SQLFunctions (Analytic)

Introduction

You use analytic functions to determine values based on groups of values. For example, you can use this type of function to determine running totals, percentages, or the top result within a group.

Syntax

  1. FIRST_VALUE ( scalar_expression ) OVER ( [ partition_by_clause ] order_by_clause [ rows_range_clause ] )
  2. LAST_VALUE ( scalar_expression ) OVER ( [ partition_by_clause ] order_by_clause [ rows_range_clause ] )
  3. LAG (scalar_expression [,offset] [,default]) OVER ( [ partition_by_clause ] order_by_clause )
  4. LEAD ( scalar_expression [ ,offset ] , [ default ] )  OVER ( [ partition_by_clause ] order_by_clause )
  5. PERCENT_RANK( ) OVER ( [ partition_by_clause ] order_by_clause )
  6. CUME_DIST( )   OVER ( [ partition_by_clause ] order_by_clause )
  7. PERCENTILE_DISC ( numeric_literal ) WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] ) OVER ( [ <partition_by_clause> ] )
  8. PERCENTILE_CONT ( numeric_literal ) WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] ) OVER ( [ <partition_by_clause> ] )

FIRST_VALUE

You use the FIRST_VALUE function to determine the first value in an ordered result set, which you identify using a scalar expression.

SELECT StateProvinceID, Name, TaxRate,
       FIRST_VALUE(StateProvinceID)
        OVER(ORDER BY TaxRate ASC) AS FirstValue
FROM SalesTaxRate;

In this example, the FIRST_VALUE function is used to return the ID of the state or province with the lowest tax rate. The OVER clause is used to order the tax rates to obtain the lowest rate.

StateProvinceIDNameTaxRateFirstValue
74Utah State Sales Tax5.0074
36Minnesota State Sales Tax6.7574
30Massachusetts State Sales Tax7.0074
1Canadian GST7.0074
57Canadian GST7.0074
63Canadian GST7.0074

LAST_VALUE

The LAST_VALUE function provides the last value in an ordered result set, which you specify using a scalar expression.

SELECT TerritoryID, StartDate, BusinessentityID,
       LAST_VALUE(BusinessentityID) 
        OVER(ORDER BY TerritoryID) AS LastValue
FROM SalesTerritoryHistory;

This example uses the LAST_VALUE function to return the last value for each rowset in the ordered values.

TerritoryIDStartDateBusinessentityIDLastValue
12005-07-01 00.00.00.000280283
12006-11-01 00.00.00.000284283
12005-07-01 00.00.00.000283283
22007-01-01 00.00.00.000277275
22005-07-01 00.00.00.000275275
32007-01-01 00.00.00.000275277

LAG and LEAD

The LAG function provides data on rows before the current row in the same result set. For example, in a SELECT statement, you can compare values in the current row with values in a previous row.

You use a scalar expression to specify the values that should be compared. The offset parameter is the number of rows before the current row that will be used in the comparison. If you don't specify the number of rows, the default value of one row is used.

The default parameter specifies the value that should be returned when the expression at offset has a NULL value. If you don't specify a value, a value of NULL is returned.


The LEAD function provides data on rows after the current row in the row set. For example, in a SELECT statement, you can compare values in the current row with values in the following row.

You specify the values that should be compared using a scalar expression. The offset parameter is the number of rows after the current row that will be used in the comparison.

You specify the value that should be returned when the expression at offset has a NULL value using the default parameter. If you don't specify these parameters, the default of one row is used and a value of NULL is returned.

SELECT BusinessEntityID, SalesYTD,
       LEAD(SalesYTD, 1, 0) OVER(ORDER BY BusinessEntityID) AS "Lead value",
       LAG(SalesYTD, 1, 0) OVER(ORDER BY BusinessEntityID) AS "Lag value"
FROM SalesPerson;

This example uses the LEAD and LAG functions to compare the sales values for each employee to date with those of the employees listed above and below, with records ordered based on the BusinessEntityID column.

BusinessEntityIDSalesYTDLead valueLag value
274559697.56393763178.17870.0000
2753763178.17874251368.5497559697.5639
2764251368.54973189418.36623763178.1787
2773189418.36621453719.46534251368.5497
2781453719.46532315185.61103189418.3662
2792315185.61101352577.13251453719.4653

PERCENT_RANK and CUME_DIST

The PERCENT_RANK function calculates the ranking of a row relative to the row set. The percentage is based on the number of rows in the group that have a lower value than the current row.

The first value in the result set always has a percent rank of zero. The value for the highest-ranked – or last – value in the set is always one.


The CUME_DIST function calculates the relative position of a specified value in a group of values, by determining the percentage of values less than or equal to that value. This is called the cumulative distribution.

SELECT BusinessEntityID, JobTitle, SickLeaveHours,
PERCENT_RANK() OVER(PARTITION BY JobTitle ORDER BY SickLeaveHours DESC)
       AS "Percent Rank",
CUME_DIST() OVER(PARTITION BY JobTitle ORDER BY SickLeaveHours DESC)
       AS "Cumulative Distribution"
FROM Employee;

In this example, you use an ORDER clause to partition – or group – the rows retrieved by the SELECT statement based on employees' job titles, with the results in each group sorted based on the numbers of sick leave hours that employees have used.

BusinessEntityIDJobTitleSickLeaveHoursPercent RankCumulative Distribution
267Application Specialist5700.25
268Application Specialist560.3333333333333330.75
269Application Specialist560.3333333333333330.75
272Application Specialist5511
262Assitant to the Cheif Financial Officer4801
239Benefits Specialist4501
252Buyer5000.111111111111111
251Buyer490.1250.333333333333333
256Buyer490.1250.333333333333333
253Buyer480.3750.555555555555555
254Buyer480.3750.555555555555555

The PERCENT_RANK function ranks the entries within each group. For each entry, it returns the percentage of entries in the same group that have lower values.

The CUME_DIST function is similar, except that it returns the percentage of values less than or equal to the current value.

PERCENTILE_DISC and PERCENTILE_CONT

The PERCENTILE_DISC function lists the value of the first entry where the cumulative distribution is higher than the percentile that you provide using the numeric_literal parameter.

The values are grouped by rowset or partition, as specified by the WITHIN GROUP clause.


The PERCENTILE_CONT function is similar to the PERCENTILE_DISC function, but returns the average of the sum of the first matching entry and the next entry.

SELECT BusinessEntityID, JobTitle, SickLeaveHours,
       CUME_DIST() OVER(PARTITION BY JobTitle ORDER BY SickLeaveHours ASC)
       AS "Cumulative Distribution",
       PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY SickLeaveHours)
          OVER(PARTITION BY JobTitle) AS "Percentile Discreet"
FROM Employee;

To find the exact value from the row that matches or exceeds the 0.5 percentile, you pass the percentile as the numeric literal in the PERCENTILE_DISC function. The Percentile Discreet column in a result set lists the value of the row at which the cumulative distribution is higher than the specified percentile.

BusinessEntityIDJobTitleSickLeaveHoursCumulative DistributionPercentile Discreet
272Application Specialist550.2556
268Application Specialist560.7556
269Application Specialist560.7556
267Application Specialist57156

To base the calculation on a set of values, you use the PERCENTILE_CONT function. The "Percentile Continuous" column in the results lists the average value of the sum of the result value and the next highest matching value.

SELECT BusinessEntityID, JobTitle, SickLeaveHours,
       CUME_DIST() OVER(PARTITION BY JobTitle ORDER BY SickLeaveHours ASC)
       AS "Cumulative Distribution",
       PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY SickLeaveHours) 
          OVER(PARTITION BY JobTitle) AS "Percentile Discreet",
       PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY SickLeaveHours) 
          OVER(PARTITION BY JobTitle) AS "Percentile Continuous"
FROM Employee;
BusinessEntityIDJobTitleSickLeaveHoursCumulative DistributionPercentile DiscreetPercentile Continuous
272Application Specialist550.255656
268Application Specialist560.755656
269Application Specialist560.755656
267Application Specialist5715656