Common Table Expressions (CTE) were introduced into SQL Server 2005. The primary purpose was to perform recursive queries – a big plus feature to SQL Server. However they can also be used for many other purposes too.

A problem I was recently presented with was to take a common GROUP BY query that showed sales by category and also show the top selling item within that category.

Here is a sample schema and data we are working with:

CREATE TABLE Products(
      ProductID         int                     NOT NULL    PRIMARY KEY,
      ProductName       varchar(100)            NOT NULL,
      CategoryID        int                     NOT NULL);

INSERT INTO Products
SELECT 1, 'Red Car', 1
UNION ALL SELECT 2, 'Green Car', 1
UNION ALL SELECT 3, 'Black Car', 1
UNION ALL SELECT 4, 'Red Truck', 2
UNION ALL SELECT 5, 'White Truck', 2;

CREATE TABLE ProductCategories(
      CategoryID        int                     NOT NULL    PRIMARY KEY,
      CategoryName      varchar(100)            NOT NULL);

INSERT INTO ProductCategories
SELECT 1, 'Cars'
UNION ALL SELECT 2, 'Trucks';

CREATE TABLE SalesOrderDetails(
      DetailID          int   IDENTITY(1,1)     NOT NULL PRIMARY KEY,
      ProductID         int                     NOT NULL,
      Qty               int                     NOT NULL);

INSERT INTO SalesOrderDetails
SELECT 1, 5
UNION ALL SELECT 2, 4
UNION ALL SELECT 2, 2
UNION ALL SELECT 2, 6
UNION ALL SELECT 1, 1
UNION ALL SELECT 3, 4
UNION ALL SELECT 4, 3
UNION ALL SELECT 4, 2
UNION ALL SELECT 5, 1
UNION ALL SELECT 5, 8
UNION ALL SELECT 5, 1;

Historically this query would be done by creating a temp table or at least a derived table, performing the GROUP BY and doing a sub query to get the top item. This would work and performance would be fine on a small set of data. However it involves several statements and the sub query is limited to one column being returned.

Using 3 of the newer features we can write the query in one statement as follows:

WITH CTE AS(
SELECT P.CategoryID, ProductName, SUM(D.Qty) Qty, 
	ROW_NUMBER() OVER (PARTITION BY P.CategoryID 
		ORDER BY SUM(D.Qty) DESC) as [RANK]
FROM Products P
   JOIN SalesOrderDetails D ON D.ProductID = P.ProductID
GROUP BY P.CategoryID, ProductName)
SELECT C.CategoryName, SUM(CTE.Qty) TotalUnitSales, 
	T.ProductName TopSellingProduct
FROM CTE 
   JOIN ProductCategories C ON C.CategoryID = CTE.CategoryID
   CROSS APPLY (SELECT ProductName 
		FROM CTE PR 
		WHERE PR.[Rank] = 1 
				AND PR.CategoryID = CTE.CategoryID) T
GROUP BY C.CategoryName, T.ProductName;

Using the CTE we can get the total sales per product, we also rank the sales using the Row_Number feature over each category. Running just the CTE portion of the statement gives you:

CategoryID ProductName Qty RANK
1 Green Car 12 1
1 Red Car 6 2
1 Black Car 4 3
2 White Truck 10 1
2 Red Truck 5 2

The second part of the query aggregates the sales per category. And then lastly we use CROSS APPLY to select the top selling product for each category. This gives us the final resultset of:

CategoryName TotalUnitSales TopSellingProduct
Cars 22 Green Car
Trucks 15 White Truck

And happy days we have our results as expected. This is a very simple example of what could be done but hopefully this gives you the general idea.

Tagged with →  

Leave a Reply

Your email address will not be published. Required fields are marked *