One of the lesser mentioned features in SQL 2014 is the new cardinality estimator. In this release Microsoft are making significant changes to query optimiser – the heart of the SQL Server Engine. It’s a brave move – and something they are doing with caution, so much so that to use it on upgraded databases you will have to use a trace flag to enable it.
SQL uses statistics on data in your tables and indexes to determine how to execute a query. Cardinality is a measure of density which the optimiser uses to estimate how many rows match the predicates in your query (the filter criteria). This is a very important part of the plan generation, for example if the statistics suggest only a small number of rows will be returned from two tables that are to be joined then it is likely that an nested loop join will be used. If there are many rows then a hash join or maybe a merge join will be used. Each method has positives and negatives, getting it wrong can create painfully slow queries. For example if the statistics suggest 100 rows will be returned for a join but in truth it is a million rows then the optimiser will probably try to perform a nested loop join. This will probably be significantly slower.
For sometime people have been asking for Microsoft to introduce partitioned statistics. This is needed because for very large tables updating statistics on very old data repeatedly just to include a small percentage of new data was resource and time consuming. Microsoft has responded and included partitioned statistics in SQL 2014. I’ve been unable to find a way of testing this is CTP1 – I suspect it isn’t available yet as they have not included the new Compatibility Mode (120) yet. This is usually required when syntax changes are introduced.
So with this change to statistics has forced Microsoft’s hand in looking at the cardinality estimator to support the new feature. But they from the looks of things they have taken the opportunity to re-work the engine. The risk of forcing the change on everyone is too risky, if it was to wreck havoc on customer servers it would be severely damaging to the product – so the use of trace flags to enable it is no surprise. Slightly oddly the trace flag is not via the usual method of using DBCC TRACEON, instead it is a query hint. This is good in the sense that you can control where it is used if you see conflicting results.
The syntax to use the new cardinality estimator on databases created on previous versions of SQL Server (this includes attached or restored databases) is:
OPTION (QUERYTRACEON 2312)
To disable the new estimator on databases created on SQL 2014 the syntax is:
OPTION (QUERYTRACEON 9481)
When you run the query with this option enabled and view the XML execution plan you should see CardinalityEstimationModel110=”ON” near the top of the plan.
I’ve played with the option with several queries using the Adventure Works 2012 databases and found the results hit and miss. Some of the estimates were better, some were worse. But in either case the amount of reads recorded using STATISTICS IO were similar.
Here is an interesting example using a restored adventure works database on SQL 2014 CTP1 so I had to enable the option:
SET STATISTICS IO ON SELECT COUNT(*) FROM Sales.SalesOrderHeader H JOIN Sales.SalesOrderDetail D ON D.SalesOrderID = H.SalesOrderID JOIN Production.TransactionHistory TH ON D.ProductID = TH.ProductID WHERE H.DueDate > TH.TransactionDate AND OnlineOrderFlag = 0 SET STATISTICS IO OFF SET STATISTICS IO ON SELECT COUNT(*) FROM Sales.SalesOrderHeader H JOIN Sales.SalesOrderDetail D ON D.SalesOrderID = H.SalesOrderID JOIN Production.TransactionHistory TH ON D.ProductID = TH.ProductID WHERE H.DueDate > TH.TransactionDate AND OnlineOrderFlag = 0 OPTION (QUERYTRACEON 2312) SET STATISTICS IO OFF
If you look at the messages window the number of reads are identical for both queries. Now look at the query plans, this is from the first example using the old estimator:
In the screenshot I’m hovering over the arrow to the left of the Hash Match join operator. You can see the estimator is pretty close. Now look at the same item for the plan on the new estimator:
Here you can see that the estimate is well off. Yet the reads are the same, the query cost (which is never particularly reliable) shows that the second query is 41% of the load versus 59% for the old estimator query. Running the query with STATISTICS TIME on shows the second query is about 10% faster.
It’s hard to draw any conclusions from this yet – time will tell how it performs.
Microsoft are yet to release details of the changes they have made so it is very hard to write queries to test the scenarios they are expecting to be better. They may actually never release that detail as the optimiser has always been a black box that I suspect very few people at Microsoft even fully understand.
Still – download CTP1 and have a play.