Over the past 8 months, I’ve been performing a lot of SQL Server health checks (if you would like one done, drop me a line at jim AT jimmcleod DOT net), and this involves an analysis of the most read-intensive queries that are being performed by SQL Server. SQL Server’s architecture runs off 8 KB pages – database data files are made up of 8 KB pages (allocated as 64 KB extents, or 8 pages), and a read-intensive query is one that performs a high number of page reads to return the required information.
Some queries might require reading a large number of pages in order to get the correct data for the query, but inefficient indexes can blow out the number of pages read. It’s not unusual to see queries being executed that perform 50 GB of reads against a 10 GB database, effectively visiting the same pages over and over. It obviously takes more CPU time to process a higher number of pages, but it’s worse when you add the speed of the disk in as a factor. If your query performs a table scan of a table containing 100 MB of data, SQL Server will need to read all 100 MB (12,800 pages)off disk and into memory (unless it’s already cached). The query execution will effectively halt while waiting for all the pages to be loaded into memory, and each page will then be scanned looking for those rows that match the where criteria. Also, while loading in this 100 MB of data, another 100 MB has been pushed out of the cache, meaning SQL Server must load that data back from disk when it’s needed again.
If that wasn’t enough, high read counts are also major causes of blocking. A process is blocked when it is forced to wait for a resource to become available. Most important to the current topic is waiting for a page to be lifted from disk into memory as discussed above, or waiting on locks to be removed. For example, if a process is scanning a table looking for rows, and another process wants to update a single row, the update process will be blocked until the first process has finished, even if those two processes are after completely separate rows. (Note that this example doesn’t go into isolation levels, which makes blocking much more interesting.)
So, how can we see how many reads are used, and how can we tell if those are excessive? There are two easy ways to find out the number of reads – Profiler, or SET STATISTICS IO ON. More on these in the next post.