Apr 092013
 

If you haven’t heard, someone important died today. Someone that changed the world. No, it wasn’t a real person. SQL Server 2000’s support lifecycle finally came to a whimpering end. Not a dignified, noble, heroic, or tragic end. More of a fizzling out.

Cast your mind back, if you can, to the turn of the century. 2000. A year of renewed hope, where everyone was thankful that the Y2K bug hadn’t wiped out life as we know it. Everyone in the SQL Server world was excited about a brand new release1. SQL Server 8 was going to be fantastic, but when it exploded out onto the scene in late 2000, it was SQL Server 2000, which is a lot more exciting that 8.0. Like, 1992 better than 8. Over the next 12-24 months, SQL Server 2000 was it. Everyone loved it. Microsoft were making fantastic strides in enterprise database software, and SQL Server was finally giving Oracle a bit more pressure. The new features in 2000 were such a leap from 7.0, and made managing databases much easier.

SQL Server 2000 rocked.

It had some cracks. How can anyone forget Slammer, a worm that triggered a buffer overflow exploit on the SQL Browser service? It was relatively benign, apart from the massive amount of network traffic it generated. In theory, it could have done anything once it had gotten into the Browser. For a decent layman’s2 view, have a look at the Wired article from July 2003.

A 64 bit version was released in 2003. Reporting Services was released in 2004. It just kept getting better and better. There were hiccups – Service Pack 4 suffered a disastrous memory bug, where only half the server memory could be used by AWE.

But then SQL Server 2005 came out, and it was fantastic in comparison. It took 12-24 months to start seeing a decline in SQL Server 2000 installations, but it happened. SQL Server 2008 came out, and people started sneering at SQL Server 2000. It was primitive, hard to work with, ugly. DTS was horrible compared to SSIS. Those that had sung its praises now sighed with resignation whenever they had to use Enterprise Manager, and I’m sorry to say that I have to count myself in that group of people. “Why haven’t you upgraded from 2000 yet? It’s 2010! Way past time!” And yet, SQL Server 2000 kept chugging along, doing what it had always done, as the number of installs dwindled and dwindled.

From a very high height, SQL Server 2000 has fallen, but it has not had an end like the greatest heroes of history. Let’s spend a minute today and remember how great this product was. If SQL Servers 2005, 2008, 2008 R2 and 2012 are so great, it’s because they stood on the shoulders of a giant.3

Rest in peace, old friend.

 

Notes:

1 Artistic liberty, because I was working with other SQL implementations back then and not paying attention to SQL Server.

2 If laymen understand assembly…

3And of course, SQL Server 2000 stood on the shoulders of 7.0, and so forth, but that fact ruins the tone of the sentence.

Jan 082013
 

Welcome to 2013! I’m not usually one for making resolutions based on the year changing, but the topic of this month’s T-SQL Tuesday, courtesy of Jason Brimhall is on "Standing Firm", which is awfully similar to the MacLeod clan’s motto: "Hold Fast".

While I haven’t started the year with a resolution in mind, New Years day did mark the start of me riding my bicycle again after a number of years of neglect. The motivation was to get back into doing something I love – and, as usual, I was reminded how much I enjoy being on the bike as soon as I got out. However, I know from bitter experience that it’s so easy to get out of the habit. It’s too easy to make up excuses such as:

  • I have to ride early in the day due to other commitments, but I didn’t get to bed until late (my most common reason!)
  • It’s too cold/wet/windy/dark to go out.
  • There’s nothing but hills around and I want something flat.
  • How can I resolve to stay on the bike, and not fall off? (The answer is obvious – keep hold of the handlebars!)

Solution

I’ve gotten back into cycling a few times before, and it usually lasts while I have a strong reason to, for example, cycling being a more convenient method of commuting than public transport. Basically, a goal. However, what I find works well to stand firm with the resolve to ride is to spend some time thinking about what could go wrong, or what excuses I could think up, and how to handle them ahead of time. For example:

  • Choose a goal to ride for – in my case it’s getting the fitness back, trying to ride more kilometres than @woodgreg, and spending time with my cycling friends that have been awfully neglected lately. I’ll need a stronger goal soon, but that works for now.
  • My left shifter only has a short life left, and you can’t buy 9 speed replacements. Solution is to prepare early, and get a new bike ($$$), or overhaul to 10 speed ($$).
  • It’s light outside at 6am now, but soon it will be dark, and I don’t want to bother with contacts lenses at 6am, don’t want to wear normal glasses, and it’ll be too dark for my prescription sunglasses. Solution: look into prescription cycling glasses – it’s a solved problem. Do it before I need it.
  • Not enough sleep? Get to bed earlier, although I’ll probably need badgering from my wife, as there’s always something more interesting to do than sleep.
  • Too cold? Get the right gear. Too wet? Get the right gear. Too windy? Harden up a bit.

Jim, what’s this got to do with databases and standing firm?

Good question. The steps above show how my resolve is strengthened, not by force of will alone, but by putting together an environment that helps keep me on track. We can apply this to the SQL Server certification process. If the goal is to study for a specific exam, what are the possible problems that can crop up?

  • I just don’t understand this material.
  • I’m too tired to study.
  • I don’t have the right materials/equipment/practice environments.
  • There’s just too much to learn!
  • I have no interest in this – I’m just doing it because my boss wants me to.

I’m sure there’s other reasons, but you can follow the same process for any objections.

  • If you don’t understand the material, you may need someone to help you work through it. Find a friendly SQL Server person, offer them coffee in exchange for some of their time.
  • If your brain is too tired at the end of the day, consider studying during a lunch break, before starting work, or arrange with your boss to spend an hour each morning studying (and thinking how you can apply the material to your job).
  • If you feel overwhelmed by the amount of the material, break it down, and resolve to learn one subject a week.
  • If you have no interest in the material, then considering this excuse may be the reflection you need to decide whether this is actually right for you.

That’s it. Put together an environment that supports and encourages you to stick to your resolution.

I’d also recommend spending a few hours a week on a bicycle. It’s fun, it gives you a lot of exercise, and plenty of time to compose your thoughts. I think it’s no coincidence that my blog posting frequency correlates to those times when I was riding a lot!

Aug 142012
 

It’s T-SQL Tuesday again, and this month’s gracious host is Mike Fal (Blog | Twitter), urging us to deliver wonderful stories about SQL Server trick shots. The first thing that comes to mind is the Yo Gabba Gabba “Cool Tricks” segment. You should probably read the rest of my post before watching the video, or you’re liable to end up watching Flight of the Conchords shorts for the next twenty minutes.

My trick shot is nothing particularly revolutionary, but my hope is that at least one person hasn’t seen this trick before. I’ve used it often enough that I’ve got it down to a fine art.

The trick is using SQL Server Profiler to discover the inner workings of SQL Server Management Studio. Almost everyone knows about the “Script” button at the top of most dialog boxes to generate the T-SQL commands to perform a specific action, but occasionally you want to see exactly how SSMS came up with something it is displaying on screen. There’s nothing particularly special about SSMS – it doesn’t appear to interface with SQL Server in any way other than you or I could, using primarily T-SQL commands, with a bit of SMO thrown in. This generally means that anything you can see or do with SSMS, there will be an equivalent, and you can capture this activity.

The trick in action

For example, one of my preferred methods to see how much space is available in a file is to use the Shrink File dialog (right click the database in Object Explorer, Tasks, Shrink, Files).

image

There’s 220 MB allocated, and 36.75 MB available. That could be a very useful thing to know, and to use in my own set of diagnostic queries. How does SSMS determine this? Time to bring out the trick shot.

Fire up SQL Server Profiler from the Tools menu, and connect to a non-production machine. You can keep all the default events, but the important part is the filter. If you’re on a machine by yourself, a simple filter on Application Name  (like ‘%Management Studio%’) is sufficient, but you can add extra filters for your login name if necessary.

image

Once the trace is running, do the smallest amount of steps necessary to trigger what you want to capture, clearing the trace window if need be. For example, right clicking on a database in the Object Explorer causes 39 lines worth of activity to appear. The smallest amount would be to get into the Shrink File dialog, then change the drop down lists to choose which file you want to display. The fewer rows to wade through, the easier it will be to find that T-SQL you’re looking for. In our case:

exec sp_executesql N'use [AdventureWorks];select 
CAST(CASE s.type WHEN 2 THEN s.size * CONVERT(float,8) ELSE dfs.allocated_extent_page_count*convert(float,8) END AS float) AS [UsedSpace],
CASE s.type WHEN 2 THEN 0 ELSE @_msparam_0 - dfs.allocated_extent_page_count*convert(float,8) END AS [AvailableSpace] 
from 
sys.filegroups AS g
inner join sys.database_files AS s on ((s.type = 2 or s.type = 0) and (s.drop_lsn IS NULL)) AND (s.data_space_id=g.data_space_id)
left outer join sys.dm_db_file_space_usage as dfs ON dfs.database_id = db_id() AND dfs.file_id = s.file_id
where 
s.name = @_msparam_1 and g.data_space_id = @_msparam_2
',N'@_msparam_0 nvarchar(4000),@_msparam_1 nvarchar(4000),@_msparam_2 nvarchar(4000)',@_msparam_0=N'225280',@_msparam_1=N'AdventureWorks2008R2_Data',@_msparam_2=N'1'

This returns a UsedSpace of 187648 KB (183.25 MB), and AvailableSpace of 37632 KB (36.75 MB), the sum of which is the 220 MB shown above. We’ve managed to discover that SSMS uses sys.dm_db_file_space_usage to determine free space.

What have I learned here?

I’ve learned many things in the past using this trick, but the most recent was while writing up this post, which was done using SQL Server 2012 (with an upgraded 2008 R2 AdventureWorks database). I originally tested using SSMS 2008 R2, connecting to a 2008 instance, and those tests showed that SSMS was actually using the FILEPROPERTY function to get the space used, with code similar to:

SELECT FILEPROPERTY('AdventureWorks2008R2_Data', 'SpaceUsed') * CONVERT(float, 8)

When I ran the test again to write it up, I was a little surprised to see sys.dm_db_file_space_usage appear. This has been available since SQL Server 2005, but this part of SSMS just hadn’t been updated to take advantage of the DMVs until now.

Parting words

There you have it – if you see any interesting value come out of SSMS, use Profiler to see exactly how they did it, and you can learn from it.

My name’s Jim, and my cool trick is using my Profiler x-ray glasses (go watch the video now).

Jun 262012
 

I recently posted my colour scheme for Management Studio 2005/2008, and have just completed porting it to Management Studio 2012.

image

As before, it’s based on a combination of Solarized and Son of Obsidian.  The most visible change from the 2008 version is the highlighting – Visual Studio 2010 uses transparency for highlighting text, which looks rather bad when  combined with my previous dark yellow colour. I’ve set it to black for now, which doesn’t look too bad. This plugin will apparently fix the transparency, but I haven’t tried as yet.

You can download the settings file from here (rename to remove the .txt extension from the file), and use the Tools –> Import and Export Settings menu in Management Studio. This also allows you to take a backup of your current settings, or revert back to defaults.

If you enjoy the settings, or have any suggestions, please let me know! I’m still using SSMS 2008 as my main IDE, so this version isn’t quite as well tested yet.

If you’re not a fan, or you’d like to enjoy other colour schemes for Visual Studio 2010, check out studiostyles (although many of the styles won’t be specifically set up for the SQL* elements in Fonts and Colors, so you may have to do some editing).

May 032012
 

I’m normally not a fan of messing with Management Studio’s default configuration unless I’m going to be using that version for a long, sustained period – I frequently use other environments that don’t allow me to have customised settings.

At the moment, I’m mainly using just two computers, and thought a change of scenery would be nice. Here’s a pretty screenshot:

image

There’s two main components here – font selection and colour selection. Anyone who’s used Visual Studio or Management Studio 2012 will recognise the Consolas font (11pt), which is incredibly nicer to read than the default Courier New. I believe Consolas is available with Word 2007, so most machines should have this preinstalled.

The colour scheme was originally inspired by Solarized which is incredibly well designed and thought out. I made quite a few adjustments to add a few more vibrant colours (such as the white for SQL operators, a brighter green, a brighter yellow, and a blacker background, heavily influenced by Son of Obsidian), and this has had the detrimental effect that pasting coloured text onto a white background (such as email) doesn’t work too well. The original Solarized is great in that it works well on either light or dark backgrounds.

Switching back to the default colours does significantly shock your eyes due to the brightness, so I’m enjoying the comfort of the soft colours.

To install, simply backup your [HKEY_CURRENT_USER\Software\Microsoft\Microsoft SQL Server\100\Tools\Shell\FontAndColors\{A27B4E24-A735-4D1D-B8E7-9716E1E3D8E0}] registry key, and load in this new set (standard disclaimer about being careful with your registry, make sure you know what you’re doing, don’t blame me, etc). You can then either restart SSMS, or go to the Tools, Options, Environements, Fonts and Colors and simply click OK.

Enjoy, and if you make any changes, please let me know – I’d be interested in trying it!

Update – August 28, 2012: Bill Hurt has sent through a slightly modified version which changes the line numbers and background colours to a deep black/brown that goes well with the rest of the colour scheme. It’s certainly worth a look! Download here.

Jul 112011
 

Imagine that you’ve just used SQL Server Management Studio to generate a script, and it’s done it’s job wonderfully, except for a lot of unsightly GO and blank lines:

GRANT SELECT ON [Person].[Address] TO [SomeUser];
GO

GRANT INSERT ON [Person].[Address] TO [SomeUser];
GO

GRANT UPDATE ON [Person].[Address] TO [SomeUser];
GO

GRANT DELETE ON [Person].[Address] TO [SomeUser];
GO

Although technically correct, this may offend your sense of aesthetics. If there’s only a couple of rows (as in the excerpt above, you can manually remove the unsightly GOs and blank lines, but this is a bigger ask when you have more than about 20 commands.

The simplest method I’ve come across is to use one of the DBA’s most trusted tools – Excel. Although there are other ways of doing this sort of operation, Excel is installed on almost every Windows machine, and this is a quick operation.

First, copy the entire set of commands, and paste into Excel as Column B, and in column A, add a number for each row (essentially an IDENTITY column). Remember that you can just fill out 1 and 2, select both of these, and drag down to the bottom:

image

This number column is used so that we know which order each row was in originally, because we’re about to sort the data. Select columns A and B for all rows, and sort by column B:

image

Next, select the rows you don’t want, and delete them.

image

Now, select the remaining data, and sort again, this time on column A, to return to the original sort order (minus the offending rows):

image

Copy the commands back into Management Studio, and you’re done.

GRANT SELECT ON [Person].[Address] TO [SomeUser];
GRANT INSERT ON [Person].[Address] TO [SomeUser];
GRANT UPDATE ON [Person].[Address] TO [SomeUser];
GRANT DELETE ON [Person].[Address] TO [SomeUser];
Apr 122011
 

Welcome to T-SQL Tuesday for April 12, 2011. This month is generously hosted by Matt Velic (Blog | Twitter), who poses the topic of APPLY, one of the many excellent additions to the T-SQL language in SQL Server 2005.

If you read other T-SQL Tuesday posts from today, you’ll get some excellent tips on how to use APPLY, and a list of the excellent things you can do with it. I’m going to go in a slightly different direction, and examine what your options are when using APPLY with database compatibility level 80.

Books Online states that the database must be at least compatibility level 90 to use APPLY.

It turns out that you can create a query that uses APPLY, and have it run in compatibility level 80. Consider the following setup:

CREATE DATABASE Compat80
GO
ALTER DATABASE Compat80 SET COMPATIBILITY_LEVEL=80
GO

USE Compat80
GO
SET NOCOUNT ON 
CREATE TABLE A (i int, name varchar(20)) 
CREATE TABLE B (i int, amount decimal(15,3)) 
INSERT INTO A (i, name) VALUES (1, 'Jack') 
INSERT INTO A (i, name) VALUES (2, 'Bob') 
INSERT INTO A (i, name) VALUES (3, 'Sally') 
INSERT INTO B (i, amount) VALUES (1, 25.0) 
INSERT INTO B (i, amount) VALUES (1, 50.5) 
INSERT INTO B (i, amount) VALUES (2, 16.0) 
INSERT INTO B (i, amount) VALUES (3, 110.0) 
GO 

Can we use an APPLY in Compatibility 80?

We’ve created a database in compatibility level 80, and created two tables, A and B. Let’s try a query with CROSS APPLY:

SELECT A.Name, dv.Amount FROM A 
CROSS APPLY (SELECT i, SUM(Amount) Amount 
             FROM B WHERE B.i = A.i GROUP BY i) dv 

image

There’s not much to say here, except that I proved Books Online wrong – I can do a CROSS APPLY in Compatibility 80. I don’t know the exact reason why I can do this, but it’s likely to be the simplistic nature of the query (which makes it less interesting). I’ll do a quick little victory dance, and we’ll move on.

A more useful use of APPLY

Where APPLY really shines is when functions are used. Let’s go ahead and create a simple function to test with:

CREATE FUNCTION dbo.fn_getB(@i AS int) 
RETURNS @result TABLE 
( 
      i   int NOT NULL 
    , amount decimal(15,3) 
) 
AS BEGIN 
        INSERT INTO @result 
        SELECT i, amount FROM B WHERE i = @i 
        RETURN 
END 
GO

The logic for this function is slightly different than the previous, but it only serves to make it more interesting. Let’s go ahead and use this function in an APPLY, remembering that we’re still in Compatibility 80:

SELECT * FROM A 
CROSS APPLY dbo.fn_getB(A.i) dv 

Msg 102, Level 15, State 1, Line 2

Incorrect syntax near ‘.’.

That’s disappointing, and rather cryptic. Let’s try again after switching to Compatibility 90:

ALTER DATABASE Compat80 SET COMPATIBILITY_LEVEL=90
GO 
SELECT * FROM A 
CROSS APPLY dbo.fn_getB(A.i) dv 
GO 

image

Success!

Getting around this restriction

We’ve seen that APPLY using functions is a handy tool, but can we still use it in databases that are in Compatibility 80? The following is a workaround, but it’s a bit of a hack. I just like thinking outside the box.

The limitation here is that we cannot let a Compatibility 80 database execute the query. We can, however, run the query from within a different database, such as Master (ignoring permission issues) by using sp_executesql.

EXEC sp_executesql N'USE Master; 
        SELECT * FROM Compat80.dbo.A 
        CROSS APPLY Compat80.dbo.fn_getB(A.i) dv ' 
GO

The main reason I’d want to use APPLY is to use the DMFs with the DMVs provided by SQL Server 2005, such as sys.dm_exec_*. However, these DMVs typically supply a database_id column, meaning you can run the query from within Master anyway. With that in mind, I can’t think of any real benefit of this technique – it’s almost always a better option to simply upgrade your databases to compatibility 90 or above – we are in 2011, after all.

Thanks again for this month’s edition of T-SQL Tuesday, Matt!

Apr 112011
 

The question is, what is the difference between having a filter on the ON clause, or in the WHERE clause?

Consider the following queries:

USE TempDB 
GO 
CREATE TABLE A (i int, name varchar(20)) 
INSERT INTO A (i, name) VALUES (1, 'Jack'), (2, 'Ryan')
    , (3, 'Simon'), (4, 'Sandra'), (5, 'Daryl') 
GO 

CREATE TABLE B (i int, amount decimal(9,3)) 
INSERT INTO B (i, amount) VALUES (1, 100.25), (2, 10000)
    , (1, 0.45), (4, 234.23) 

-- WHERE
SELECT * FROM A 
INNER JOIN B ON A.i = B.i 
WHERE A.Name NOT LIKE 'S%' 

-- ON
SELECT * FROM A 
INNER JOIN B ON A.i = B.i AND A.Name NOT LIKE 'S%' 

Consider these two queries are the bottom. The first uses a “A.Name NOT LIKE ‘S%’” filter in the WHERE clause, whereas the second uses the same filter in the ON clause. The results are identical:

image

Where this gets interesting is when we use an OUTER JOIN. For example, consider the following two queries, which are identical to the previous two, but using a LEFT OUTER JOIN:

-- Query A - WHERE filter 
SELECT * FROM A 
LEFT JOIN B ON A.i = B.i 
WHERE A.Name NOT LIKE 'S%' 

-- Query B - ON filter 
SELECT * FROM A 
LEFT JOIN B ON A.i = B.i AND A.Name NOT LIKE 'S%' 

And the results:

image

What’s happened here? The only difference was the placement of the A.Name clause, but the second query (the ON) has added Simon and Sandra’s rows.

What’s happened is due to the ordering of the logical query processing steps. Logically, all JOINs are performed as INNER JOINs using the ON filters, and then, as a subsequent step, any OUTER JOIN rows are added back to the necessary side. After all JOINs are complete is the WHERE filter processed.

This means that in Query A (WHERE), the inner join between A and B was completed, then rows 3 (Simon), and 5 (Daryl) were added back in. Then the WHERE was applied, and Simon and Sandra were removed as their name begins with S.

In Query B (ON), A and B were joined together, but any rows in A with a name LIKE ‘S%’ were not joined, as the ON filter ignored them during the initial JOIN phase. The OUTER JOIN phase then added in all rows from the LEFT that weren’t joined, and so Simon and Sandra were added back in.

Is this a problem? I don’t believe it’s much of an issue, as I tend to think of the WHERE clauses as being distinct from the ON clauses. Just remember to keep your join filters (ON) and your row excluders (WHERE) separate, and understand the difference when necessary.

If you ran the sample code, don’t forget to clean up:

DROP TABLE A 
DROP TABLE B 
Mar 082011
 

The subject of T-SQL Tuesday #016, kindly hosted by Jes Borland (blog | twitter), is Aggregation. Although it’s a fine topic, I could not figure out what I could write about aggregation that would capture my interest, and more importantly, dear reader, yours.

I thought about the topic ceaselessly while doing dishes, feeding children, and cleaning up the house. I considered aggregate functions, and optional grouping parameters. I twisted the topic inside and out, trying to figure out an interesting angle. I considered telling a story about the Tuples vs the Aggregates on their way to Query Station (and spent an embarrassingly long time on that train of thought). Finally, I went out for a run, thought of a topic pretty quickly and started turning the idea over in my mind for the next 7 km.

Columnstore Indexes

A columnstore index is a new feature of Denali, whereby an index is stored vertically – one column per page.  Here is the whitepaper download – highly recommended, and only 5 pages of text. Imagine a row of a typical covering index (in this case, AdventureWork’s Person.Person table, and I’ve covered every column):

image

Every row stored in this particular index is stored on the same page. As each row is quite wide, only 5 rows will fit per 8 KB page.

A columnstore index, however, stores each column on a separate set of pages. I am unsure of the ordering of the rows* but each page is compressed. As the column is likely to contain similar data, this means that each page can have excellent compression ratios – a factor of 4-15 is quoted in the whitepaper – and thousands of rows can fit on a single page.

* Due to a lack of internals documentation so far. Are they ordered by the column, or by the clustering key? I can see arguments for either way, but haven’t yet dived deep enough to decide one or the other.

This allows for fantastic performance gains when performing aggregates, as fewer pages need to be read to service the query, and only the columns actually required by the query need to be accessed.

Howerver, due to the fact that columnstore indexes are read only, they really only have applications in data warehouses.

Can we reproduce columnstore indexes in SQL Server 2008?

The question of whether it is possible to reproduce columnstore indexes in SQL Server 2008 popped into my mind, and the short answer is “No, not a chance”. Sorry.

Let’s look at the properties of a columnstore index.

Read Only – SQL Server 2008 can certainly build index structures on read-only data. And, as it’s read only, separate aggregation steps can be created in 2008 to summarise data ahead of time. However, one of columnstore’s selling points is to remove the need to aggregate. Columnstore gives you a massive performance benefit with little effort.

Highly Compressed – SQL Server 2008 can compress indexes. However, to get highly compressible indexes, similar to columnstore’s you’ll only want a single column, which makes the index less useful, and forces you to join frequently.

Only select required columns – A query accessing a columnstore index will only access those columns that are needed by the query. To get the same sort of functionality in 2008, you would need to either create a covering index for each query you want (nothing like columnstore!), or join multiple indexes together (resulting in a large amount of logical reads).

Different table structure – Denali is able to use a completely different page structure, particularly as this data is read-only. This is not possible in SQL Server 2008, and still be usable with T-SQL.

No code changes – Denali’s implementation allows us to create a single columnstore index on all columns in the table, and we don’t need to modify any queries to use these columnstore indexes. It’s completely seamless.

I tried to create an example where we could create similar compressed indexes in 2008, but I was never able to merge multiple indexes together to query the data efficiently. Denali has specific optimisations available for columnstore indexes, and these cannot be reproduced. About the best we can do is to create a read-only compressed covering index, which is nowhere  near impressive these days.

After a few hours considering this, my conclusion is that you are better off optimising your data warehouse for those queries that you know are going to come, and ad-hoc queries will need to continue to run slowly, at least until you’ve upgraded to Denali. Frequently run queries will benefit from extra customisation (pre-aggregation, specialised indexes with compression), but there no chance of a similar payoff inside the database engine.

Not too long to wait now, and you’ll be able to take advantage of columnstore indexes!

Mar 012011
 

It’s finally happened. The table you created five years ago with an clustered integer IDENTITY column is about to hit 231 values (2,147,483,648). Five years ago, you were inexperienced with SQL Server, and had no idea that this table would grow this large – or that you’d be around to see it!

First, you reseed the IDENTITY value, back to -231.

DBCC CHECKIDENT (LargeTable, RESEED, -2147483647)

This should allow you to get another couple of years out of the table, and you will have enough time to implement a better table structure.

A flash of inspiration

A few hours later, you’re relaxing after work, quietly pleased about the bullet you’ve dodged today. A thought suddenly strikes you like a lightning bolt. You recall that inserts into a clustered index work best when inserting into an ever increasing value, at the end of the index. Can you  have made a massive error in judgement in forcing all inserts to always happen just before identity value 1?

While you toss and turn about this all night, feel free to consider the issue before reading on. What do you think will happen? Will you have an enormous number of page splits are you constantly run into identity 1? This is obviously a frequently inserted table, given that it has had 2 billion rows added in the past five years (1.1 million per day).

Testing the hypothesis

Here’s a nice block of code to test the problem. We create a table with 1000 rows approximately 1000 bytes long, to fit 8 rows per page.

CREATE DATABASE IdentityTest
GO
USE IdentityTest
GO

-- Create a test table, just under 1000 bytes per row. We can fit 8 rows per page.
CREATE TABLE T (i INT IDENTITY PRIMARY KEY, j char(960))
GO

SET NOCOUNT ON

-- Backup the database so we have a consistent log to examine
ALTER DATABASE IdentityTest SET RECOVERY FULL
BACKUP DATABASE IdentityTest TO DISK = 'C:\Work\IdentityTest_Full.bak' WITH INIT
GO

-- Seed the table with 1000 rows with identity ranges from 1-1000
INSERT INTO T (j) VALUES ('AAAAAAA')
GO 1000

-- Reseed back to a large negative number
DBCC CHECKIDENT (T, RESEED, -2147483647)
GO

-- Add another 100 rows with negative identities counting towards positive
INSERT INTO T (j) VALUES ('BBBBBBB')
GO 100

-- Examine the transaction log to see page split operations.
-- Note the "CheckIdent" Transaction Name around row 1136 where we reseeded.
SELECT [Transaction Name], * FROM fn_dblog(null, null)
WHERE [Transaction Name] IS NOT NULL
ORDER BY [Current LSN]
GO

-- Cleanup
USE master
GO
DROP DATABASE IdentityTest
GO

We can now clearly see the page split pattern in the transaction log. As expected, during inserts prior to the re-seeding, we have a pattern of 8 inserts before a Page Split occurs, signalling that no more rows will fit onto this page.

image

Analysis of the results

After we run the CheckIdent operation (row 1136), we can see that the next insert operation causes a page split, because we are trying to insert onto the first page of the clustered index. This already has 8 rows on it (IDs 1-8), so ID -2,147,483,646 won’t fit.  The new page ends up with the new row, plus IDs 1, 2, and 3. This leaves room for four more rows to be inserted.

As expected, the fifth insert after the CheckIdent (row 1142) causes another page split. This time, the four negative ID rows are moved to one page, and rows 1,2,3 are on the other page.

So far, the pattern seems bad. We’ve had two splits after inserting five rows, and we get another split on the ninth insert (row 1147). In this case, however, we have not bumped up against the positive IDs. We currently have a single page holding the previous 8 negative ID rows, and the ninth won’t fit at the end, so a new page is allocated with this single row.

Now we’re into a stable state. Every 8 inserts, a new page is allocated, and the inserts never again touch the pages containing the first positive rows.

Conclusion

The results of this test are pretty clear – the page splitting algorithms will quickly re-stabilise after re-seeding. You’ll no longer be inserting into the end of the clustered index B-tree, but you will be inserting into a consistent point, which is separated from the start of the values, and so you should not see any performance degradation.

You were stressing for nothing. Relax!