May 032012
 

I’m normally not a fan of messing with Management Studio’s default configuration unless I’m going to be using that version for a long, sustained period – I frequently use other environments that don’t allow me to have customised settings.

At the moment, I’m mainly using just two computers, and thought a change of scenery would be nice. Here’s a pretty screenshot:

image

There’s two main components here – font selection and colour selection. Anyone who’s used Visual Studio or Management Studio 2012 will recognise the Consolas font (11pt), which is incredibly nicer to read than the default Courier New. I believe Consolas is available with Word 2007, so most machines should have this preinstalled.

The colour scheme was originally inspired by Solarized which is incredibly well designed and thought out. I made quite a few adjustments to add a few more vibrant colours (such as the white for SQL operators, a brighter green, a brighter yellow, and a blacker background, heavily influenced by Son of Obsidian), and this has had the detrimental effect that pasting coloured text onto a white background (such as email) doesn’t work too well. The original Solarized is great in that it works well on either light or dark backgrounds.

Switching back to the default colours does significantly shock your eyes due to the brightness, so I’m enjoying the comfort of the soft colours.

To install, simply backup your [HKEY_CURRENT_USER\Software\Microsoft\Microsoft SQL Server\100\Tools\Shell\FontAndColors\{A27B4E24-A735-4D1D-B8E7-9716E1E3D8E0}] registry key, and load in this new set (standard disclaimer about being careful with your registry, make sure you know what you’re doing, don’t blame me, etc). You can then either restart SSMS, or go to the Tools, Options, Environements, Fonts and Colors and simply click OK.

Enjoy, and if you make any changes, please let me know – I’d be interested in trying it!

Update – August 28, 2012: Bill Hurt has sent through a slightly modified version which changes the line numbers and background colours to a deep black/brown that goes well with the rest of the colour scheme. It’s certainly worth a look! Download here.

Jul 112011
 

Imagine that you’ve just used SQL Server Management Studio to generate a script, and it’s done it’s job wonderfully, except for a lot of unsightly GO and blank lines:

GRANT SELECT ON [Person].[Address] TO [SomeUser];
GO

GRANT INSERT ON [Person].[Address] TO [SomeUser];
GO

GRANT UPDATE ON [Person].[Address] TO [SomeUser];
GO

GRANT DELETE ON [Person].[Address] TO [SomeUser];
GO

Although technically correct, this may offend your sense of aesthetics. If there’s only a couple of rows (as in the excerpt above, you can manually remove the unsightly GOs and blank lines, but this is a bigger ask when you have more than about 20 commands.

The simplest method I’ve come across is to use one of the DBA’s most trusted tools – Excel. Although there are other ways of doing this sort of operation, Excel is installed on almost every Windows machine, and this is a quick operation.

First, copy the entire set of commands, and paste into Excel as Column B, and in column A, add a number for each row (essentially an IDENTITY column). Remember that you can just fill out 1 and 2, select both of these, and drag down to the bottom:

image

This number column is used so that we know which order each row was in originally, because we’re about to sort the data. Select columns A and B for all rows, and sort by column B:

image

Next, select the rows you don’t want, and delete them.

image

Now, select the remaining data, and sort again, this time on column A, to return to the original sort order (minus the offending rows):

image

Copy the commands back into Management Studio, and you’re done.

GRANT SELECT ON [Person].[Address] TO [SomeUser];
GRANT INSERT ON [Person].[Address] TO [SomeUser];
GRANT UPDATE ON [Person].[Address] TO [SomeUser];
GRANT DELETE ON [Person].[Address] TO [SomeUser];
May 232011
 

If there’s one thing that really stands out about SQL Server, it’s the fantastic social community behind the product. If you keep an eye out on SQL blogs and Twitter networks, you can see events running every week across the US and the UK. Australia also has quite a strong SQL Server community, with the Melbourne SQL Server user group exceptionally popular. Although 30 minutes is set aside for socialising, I feel it’s just not enough – I’m such a social butterfly, you see.

To that end, Luke Hayler (@lukehayler)and I have organised an evening of social drinks. Come along, meet some other SQL Server professionals based in Melbourne, and discuss anything you like: SQL Server, virtualisation, your dog, or just wax lyrical about the quality of the beer available.

More details, and signup information at http://sqlserversocial.eventbrite.com/

Luke’s original post

image

That’s my kind of database cluster!

Apr 122011
 

Welcome to T-SQL Tuesday for April 12, 2011. This month is generously hosted by Matt Velic (Blog | Twitter), who poses the topic of APPLY, one of the many excellent additions to the T-SQL language in SQL Server 2005.

If you read other T-SQL Tuesday posts from today, you’ll get some excellent tips on how to use APPLY, and a list of the excellent things you can do with it. I’m going to go in a slightly different direction, and examine what your options are when using APPLY with database compatibility level 80.

Books Online states that the database must be at least compatibility level 90 to use APPLY.

It turns out that you can create a query that uses APPLY, and have it run in compatibility level 80. Consider the following setup:

CREATE DATABASE Compat80
GO
ALTER DATABASE Compat80 SET COMPATIBILITY_LEVEL=80
GO

USE Compat80
GO
SET NOCOUNT ON 
CREATE TABLE A (i int, name varchar(20)) 
CREATE TABLE B (i int, amount decimal(15,3)) 
INSERT INTO A (i, name) VALUES (1, 'Jack') 
INSERT INTO A (i, name) VALUES (2, 'Bob') 
INSERT INTO A (i, name) VALUES (3, 'Sally') 
INSERT INTO B (i, amount) VALUES (1, 25.0) 
INSERT INTO B (i, amount) VALUES (1, 50.5) 
INSERT INTO B (i, amount) VALUES (2, 16.0) 
INSERT INTO B (i, amount) VALUES (3, 110.0) 
GO 

Can we use an APPLY in Compatibility 80?

We’ve created a database in compatibility level 80, and created two tables, A and B. Let’s try a query with CROSS APPLY:

SELECT A.Name, dv.Amount FROM A 
CROSS APPLY (SELECT i, SUM(Amount) Amount 
             FROM B WHERE B.i = A.i GROUP BY i) dv 

image

There’s not much to say here, except that I proved Books Online wrong – I can do a CROSS APPLY in Compatibility 80. I don’t know the exact reason why I can do this, but it’s likely to be the simplistic nature of the query (which makes it less interesting). I’ll do a quick little victory dance, and we’ll move on.

A more useful use of APPLY

Where APPLY really shines is when functions are used. Let’s go ahead and create a simple function to test with:

CREATE FUNCTION dbo.fn_getB(@i AS int) 
RETURNS @result TABLE 
( 
      i   int NOT NULL 
    , amount decimal(15,3) 
) 
AS BEGIN 
        INSERT INTO @result 
        SELECT i, amount FROM B WHERE i = @i 
        RETURN 
END 
GO

The logic for this function is slightly different than the previous, but it only serves to make it more interesting. Let’s go ahead and use this function in an APPLY, remembering that we’re still in Compatibility 80:

SELECT * FROM A 
CROSS APPLY dbo.fn_getB(A.i) dv 

Msg 102, Level 15, State 1, Line 2

Incorrect syntax near ‘.’.

That’s disappointing, and rather cryptic. Let’s try again after switching to Compatibility 90:

ALTER DATABASE Compat80 SET COMPATIBILITY_LEVEL=90
GO 
SELECT * FROM A 
CROSS APPLY dbo.fn_getB(A.i) dv 
GO 

image

Success!

Getting around this restriction

We’ve seen that APPLY using functions is a handy tool, but can we still use it in databases that are in Compatibility 80? The following is a workaround, but it’s a bit of a hack. I just like thinking outside the box.

The limitation here is that we cannot let a Compatibility 80 database execute the query. We can, however, run the query from within a different database, such as Master (ignoring permission issues) by using sp_executesql.

EXEC sp_executesql N'USE Master; 
        SELECT * FROM Compat80.dbo.A 
        CROSS APPLY Compat80.dbo.fn_getB(A.i) dv ' 
GO

The main reason I’d want to use APPLY is to use the DMFs with the DMVs provided by SQL Server 2005, such as sys.dm_exec_*. However, these DMVs typically supply a database_id column, meaning you can run the query from within Master anyway. With that in mind, I can’t think of any real benefit of this technique – it’s almost always a better option to simply upgrade your databases to compatibility 90 or above – we are in 2011, after all.

Thanks again for this month’s edition of T-SQL Tuesday, Matt!

Apr 122011
 

It’s happened. After a long adolescence, SQL Server 2005 has finally grown up and left the nest. Today, April 12, 2011, marks the end-of-life date for mainstream support for SQL Server 2005.

As a parent, Microsoft will no longer have to completely support SQL Server 2005 in every way. MS won’t have to pick it up from Saturday night parties, or drive it to sports on Saturday mornings. (Quiz: Which sport would SQL Server 2005 play? Leave a comment!) While MS may spot SQL 2005 if it needs to catch a taxi home after a big night out, this sort of support will be fewer and far between.

SQL Server 2005 can now smoke if it chooses to, and if it happens to, it’s up to MS to decide whether to dissuade it, or let it continue. If you do catch your server smoking, I highly recommend putting out the fire immediately.

I think I’ve beaten that analogy quite enough. Extended support will continue until 12 April 2016, but it’s definitely time to consider a upgrade (to SQL Server 2008 R2 or SQL11) as part of your three-year plan. If you’ve still got SQL Server 2000 databases, I’d still consider those as a top priority though.

Thanks, SQL Server 2005! It’s been a pleasure, but it’s time to fly the coop. Good luck! (And yes, I do realise the irony of wishing it luck at the same time telling it to go away and never come back!)

(Full list of End Of Life dates for SQL Server)

Apr 112011
 

The question is, what is the difference between having a filter on the ON clause, or in the WHERE clause?

Consider the following queries:

USE TempDB 
GO 
CREATE TABLE A (i int, name varchar(20)) 
INSERT INTO A (i, name) VALUES (1, 'Jack'), (2, 'Ryan')
    , (3, 'Simon'), (4, 'Sandra'), (5, 'Daryl') 
GO 

CREATE TABLE B (i int, amount decimal(9,3)) 
INSERT INTO B (i, amount) VALUES (1, 100.25), (2, 10000)
    , (1, 0.45), (4, 234.23) 

-- WHERE
SELECT * FROM A 
INNER JOIN B ON A.i = B.i 
WHERE A.Name NOT LIKE 'S%' 

-- ON
SELECT * FROM A 
INNER JOIN B ON A.i = B.i AND A.Name NOT LIKE 'S%' 

Consider these two queries are the bottom. The first uses a “A.Name NOT LIKE ‘S%’” filter in the WHERE clause, whereas the second uses the same filter in the ON clause. The results are identical:

image

Where this gets interesting is when we use an OUTER JOIN. For example, consider the following two queries, which are identical to the previous two, but using a LEFT OUTER JOIN:

-- Query A - WHERE filter 
SELECT * FROM A 
LEFT JOIN B ON A.i = B.i 
WHERE A.Name NOT LIKE 'S%' 

-- Query B - ON filter 
SELECT * FROM A 
LEFT JOIN B ON A.i = B.i AND A.Name NOT LIKE 'S%' 

And the results:

image

What’s happened here? The only difference was the placement of the A.Name clause, but the second query (the ON) has added Simon and Sandra’s rows.

What’s happened is due to the ordering of the logical query processing steps. Logically, all JOINs are performed as INNER JOINs using the ON filters, and then, as a subsequent step, any OUTER JOIN rows are added back to the necessary side. After all JOINs are complete is the WHERE filter processed.

This means that in Query A (WHERE), the inner join between A and B was completed, then rows 3 (Simon), and 5 (Daryl) were added back in. Then the WHERE was applied, and Simon and Sandra were removed as their name begins with S.

In Query B (ON), A and B were joined together, but any rows in A with a name LIKE ‘S%’ were not joined, as the ON filter ignored them during the initial JOIN phase. The OUTER JOIN phase then added in all rows from the LEFT that weren’t joined, and so Simon and Sandra were added back in.

Is this a problem? I don’t believe it’s much of an issue, as I tend to think of the WHERE clauses as being distinct from the ON clauses. Just remember to keep your join filters (ON) and your row excluders (WHERE) separate, and understand the difference when necessary.

If you ran the sample code, don’t forget to clean up:

DROP TABLE A 
DROP TABLE B 
Mar 282011
 

A colleague mentioned he received a warning while building indexes online the other day, so I decided to check it out. A quick search in sys.messages for messages LIKE ‘%online%index%’ found the following message:

Warning: Online index operation on table ‘%.*ls’ will proceed but concurrent access to the table may be limited due to residual lock on the table from a previous operation in the same transaction.

The error message is fairly clear – if you have a transaction open with existing locks, you’ll get this warning, letting you know that your online index rebuild might not be as online as you think it will be.

As a test, let’s try the following code on AdventureWorks2008:

BEGIN TRAN
UPDATE Person.Person SET FirstName = 'Kenneth' WHERE BusinessEntityID = 1
-- Rebuild an index on this table
ALTER INDEX IX_Person_LastName_FirstName_MiddleName ON Person.Person 
   REBUILD WITH (ONLINE = ON)
ROLLBACK

In this case, we get the warning message due to the five extra locks taken as part of the UPDATE command. While this index is being rebuilt, other sessions trying to access that row will be blocked. This is a good warning.

However, what happens if we try to rebuild an index on a different table? Ideally, I’d want the same sort of warning – if I have any locks that could cause blocking, I’d like to know immediately. The following code will test that:

BEGIN TRAN
UPDATE Person.Person SET FirstName = 'Kenneth' WHERE BusinessEntityID = 1
-- Rebuild an index on a completely different table
ALTER INDEX IX_Customer_TerritoryID ON Sales.Customer 
   REBUILD WITH (ONLINE = ON)
ROLLBACK

The result is that we get no warning. Rebuilding an index on the Sales.Customer table has nothing to do with the Person.Person update we performed, so we miss out of the warning.

This goes against what I’d ideally like. In this case, access to the row modified in Person.Person will result in a block until the index rebuild has finished.

This is not much of an issue though. I can’t think of many situations where I’d want to rebuild an index inside a transaction – it’s the type of task where it’s more likely to be run in a very narrow, constrained transaction.

Mar 152011
 

Every couple of years, I pull out a particular book that’s been sitting on my bookshelf for the past decade, read it, and am suddenly refilled with the passion of computers and computing science. This book is Code: The Hidden Language of Computer Hardware and Software, authored by Charles Petzold.

This book is perfect for those who never did an EE or CS degree at university, and want to know exactly how and why computers work. I’m sure a lot of DBAs fall into this category! Petzold takes us on a wonderful journey, on how communication is encoded into our lives, and focussing particularly on digital codes – binary.  Starting with the most simple concepts (communicating simple messages with flashlights and Morse code over short range), new concepts are introduced in simple, logical steps. From the flashlights, electricity is introduced, and a telegraph system is designed.

From Morse code, we are introduced to Braille and to UPC barcodes, all tied together with the theme of binary logic – the information can be coded by a series of ones and zeroes.

The book slowly builds up these concepts in an easy to follow fashion, using telegraph relays to build logic gates, through to a machine that can add two binary numbers, and finally culminates in a design for a theoretical general purpose, programmable computer.

From there, the concept of a transistor is introduced, the Intel 8080 and the Motorola 68000 CPUs are examined in detail, and it’s then a whirlwind of fast logical steps from machine code, to assembly, to higher level languages, operating systems and file systems (with a reasonably detailed look at CP/M). Finally, modern (well, 1999) graphical operating systems are examined, along with how various types of information are stored – graphics formats, sound formats, video, and HTML.

The book doesn’t go into networking protocols (except for a brief couple of paragraphs on modems), but it is easy to extrapolate how computers can talk to one another from the previous concepts.

All along the way, Petzold sprinkles in the history of computing, and the entire story is woven in a very engaging tone. Every time I read this book, I’m reawakened to the beauty and underlying simplicity (underlying – there’s nothing simple about 10 million transistor processors!) of computing.

Highly recommended!

Mar 082011
 

The subject of T-SQL Tuesday #016, kindly hosted by Jes Borland (blog | twitter), is Aggregation. Although it’s a fine topic, I could not figure out what I could write about aggregation that would capture my interest, and more importantly, dear reader, yours.

I thought about the topic ceaselessly while doing dishes, feeding children, and cleaning up the house. I considered aggregate functions, and optional grouping parameters. I twisted the topic inside and out, trying to figure out an interesting angle. I considered telling a story about the Tuples vs the Aggregates on their way to Query Station (and spent an embarrassingly long time on that train of thought). Finally, I went out for a run, thought of a topic pretty quickly and started turning the idea over in my mind for the next 7 km.

Columnstore Indexes

A columnstore index is a new feature of Denali, whereby an index is stored vertically – one column per page.  Here is the whitepaper download – highly recommended, and only 5 pages of text. Imagine a row of a typical covering index (in this case, AdventureWork’s Person.Person table, and I’ve covered every column):

image

Every row stored in this particular index is stored on the same page. As each row is quite wide, only 5 rows will fit per 8 KB page.

A columnstore index, however, stores each column on a separate set of pages. I am unsure of the ordering of the rows* but each page is compressed. As the column is likely to contain similar data, this means that each page can have excellent compression ratios – a factor of 4-15 is quoted in the whitepaper – and thousands of rows can fit on a single page.

* Due to a lack of internals documentation so far. Are they ordered by the column, or by the clustering key? I can see arguments for either way, but haven’t yet dived deep enough to decide one or the other.

This allows for fantastic performance gains when performing aggregates, as fewer pages need to be read to service the query, and only the columns actually required by the query need to be accessed.

Howerver, due to the fact that columnstore indexes are read only, they really only have applications in data warehouses.

Can we reproduce columnstore indexes in SQL Server 2008?

The question of whether it is possible to reproduce columnstore indexes in SQL Server 2008 popped into my mind, and the short answer is “No, not a chance”. Sorry.

Let’s look at the properties of a columnstore index.

Read Only – SQL Server 2008 can certainly build index structures on read-only data. And, as it’s read only, separate aggregation steps can be created in 2008 to summarise data ahead of time. However, one of columnstore’s selling points is to remove the need to aggregate. Columnstore gives you a massive performance benefit with little effort.

Highly Compressed – SQL Server 2008 can compress indexes. However, to get highly compressible indexes, similar to columnstore’s you’ll only want a single column, which makes the index less useful, and forces you to join frequently.

Only select required columns – A query accessing a columnstore index will only access those columns that are needed by the query. To get the same sort of functionality in 2008, you would need to either create a covering index for each query you want (nothing like columnstore!), or join multiple indexes together (resulting in a large amount of logical reads).

Different table structure – Denali is able to use a completely different page structure, particularly as this data is read-only. This is not possible in SQL Server 2008, and still be usable with T-SQL.

No code changes – Denali’s implementation allows us to create a single columnstore index on all columns in the table, and we don’t need to modify any queries to use these columnstore indexes. It’s completely seamless.

I tried to create an example where we could create similar compressed indexes in 2008, but I was never able to merge multiple indexes together to query the data efficiently. Denali has specific optimisations available for columnstore indexes, and these cannot be reproduced. About the best we can do is to create a read-only compressed covering index, which is nowhere  near impressive these days.

After a few hours considering this, my conclusion is that you are better off optimising your data warehouse for those queries that you know are going to come, and ad-hoc queries will need to continue to run slowly, at least until you’ve upgraded to Denali. Frequently run queries will benefit from extra customisation (pre-aggregation, specialised indexes with compression), but there no chance of a similar payoff inside the database engine.

Not too long to wait now, and you’ll be able to take advantage of columnstore indexes!

Mar 012011
 

It’s finally happened. The table you created five years ago with an clustered integer IDENTITY column is about to hit 231 values (2,147,483,648). Five years ago, you were inexperienced with SQL Server, and had no idea that this table would grow this large – or that you’d be around to see it!

First, you reseed the IDENTITY value, back to -231.

DBCC CHECKIDENT (LargeTable, RESEED, -2147483647)

This should allow you to get another couple of years out of the table, and you will have enough time to implement a better table structure.

A flash of inspiration

A few hours later, you’re relaxing after work, quietly pleased about the bullet you’ve dodged today. A thought suddenly strikes you like a lightning bolt. You recall that inserts into a clustered index work best when inserting into an ever increasing value, at the end of the index. Can you  have made a massive error in judgement in forcing all inserts to always happen just before identity value 1?

While you toss and turn about this all night, feel free to consider the issue before reading on. What do you think will happen? Will you have an enormous number of page splits are you constantly run into identity 1? This is obviously a frequently inserted table, given that it has had 2 billion rows added in the past five years (1.1 million per day).

Testing the hypothesis

Here’s a nice block of code to test the problem. We create a table with 1000 rows approximately 1000 bytes long, to fit 8 rows per page.

CREATE DATABASE IdentityTest
GO
USE IdentityTest
GO

-- Create a test table, just under 1000 bytes per row. We can fit 8 rows per page.
CREATE TABLE T (i INT IDENTITY PRIMARY KEY, j char(960))
GO

SET NOCOUNT ON

-- Backup the database so we have a consistent log to examine
ALTER DATABASE IdentityTest SET RECOVERY FULL
BACKUP DATABASE IdentityTest TO DISK = 'C:\Work\IdentityTest_Full.bak' WITH INIT
GO

-- Seed the table with 1000 rows with identity ranges from 1-1000
INSERT INTO T (j) VALUES ('AAAAAAA')
GO 1000

-- Reseed back to a large negative number
DBCC CHECKIDENT (T, RESEED, -2147483647)
GO

-- Add another 100 rows with negative identities counting towards positive
INSERT INTO T (j) VALUES ('BBBBBBB')
GO 100

-- Examine the transaction log to see page split operations.
-- Note the "CheckIdent" Transaction Name around row 1136 where we reseeded.
SELECT [Transaction Name], * FROM fn_dblog(null, null)
WHERE [Transaction Name] IS NOT NULL
ORDER BY [Current LSN]
GO

-- Cleanup
USE master
GO
DROP DATABASE IdentityTest
GO

We can now clearly see the page split pattern in the transaction log. As expected, during inserts prior to the re-seeding, we have a pattern of 8 inserts before a Page Split occurs, signalling that no more rows will fit onto this page.

image

Analysis of the results

After we run the CheckIdent operation (row 1136), we can see that the next insert operation causes a page split, because we are trying to insert onto the first page of the clustered index. This already has 8 rows on it (IDs 1-8), so ID -2,147,483,646 won’t fit.  The new page ends up with the new row, plus IDs 1, 2, and 3. This leaves room for four more rows to be inserted.

As expected, the fifth insert after the CheckIdent (row 1142) causes another page split. This time, the four negative ID rows are moved to one page, and rows 1,2,3 are on the other page.

So far, the pattern seems bad. We’ve had two splits after inserting five rows, and we get another split on the ninth insert (row 1147). In this case, however, we have not bumped up against the positive IDs. We currently have a single page holding the previous 8 negative ID rows, and the ninth won’t fit at the end, so a new page is allocated with this single row.

Now we’re into a stable state. Every 8 inserts, a new page is allocated, and the inserts never again touch the pages containing the first positive rows.

Conclusion

The results of this test are pretty clear – the page splitting algorithms will quickly re-stabilise after re-seeding. You’ll no longer be inserting into the end of the clustered index B-tree, but you will be inserting into a consistent point, which is separated from the start of the values, and so you should not see any performance degradation.

You were stressing for nothing. Relax!