Feb 152011
 

I’ve done five SQL Server Microsoft Certification exams in the past few years, passing each on the first try, and only using an Exam Preparation book for the first exam. I believe there are two methods of studying for an exam – either cramming and hoping you retain enough knowledge during the exam, or creating a targeted list of topics, and living and breathing the subject until you know it intimately.

This article will talk about the second method, which I believe results in a much deeper understanding of the exam contents.

Daily Practice

Use SQL Server daily. This is the most important step. If you are wanting to take an exam on SQL Server, you should be using it daily (alright, you can have weekends off!). The point of this is to be continually using what you’ve been learning, reinforcing it, and completely internalising it. If you aren’t using SQL Server, why get certified in it? 

I recommend spending 30-60 minutes each day using SQL Server in addition to your day job. Create a database in your development environment, and practice the techniques and commands you are learning, and then utilise them in production where appropriate (and approved). Even if you’re doing routine work and have no current need to use the new features that will be on the exam, you should still aim to practice each day. This study period can be at any time of the day – whatever works for you. I recommend either during lunch, or before starting work in the morning. Personally, evenings don’t work for me, as I’m less alert at the end of the day and can’t get started until 9:30pm due to family commitments.

If you don’t have the correct environment at work (for example, you’re studying for SQL Server 2008 certifications, but you only use 2000 at work), it may be harder to get the necessary practice. In this case, I recommend discussing the options with your boss. It’s in his interests to get you skilled in the latest version of SQL Server, due to the amount of technical debt growing with older versions of SQL Server. If this is not possible, buy your own copy of SQL Server 2008 Developer Edition, and install it on your own computer (it runs fine on a netbook – just ask Noel McKinney!)

What’s Going To Be Examined?

How do you know which topics will be on the exam? The title of the exam is often not a very good indicator, and goes into no detail, but the exact requirements of the exam are freely available on the Microsoft Learning web site. For example, scan through the list of requirements for the 70-433 exam, available on the “Skills Measured” tab.

You’ll see that there are seven major sections, with four or five dot-points against each.  This tells you exactly which topics may be included, and as long as you have a good handle on each of these, you’re good to go!

I like to make a copy of this list, and cut out the topics that I already know well, leaving behind a list of subjects that I believe I need to work on. I then take one of these subjects, and throw myself into learning the topic.

This can be from:

  • reading Books Online
  • searching for online articles or blog posts on the topic
  • using one of the Exam Preparation guides
  • watching videos (e.g. from www.sqlshare.com)

Reproduce to Reinforce

When reading material with examples, reproduce each example in your own environment. Type out the code each time, run it, and check that it works.

After a period of time (an hour, a day), reproduce the example without looking at the original code. If you get stuck, use Books Online for syntax assistance.

This method of following the examples has two effects. Firstly, you are more likely to retain the syntax if you’re actively typing it, rather than passively copying and pasting. Secondly, by reproducing the example with minimal assistance, you’re proving to yourself that you now know the material.

Apply Your Knowledge

Finally, consider the pearls of wisdom that you’re learning, and see if you can apply it back to your own situation at work. If you’re learning about mirroring, and you currently have a cluster at work, consider the benefits of your current cluster, and contrast these with the benefits of switching to mirroring. You don’t actually have to change to a mirror, but consider how things would be different if you did. How does your DR plan change?

If you’re studying performance, and you don’t have a performance baseline for your servers, look into setting one up, and start to monitor performance more closely.

If you have a colleague available (preferably one that already knows this material) take them out for coffee and take them through your reasoning. The act of explaining your thoughts will solidify the concepts in your head, and they can suggest things you might have missed.

If no colleagues are available, you can throw your ideas to the Internet. Find a SQL Server forum, search for related threads, and if none are found, start your own. Or, utilise the #sqlhelp tag on Twitter.

Rinse and Repeat

Once you’ve completed the above steps for a specific topic, go back and choose another topic. Once you’ve worked your way through the list, you should be ready for the exam. Good luck, and if these tips are helpful, please let me know!

Jan 182011
 

imageWhen you think of a craftsman, a likely image to immediately pop into your head would be one of an older man, working with hand tools on a piece of wood.  Each movement he makes are deliberate and precise, and he seems to know intuitively what needs to happen next.  His tools are so familiar to him, and used so effortlessly that they seem like an extension of his body.

Although database work is a far cry from any form of woodwork, it is still a craft, albeit one that is difficult to garner as much sympathy at family gatherings as the classic crafts.  As a craft, one of the two main things database professionals will talk about is "what have you created or done?", and "which tools do you use?".

This second question is a great question, as you may discover a fantastic tool that you’d never heard of, and which can provide vast improvements in the efficiency of your regular time-consuming tasks, whether by providing necessary information more easily, or by automating tasks you currently do manually.

I frequently get odd looks when I state that I don’t regularly use third-party tools at all (with the exception of backup compression tools). 

What?  No Tools?

The primary reason for this stance is that I work for many different clients, and the majority do not have any tools over what is provided with SQL Server, and for various reasons, cannot justify acquiring any additional tools.

This is not to say that I am against third-party tools – I definitely do make use of them when available, but I believe that for every task that a tool provides assistance with, a database professional should be able to do without the tool at a pinch.

I liken this to our craftsman’s view of power tools.  There are many situations where power tools will greatly increase the speed and ease of creating some projects in the workshop, but for other situations hand-tools reign supreme.  Perhaps the craftsman is visiting family and is asked to repair a broken chair, or has a simple task to do that doesn’t justify setting up the power tools.

Let’s See Some Examples

As an example, Ola Hallengren’s excellent Index Optimisation scripts (amongst other things) are an excellent and free utility for more intelligently performing index maintenance.  However, at a pinch, a DBA should be able to query the sys.dm_db_index_* family of DMVs to determine which indexes require maintenance, and issue the appropriate ALTER INDEX command.

Automatic code-completion is another helpful tool that some people swear by.  As this is a relatively new feature in the SQL Server world (whether a third-party tool, or the one provided as part of SQL Server 2008′s Management Studio), many DBAs are used to not using this.  In the future, however, we will become used to having code completion tools available.  It’s still important to know how to quickly query the database structure using sp_help, sp_helptext, sys.objects, INFORMATION_SCHEMA, etc. (Incidentally, I wow at least 5 people a year by using the Alt-F1 shortcut for sp_help. Simply highlight a table name in a query window in Management Studio, and hit Alt-F1.  Very useful.)

There are a number of tools available that can be used to trace expensive queries, and I do enjoy using a tool developed in-house to provide trace control and analysis.  If this is not available, or if it would take too much time to set up, I’m happy to pull out Profiler, script out a focussed trace, collect some data, and then analyse the resulting trace files with fn_trace_gettable().

There are numerous other examples.  Can you read a deadlock graph, or do you need Profiler to draw a picture?  Can you read a blocking chain in sp_who2, or do you need Adam Machanic’s (blog | twitter) Who Is Active?  What if you are called upon to help with a SQL Server 2000 instance, where this is unavailable?

Regardless of whether you primarily use the base set of tools, or primarily use the "power" tools, it pays to be familiar with both types, and to be able to use each tool in the proper way in the right situation.

Jan 072011
 

When a building is designed, one of the main requirements is to have an appropriate foundation, and a larger and stronger building requires a larger and stronger foundation.  A one-storey house, for example, requires a much simpler foundation than an 85-storey office building.  The deeper the foundation is, the higher the building can be, and the more resistant it is to outside forces, such as strong winds.  The foundations are hidden from the casual glance, but are absolutely crucial to the success of the building.

Careers, particularly in SQL Server administration, are much the same as a building.  You must first create a solid foundation of skills, and, like a building’s foundation, these may not be visible.  This will include both technical knowledge and "soft"skills, such as:

    · The Windows operating system (including monitoring)

    · File system knowledge

    · Knowledge of networking protocols

    · Scripting basics for automation purposes (e.g., Powershell)

    · Server architecture

    · Storage technologies and I/O concepts

    · Basic Windows/Active Directory security concepts

    · Interpersonal relationships and communication

    · Knowledge of table structure and basic SQL syntax

    · Data backups, restores, and recoverability

    · SQL Server security

(I hesitate to put SQL Server Internals in here as they are a foundational skill for more in-depth SQL Server work, but at a much deeper level, and requires the other foundations first.)

You’ll notice that there is not a lot on that list that is specific to SQL Server.  At the heart of it, a database administrator is a specialised system administrator, and requires similar base skills as a Windows sys admin, with the addition some database knowledge.  At the least, a DBA needs to be know how to configure security, query and change data, and provide a measure of data protection in the forms of backups.

The analogy with a building breaks down a little when you realise that, unlike building construction, you don’t have to complete your foundations prior to working on the "visible" parts of your career.  This is a good thing, though.  You can get started with a DBA career without knowing much about Windows Server, and your limited knowledge will be sufficient to keep you from blowing over during normal weather, but will be inadequate if you attempt to configure a clustered SQL Server deployment.

This holiday period is a good time to reflect.  If you made any New Years Resolutions, after a week they’re either sticking or they’re shot, so you can reflect now without having to worry about the stigma of frequently broken New Years Resolutions hanging over your head.  What areas of your foundations are a little shaky, and could use some attention?  Which SQL Server features would you like to focus on this year, but would require an improvement in the foundations before you can really understand it?  As an example, if you want to improve your backup speeds, you may need to improve your knowledge about I/O throughput to your SAN before you can confidently use SQL Server’s backup performance enhancements (compression, striped backup files, etc).

Personally, I need to improve my knowledge of SANs, and more up to speed with the features that they can bring to the table, and the basics how each of these features work – enough to understand the pros and cons of each feature.

So, get to it. Build those foundations stronger and become a rock!

Jul 212010
 

As mentioned in my last post, it is possible to use RAISERROR WITH NOWAIT in order to immediately send a message back to the client.  This is useful for long, procedural (i.e., not set-based) stored procedures that loop over many different rows.

Consider the following stored procedure:

CREATE PROCEDURE dbo.InfoMsgTest
AS
    DECLARE @i int;
    SET @i = 1;
    WHILE @i < 100
    BEGIN
        RAISERROR('%d', 0, 1, @i) WITH NOWAIT;
        -- Do some processing!
        WAITFOR DELAY '00:00:01';
        SET @i = @i + 1;
    END
GO

This procedure is a simple loop that counts to 100.  Each time around the loop, a RAISERROR command is executed, passing out the value of @i.  Any message at all could be passed – you could include how many rows have been processed, how many to go, and what the primary key is of the current row.

On the client, consider the following C# console application.  All error handling has been removed, and I haven’t written any .NET code in two years, so your forgiveness is appreciated!

using System;
using System.Data.SqlClient;

namespace InfoMessages
{
    class Program
    {
        static void Main(string[] args)
        {
            SqlConnection conn = new SqlConnection(
                "Data Source=(local);Initial Catalog=AdventureWorks;" 
                    + "Integrated Security=SSPI;");
            conn.InfoMessage += 
                new SqlInfoMessageEventHandler(InfoMessage);
            conn.Open();
            SqlCommand cmd = new SqlCommand("exec dbo.InfoMsgTest", conn);
            cmd.CommandTimeout = 120;
            Console.WriteLine("Processing starting.");
            cmd.ExecuteReader();
            conn.Close();
            Console.WriteLine("Processing complete.");
        }

        private static void InfoMessage (object sender, 
            SqlInfoMessageEventArgs e)
        {
            Console.WriteLine("Percent completed: " + e.Message + "%");
        }
    }
}

Note that it is vital to use a cmd.ExecuteReader().  cmd.ExecuteNonQuery() will not fire the InfoMessage handler. 

And the output:

image

There you have it!  A GUI application shouldn’t be too much harder.  Little things like this can make the difference between having a responsive application that informs the user as to what is happening, versus a black box that appears to hang for 30 seconds while the stored procedure is executed.

Jul 192010
 

SQL Server provides two primary ways of communicating data to the client – Result Sets and Messages.  Typically, a client application will respond to Result Sets, and any error messages that are raised by SQL Server with a severity higher than 10.  For error messages with a severity of 10 or less, the .NET event SQLConnection.InfoMessasge can be used to return information during query processing.

In Management Studio, the difference between a Message and and Error is that the Error is flagged in red on the Messages result panel and may trigger rollbacks or break connections, depending on the severity of the error.

PRINT

One use of communicating data back to the client is for stored procedures to let the user know where they are up to.  While this could be used for production code, it is usually used as a poor man’s debugger.  By sprinkling PRINT “Currently at point x” statements through your stored procedure, you can get an inkling of where the processing is up to.

However, PRINT has a noticeable drawback – the results are not returned immediately.  Instead, anything sent to PRINT will be buffered, and not released until the buffer is full, or the query completes.  This buffer is around 8KB in size.

“No problem!” I hear you cry. “I’ll just pad my PRINT message out to be 8KB!”  Nice try, but unfortunately, the PRINT statement will trim to varchar(8000) or nvarchar(4000), which isn’t enough.  For example:

PRINT 'A' + REPLICATE(' ', 8000)
PRINT 'B' + REPLICATE(' ', 124)
WAITFOR DELAY '00:00:05'
PRINT 'C'

In this example, we’re using REPLICATE to try to pad out the PRINT’s message, but we need two PRINT statements to get anything back immediately.  By running the example, and flicking to the Messages screen in Management Studio, you can see if A is being returned before or after the WAITFOR DELAY statement.  In my tests, the 124 on the B line is not a static value – it was 134 for a different server. 

So, two PRINT messages does not really seem like an acceptable solution.

RAISERROR

Enter RAISERROR. While the RAISERROR syntax is slightly more complicated, it’s also a lot more powerful (although the misspelling is quite annoying).

RAISERROR ('Message', 0, 1, ..., ...) WITH NOWAIT

The first parameter is simply a textual description of the error/message.  Next (0) is the Severity level.  If this value is 10 or less, it will be counted as a Message, and not as an Error.  The 1 indicates the State of the message – for a message, you’ll generally keep this at 1. After the State, you can list multiple parameters that will be inserted into the first parameter – more on this shortly.

 

Example 1 shows two methods of RAISERROR, one where the text of the message is stored in a variable, and one where it is included in the RAISERROR command.  This simply returns “Currently at position 56” in both instances.  Note the WITH NOWAIT.  This tells SQL Server to send the message back to the client immediately, effectively avoiding the problems PRINT has.

-- Example 1
DECLARE @msg nvarchar(200) = 'Currently at position %d.'
RAISERROR (@msg, 0, 1, 56) WITH NOWAIT
RAISERROR ('Currently at position %d.', 0, 1, 56) WITH NOWAIT

Note that the equivalent PRINT statement would be:

PRINT 'Currently at position ' + CONVERT(varchar(10), 124) + '.'

 

Example 2 shows how easy it is to output a text value.  This is useful for displaying the current value of the a loop.

-- Example 2
DECLARE @somevalue varchar(200) = 'Melbourne'
DECLARE @msg nvarchar(200) = '@somevalue is currently %s.'
RAISERROR (@msg, 0, 1, @somevalue) WITH NOWAIT

 

Finally, Example 3 shows how you can combine multiple values in your output.

-- Example 3
DECLARE @somevalue varchar(200) = 'Melbourne'
DECLARE @msg nvarchar(200) = '@somevalue is currently "%s" at position %d.'
RAISERROR (@msg, 0, 1, @somevalue, 124) WITH NOWAIT

 

Monitoring

Another benefit of RAISERROR over PRINT is that it is much easier to trace  RAISERROR in Profiler.  Simply capture “User Error Message” events for Error 50000, and you’ll get the messages.  Of course, you can always filter on the severity or the SPID, or any other filter that is appropriate.

image

So, there you have it!  RAISERROR is a much more sophisticated method of returning status messages to the client that using PRINT.

Jul 132010
 

(It’s T-SQL Tuesday #008 – Gettin’ Schooled)

I learn by doing, and by teaching.

Studies have shown that the best way to learn a topic is to teach it to someone else.  I agree wholeheartedly with this – you don’t really know a topic until you’ve had to put it into your own words and get someone else to understand.  Helping out people on the MSDN SQL Forums and the SQLServerCentral forums is a great way of learning.  It’s a very ad-hoc method, as there is no guarantee what you’ll be looking at on any particular day.  Although I might not know the answer, a well written question will pique my interest, and, as one of my strengths is researching how to do things with SQL Server, I’ll attempt to ferret out the answer.  This results in a deeper understanding for me, and (hopefully) a thankful person on the other end. 

Although helping completely unknown people on the Internet can be fun, it’s a lot more satisfying when helping in person, either through teaching courses, giving presentations at user groups, or one-on-one mentoring.  These require you to know the topic thoroughly up front, as there is much less of an opportunity to dart off to Books Online.

I don’t read too many SQL Server books anymore, with a few notable exceptions, such as the Inside SQL Server series, and the SQL Server MVP Deep Dives. These are highly recommended due to their deep technical nature.  The MVP Deep Dives is especially interesting, as it contains a wide range of topics about what MVPs find interesting, as opposed Books Online worded differently.  (This is not to bag authors – there’s definitely an audience for well written books – I’m just happy with Books Online!)  This is a very similar type of format to podcast interviews.  I don’t recall how many different times I’ve heard Paul and Kim go over the same material once more, but it’s always an entertaining listen!  With 90 minutes on a train each day, podcasts are quite useful, as long as they’re not dry technical topics.  Videocasts are not my thing as I rarely have the opportunity.

I keep up with blogs (thank you, Google Reader!) to see what current ideas are floating around, but it’s necessary to filter them – I don’t have time to read every blog post in detail, although many are deserving of that attention!  Instead, I’ll flick over the content to get a feel for the topic, and keep it in mind for later reference.  Blogs can be quite handy when searching, but it’s always worth remembering not to just blindly follow advice given.  Think through the offered steps and consider whether it makes sense before trying it out on your production system.

I believe in the value of certifications, although only as a supplement to experience.  I would love the opportunity to do the SQL Server MCM course as it appears to be an excellent test of all areas of SQL Server, but the wife and kids will insist on spending three weeks in Seattle!

If I had to pick one method of learning that I believe is optimal, I would choose mentoring.  It’s always important to have a mentor, even if you’re considered an expert, if only to bounce ideas off.  And it’s fantastic to give back by mentoring others.

Jul 122010
 

OK, stop groaning over the title of this post.  It’ll probably be the best pun you read until you flick over to a slightly wittier SQL blog.

I’ve recently been upgrading an application from SQL Server 2000 to SQL Server 2005, and analysing performance between the two.  A common technique with this application is to create reporting stored procedures that have many different parameters, and allow the user to enter as few, or as many as they like.  (And then the text strings get a ‘%’ attached at each end, and thrown to a LIKE operation.)

For example, consider the following stored procedure:

USE AdventureWorks2008
GO

CREATE PROCEDURE dbo.TestOR 
      @PersonType nchar(2)
    , @FirstName nvarchar(50)
    , @LastName nvarchar(50)
AS
    SELECT * 
    FROM Person.Person
    WHERE (PersonType = @PersonType OR @PersonType IS NULL)
        AND (FirstName = @FirstName OR @FirstName IS NULL)
        AND (LastName = @LastName OR @LastName IS NULL)
GO

EXEC dbo.TestOR @PersonType = 'EM', @FirstName = null, @LastName = null
EXEC dbo.TestOR @PersonType = 'EM', @FirstName = 'Rob', @LastName = null
EXEC dbo.TestOR @PersonType = null, @FirstName = null, @LastName = 'Caron'
GO

You can see that the driving force here is the pattern (FirstName = @FirstName OR @FirstName IS NULL).  This means that if you do not supply a value for @FirstName (or set it to NULL), then the second part of the OR will always return TRUE, and so all rows will be selected, cancelling out the need for first part.

This appears to be a very good method of creating a single stored procedure that can flexibly take many different parameters.  It probably performed quite well in development too, until the amount of data increased.

Let’s have a look at the execution plan:

image

Oh dear.  Table scans.  This example is only using a single table, but you can imagine what would happen if search parameters could be in multiple tables.

An additional problem with this method is that of parameter sniffing – if the query is initially run with a LastName only, then the execution plan will be optimised for a LastName – and this may not be appropriate for the next execution.  This can be demonstrated by running “EXEC sp_recompile dbo.TestOR”, and then running query #3, then #2, then #1.  The execution plans are the same for all three, but the plan has changed, as a different index has been used:

image

There are a few ways to fix this.  One is to create a different path for each combination of variables that are optional.  However, this rapidly becomes unwieldy – for this example, we would need nine different paths, all with very similar code!

Another option is to move to dynamic SQL:

CREATE PROCEDURE dbo.TestDynamic
      @PersonType nchar(2)
    , @FirstName nvarchar(50)
    , @LastName nvarchar(50)
AS
    DECLARE @sql nvarchar(max)
    DECLARE @params nvarchar(max)
    SET @sql = 'SELECT * 
                FROM Person.Person
                WHERE (1=1) ' 
    IF @PersonType IS NOT NULL 
        SET @sql = @sql + ' AND PersonType = @PersonType '
    IF @FirstName IS NOT NULL 
        SET @sql = @sql + ' AND FirstName = @FirstName '
    IF @LastName IS NOT NULL 
        SET @sql = @sql + ' AND LastName = @LastName '

    SET @params = '@PersonType nchar(2), @FirstName nvarchar(50), @LastName nvarchar(50)'
    
    EXEC sp_executesql @sql, @params, @PersonType = @PersonType
              , @FirstName = @FirstName, @LastName = @LastName
GO

Let’s have a look at the new execution plans:

image

Much better!  We now have three different execution plans, and three different SQL statements being executed.

There are a number of benefits to switching to dynamic SQL in this case:

  1. The query is simpler.  This means that the query plans are likely to be more stable – there is less change of a bad plan being generated.
  2. Each combination of parameters will get its own execution plan, and this will be stored in the cache – in this case, we could have nine different plans.
  3. The code is easier to maintain.  It’s a little harder to read, but you only have a single copy of the query – it’s just built up along the way.
  4. The users still get to have their flexible interface – very important when the upgrade is supposed to change as few things as possible, functionality-wise.
Jul 082010
 

A common myth is that the LocalSystem account has no access to networked resources, and so you may have trouble getting SQL Server to backup to remote locations.

Excusing the fact that running SQL Server or IIS as LocalSystem is not a best practice, it is still possible to connect to networked resources. This is done by the domain account DOMAIN\ComputerName$. For example, if my server SQL01 was on the COMPANY domain, there will be an account in Active Directory named COMPANY\SQL01$. Whenever a service running as LocalSystem attempts to connect remotely, it will attempt to use this account. The restriction is that the server must be in a domain – a workgroup will not cut it.

A recent example of where this came in handy recently was an IIS installation that called Crystal Reports, which ran using System DSNs to connect to the database. These DSNs were configured to use a SQL login with no password. This worked quite well, until the security was tightened, and the SQL login was to be given a password. This then caused every report using the DSN to prompt the user for a password, as System DSNs cannot store passwords. Possible solutions were to change all 100+ reports to File DSNs (and protect them well, as the password would be stored in plain text inside the DSN). Alternatively, the System DSN could be modified to log in using Windows Authentication.

As IIS was running as LocalSystem, the initial attempt was to change this to be a domain account. Unfortunately, however, many security issues appeared, particularly with some versions of Internet Explorer. Allowing the DOMAIN\ServerName$ account limited access to the database removed the reliance on the insecure SQL login, and avoided a lot of rework.

As mentioned earlier, it is a better practice to use a dedicated domain account, but this technique may be useful in a pinch.

Jul 072010
 

While looking for a detailed explanation of cache-store flush messages in the SQL Server ERRORLOG, I came across this page: http://blogs.msdn.com/b/sqlprogrammability/archive/2007/01/17/10-0-plan-cache-flush.aspx, which refers to dropping a database causing the procedure cache to be flushed.

As someone that occasionally creates separate databases to hold temporary data, or a subset of an entire database, I did some investigations.

Script 1:

SELECT * FROM sys.dm_os_memory_cache_counters
WHERE name in ('Object Plans', 'SQL Plans', 'Bound Trees')
SELECT * FROM sys.dm_exec_cached_plans

This code simply reports on the state of the procedure cache at the server level.

Script 2:

CREATE DATABASE TestDB
GO
DROP DATABASE TestDB
GO
SELECT * FROM sys.dm_os_memory_cache_counters
WHERE name in ('Object Plans', 'SQL Plans', 'Bound Trees')
SELECT * FROM sys.dm_exec_cached_plans

We create a database, drop it, and then execute the queries from Script 1.

Conveniently, I have two development instances (2005 and 2008) with detailed plan caches that I have no problems with potentially flushing.  In SQL Server 2005, we get the following:

Before (Script 1):

image

After (Script 2):

image

As (unfortunately) expected, the procedure cache is gone.  Now let’s try SQL Server 2008:

Before:

image

After:

image

No change!  Fantastic! The obvious conclusion is that you can drop or detach databases as much as you like in SQL Server 2008, but you may want to be aware of the potential effect you have on the server when using 2005.

Jul 052010
 

It is a common practice to rebuild indexes frequently in order to improve SQL Server performance.  The problem with rebuilding indexes is that you need to have space inside the data file to hold the index currently being rebuilt.  This means that a 5 GB index will require an additional 5 GB of space.  The subsequent problem here is that when the operation is over, the database will appear to have 5 GB of free space, and the DBA might decide to shrink the database (a bad idea, as this will re-fragment the data file).

A potential solution, for those building non-clustered indexes offline, is to first disable the non-clustered index.  A disabled index is akin to dropping the index, but keeping the index definition. After an index is disabled, it must be rebuilt before it can be used, as SQL Server has no way of knowing how many, or which, rows were inserted, updated, or deleted during the period the index was disabled.

This means that a 5 GB index can be rebuilt in place, using the same 5 GB.  The operation may be a little bit slower, and temporary space (either in TempDB or the current database) will be needed to re-sort the index (rather than base it off the current “live” copy of the index, as there is none), but it removes the requirement for the data file to have enough empty space to hold a second copy of the index.

Note that this only applies to non-clustered indexes, as disabling a clustered index will result in the entire table being unavailable, and all non-clustered indexes disabled. Ideally, clustered indexes will be based on a small ever-increasing clustered key, which will greatly reduce the need to ever de-fragment the non-clustered indexes (although there are cases, for example when the row is initially inserted very small (due to variable length columns), and then later updated to populate the variable length columns, widening the row and causing it to not fit on the existing page, forcing a page split).

While this may be a useful technique to avoid data file growth, I would worry about an environment that does not have sufficient working room to rebuild the largest non-clustered index.

Example:

IF EXISTS (SELECT * FROM sys.databases WHERE name = 'IndexRebuildTest')
    USE MASTER
    DROP DATABASE IndexRebuildTest
GO
CREATE DATABASE IndexRebuildTest
GO
USE IndexRebuildTest
GO

-- Force the transaction log to grow
ALTER DATABASE [IndexRebuildTest] MODIFY FILE ( NAME = N'IndexRebuildTest_log'
     , SIZE = 51200KB )
GO

CREATE TABLE t1 (
      i int IDENTITY
    , UniqueID UNIQUEIDENTIFIER DEFAULT newid()
    , c CHAR(1000)
    , CONSTRAINT pk_t1 PRIMARY KEY (i)
)
GO
CREATE UNIQUE NONCLUSTERED INDEX ncix_UniqueID ON t1(UniqueID) INCLUDE (c)
GO

SET NOCOUNT ON
GO

INSERT INTO t1 DEFAULT VALUES
GO 10000

-- Note that sp_spaceused reports the database size including transaction log.
EXEC sp_spaceused @updateusage = N'TRUE'

image

We have an 80 MB database –  but 50 MB of that is the transaction log.  The data is about 12 MB, and the index is 18 MB – quite a large, fragmented index!

ALTER INDEX ncix_UniqueID ON t1 REBUILD WITH (SORT_IN_TEMPDB = ON)
GO
EXEC sp_spaceused @updateusage = N'TRUE'

image

Now, our database has grown to 91 MB – an increase of 11 MB.  We’ve also experienced a shrinking in the size of the index due to the rebuild, saving 6 MB.

Let’s run the first listing again (to reset the database size back to 80 MB), and then try disabling the index first.  Note that the 18 MB currently being taken by the index is immediately released when the index is disabled.

ALTER INDEX ncix_UniqueID ON t1 DISABLE
GO
ALTER INDEX ncix_UniqueID ON t1 REBUILD WITH (SORT_IN_TEMPDB = ON)
GO
EXEC sp_spaceused @updateusage = N'TRUE'

image

There you have it.  The database is still 80MB (it fluctuates slightly, depending on how fragmented in the index is each time we reset the database), and there is 6 MB of unallocated space – which is the savings from de-fragmenting the index.

Again, this primarily only useful for rebuilding offline – the index will be unavailable while it’s disabled.  If you do have 2005/2008 Enterprise Edition, you could rebuild online and at least have the table available during the rebuild, but the index will not be.  (And if you do have Enterprise Edition, forking out another $2000 for extra disk space won’t be an issue!)