Articles for July 2010

Using RAISERROR for Progress Messages

As mentioned in my last post, it is possible to use RAISERROR WITH NOWAIT in order to immediately send a message back to the client.  This is useful for long, procedural (i.e., not set-based) stored procedures that loop over many different rows.

Consider the following stored procedure:

CREATE PROCEDURE dbo.InfoMsgTest
AS
    DECLARE @i int;
    SET @i = 1;
    WHILE @i < 100
    BEGIN
        RAISERROR('%d', 0, 1, @i) WITH NOWAIT;
        -- Do some processing!
        WAITFOR DELAY '00:00:01';
        SET @i = @i + 1;
    END
GO

This procedure is a simple loop that counts to 100.  Each time around the loop, a RAISERROR command is executed, passing out the value of @i.  Any message at all could be passed – you could include how many rows have been processed, how many to go, and what the primary key is of the current row.

On the client, consider the following C# console application.  All error handling has been removed, and I haven’t written any .NET code in two years, so your forgiveness is appreciated!

using System;
using System.Data.SqlClient;

namespace InfoMessages
{
    class Program
    {
        static void Main(string[] args)
        {
            SqlConnection conn = new SqlConnection(
                "Data Source=(local);Initial Catalog=AdventureWorks;" 
                    + "Integrated Security=SSPI;");
            conn.InfoMessage += 
                new SqlInfoMessageEventHandler(InfoMessage);
            conn.Open();
            SqlCommand cmd = new SqlCommand("exec dbo.InfoMsgTest", conn);
            cmd.CommandTimeout = 120;
            Console.WriteLine("Processing starting.");
            cmd.ExecuteReader();
            conn.Close();
            Console.WriteLine("Processing complete.");
        }

        private static void InfoMessage (object sender, 
            SqlInfoMessageEventArgs e)
        {
            Console.WriteLine("Percent completed: " + e.Message + "%");
        }
    }
}

Note that it is vital to use a cmd.ExecuteReader().  cmd.ExecuteNonQuery() will not fire the InfoMessage handler. 

And the output:

image

There you have it!  A GUI application shouldn’t be too much harder.  Little things like this can make the difference between having a responsive application that informs the user as to what is happening, versus a black box that appears to hang for 30 seconds while the stored procedure is executed.

PRINT vs RAISERROR

SQL Server provides two primary ways of communicating data to the client – Result Sets and Messages.  Typically, a client application will respond to Result Sets, and any error messages that are raised by SQL Server with a severity higher than 10.  For error messages with a severity of 10 or less, the .NET event SQLConnection.InfoMessasge can be used to return information during query processing.

In Management Studio, the difference between a Message and and Error is that the Error is flagged in red on the Messages result panel and may trigger rollbacks or break connections, depending on the severity of the error.

PRINT

One use of communicating data back to the client is for stored procedures to let the user know where they are up to.  While this could be used for production code, it is usually used as a poor man’s debugger.  By sprinkling PRINT “Currently at point x” statements through your stored procedure, you can get an inkling of where the processing is up to.

However, PRINT has a noticeable drawback – the results are not returned immediately.  Instead, anything sent to PRINT will be buffered, and not released until the buffer is full, or the query completes.  This buffer is around 8KB in size.

“No problem!” I hear you cry. “I’ll just pad my PRINT message out to be 8KB!”  Nice try, but unfortunately, the PRINT statement will trim to varchar(8000) or nvarchar(4000), which isn’t enough.  For example:

PRINT 'A' + REPLICATE(' ', 8000)
PRINT 'B' + REPLICATE(' ', 124)
WAITFOR DELAY '00:00:05'
PRINT 'C'

In this example, we’re using REPLICATE to try to pad out the PRINT’s message, but we need two PRINT statements to get anything back immediately.  By running the example, and flicking to the Messages screen in Management Studio, you can see if A is being returned before or after the WAITFOR DELAY statement.  In my tests, the 124 on the B line is not a static value – it was 134 for a different server. 

So, two PRINT messages does not really seem like an acceptable solution.

RAISERROR

Enter RAISERROR. While the RAISERROR syntax is slightly more complicated, it’s also a lot more powerful (although the misspelling is quite annoying).

RAISERROR ('Message', 0, 1, ..., ...) WITH NOWAIT

The first parameter is simply a textual description of the error/message.  Next (0) is the Severity level.  If this value is 10 or less, it will be counted as a Message, and not as an Error.  The 1 indicates the State of the message – for a message, you’ll generally keep this at 1. After the State, you can list multiple parameters that will be inserted into the first parameter – more on this shortly.

 

Example 1 shows two methods of RAISERROR, one where the text of the message is stored in a variable, and one where it is included in the RAISERROR command.  This simply returns “Currently at position 56” in both instances.  Note the WITH NOWAIT.  This tells SQL Server to send the message back to the client immediately, effectively avoiding the problems PRINT has.

-- Example 1
DECLARE @msg nvarchar(200) = 'Currently at position %d.'
RAISERROR (@msg, 0, 1, 56) WITH NOWAIT
RAISERROR ('Currently at position %d.', 0, 1, 56) WITH NOWAIT

Note that the equivalent PRINT statement would be:

PRINT 'Currently at position ' + CONVERT(varchar(10), 124) + '.'

 

Example 2 shows how easy it is to output a text value.  This is useful for displaying the current value of the a loop.

-- Example 2
DECLARE @somevalue varchar(200) = 'Melbourne'
DECLARE @msg nvarchar(200) = '@somevalue is currently %s.'
RAISERROR (@msg, 0, 1, @somevalue) WITH NOWAIT

 

Finally, Example 3 shows how you can combine multiple values in your output.

-- Example 3
DECLARE @somevalue varchar(200) = 'Melbourne'
DECLARE @msg nvarchar(200) = '@somevalue is currently "%s" at position %d.'
RAISERROR (@msg, 0, 1, @somevalue, 124) WITH NOWAIT

 

Monitoring

Another benefit of RAISERROR over PRINT is that it is much easier to trace  RAISERROR in Profiler.  Simply capture “User Error Message” events for Error 50000, and you’ll get the messages.  Of course, you can always filter on the severity or the SPID, or any other filter that is appropriate.

image

So, there you have it!  RAISERROR is a much more sophisticated method of returning status messages to the client that using PRINT.

How I Learn – T-SQL Tuesday #008

(It’s T-SQL Tuesday #008 – Gettin’ Schooled)

I learn by doing, and by teaching.

Studies have shown that the best way to learn a topic is to teach it to someone else.  I agree wholeheartedly with this – you don’t really know a topic until you’ve had to put it into your own words and get someone else to understand.  Helping out people on the MSDN SQL Forums and the SQLServerCentral forums is a great way of learning.  It’s a very ad-hoc method, as there is no guarantee what you’ll be looking at on any particular day.  Although I might not know the answer, a well written question will pique my interest, and, as one of my strengths is researching how to do things with SQL Server, I’ll attempt to ferret out the answer.  This results in a deeper understanding for me, and (hopefully) a thankful person on the other end. 

Although helping completely unknown people on the Internet can be fun, it’s a lot more satisfying when helping in person, either through teaching courses, giving presentations at user groups, or one-on-one mentoring.  These require you to know the topic thoroughly up front, as there is much less of an opportunity to dart off to Books Online.

I don’t read too many SQL Server books anymore, with a few notable exceptions, such as the Inside SQL Server series, and the SQL Server MVP Deep Dives. These are highly recommended due to their deep technical nature.  The MVP Deep Dives is especially interesting, as it contains a wide range of topics about what MVPs find interesting, as opposed Books Online worded differently.  (This is not to bag authors – there’s definitely an audience for well written books – I’m just happy with Books Online!)  This is a very similar type of format to podcast interviews.  I don’t recall how many different times I’ve heard Paul and Kim go over the same material once more, but it’s always an entertaining listen!  With 90 minutes on a train each day, podcasts are quite useful, as long as they’re not dry technical topics.  Videocasts are not my thing as I rarely have the opportunity.

I keep up with blogs (thank you, Google Reader!) to see what current ideas are floating around, but it’s necessary to filter them – I don’t have time to read every blog post in detail, although many are deserving of that attention!  Instead, I’ll flick over the content to get a feel for the topic, and keep it in mind for later reference.  Blogs can be quite handy when searching, but it’s always worth remembering not to just blindly follow advice given.  Think through the offered steps and consider whether it makes sense before trying it out on your production system.

I believe in the value of certifications, although only as a supplement to experience.  I would love the opportunity to do the SQL Server MCM course as it appears to be an excellent test of all areas of SQL Server, but the wife and kids will insist on spending three weeks in Seattle!

If I had to pick one method of learning that I believe is optimal, I would choose mentoring.  It’s always important to have a mentor, even if you’re considered an expert, if only to bounce ideas off.  And it’s fantastic to give back by mentoring others.

Ors belong in Boats, Not Where Clauses

OK, stop groaning over the title of this post.  It’ll probably be the best pun you read until you flick over to a slightly wittier SQL blog.

I’ve recently been upgrading an application from SQL Server 2000 to SQL Server 2005, and analysing performance between the two.  A common technique with this application is to create reporting stored procedures that have many different parameters, and allow the user to enter as few, or as many as they like.  (And then the text strings get a ‘%’ attached at each end, and thrown to a LIKE operation.)

For example, consider the following stored procedure:

USE AdventureWorks2008
GO

CREATE PROCEDURE dbo.TestOR 
      @PersonType nchar(2)
    , @FirstName nvarchar(50)
    , @LastName nvarchar(50)
AS
    SELECT * 
    FROM Person.Person
    WHERE (PersonType = @PersonType OR @PersonType IS NULL)
        AND (FirstName = @FirstName OR @FirstName IS NULL)
        AND (LastName = @LastName OR @LastName IS NULL)
GO

EXEC dbo.TestOR @PersonType = 'EM', @FirstName = null, @LastName = null
EXEC dbo.TestOR @PersonType = 'EM', @FirstName = 'Rob', @LastName = null
EXEC dbo.TestOR @PersonType = null, @FirstName = null, @LastName = 'Caron'
GO

You can see that the driving force here is the pattern (FirstName = @FirstName OR @FirstName IS NULL).  This means that if you do not supply a value for @FirstName (or set it to NULL), then the second part of the OR will always return TRUE, and so all rows will be selected, cancelling out the need for first part.

This appears to be a very good method of creating a single stored procedure that can flexibly take many different parameters.  It probably performed quite well in development too, until the amount of data increased.

Let’s have a look at the execution plan:

image

Oh dear.  Table scans.  This example is only using a single table, but you can imagine what would happen if search parameters could be in multiple tables.

An additional problem with this method is that of parameter sniffing – if the query is initially run with a LastName only, then the execution plan will be optimised for a LastName – and this may not be appropriate for the next execution.  This can be demonstrated by running “EXEC sp_recompile dbo.TestOR”, and then running query #3, then #2, then #1.  The execution plans are the same for all three, but the plan has changed, as a different index has been used:

image

There are a few ways to fix this.  One is to create a different path for each combination of variables that are optional.  However, this rapidly becomes unwieldy – for this example, we would need nine different paths, all with very similar code!

Another option is to move to dynamic SQL:

CREATE PROCEDURE dbo.TestDynamic
      @PersonType nchar(2)
    , @FirstName nvarchar(50)
    , @LastName nvarchar(50)
AS
    DECLARE @sql nvarchar(max)
    DECLARE @params nvarchar(max)
    SET @sql = 'SELECT * 
                FROM Person.Person
                WHERE (1=1) ' 
    IF @PersonType IS NOT NULL 
        SET @sql = @sql + ' AND PersonType = @PersonType '
    IF @FirstName IS NOT NULL 
        SET @sql = @sql + ' AND FirstName = @FirstName '
    IF @LastName IS NOT NULL 
        SET @sql = @sql + ' AND LastName = @LastName '

    SET @params = '@PersonType nchar(2), @FirstName nvarchar(50), @LastName nvarchar(50)'
    
    EXEC sp_executesql @sql, @params, @PersonType = @PersonType
              , @FirstName = @FirstName, @LastName = @LastName
GO

Let’s have a look at the new execution plans:

image

Much better!  We now have three different execution plans, and three different SQL statements being executed.

There are a number of benefits to switching to dynamic SQL in this case:

  1. The query is simpler.  This means that the query plans are likely to be more stable – there is less change of a bad plan being generated.
  2. Each combination of parameters will get its own execution plan, and this will be stored in the cache – in this case, we could have nine different plans.
  3. The code is easier to maintain.  It’s a little harder to read, but you only have a single copy of the query – it’s just built up along the way.
  4. The users still get to have their flexible interface – very important when the upgrade is supposed to change as few things as possible, functionality-wise.

Myth: LocalSystem has no access to Network

A common myth is that the LocalSystem account has no access to networked resources, and so you may have trouble getting SQL Server to backup to remote locations.

Excusing the fact that running SQL Server or IIS as LocalSystem is not a best practice, it is still possible to connect to networked resources. This is done by the domain account DOMAIN\ComputerName$. For example, if my server SQL01 was on the COMPANY domain, there will be an account in Active Directory named COMPANY\SQL01$. Whenever a service running as LocalSystem attempts to connect remotely, it will attempt to use this account. The restriction is that the server must be in a domain – a workgroup will not cut it.

A recent example of where this came in handy recently was an IIS installation that called Crystal Reports, which ran using System DSNs to connect to the database. These DSNs were configured to use a SQL login with no password. This worked quite well, until the security was tightened, and the SQL login was to be given a password. This then caused every report using the DSN to prompt the user for a password, as System DSNs cannot store passwords. Possible solutions were to change all 100+ reports to File DSNs (and protect them well, as the password would be stored in plain text inside the DSN). Alternatively, the System DSN could be modified to log in using Windows Authentication.

As IIS was running as LocalSystem, the initial attempt was to change this to be a domain account. Unfortunately, however, many security issues appeared, particularly with some versions of Internet Explorer. Allowing the DOMAIN\ServerName$ account limited access to the database removed the reliance on the insecure SQL login, and avoided a lot of rework.

As mentioned earlier, it is a better practice to use a dedicated domain account, but this technique may be useful in a pinch.

The Effects of Dropping a Database on the Proc Cache

While looking for a detailed explanation of cache-store flush messages in the SQL Server ERRORLOG, I came across this page: http://blogs.msdn.com/b/sqlprogrammability/archive/2007/01/17/10-0-plan-cache-flush.aspx, which refers to dropping a database causing the procedure cache to be flushed.

As someone that occasionally creates separate databases to hold temporary data, or a subset of an entire database, I did some investigations.

Script 1:

SELECT * FROM sys.dm_os_memory_cache_counters
WHERE name in ('Object Plans', 'SQL Plans', 'Bound Trees')
SELECT * FROM sys.dm_exec_cached_plans

This code simply reports on the state of the procedure cache at the server level.

Script 2:

CREATE DATABASE TestDB
GO
DROP DATABASE TestDB
GO
SELECT * FROM sys.dm_os_memory_cache_counters
WHERE name in ('Object Plans', 'SQL Plans', 'Bound Trees')
SELECT * FROM sys.dm_exec_cached_plans

We create a database, drop it, and then execute the queries from Script 1.

Conveniently, I have two development instances (2005 and 2008) with detailed plan caches that I have no problems with potentially flushing.  In SQL Server 2005, we get the following:

Before (Script 1):

image

After (Script 2):

image

As (unfortunately) expected, the procedure cache is gone.  Now let’s try SQL Server 2008:

Before:

image

After:

image

No change!  Fantastic! The obvious conclusion is that you can drop or detach databases as much as you like in SQL Server 2008, but you may want to be aware of the potential effect you have on the server when using 2005.

Disabling Indexes Before Rebuilding

It is a common practice to rebuild indexes frequently in order to improve SQL Server performance.  The problem with rebuilding indexes is that you need to have space inside the data file to hold the index currently being rebuilt.  This means that a 5 GB index will require an additional 5 GB of space.  The subsequent problem here is that when the operation is over, the database will appear to have 5 GB of free space, and the DBA might decide to shrink the database (a bad idea, as this will re-fragment the data file).

A potential solution, for those building non-clustered indexes offline, is to first disable the non-clustered index.  A disabled index is akin to dropping the index, but keeping the index definition. After an index is disabled, it must be rebuilt before it can be used, as SQL Server has no way of knowing how many, or which, rows were inserted, updated, or deleted during the period the index was disabled.

This means that a 5 GB index can be rebuilt in place, using the same 5 GB.  The operation may be a little bit slower, and temporary space (either in TempDB or the current database) will be needed to re-sort the index (rather than base it off the current “live” copy of the index, as there is none), but it removes the requirement for the data file to have enough empty space to hold a second copy of the index.

Note that this only applies to non-clustered indexes, as disabling a clustered index will result in the entire table being unavailable, and all non-clustered indexes disabled. Ideally, clustered indexes will be based on a small ever-increasing clustered key, which will greatly reduce the need to ever de-fragment the non-clustered indexes (although there are cases, for example when the row is initially inserted very small (due to variable length columns), and then later updated to populate the variable length columns, widening the row and causing it to not fit on the existing page, forcing a page split).

While this may be a useful technique to avoid data file growth, I would worry about an environment that does not have sufficient working room to rebuild the largest non-clustered index.

Example:

IF EXISTS (SELECT * FROM sys.databases WHERE name = 'IndexRebuildTest')
    USE MASTER
    DROP DATABASE IndexRebuildTest
GO
CREATE DATABASE IndexRebuildTest
GO
USE IndexRebuildTest
GO

-- Force the transaction log to grow
ALTER DATABASE [IndexRebuildTest] MODIFY FILE ( NAME = N'IndexRebuildTest_log'
     , SIZE = 51200KB )
GO

CREATE TABLE t1 (
      i int IDENTITY
    , UniqueID UNIQUEIDENTIFIER DEFAULT newid()
    , c CHAR(1000)
    , CONSTRAINT pk_t1 PRIMARY KEY (i)
)
GO
CREATE UNIQUE NONCLUSTERED INDEX ncix_UniqueID ON t1(UniqueID) INCLUDE (c)
GO

SET NOCOUNT ON
GO

INSERT INTO t1 DEFAULT VALUES
GO 10000

-- Note that sp_spaceused reports the database size including transaction log.
EXEC sp_spaceused @updateusage = N'TRUE'

image

We have an 80 MB database –  but 50 MB of that is the transaction log.  The data is about 12 MB, and the index is 18 MB – quite a large, fragmented index!

ALTER INDEX ncix_UniqueID ON t1 REBUILD WITH (SORT_IN_TEMPDB = ON)
GO
EXEC sp_spaceused @updateusage = N'TRUE'

image

Now, our database has grown to 91 MB – an increase of 11 MB.  We’ve also experienced a shrinking in the size of the index due to the rebuild, saving 6 MB.

Let’s run the first listing again (to reset the database size back to 80 MB), and then try disabling the index first.  Note that the 18 MB currently being taken by the index is immediately released when the index is disabled.

ALTER INDEX ncix_UniqueID ON t1 DISABLE
GO
ALTER INDEX ncix_UniqueID ON t1 REBUILD WITH (SORT_IN_TEMPDB = ON)
GO
EXEC sp_spaceused @updateusage = N'TRUE'

image

There you have it.  The database is still 80MB (it fluctuates slightly, depending on how fragmented in the index is each time we reset the database), and there is 6 MB of unallocated space – which is the savings from de-fragmenting the index.

Again, this primarily only useful for rebuilding offline – the index will be unavailable while it’s disabled.  If you do have 2005/2008 Enterprise Edition, you could rebuild online and at least have the table available during the rebuild, but the index will not be.  (And if you do have Enterprise Edition, forking out another $2000 for extra disk space won’t be an issue!)