Deprecated Features

SQL Server is a mature product, and as the product grows older, new features are added (we all know about those!) and some features are removed.  However, to avoid having customers upgrade to the latest and greatest version of SQL Server, Microsoft provide warnings that features will be deprecated (a fancy word for “removed”) in the next version, or in some future version.

If you’re considering upgrading your application, the first test you should do is detect if you’re using deprecated features.  You can either skim the list of deprecated features (look up “Deprecated features” in the Books Online index), or you can run a Profiler trace.

There are two Deprecation events in Profiler, residing in the not-so-hidden category of “Deprecation”.  These are:

  • Deprecation Announcement – captures those events that will be around for the next version, but will be removed in future.
  • Deprecation Final Support – captures those events that will NOT be in the next version.

Set up a Profiler trace on these two events.  Note that you should tick the “Show All Events” and “Show All Columns” options to be able to capture all relevant information.  Failure to “Show All Columns” can result in not capturing anything.

That done, sit back and wait for the messages to come rolling in.  On the other hand, if you’re running this on a test server with zero workload, you’ll need to produce some deprecated events to capture.  Again, look up “Deprecated Featres” in the Books Online index (on the web at: http://msdn.microsoft.com/en-us/library/ms143729.aspx, with a link to the 2005 list), and find a feature to use.

image

The deprecation warning message is kind enough to give you a hint on how to perform the function in future.

That’s all there is to it!  Now you can upgrade to SQL Server 2008 with a stronger feeling of confidence that Microsoft haven’t dropped support for a particular feature.  This doesn’t replace testing, of course, but will at least reduce the number of errors you get when you do upgrade.

Relog.exe – Converting Perfmon data to a different format

I bumped into a former colleague down the street today, and he asked how to go about converting existing Perfmon data from .blg (binary) to a different format, such as CSV or into a SQL database. (Yes, that’s right.  I run into people I haven’t seen in weeks, and we talk about SQL Server.  That’s normal, right?)

Going off on a slight tangent, which format of Perfmon log should you use?  The four choices are Binary (BLG), Tab Delimited, Comma Delimited, and SQL Database.  Binary is useful for most purposes, but becomes harder to extract the data out if you want to run any other processing on it.  Tab and Comma delimited have the benefit of being able to view the data in Excel (if you don’t have too many counters), and graphing can be easier.  SQL Database has the benefit of logging straight into a SQL database, and this allows you to write a query to process your Perfmon data.  For example, you could create an hourly average, or pull out the date/times of peak loads.  The drawback to using a SQL database is that it’s not as lightweight – you quickly store a lot of data in your database, and increase the load on your server. 

A good compromise may be to log into binary format, and then use the relog.exe command line tool to convert the data into SQL or comma/tab delimited format (which can then be opened in Excel or bulk-loaded into a database).

Relog.exe is a simple tool – it allows you to convert between Perfmon file formats, extract specific counters, or change the sample period.  There’s not really much to explain that isn’t mentioned in the Technet article at http://technet.microsoft.com/en-us/library/bb490958.aspx

Quick Tip #4: Can You Restore Your Backups?

This tip might seem obvious, but I occasionally come into a situation where “something” (usually a third party backup tool) is taking backups of the SQL Server databases, but the system administrators don’t know exactly where, or how to restore the file.  The case that triggered this blog post involved a SQL Agent job that generated an error when being modified, and losing the final 32 steps.  Whoops.  My colleague rang me, and my suggestion was to restore a copy of MSDB from backup so he could script out the job again.

The initial phone call to the sys admin went something like this “Do we take backups of the MSDB database?”.  “Uh, err, let me check… … … No.”

I jumped on the server and identified that yes, something was backing up MSDB each night, but it was going directly to a device which I had no visibility over.  To cut a long story short, the backup was found, but the backup tool apparently only allows you to restore over the top of the database you backed up – not something we wanted to do.

Learning where your backups go, and how to restore your databases to any location is something you don’t have the luxury of doing when you’re under pressure.  You need a pre-generated sequence of steps to run through in an emergency to take the guesswork out of it.

In this situation, my colleague ended up re-entering all the missing steps from the description of each step from the job’s history.  What should have been a simple task (restore MSDB elsewhere, script out the job, re-install the job) became impossible, and this job had to be running soon.

A similar situation exists in if you haven’t tested every single backup, you may not know that your backup file is corrupted.  The odds of a dead backup are small, but it’s always possible.  The best way to be sure is to actually restore the backup to another location, but even running a backup verification is better than nothing.

Wordle Image of My Blog

Worlde seems to be the in thing going around the SQL Server blogging community this week, so I figured I’d knock one up of this site.  Click for a larger image.

wordle

Apparently I enjoy talking about tables, indexes, queries and Perfmon.  Sounds about right.

Quick Tip #3: Think!

An important lesson for everyone that has adhoc access to SQL Server is that before you run any query in Management Studio is to think about the consequences first.

For example, if you want to view data in a table, your first thought is to SELECT * FROM Table, or to open it in Management Studio, which essentially performs the same function.  But before you run this, have a think about the possible consequences.

  • If this is a large table, you might start reading 10 GB of data out of SQL Server
  • If you’re reading 10 GB of data, you might destroy the buffer cache, causing queries to run slowly until everything’s cached again
  • You’ll likely end up with a table lock that will last until the end of the transaction, assuming the default isolation level of read committed.  How long will the SELECT take to run?

It’s important as a DBA to do no harm.  In this situation, there’s a couple of things you can do.  You can just get a quick selection of rows if you just want to get an idea of what’s in the table:

  • SELECT TOP 100 * FROM Table
  • SELECT TOP 100 * FROM Table (nolock)
Note that an ORDER BY clause will need to sort all rows, so unless the data is sorted, you’ve just loaded in the entire table to perform a sort.
We can find out the number of rows by:
  • SELECT * FROM sys.sysindexes WHERE object_id = object_id(’Table’) 
  • SELECT COUNT(*) FROM Table (nolock) — Not Recommended

The first option will give an approximate number of rows, but it is fast. (I like sysindexes as it works on both 2000 and 2005.)  Note that I don’t recommend the second option as, again, it loads all data pages into memory, destroying the buffer cache.

What about building an index, assuming you don’t have Enterprise Edition and can’t build the index online?  Let’s think:

  • How many rows in the table, and how wide will each row in the index be?  If it’s small, I might be able to build it immediately.
  • How active is the table?  If it’s very active, any blocking during an index build can be disasterous, and so should wait for a maintenance window.
  • How long will the index take to build?  Can any users afford to be blocked for 20 seconds?  Some websites kill the query after 10 seconds, so users would see timeouts, whereas other applications might not mind if the query runs for up to 30 seconds (30 being the usual .NET timeout). 

By thinking of the possible consequences, you are more likely to keep the system running, and not accidentally get in the way of your users.  So, before hitting “Execute”, have a quick think of what could happen when you run that query (especially if you have an DELETE statement with no WHERE clause!)

SQL Agent Handling Large Negative Error Codes

In my previous post, I discussed using Logman to automatically start Perfmon counters logging.  The solution involved running logman.exe to start the collection, and if the counters were already running you’d get an error message saying words to that effect.

The problem with this solution is SQL Agent will report that the job failed as the error code returned by Logman is non-zero.  You can tell SQL Agent a specific code to use as the “success” code, but the user interface will not allow big negative numbers, such as -2144337750 (which is what Perfmon returns if the counter is already running on my Vista machine.  Windows 2003 returns a different code).  While you may be able to enter this value via script, I’m not sure how Management Studio will handle it, and could cause problems down the line if you edit the job.

Instead, a solution is to use a batch file that can run logman, check the error code returned, and if the error code is “Already running”, it can then exit the batch file specifying an error code that SQL Agent can handle (such as 0):

@echo off
C:\Windows\System32\logman.exe start TestCollection -s SERVERNAME
IF %errorlevel%==-2144337750 (
 echo Collection already started!
 exit /B 0
) ELSE (
 echo Exiting with error level: %errorlevel%
)

That’s it!  Now the job will report as succeeded if the collection is already running.

Starting Perfmon Monitoring Automatically

Monitoring your database server is crucial to being able to analyse its performance, and you want to be monitoring continually - it’s no good starting to monitor when you detect a problem, as you a) might not know that you have a problem, and b) you don’t know what normal operation looks like.

Perfmon is the current standard (although SQL Server 2008 has some nifty features that can complement Perfmon).  I won’t go into details about how to use Perfmon, as there’s plenty of information available on the web, but it can be tricky to configure to get Perfmon running automatically after reboot, or if for some reason Perfmon fails (such as a temporary out of disk space error).

The simplest way is to use the logman.exe program, supplied with Windows 2003 and later, and available for Windows 2000.  This command line program can start or stop Perfmon counter logs on either the local or a remote server, among many other tasks.

The way we use it is to schedule a task in SQL Agent or Windows Scheduler that runs every 5 or 10 minutes and attempts to start the Perfmon counter log specified.  If the counter log is already started, an error is returned, but the counter log is not affected.

The command to start a job is (note that logman should be in your C:\Windows\System32 folder):

logman.exe start CounterCollectionName -s RemoteServerName

More information about Logman and its parameters is available at http://technet.microsoft.com/en-us/library/cc755366.aspx.

Performance Wasn’t a Top Priority Here…

I spent time today analysing a system monitoring tool that collects a lot of information about your servers and stores the data into a table.  A lot of data is collected, and the table was currently at 23 GB with 180,000,000 rows, with only a clustered index on a non-unique GUID.  There was no primary key set, although there was also an identity column.

A delete process runs nightly, and deletes anything older than 30 days.  This delete process was highly inefficient, and was timing out after 10 minutes, causing the transaction to roll back, and nothing deleted.  Additionally, during these 10 minutes, the table was locked and inserts couldn’t not be performed.

The code for the deletion looked like this (although this query is from AdventureWorks, and 2500 days is needed for demonstration purposes):

DELETE FROM Sales.SalesOrderHeader
WHERE datediff(day, ModifiedDate, getdate()) >= 2500 

If you examine the query, you’ll notice two things.  Firstly, as there is no index on ModifiedDate, we’ll need to perform a table scan.  That’s fixable with a non-clustered index:

CREATE NONCLUSTERED INDEX ncix_moddate ON
    Sales.SalesOrderHeader(ModifiedDate)

We’ll ignore the fact that our new index on 180,000,000 rows will take up 4.8 GB - a fast database is a large database!  Changing the clustered index to ModifiedDate, GUID would remove the need for the uniquifier and the new index.

The other thing you can notice about this query is that each row must process the datediff function to determine the number of days difference between the ModifiedDate and the current date.  180,000,000 function calls, and this can’t be indexed.

A much better way of writing this query would be to determine the boundary date.  Any rows earlier than this date will be deleted.  By checking the ModifiedDate against this boundary date (which is calculated once at the start of the query), the new index on ModifiedDate can be used to find the rows to delete.

DELETE FROM Sales.SalesOrderHeader
WHERE ModifiedDate <= dateadd(day, -2500, getdate())

If we compare the execution plans, we can see the following:

image

Note that I’m using a SELECT instead of a delete, as AdventureWorks has cascading deletes on the Sales.SalesOrderDetails table.  The first query takes 61 reads - an entire index scan of my new index, whereas the second query takes just 6 reads - seeking to those parts of the index where the date matches.

The only bad news now is that I can’t modify this code, as it appears to be run by a service and not in a stored procedure.  I’m hoping there will be an option to turn off purging the table, and I can run my own delete command.

Quick Tip #2 - Are You Using All Your Memory?

I have a couple of quick questions for you.  Are you running SQL Server 32 bit?  Do you have more than 4 GB of memory in the server?  Do you know if you’re using as much memory as you can?

I often see servers that are experiencing performance problems due to a lack of memory.  A recent case was with a 2005 Standard server with 16 GB of memory, but without AWE enabled, so SQL Server was limited to 1700 MB in its buffer cache.  It was running (amongst others) a single query constantly which scanned a 2.5 GB table, and so the disks were flat out trying to service these requests.  Turning on AWE enabled most of the database to fit in memory, and the large scanning query dropped from 400 seconds to about 7 seconds.  An index was then added, and performance was even better.

To find out how much physical memory is installed, and how much physical memory is still available, have a look in the Performance tab of Task Manager.  From within SQL Server, you can run the query:

SELECT * FROM sys.dm_os_performance_counters WHERE counter_name IN (’Target Server Memory (KB)’,’Total Server Memory (KB)’)

This will show you the amount of memory SQL Server is aiming to use (Target Server Memory), and how much it is currently using (Total Server Memory).  If AWE is not enabled, these values won’t go above 1700 MB.

For information on how to configure AWE, see my previous post for 2005, or for SQL Server 2000.

It’s Dangerous to Use XML in Triggers

There are a few examples of DDL triggers available on the web, which use XML to retrieve the EventData() of the trigger.  EventData() retrieves an XML object containing properties about the event.  For example, Books Online describes ALTER TABLE event as:

<EVENT_INSTANCE>
    <EventType>type</EventType>
    <PostTime>date-time</PostTime>
    <SPID>spid</SPID>
    <ServerName>name</ServerName>
    <LoginName>name</LoginName>
    <UserName>name</UserName>
    <DatabaseName>name</DatabaseName>
    <SchemaName>name</SchemaName>
    <ObjectName>name</ObjectName>
    <ObjectType>type</ObjectType>
    <TSQLCommand>command</TSQLCommand>
</EVENT_INSTANCE>

By using XML query methods, we can extract data out of this XML structure.  For example:

DECLARE @eventdata xml
SELECT @eventdata = EventData()
SELECT @eventdata.value('(/EVENT_INSTANCE/EventType)[1]'
    , 'nvarchar(100)')

This works well in most cases, but there is one little gotcha that occasionally springs up.  To use the XML methods, you need to have a specific set of SET options applied to your session.  If you have a DDL trigger that uses XML methods to parse out the EventData, and you have incompatible SET options set in your session, your trigger will generate an error, and rollback the entire transaction.

Now, although you can change set options inside a trigger, and they will automatically roll back to their previous values once the trigger ends, you may not be aware that you need to do this, and once you deploy the trigger into production, you start getting errors from some clients.  The easiest way around this is to not use the XML methods in your trigger, and just retrieve EventData() and store it in an XML column.  You can then use a separate process which has the correct session options set to process the XML in greater detail.

The alternative option is to ensure you set the set options correctly in your trigger.  The correct settings to use are:

SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET NUMERIC_ROUNDABORT OFF
SET ANSI_WARNINGS ON

If you change any of these and attempt to run an XML method, you’ll get an error message telling you which option needs changing:

DECLARE @xml xml
SET @xml = '<Root><Product ID="1" Name="Widget, 25cm">
</Product></Root>'
SELECT @xml.query('/Root/Product')

Msg 1934, Level 16, State 1, Line 4
SELECT failed because the following SET options have incorrect settings: ‘ANSI_NULLS, QUOTED_IDENTIFIER, CONCAT_NULL_YIELDS_NULL, ANSI_WARNINGS, ANSI_PADDING’. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or
query notifications and/or xml data type methods.

Applying a DDL trigger into production without first checking that the session values are compatible could be disastrous!

Special thanks to Jon Kehayias for technical reviewing of this post.

WordPress Themes