Jun 262008
 

Cascading deletes are one of those features that people either love or hate.  The haters prefer to delete all child rows first, and then delete the parent, while the lovers like the idea of deleting from one table and having every other table in the database empty out.  I have to admit that I fall into both camps – I don’t use it too often, and most of the time I’ll opt for deleting the children first, even with the cascading delete turned on.

However, performance can suffer when you have a cascading delete, and as it’s more hidden than the two step Delete-Children-First method, it can be harder to realise how an index can increase performance.

Imagine that you have three tables: Employee, Equipment, and EquipmentLoans, where an employee can borrow equipment:

CREATE TABLE Employee (
      EmployeeID int IDENTITY NOT NULL PRIMARY KEY
    , FirstName varchar(30)
    , LastName varchar(30)
)

CREATE TABLE Equipment (
      EquipmentID int IDENTITY NOT NULL PRIMARY KEY
    , Description varchar(30)
)

CREATE TABLE EquipmentLoans (
      EquipmentID int not null
    , EmployeeID int not null
    , DateDue datetime not null
    , CONSTRAINT fk_EquipLoans_EquimentID FOREIGN KEY (EquipmentID) 
            REFERENCES Equipment(EquipmentID) ON DELETE CASCADE
    , CONSTRAINT fk_EquipLoans_EmployeeID FOREIGN KEY (EmployeeID) 
            REFERENCES Employee(EmployeeID) ON DELETE CASCADE
    , PRIMARY KEY (EquipmentID, EmployeeID)
)

INSERT INTO Equipment (Description) VALUES ('Laptop');
INSERT INTO Equipment (Description) VALUES ('Projector');
INSERT INTO Equipment (Description) VALUES ('Laser Pointer');

INSERT INTO Employee (FirstName, LastName) VALUES ('Scott', 'Wood');
INSERT INTO Employee (FirstName, LastName) VALUES ('John', 'Otto');
INSERT INTO Employee (FirstName, LastName) VALUES ('Bart', 'Johnson');

INSERT INTO EquipmentLoans VALUES (1,1, '2008-08-14')
INSERT INTO EquipmentLoans VALUES (3,1, '2008-08-14')
INSERT INTO EquipmentLoans VALUES (2,2, '2008-07-21')

Now we have a clustered index on Employee (EmployeeID), one on Equipment(EquipmentID), and one on EquipmentLoans(EquipmentID, EmployeeID).  Because we’ve added cascading deletes on the EquipmentLoans table for both Employees and Equipment, we can automatically delete rows from EquipmentLoans when either the Employee or Equipment is deleted.  Let’s have a look at how this works.  If we run this command:

DELETE FROM Equipment WHERE EquipmentID = 3

Then the clustered index of EquipmentLoans will be looked up to find any Equipment with an ID of 3 and delete that row.  Next, it will look up the clustered index of Equipment and delete the row with an EquipmentID of 3.  This is very fast, as it uses index lookups.

However, what if we delete an employee?

DELETE FROM Employee WHERE EmployeeID = 2

There is only the clustered index on EquipmentLoans, so to find any rows with EmployeeID = 2, the clustered index must be scanned.  If there are a lot of rows, this could take a while.  After this is done, the clustered index for Employees will be looked up, and the Employee with ID 2 will be deleted. 

If there was no cascading delete, you would still need to perform two commands:

DELETE FROM EquipmentLoans WHERE EmployeeID = 2
DELETE FROM Employee HERE EmployeeID = 2

However, it’s easy to see which of these two commands is performing badly (i.e. the first one), and a nonclustered index on EquipmentLoans(EmployeeID) is an obvious fix.  Once this index is added, it can be used to find all EquipmentLoans where EmployeeID = 2 and delete them – no more table scan.  This applies whether it’s in the cascading delete or not.

Note that if you look at the execution plans for these queries, you will only see table scans.  This is because these demo tables are all small enough to fit into a single 8 KB page.  If there were thousands of rows, the indexes would come into play.

Jun 252008
 

I need to import Excel spreadsheets into SQL Server 2005 every few months, and I always end up banging my head on the same problem.  At least I know what to look for now!

Using the Import Data Wizard in Management Studio, is my preferred method of getting data from spreadsheet format into the database, and from there I can massage the data to load it into the necessary tables. The issue is that when you import a column that contains data that could be read as a number or text, for example a classification column containing values like 150, 160, K20, 146, 12G, Excel’s ISAM driver will take a guess as to what type of data will be going into this column, usually based on the first 8 rows.  If it determines that it’s text, you’ll get an nvarchar(255), and if it’s a number you’ll get a float.  However, if it decides that a nvarchar is needed, then all numbers will be imported as null.  Of course, it’s the same if a float is used.  If all 8 rows have a value of NULL, then the entire column will always be assumed to be NULL.

There’s a few ways around this.  The first is to separate your data so that all your “float” values are in one spreadsheet, and all your “nvarchars” are in another.  This isn’t a very attractive idea as it essentially doubles the work.

Another option is to modify the source document to ensure that text data is used for every cell, but just formatting the cells as text isn’t sufficient.  You have to reapply the data for each cell individually.  Needless to say, I didn’t go down this route, but it’s detailed in a link below.

The solution was to put IMEX=1 into the connection string.  This puts the Excel ISAM driver into “Import” mode, and all values come in correctly.  The main drawback to this method is that you can’t modify the connection string used in the Management Studio Import Data Wizard, and so you have to save it as an SSIS package and then make the change. “IMEX=1″ comes just after HDR, and in the same quotes:  “Excel 8.0;HDR=NO;IMEX=1;”. It’s important to get this right or you won’t be able to connect.  Note that even with IMEX=1, you still want to have data that isn’t just numeric in your first 8 rows. More information is available at http://support.microsoft.com/kb/194124/EN-US/.

A good overview on importing data from Excel to SQL Server with a variety of techniques is available at http://support.microsoft.com/kb/321686.

Jun 182008
 

I installed SQL Server 2008 RC0 over the weekend.  I was expecting a fairly smooth installation, but it ended up taking me most of an evening.  First I had to upgrade to Windows Server 2003 R2 SP2 (and had a blue screen occur while halfway through the install!), but that is hardly SQL Server’s fault.  I opted to install everything I could, and received some strange error messages (since forwarded onto Microsoft).  However, the summary at the end indicated that these problems are only related to full text search, and the rest of SQL Server appears to be functioning normally.  The installation application was brilliant, and the debug logging exceptional.

Now to get in and play with it!

Jun 172008
 

I put together a quick example of a DDL trigger to prevent databases from being dropped for a forum post today, and figured I’d put it on the blog in case I need it again.

While the best way to prevent users from dropping databases is to not grant the necessary access to drop a database, it can sometimes come in handy to have a catch-all to stop any databases from being dropped.  This trigger is currently very simple – if the trigger is active, then it prevents any database on the instance from being dropped.  You must disable or delete the trigger in order to drop a database.  A more advanced version could check the username or the hostname of the user trying to perform the delete, or could check a table that contains the names of the databases that can be dropped (you would then have to insert the name of your database to this table, and then drop the database).

CREATE DATABASE TestDB
GO

USE Master
GO

CREATE TRIGGER Trig_Prevent_Drop_Database ON ALL SERVER
FOR DROP_DATABASE 
AS
    RAISERROR('Dropping of databases has been disabled on this server.', 16,1);
    ROLLBACK;
GO

DROP DATABASE TestDB -- Shouldn't work
GO 

-- Drop the trigger to allow deletions (you could also 
-- disable the trigger and then reenable it)
DROP TRIGGER Trig_Prevent_Drop_Database ON ALL SERVER
GO
 
DROP DATABASE TestDB -- Should work
GO
 
Jun 162008
 

A frequent assumption that people make about SQL is that a query will return rows in the same order every time.  For example, if you run the following query in the AdventureWorks database:

SELECT * FROM Production.Product

image

You will see that all rows returned appear to be in order of ProductID, as this has used the clustered index to return the data.  However, nothing in the query has specified that you want the rows ordered by name.  Try this query:

SELECT ProductID, Name FROM Production.Product

image

This query returns the ProductID and the Name, but ordered by the Name.  This is because the index used was AK_Product_Name, which is on the [Name] column.  This index satisfies the query by being the smallest index that can service the query.  As each row in the index is smaller than in the clustered index, more rows can be squeezed into each page.

However, what happens if someone adds a new index onto the table, and the query is run again?

CREATE NONCLUSTERED INDEX ncix_ProductID_Name ON 
  Production.Product(ProductID, Name)
GO
SELECT ProductID, Name FROM Production.Product

image

Suddenly, the results are being returned ordered by ProductID.  SQL Server now has two choices of index to use to service this query, and both are the same cost.  There is no guarantee which index will be used.  Turn on the execution plan for the previous query (Query | Include Actual Execution Plan, or Control-M in Management Studio), and run the query again to see the plan.

image

image 

Note that the Ordered property is False.  This means that whatever order has been returned does not mean that the result set is sorted.  It might be in a specific order, but this is not guaranteed.  As you can see in the results above, while the index used was the AK_ProductID_Name one we just created, the results weren’t returned in order of ProductID.  When a Index Scan occurs in this situation, SQL Server grabs every page of the index in the quickest manner possible, and processes the rows.  It is possible that the index is fragmented, and pages are not stored in order on the disk.  Consider a page split.  If a row is inserted into this table (with Identity Insert on) in the middle, and there is no room for the row, a page split will occur.  If there is no free pages in the extent, a new extent will be allocated, and this could be on a completely different part of the file.  When all pages are read in, the split page may be processed last due to its location in the MDF file.

On the other hand, if an ORDER BY clause is used in the query, the Ordered property will be true, and SQL Server will “walk” the pages of the index in the proper order to get the sorted version of the results.

SELECT ProductID, Name 
FROM Production.Product 
ORDER BY ProductID

image

image image

If there is no index that will return the data in the proper order, then SQL Server will choose the most efficient index, and then sort the data:

SELECT ProductID, Name, ProductNumber 
FROM Production.Product 
ORDER BY ProductNumber

image

Note that sorting a result set can be quite expensive for the server, as TempDB must be used, which will involve writing to the TempDB transaction log.  If the application is a desktop app it may be worth getting the client to perform the sort, rather than the database.

Note that if you try to run these queries as I’ve done in this post, you may get different results.  Some things may appear to be ordered in different ways to how I’ve described it, but that’s really the whole point – if you want something ordered, include an ORDER BY clause.

Jun 112008
 

A frequent question with log shipping is “can I use the database that’s in restoring mode”?  This will come in handy for reporting, or any other read-only use.  The good news is that yes, you can – to a point.  The main restriction is that you can only restore a log when no one else is in the database.  This means you will have to kill all connections to the database whenever you want to perform your log restores.  You have the choice of either doing this more frequently, allowing you to have more up to date data in the database, or less frequently, and having slightly older data.

To begin with, you need to ensure that your database is in FULL recovery mode, not simple.  Next, take a full backup.  On the other server, restore the full backup WITH NORECOVERY.  From here, you can take log backups from the source server and restore each log file (again WITH NORECOVERY) on the target server.

However, this does not allow us to open the database and poke around, and trying to do so will just get you an error message:

Msg 927, Level 14, State 2, Line 2
Database ‘Northwind’ cannot be opened. It is in the middle of a restore.

The solution is to restore the log with a standby file, eg:

RESTORE LOG Northwind WITH STANDBY = 'E:\Log\StandByFile.stb' 

The name of the standby file, also known as the undo file) is not important, but this is where uncommitted transactions are stored.  The database cannot be brought online until uncommitted transactions in the transaction log backup have been rolled back, but new log backups cannot be restored to the standby database unless those uncommitted transactions are still in progress.  Thus, the undo file remembers the state of the transaction log to allow you to restore additional transaction log backups.

If you are taking transaction log backups every 5 minutes, you can store them up, and once an hour disconnect all users from the standby database,  restore all transaction logs to the standby, and then bring the database back into Standby mode.

This technique allows you to have a reporting server with a tried and reliable technology (not that replication or mirroring aren’t), and also provides you with a database that you can bring online very quickly if something happens to the primary database.

Mixing SQL Server 2000 Log Shipping with SQL Server 2005

While you can log ship from SQL Server 2000 to SQL Server 2005, you cannot bring the database to a warm standby.  This is because the database needs to undergo an upgrade to bring it online, and it cannot downgrade back to 2000 to apply the next transaction log.  Trying to restore a log WITH STANDBY on a 2000 database under 2005 will give you this error:

Msg 3180, Level 16, State 2, Line 1
This backup cannot be restored using WITH STANDBY because a database upgrade is needed. Reissue the RESTORE without WITH STANDBY.
Msg 3013, Level 16, State 1, Line 1
RESTORE LOG is terminating abnormally.

As long as you don’t want to recover the database or use it as a warm standby, you can log ship from 2000 to 2005.  The database will only be upgraded when it is recovered, so you can apply as many transaction log backups as required.  However, once you do bring the database back online and it is upgraded, you won’t be able to take this copy of the database back to SQL Server 2000.  This makes log shipping between 2000 and 2005 a poor solution for backups.  Your only option is to copy the database out of the 2005 database back into a 2000 database.

Jun 102008
 

This morning I got a phone call from a colleague requesting insights as to why an instance of SQL Server 2005 was not using more than 3.5 GB of memory.  The machine had 12 GB installed, /pae and /3gb were set in boot.ini (Windows Server 2003 Enterprise), and Lock Pages In Memory was set.  All seemed fine for a 12 GB system.

We confirmed that SQL Server was using 3.5 GB via Perfmon counters and "DBCC MemoryStatus".  Task Manager showed only 128 MB free on the server and that 12 GB was installed and visible to Windows, and that SQL Server was using about 256 MB (the 3.5 GB of AWE being hidden from Task Manager, which only shows VAS memory).  Obviously something was taking up the extra 9 GB, but what?  AWE memory isn’t the easiest to view under 32 bit Windows, so unless you know what is using it, you’re in for a search (let me know if you know an easy way to view applications using AWE).  My initial guess was a second copy of SQL Server running with AWE, but that wasn’t the case.

The server happened to be running under VMWare, and the issue was the existence of the VMWare Balloon Driver, which runs as vmmemctl.  VMWare is able to share identical memory between machines, so if the balloon driver reserves memory identically over a number of machines, the host only has to store one copy of this in memory.  This can give a substantial lift to memory usage efficiency.

What the balloon driver does is to take up memory to simulate memory pressure on the virtualised server.  Windows running on the guest OS then determines what is unnecessary to be in memory and pages it to disk.  If the server requires more memory, the balloon should deflate a little.  This ensures that the guest OS only has as much memory as it needs, and very little is wasted.

However, what happens when SQL Server starts?  If there’s insufficient memory available, AWE won’t kick in at all, and SQL Server will be limited to VAS – 3GB in this case, due to the /3gb switch in boot.ini.  However, as SQL Server is using AWE and 3.5 GB, the case is slightly different.  What has happened here is SQL Server has detected enough free memory to start AWE, and grabbed as much as it can – 3.5 GB.  However, the balloon driver now conspires to keep memory pressure on, and as SQL Server believes that there is no free memory available, it will not increase its memory usage.

The balloon driver works well for almost all applications that blindly request more memory and are happy to let Windows manage memory.  SQL Server is different in that it actively manages its own memory, and will only request more if it believes the operating system can supply it.  I’d recommend not using a balloon driver on a production SQL Server installation.  Actually, I’d recommend not virtualising your SQL Server installation at all, if possible, but there are high availability reasons that may make it appropriate.

Jun 092008
 

In a previous post, I described what AWE memory is, and how it works with SQL Server 2005, against both Windows 2000 Server and Windows 2003 Server.  In this post, I’ll describe how it works in SQL Server 2000.

Firstly, run

SELECT @@version

to check what version of SQL Server you are running.  This will give you a version number, and the edition. 

The first thing to be aware of is that SQL Server 2000 will only use more than 2 GB of memory with Enterprise (or Developer) Edition.  This is the main limitation of SQL Server 2000 Standard, and if more than 2 GB of memory is required, you must upgrade to SQL Server 2000 Enterprise, or SQL Server 2005 Standard.

If you are running 8.00.2039 (Service Pack 4), you will also need to upgrade to 8.00.2040 to avoid a bug where only 50% of the available memory is available for use.  See http://support.microsoft.com/kb/899761/ for more information.

Enabling AWE on SQL Server 2000 is the same as SQL Server 2005:

  1. Add the /pae switch to boot.ini to allow Windows 2000/2003 to access more than 4 GB of memory
  2. Grant “Lock Pages in Memory” to the user account that runs the SQL Server process
  3. Set the configuration setting “AWE Enabled” to 1.
sp_configure 'show advanced options', 1
RECONFIGURE
GO
sp_configure 'awe enabled', 1
RECONFIGURE
GO

Note that AWE requires 2 GB of kernel VAS to address AWE memory above 16 GB, so using the /3gb switch in boot.ini will limit AWE to 16 GB.  See http://support.microsoft.com/kb/274750 for more information about the maximum memory useable for each operating system/SQL Server 2000 combination.

SQL Server 2000 will allocate all memory upon startup, to the value of the “Max Server Memory” configuration option.  If there is not enough memory, SQL Server will take as much as it can, leaving as little as 128 MB available.  This memory comes from the Windows non-pageable pool, and so this memory will never be swapped to disk.

If the server has less than 3 GB of memory available when SQL Server starts, AWE will not be available, and SQL Server will run in non-AWE mode.

It is always recommended to set a value for “Max Server Memory” to ensure that other processes running on the server have some memory available.  Leaving ~2 GB is a good setting, for under 32 GB total server memory, ~4 GB if 64 GB total server memory.  Additionally, when multiple instances of SQL Server are running on the one machine, “Max Server Memory” is necessary to prevent one instance from using too much memory and starving the other.  If “Max Server Memory” has not been set, the first instance of SQL Server 2000 starting will allocate all bar 128 MB, and leave no memory for the second instance.

Jun 062008
 

I came across a new (well, to me) term on the MSDN forums today: SARGable, coming from Search ARGuments.  Apparently this is DBA slang to describe how well a search term can be used against an index.  For example, a constant string is SARGable as it can easily be checked against an index, and LIKE ‘A%’ is SARGable as it is easy to look up anything beginning with A in the index.  However, LIKE ‘%A’ is NOT SARGable, as you would need to scan the entire index to find any matches where the column ends in A.

Likewise, if a WHERE clause contains many function calls, such as returning the substring of a datetime column (converted to a varchar) to only return the time portion, the optimiser won’t be able to match this phrase up to an index, and so it is not SARGable.

Jun 052008
 

A common misconception is that triggers fire once per row, and there are a couple of examples in Books Online that don’t help.  For example, if you have a trigger on an update statement, and you call:

UPDATE Widgets SET [Status] = 1 WHERE Id IN (1, 3, 5, 10);

then you might expect to see the trigger fire once for every row that is being updated.

Not so.

Triggers fire once per statement, unless recursive triggers are enabled and your trigger updates a table which fires a trigger to update the original table.  This means, in the statement above, that up to four rows will be modified (assuming that Id is the primary key), and possibly less, if some of the values are missing.

When using triggers, two “magic” tables come into being, Inserted and Deleted. The Inserted table contains all rows that have been inserted into the table, and not surprisingly, the Deleted table holds those rows that have been deleted.  An UPDATE statement is considered to have deleted the old values, and inserted the new values, so a row will appear in both magic tables for an update statement.

In the UPDATE statement above, we can then expect to see one to four rows in both the inserted and updated table, depending on how many of the specified ID values existed (if none of the IDs existed, the trigger wouldn’t fire at all).  However, this code is quite common in triggers, and is fundamentally flawed if more than one row is updated:

CREATE TRIGGER TriggerOnUpdateWidgets ON Widgets
FOR UPDATE AS
    -- This is an incorrect way of using a trigger - assumes only one row will be modified!
    DECLARE @WidgetId int, @Name varchar(25), @Status int
    SELECT @WidgetId = Id, @Name = [Name], @Status = [Status]
    FROM deleted
    
    INSERT INTO WidgetsHistory (WidgetId, [Name], [Status])
    VALUES (@WidgetId, @Name, @Status) 
GO

The aim of this trigger is to write a row into the WidgetsHistory table when a row in Widgets is updated.  It will work fine as long as the update only ever operates on one row at a time.  It will also appear to work if multiple rows are updated, as the scalar variables @WidgetId, @Name, and @Status will be filled with values without a check to see how many rows are returned from Deleted.

The solution is to treat the trigger operation as a set, and insert all rows into the WidgetsHistory table:

CREATE TRIGGER TriggerOnUpdateWidgets ON Widgets
FOR UPDATE AS
    INSERT INTO WidgetsHistory (WidgetId, [Name], [Status])
    SELECT deleted.Id, deleted.[Name], deleted.[Status]
    FROM deleted
GO

The code is shorter, which is a side benefit.  If you need to do row-by-row processing, you might consider using a cursor inside your trigger, but I would recommend against this.  Triggers should be fast and lightweight, and the more complicated they are, the more likely they are to fail and rollback your transaction.  Additionally, triggers can be disabled, and so the logic provided by the trigger will not be run.  Instead, you might consider the trigger placing a request on a Service Broker queue so that it will run asynchronously, or redesign the application to handle this in a different manner.  Consider performing a check during the trigger to ensure that only one row is updated at a time, and all other transactions are rolled back.

Sample code to illustrate set based triggers:

CREATE DATABASE TriggerTestDB;
GO
USE TriggerTestDB;

-- Create a table of Widgets
CREATE TABLE Widgets (
      Id int not null identity primary key
    , [Name] varchar(25) not null
    , [Status] int not null)
GO

-- Create a table to record the historical values of the widgets
CREATE TABLE WidgetsHistory (
      HistoryDate datetime default CURRENT_TIMESTAMP
    , WidgetId int not null
    , [Name] varchar(25) not null
    , [Status] int not null
    , PRIMARY KEY (HistoryDate, WidgetId)
)
GO

-- Works fine
CREATE TRIGGER TriggerOnUpdateWidgets ON Widgets
FOR UPDATE AS
    INSERT INTO WidgetsHistory (WidgetId, [Name], [Status])
    SELECT deleted.Id, deleted.[Name], deleted.[Status]
    FROM deleted
GO

-- Populate the table with 50 widgets
SET NOCOUNT ON;
INSERT INTO Widgets ([Name], [Status]) VALUES ('WidgetName', 0);
GO 50

-- Update four of the widgets's status
UPDATE Widgets SET [Status] = 1 WHERE Id IN (1, 3, 5, 10);
GO

-- Have a look in the tables
SELECT * FROM Widgets;
SELECT * FROM WidgetsHistory;
GO

-- Cleanup
USE [Master];
DROP DATABASE TriggerTestDB;
GO