One of the problems with updating the schema of a database table that has a significant amount of data is that it can take ages. And whilst the data is being fixed up your database is in a half-baked state. A faster approach is to make a simpler metadata only change up-front, plug the missing data in in slow-time and then complete the meta-data change afterwards to close the loop. Using nullable columns makes the process more manageable.
Say for example I have a large customer table and I want to add a new non-nullable column with a default value. If I add that column as-is the database is going to have to touch every page to set the default values on all existing rows. However, if I add a nullable column, SQL Server at least, only has to update the table’s meta-data, it touches no data pages.
ALTER TABLE Customer ADD NewColumn INT NULL;
If the public interface to your database is raw access to the database tables then you’re pretty well hosed at this point without fixing the queries that touch the tables. But if you’ve built yourself a facade using views, functions, etc. to abstract the underlying data model you can use ISNULL() to fill in the default values on-the-fly when the client queries old data:-
SELECT c.Name,
ISNULL(c.NewColumn, @defaultValue)
FROM Customer c
When writing data your facade can manually apply the not null constraint and fill in the default values when not provided by the client. This may not be quite as performant as letting the database handle it natively but, bulk inserts aside, this should be more than adequate for most scenarios.
With the facade doing its job and encapsulating the data you’re free to update the old data in slow time. SQL Server (and probably other RDBMSs too) support using the TOP clause with the UPDATE statement to allow you to fix your data in small batches which helps keeps the transaction time down. It also means you can avoid blowing the transaction log if it’s of a fixed size.
UPDATE TOP(1000) Customer
SET NewColumn = @defaultValue
WHERE NewColumn is null
Theoretically speaking the nullability of the column is correctly handled, at least from the client’s perspective[1], so you don’t need to actually alter the underlying table again. But if you do want/need to enforce the constraint more naturally you’re going to have to bite the bullet. SQL Server supports the WITH NOCHECK option that still allows you to make a metadata only change, but that comes with its own drawbacks, and so you may just need to accept a final hit. However, at least you can split the whole task up into smaller chunks and execute them as capacity permits rather than panicking over how you’re going to squeeze the entire change into a smaller window.
[1] When I talk about Database Development Using TDD, this is the kind of requirement I have in mind as it’s perfect unit test fodder.
No comments:
Post a Comment