Just over 18 months ago I wrote a post about Integration Testing using NUnit, and at the end I stated that I would cover database integration testing in a separate post. Well here it is. Finally!
The first thing to clear up is what scope of testing we’re covering here because it seems that many teams test their database entirely through their Data Access Layer. Now, I’m not saying that’s wrong, but in “You Write Your SQL Unit Tests in SQL?” I describe the reasons why I believe that it’s not the only approach and that in fact there is great value in doing it separately. So, in this post I’m starting out with a fundamental assumption that you’ve already unit tested both your SQL and C# code and that what we’re doing here is building on that level of trust.
Testing what then?
If we’re not testing the behaviour of our SQL or C# code then you may rightly ask what exactly are we testing? Well, as the name Integration Test implies we’re focusing on the interaction between our C# and SQL code. What sits between these two worlds is the client-side database API (which itself is probably layered) and the database server - a lot of code. Not only that but these two worlds have different ideas of how numbers and strings are represented and errors handled.
There are two kinds of common errors that can only come out at this level of testing (and higher) - schema changes and data type mismatches. The former is essentially a breaking change in the database’s public interface, such as the renaming or re-ordering of parameters and result set columns; the breakage can be quite subtle sometimes[*]. A draconian change policy would try to ensure this never happens, but then that also closes the door to refactoring the database schema too. The second common error revolves around data representation, with floating-point numbers being the prime example as databases often have a far more flexible way of defining the scale and precision of non-integral type numbers. Unexpected trailing whitespace caused by the use of a char(n) instead of varchar(n) type can cause surprises too.
These effects are made all the more apparent when you have separate SQL and C# developers because their timescales may differ, along with their code commits, and that can cause feature-level integration testing to be passed up in favour of going straight to system-level testing.
Schema only, no data
It is a pre-perquisite that before you can run your C# tests against your database you must ensure that it is in a known empty state. This means that you cannot just restore a production database, unless you also empty it of data. The alternative is to build it from scratch before running the entire test suite. And that is what we do.
However, because this can take some time we optimise the process. The Continuous Integration build first builds the database and runs the set of SQL tests, then it builds the C# code and runs those tests too. Finally it re-uses the SQL unit test database to run the integration tests, and because this is one single build & testing sequence the SQL/C# source code is guaranteed to be consistent (assuming the check-ins are themselves atomic and consistent at feature-level).
For our day-to-day development we optimise the process even further by creating a one-off backup of the current production schema and then restoring that each time we want to run the integration tests (after having applied any new patches). This restore & patch approach takes just minutes compared to the length of time it takes to create the database from scratch.
The test framework
The tests themselves are written almost exactly as you would write any other in NUnit, but we have our own helpers which are exposed via a base class to work around some of the problems inherent in using a unit testing framework to do other sorts of testing. For example the biggest problem is that you have the large dependency that is The Database to somehow configure for use within the test, and for this we use an environment variable. I generally dislike any sort of configuration like this normally[+], but it’s a necessary evil, particularly when you can’t just assume that every developer (and the build machine) will be using a local copy of SQL Server Express. By default we all have a PERSONAL_DATABASE variable configured that we use day-to-day and can be discovered even if testing through Visual Studio/Resharper. If the need arises a batch file can always be used to redirect the tests to another instance with little fuss.
The skeleton of a test looks something like this:-
[TestFixture, Category(”Database”)]
public class CustomerDatabaseTests : DatabaseTestBase
{
[Test]
public void Add_ShouldInsertCustomer()
{
using (var connection = Connection)
{
. . .
}
}
}
As you can see we use a separate category for the database tests so that you only run them when you know your “personal database” is correctly built. The fixture derives from the DatabaseTestBase class as that is how all the helpers are exposed, such as the Connection property that reads the connection string stored in the PERSONAL_DATABASE variable and serves up a freshly opened connection for it.
One design constraint this implies is that we use Parameterise From Above, rather than ugly Singletons to get our connection passed to the code under test. This a good design practice anyway and we exploit that further by ensuring that we control the outer transaction that surrounds it. This is made possible because we also have our own thin wrapper around the underlying .Net database classes. Controlling the real transaction means we can undo all changes made in the test by simply rolling back the transaction no matter which way the test ends (e.g. intercepting Commit()).
Sadly the need to manually scope the connection with a using() requires a level of discipline when writing the test as otherwise the effects can leak across tests. The alternative, which has yet to be implemented by us, is to use that old stalwart Execute Around Method so that we can add as many smarts as we need in the helper method to ensure the test is as isolated as possible. This would make the test look like this instead:-
[Test]
public void Add_ShouldInsertCustomer()
{
Execute
(
(connection) =>
{
. . .
}
);
}
Arranging
The first part of any test “dance” involves the arrangement of the dependent data and objects - which is hopefully minimal. With databases though it is common practice to use referential integrity to keep the data sane and so that may mean you have add some static data to avoid falling foul of it; it’s either that or drop the constraints which are there exactly to help find bugs.
NUnit supports both fixture and test level SetUp() methods to help keep them readable by factoring out common code. As a general rule we have used the fixture level setup for static (or reference) data that is orthogonal to the test and then used the other method or the test itself for data that is directly relevant to it. You often find yourself inserting the same static data time and again and so you have a choice of whether to create a C# based helper class or create some helper stored procedures in a separate schema so they can be shared across both the SQL and C# worlds, e.g.
[TestFixtureSetUp]
public void FixtureSetUp()
{
using (var connection = Connection)
{
connection.Execute(“exec
test.InsertCustomerAndDetails
(1, ‘Bob’, ‘London’, . . .);”);
}
}
Acting
The second part of the dance - acting - was pretty much covered in the overview of the framework as you just need to get your test controlled connection into the code under test.
Asserting
The final step is to verify the outcome by writing a bunch of asserts. If you’re just reading data then you can use the same style of setup shown above and then use the normal features of NUnit that you’d use to compare in-memory based sequences. But if you’re writing data then it’s a slightly more complicated affair and this is another area where a bunch of helpers are required. We still use the basic Assert mechanism, but invoke custom methods to query aspects of the data we expect to have written, e.g.
Assert.That(RowCount(“Customer”), Is.Equal.To(1));
Depending on the system you’re building you may be able to round-trip the data and get two tests for the price of one, as long as any caching is disabled between the write/read calls:-
{
var expected = new Customer(1, . . .);
dataMapper.InsertCustomer(expected);
var actual = dataMapper.FindCustomer(expected.Id);
Assert.That(actual.Name, Is.EqualTo(expected.Name));
}
However testing more than one thing at once is generally frowned upon, and rightly so because you don’t know whether the read or the write action failed. But you can also end up duplicating a lot of SQL in your test code if you don’t leverage your own API and that creates a test maintenance burden instead. If you’re purely writing data then you may have little choice but to write some form of query:-
{
. . .
string actual = String.Format(“SELECT count(*) FROM
Customer WHERE Id={0} AND Name=’{1}’”,
expected.Id, expected.Name);
Assert.That(RowCount(“Customer”), Is.Equal.To(1));
Assert.That(QueryResult(actual), Is.Equal.To(1));
}
The NUnit constraint model allows you to build your own more fluent style of interface so that you can avoid hand cranking SQL if you prefer:-
Assert.That(Table(“Customer”).HasRow.Where(“Id”).Equals(expected.Id).And(“Name”).Equals(expected.Name).Go());
One of the benefits of reusing the SQL unit test database is that you’ll be DBO and as such you’ll be able to exploit your god-like status to allow you to get access to the internals and write this sort of test, even if you’ve carefully constructed a tight interface to the database. It may feel painful writing out such long-winded queries but if you’re trying to ensure you maintain high-fidelity of your data through all the layers is there any alternative?
[*] The perfect example of a subtle breaking change comes with SQL Server and OLE-DB. In SQL Server any procedure parameter can have a default value, but with OLEDB defaulted parameters have[**] to be at the end of the argument list. A SQL developer could quite rightly add a new parameter anywhere in the argument list and provide a default value to ensure the interface appears to remain unchanged, but if its not at the end an OLE-DB based query would then fail.
[**] This may have been sorted now, but it was definitely still the case a few years ago.
[+] A new developer should be able to just install the version control client software, pull the source code (and preferably tools) locally and be able to build the software and run the tests out-of-the-box.