Friday 22 April 2011

Database Enumerations – A Non-Performant Query

Back in September last year I wrote about how we have implemented constants and enumerations in our SQL Server database (Implementing Constants & Enumerations in a Database). Essentially we have used parameterless User Defined Functions (UDFs) to act as the symbol for the constant or enumeration. At the time I said that I was not aware of any performance impact that this had, but that I would write it up when I found one. Well, I wrote a support query the other day and noticed that there was quite a difference between the query with and without the UDF. It looked something like this:-

SELECT *
FROM   Requests r
JOIN   Tasks t
ON     r.RequestId = t.RequestId
AND    t.TaskType = dbo.TaskType_Calculation()
WHERE  r.ValueDate = . . .

The two tables involved in the join have millions of rows in them and although there is an index covering part of the query the selectivity is quite low in this case. I didn’t look at the execution plans because this is a template query that I use regularly and the only difference was the explicit filter on TaskType using the UDF. The query took 13 seconds to execute.

Given how often I run very similar queries to this I suspected this was the evidence I was looking for to show how a UDF could affect query performance, and so I did the obvious thing which was to substitute the call to the UDF with the relevant constant:-

AND    t.TaskType = 2 -- dbo.TaskType_Calculation()

Lo and behold the query ran in just 2 seconds. I switched back and forth between the two forms to see if the performance was consistent and it was – 2s vs 13s. Somewhat perturbed now that our lovely idea may be an accident waiting to happen I checked the performance using an intermediate variable, which is effectively how we have used this idiom in our code (because you can’t always reference the UDF directly and it’s a bit verbose):-

DECLARE @calculationTask udt.TaskType_t
SET     @calculationTask =
dbo.TaskType_Calculation()

SELECT *
FROM   Requests r
JOIN   Tasks t
ON     r.RequestId = t.RequestId
AND    t.TaskType = @calculationTask
WHERE  r.ValueDate = . . .

Tentatively I pressed CTRL+E in SSMS to execute the query… and… it returned in just 2 seconds. Just to be sure I then alternated between all 3 forms but the intermediate variable version still performed exactly the same as the hard-coded constant. Phew! This is clearly not an exhaustive test but it does give me some continuing comfort that it is a sound idea but that we now know for sure that it can have a performance impact in some scenarios.

P.S. This was on SQL Server 2008 R1.

No comments:

Post a Comment