The majority of work I do, as a programmer who wears a shirt and tie, involves databases. And I try to keep things nice and simple when working with databases. What I mean by simple is, extra tables in a database is fine, even triple the “minimum number” (whatever that is), if they all mostly follow some pattern. Constructing a database schema is then a simple application of the rules devised when designing the pattern, and means there’s not much to think about or have go wrong. With a judicious choice of pattern, ORM tools like Hibernate or other things like .NET Entity Framework can start working for you rather than against you. It’s even possible to write code to check over the schema and ensure that the pattern is being followed, and then write code to write code to do all the data access (yes, write code to write code…)
My boss, on the other hand, has a slightly different idea of simplicity. Namely, in his view, simplicity = minimise number of tables, and rely on stored procedures for everything. The arguments I want to win with him are not really what I want to blog about today.
Today I’d like to describe how I implemented strongly-typed lists of objects (think
List<SomeType> in C# /
list<AClass> in C++) adhering to my boss’s vision of minimising table count, using only three tables and a pile of T-SQL and custom constraints, instead of the “more complex” system of lots of join tables and foreign key constraints. I do not particularly recommend this mode of implementation. For a start, it has made use of .NET Entity Framework, shall we say, “interesting.”
So consider directly representing classes, such as a Product class, each with a table in the schema, such as tProduct. Suppose the Product class has properties with the types
List<Retailers>, representing the extras one can buy with a product, and the retailers retailing a product. (Suppose we also model
If we were implementing this with regular old join tables, there would be no columns in
tProduct to represent the lists, rather, there would be some table
tProductExtras, pointing at both
tExtra with FK constraints (call the columns
ExtraID), and a
tProductRetailers, pointing at both
tRetailer in a similar fashion.
But now my boss comes along and says, “Oh, that’s too many tables. Your join tables really have the same structure, two columns of type
INT, so they could easily be merged.” And he’s right on the second point, but with one caveat. If we had one join table for the whole database, call it
tJoin, that simply had e.g.
ItemID, we cannot use foreign key constraints on either of them, since
ParentID points to any of the tables that represent objects possessing a
List of something, and
ItemID points to any of the tables representing objects being part of a
List. Which means there’s no database-level way of enforcing data integrity in
tJoin, right? Lists are now so untyped it’s not funny.
Well, there is a way. (My sincere apologies.) Enter
tClass. For simplicity, let
tClass just be a table with a two columns,
ID as usual, and
Name, and containing with names of classes, each one with a table in the database. So going back to the example above,
(2, "Extra"), and
tList as follows: give it an
ID column, and a column which is a FK to
tClass, call it
ClassID is going to give us the type of object being contained in the list—the “T” in “List
tJoin is now going to have the columns
ListID (FK to
ItemID. (It needs to have something else if the
List is going to be a proper list and specify the order in which items appear, but if this bugs you, just imagine we have an
Order column in
tJoin that specifies what order things appear in the list.)
CREATE TABLE dbo.tClass ( ID int IDENTITY(0,1) NOT NULL, Name varchar(50) NOT NULL, CONSTRAINT PK_tClass PRIMARY KEY CLUSTERED (ID ASC)) CREATE TABLE dbo.tList ( ID int IDENTITY(0,1) NOT NULL, ClassID int NOT NULL, CONSTRAINT PK_tList PRIMARY KEY CLUSTERED (ID ASC)) GO ALTER TABLE dbo.tList WITH CHECK ADD CONSTRAINT [FK_tList_tClass] FOREIGN KEY(ClassID) REFERENCES dbo.tClass (ID) GO CREATE TABLE dbo.tJoin ( ListID int NOT NULL, ItemID int NOT NULL, ) GO ALTER TABLE dbo.tJoin WITH CHECK ADD CONSTRAINT [FK_tJoin_tList] FOREIGN KEY(ListID) REFERENCES dbo.tList (ID) GO
Now go back to
tProduct. It is now going to have two FK columns to
tList, call them
RetailersListID. These will adequately model the
Product having attributes of type
List<Anything>. But really we also want to be sure that the
tProduct columns point to the right kind of lists, i.e. that
ExtrasListID refers to a
tList referring to
tClass with name ‘Extra’, and similarly for
RetailersListID. Here comes the first custom constraint.
In SQL Server, one cannot have arbitrary code in a constraint—not even a
SELECT. Only fairly simple things are allowed, such as
BETWEEN. How is it possible, then, to enforce this kind of policy? It turns out that scalar user-defined functions (UDFs) are allowed. So we create a UDF like so, to perform a
SELECT for us:
CREATE FUNCTION [dbo].[fListIsOfType] ( @ListID int, @ClassName varchar(50) ) RETURNS int AS BEGIN DECLARE @Result int SELECT @Result = COUNT(*) FROM tList INNER JOIN tClass ON tList.ClassID = tClass.ID WHERE tList.ID = @ListID AND tClass.Name = @ClassName RETURN @Result END
Now we have the pleasure of creating a custom checked constraint on each list column like so:
ALTER TABLE [dbo].[tProduct] WITH CHECK ADD CONSTRAINT [CK_tProduct_Extras] CHECK (([dbo].[fListIsOfType]([ExtrasListID],'Extra')>(0))) GO ALTER TABLE [dbo].[tProduct] CHECK CONSTRAINT [CK_tProduct_Extras] GO ALTER TABLE [dbo].[tProduct] WITH CHECK ADD CONSTRAINT [CK_tProduct_Retailers] CHECK (([dbo].[fListIsOfType]([RetailersListID],'Retailer')>(0))) GO ALTER TABLE [dbo].[tProduct] CHECK CONSTRAINT [CK_tProduct_Retailers] GO
Great. Now we can be pretty sure that
ExtrasListID is a
List<Extra>, and not a
List<SomethingElse>. We could stop here and rely on stored procedures to do the rest of the work—ensuring items are of the correct type when adding to lists and so on—but that’s no fun. Especially when *ahem* somebody invariably ignores the stored procedures and puts crap data in there manually.
The other side of
tJoin requires constraining. This is a bit harder. Here’s how I did it. First, I wrote another scalar UDF like so:
CREATE FUNCTION [dbo].[fItemBelongs] ( @ItemID int, @ListID int ) RETURNS int AS BEGIN DECLARE @Result int DECLARE @ClassName varchar(50) SELECT @ClassName = tClass.Name FROM tList INNER JOIN tClass ON tList.ClassID = tClass.ID WHERE tList.ID = @ListID SELECT @Result = CASE WHEN @ClassName = 'Extra' THEN (SELECT COUNT(*) FROM tExtra WHERE ID = @ItemID) WHEN @ClassName = 'Retailer' THEN (SELECT COUNT(*) FROM tRetailer WHERE ID = @ItemID) ELSE 0 END RETURN @Result END
Now I didn’t finish adding cases to the CASE expression, because I have a lot of tables. It’s boring.
The reason we can’t just write one metaquery, instead of a whole heap inside something like CASE, that finds the right table to do the lookup in is…because UDFs can’t execute arbitrary SQL. I’d love to just write
-- ... DECLARE @sql nvarchar(max), @params nvarchar(max) DECLARE @count int SET @sql = 'SELECT @count = COUNT(*) FROM t' + @ClassName + ' WHERE ID = @ItemID' SET @params = '@count int OUTPUT, @ItemID int' EXEC sp_executesql @sql, @params, @count OUTPUT, @ItemID RETURN @count -- ..
and put that into a UDF, but due to the limitations we can’t (and fair enough, if we could execute stuff like that in a UDF, forget about query optimisation).
I stopped adding cases, but I used the function as the basis for something else. I wrote a stored procedure that updates the
fItemBelongs function as follows.
CREATE PROCEDURE [dbo].[sUpdateItemBelongsTest] AS BEGIN SET NOCOUNT ON; DECLARE @sql nvarchar(max) SET @sql = 'ALTER FUNCTION fItemBelongs ( @ItemID int, @ListID int ) RETURNS int AS BEGIN DECLARE @Result int DECLARE @ClassName varchar(50) SELECT @ClassName = tClass.Name FROM tList INNER JOIN tClass ON tList.ClassID = tClass.ID WHERE tList.ID = @ListID SELECT @Result = CASE ' + REPLACE((SELECT ' WHEN @ClassName = ''' + ClassName + ''' THEN (SELECT COUNT(*) FROM t' + ClassName + ' WHERE ID = @ItemID) ' FROM tClass FOR XML PATH('')), '
', '') + ' ELSE 0 END RETURN @Result END' PRINT @sql EXEC sp_executesql @sql END
“What the hell is that about?” I hear you ask. Well. It builds the
ALTER FUNCTION statement using rows out of the
tClass table, prints the statement just to show me what is being executed, then executes it. To build the massive
CASE, it concatenates a pile of strings of the form
WHEN ClassName = 'Blah' (SELECT COUNT(*) FROM tBlah WHERE ID = @ItemID, using the
FOR XML PATH('') concatenation trick on the
Now using the function is as easy as
ALTER TABLE [dbo].[tJoin] WITH CHECK ADD CONSTRAINT [CK_tJoin_ItemsBelong] CHECK (([dbo].[fItemBelongs]([ItemID],[ListID])>(0))) GO ALTER TABLE [dbo].[tJoin] CHECK CONSTRAINT [CK_tJoin_ItemsBelong] GO
and to ensure you always have a current
fItemBelongs, consider adding the stored procedure to a trigger on changes to
Finally, I wanted to ensure that
tClass only referred to things that have tables—specifically, tables with a proper
ID column of the right type and so on. To do this, I wrote another UDF.
CREATE FUNCTION [dbo].[fClassHasTable] ( @ClassName varchar(50) ) RETURNS int AS BEGIN DECLARE @Result int SELECT @Result = COUNT(*) FROM sys.tables T INNER JOIN sys.columns C ON T.object_id = C.object_id WHERE T.name = 't' + @ClassName AND C.name = 'ID' AND C.user_type_id = 56 -- 56 = int, YMMV. AND C.is_nullable = 0 AND C.is_identity = 1 RETURN @Result END
And using it in another constraint was pretty straightforward.