Namespaced SQL Columns in Queries

One of the things I really like about the C# language is the wide range of database ORM and data mapper packages available. In a nutshell an ORM can help remove the need to write SQL queries and stored procedures in your database, instead you tell it what the “model” of your data is (i.e. the schema of the tables and how they relate to each other) and the ORM constructs SQL queries to fetch, update and add data. Data mappers do much the same thing.

A simple example would probably be helpful. This is psuedo-code, it’s not from a real ORM:

// create the connection to the database which we have modelled
MyDatabaseModel myDb = new MyDatabaseModel("localhost", "username", "password");

// get all records from the MyTableType table and cast them to MyTableType objects
IEnumerable allTheThings = myDb.MyTableType.Get();

// create a new MyTableType object
MyTableType newThing = new MyTableType {
Column1 = "Hello";
Column2 = "World";
};

// insert it into the database
myDb.MyTableType.Insert(newThing);

You can imagine this saves a LOT of time and effort, particularly for applications with a simple database schema. I’ve used several ORMs including Entity Framework (big, powerful, complex) and Linq2SQL (small, quick, easy, only works with SQL Server). I do want to try some others, in particular Rob Conery’s massive.

In PHP there are ORM packages available, too, although I admit I’ve not used any of them. Sometimes, however, you need to inject a bit of cleverness into your database access code to help speed up development, but don’t have time (or inclination) to rewrite the system around an ORM.

Repetitive SQL

This was the situation I found myself in the other night. I had a medium-sized application with lots of queries, and some tables that were being accessed a lot. I found I had very repetitive SQL strings like this:

select t1.col1, t1.col2, t2.col1, t2.col2
from table1 t1
inner join table2 t2 on t2.parentid = t1.id

Fortunately I’d put all the queries into two sets of files:

  • The classes for my types, each one of which had its own Get(), Save() and Delete() methods
  • Manager classes which handle loading of lists of object, or loading sets of related objects

The problem was I had used the same column names across many tables. This code:

select t1.col1, t1.col2, t2.col1, t2.col2

Gives a set of results that looks like this:

col1 col2 col1 col2
Table 1, Row 1, Col 1 Table 1, Row 2, Col 2 Table 2, Row 1, Col 1 Table 2, Row 2, Col 2

Which “col1” column belongs to table1, and which to table2? I was in a pickle, and no mistake. What I needed was a way to namespace the columns I returned.

Namespacing

Namespacing is simply a way to group related functionality together in a system. You can imagine a folder structure on a hard drive being like many namespaces:

/My Documents/
/My Documents/Photos/
/My Documents/Photos/Holidays/
/My Documents/Music/
/My Documents/Music/County-AND-Western/

You know where all the music of both types (country AND western) are because the folder structure clearly shows you the way.

In C# the different levels in a namespace are separated by a dot, for example:

MyApp.UI.Helpers.DateHelpers

I thought I could use this same naming convention for my queries. Also the SQL language itself uses dots to denote the boundary between a table and its columns:

select table1.*
from table1
where table1.id = 123

After a bit of experimentation I found that as long as the column names were put in quotes I could return pretty much anything I wanted. For example:

select t1.id as 'this.is.my.test'
from table1 t1

So I realised I could namespace all the columns by the table they belong to, like this:

select
t1.col1 as 'table1.col1', t1.col2 as 'table1.col2',
t2.col1 as 'table2.col1', t2.col2 as 'table2.col2'
from table1 t1
inner join table2 t2 on t2.parentid = t1.id

Now the result set I get back looks like this:

table1.col1 table1.col2 table2.col1 table2.col2
Table 1, Row 1, Col 1 Table 1, Row 2, Col 2 Table 2, Row 1, Col 1 Table 2, Row 2, Col 2

Much better! I can query the columns for each table safely.

Making namespaced column queries easy to use

Now I’d proved the concept I did some work to make it easier to use these namespaced columns. Firstly I added a new method to each of my classes which returned a string which I could use in queries.

As this application is a plugin for WordPress I am using the global $wpdb object to do my database access. Here is PHP for a small, simple class:

class MyType {

// the properties of this class
var $id;
var $name;

// returns a string for all the columns in this class
function ColumnQuery( $prefix ) {
$s = "%1.id as '%2.id', %1.name as '%2.name'";
$s = str_replace( "%1", $prefix, $s );
$s = str_replace( "%2", "MyType", $s );
return $s;
}

// loads the row with the given ID
function Load( $id ) {
global $wpdb;
$sql = $wpdb->prepare( "select
" . MyType::ColumnQuery( "t1" ) . "
from {$wpdb->prefix}MyType t1
where b.id = %d;",
$id );
$row = $wpdb->get_row( $sql, ARRAY_A );
$this->MapRow( $row );
return $this;
}

// maps a database row to the current object
function MapRow( $row ) {
$this->id = $row["MyType.id"];
$this->name = $row["MyType.name"];
}
}

// create a new instance of a MyType object
$myType = new MyType();

// populate the details of the $myType variable with the details from row 123
$myType.Load(123);

One really important think to note is that I use the ARRAY_A parameter in the $wpdb->get_row() method. This means that the columns are returned as an associative array, which is the only way I could access the namespaced columns because of the dots in them.

Something has just been dropped in my brain…

Actually, and this has only just struck me, I could probably use a different character than a dot and use the default column methods. Here’s quick psuedo-code using underscores instead of dots:

// returns a string for all the columns in this class
function ColumnQuery( $prefix ) {
$s = "%1.id as '%2_id', %1.name as '%2_name'";
$s = str_replace( "%1", $prefix, $s );
$s = str_replace( "%2", "MyType", $s );
return $s;
}

// maps a database row to the current object
function MapRow( $row ) {
$this->id = $row->MyType_id;
$this->name = $row->MyType_name;
}

Hmm, I might have to go back and do that as it’s a bit cleaner…

Table variables

The $prefix parameter in the ColumnQuery method means I can set what the variable for the table is, for example:

$sql = "select
" . MyType::ColumnQuery( "t1" ) . "
from {$wpdb->prefix}MyType t1";

In this case the table variable is “t1”. The resulting SQL string will be:

select t1.id as 'MyType.id', t1.name as 'MyType.name'
from MyType t1

Related tables

That’s great for single tables, but the real power comes when you use this structure for multiple related tables.

Imagine we have two related tables like this:

  1. Child
  2. Parent

These tables are related using a foreign key; a parent_id column in the Child table which stored the id of the Parent.

Whenever we load a Child object we also want to know the details of the parent. This is the kind of thing that an ORM system eats for breakfast. Using our new namespaced column query it’s also pretty easy. Here are the classes for the Child and Parent type:

class Child {

// the properties of this class
var $id;
var $name;
var $parent_id;
var $parent;

// returns a string for all the columns in this class
function ColumnQuery( $prefix ) {
$s = "%1.id as '%2.id', %1.name as '%2.name', %1.parent_id as '%2.parent_id'";
$s = str_replace( "%1", $prefix, $s );
$s = str_replace( "%2", "Child", $s );
return $s;
}

// maps a database row to the current object
function MapRow( $row ) {
$this->id = $row["Child.id"];
$this->name = $row["Child.name"];
$this->parent_id = $row["Child.parent_id"];

// if we have columns from the parent table then populate the parent object
if ( $row["Parent.id"] != "" ) {
$this->Parent = new Parent();
$this->Parent->MapRow( $row );
}
}
}

class Child {

// the properties of this class
var $id;
var $name;

// returns a string for all the columns in this class
function ColumnQuery( $prefix ) {
$s = "%1.id as '%2.id', %1.name as '%2.name'";
$s = str_replace( "%1", $prefix, $s );
$s = str_replace( "%2", "Parent", $s );
return $s;
}

// maps a database row to the current object
function MapRow( $row ) {
$this->id = $row["Parent.id"];
$this->name = $row["Parent.name"];
}
}

This is the magic bit:

// if we have columns from the parent table then populate the parent object
if ( $row["Parent.id"] != "" ) {
$this->Parent = new Parent();
$this->Parent->MapRow( $row );
}

If the system detects that we have columns from the parent table then we can fully populate the parent property. Here’s the Load() method for the Child class that would load and populate a Child instance with the details of its Parent:

class Child {
function Load( $id ) {
global $wpdb;
$sql = $wpdb->prepare( "select
" . Child::ColumnQuery( "c" ) . ",
" . Parent::ColumnQuery( "p" ) . "
from {$wpdb->prefix}Child c
inner join {$wpdb->prefix}Parent p on p.id = c.parent_id
where c.id = %d;",
$id );
$row = $wpdb->get_row( $sql, ARRAY_A );
$this->MapRow( $row );
return $this;
}

So we would load a child object like this:

$child = new Child();
$child->Load( 123 );

And we can access details of the child and it’s parent like this:

print "ID: " . $child->id . "
";
print "Name: " . $child->name . "
";
print "Parent Name: " . $child->parent->name;

The same approach can be used when loading multiple rows. For example here’s some SQL to load all children:

$sql = $wpdb->prepare( "select
" . Child::ColumnQuery( "c" ) . ",
" . Parent::ColumnQuery( "p" ) . "
from {$wpdb->prefix}Child c
inner join {$wpdb->prefix}Parent p on p.id = c.parent_id;";

Wrapping up

This technique helped me out of a little hole this week, hopefully it can help you as well.