SQL – Inner joins and limit…

I’ve been hard at work at the rockface of the new Wiblog system recently, and it seems that the more I get done the more there is to do. I met a bit of a problem tonight regarding tags (those lovely bits of folksonomy) which I think I’ve fixed. This one is quite hard to explain, and is only suitable for people who would wear a “geek” badge with pride, but I’ll do my best.

I have a MySQL database containing table with blog entries in it. That table is called ‘entries’. And I have another table with tags in it called ‘tags’. Each entry in the entries table has a unique ID, and each tag in the tag table is linked to the entry it belongs to using a field called ‘entryid’. So far so good.

However, I want to paginate (that is, split into multiple pages) my blog entries, showing 10 on a page then a link to the next page of 10. That’s great as long as there are zero or one tags linked to each entry. The code would look something like this:

select e.id, e.title, e.body, t.tag
from entries e
left outer join tags t on t.entryid = e.id
order by e.id desc
limit 0, 10

Those of you with some SQL knowledge will see my problem. When any particular entry has more than one tag linked to it, the ‘limit’ clause will not work correctly. Rather than bring back the top 10 entries, it will bring back the top 9, with one of thm (the one with two tags) twice. Once for each tag. Let me illustrate:

Output for the above code with one tag per entry:

id title body tag
100 100 title 100 body 100 tag
99 99 title 99 body 99 tag
98 98 title 98 body 98 tag
97 97 title 97 body 97 tag
96 96 title 96 body 96 tag
95 95 title 95 body 95 tag
94 94 title 94 body 94 tag
93 93 title 93 body 93 tag
92 92 title 92 body 92 tag
91 91 title 91 body 91 tag

Output for the above code where entry ID 100 has two tags:

id title body tag
100 100 title 100 body 100 tag
100 100 title 100 body 100 another tag
99 99 title 99 body 99 tag
98 98 title 98 body 98 tag
97 97 title 97 body 97 tag
96 96 title 96 body 96 tag
95 95 title 95 body 95 tag
94 94 title 94 body 94 tag
93 93 title 93 body 93 tag
92 92 title 92 body 92 tag

I’ve made the second tag bold to make it obvious. So, where has ID 91 gone? Dropped off the end because of the second tag linked to ID 10, that’s where. No good, in fact it’s downright bad.

I’ve had an idea for how to fix this for a while, but tonight has been my first chance to get it working. I’m sure this isn’t unique, but I had a quick search and couldn’t find anything similar. So, here we go.

The important thing we need to know is what the highest and lowest IDs of entries are that we want to bring back. Once we know that it doesn’t matter how many tags we have linked to any entries, as we always know our list will be bound by the upper and lower limits.

Here’s the code, with (hopefully) clear instructions:

# select the fields we want
select e.id, e.title, e.body, t.tag
# fom the entries table
from entries e
# join the tags table
left outer join tags t on t.entryid = e.id
# here's where it gets interesting
# firstly, get the minimum entry ID and check we only get ID equal or greater
where (e.id>=(select min(l.id) from (select id
from entries
# make sure we order correctly
order by id desc
# set our limit (used for pagination)
limit 0, 10) as l))
# then get the maximum entry ID and check we only get ID equal or less
and (e.id
from entries
# make sure we order correctly
order by id desc
# set our limit (used for pagination)
limit 0, 10) as l))
# make sure we order correctly
order by e.id desc

So, the above code should give you:

id title body tag
100 100 title 100 body 100 tag
100 100 title 100 body 100 another tag
99 99 title 99 body 99 tag
98 98 title 98 body 98 tag
97 97 title 97 body 97 tag
96 96 title 96 body 96 tag
95 95 title 95 body 95 tag
94 94 title 94 body 94 tag
93 93 title 93 body 93 tag
92 92 title 92 body 92 tag
91 91 title 91 body 91 tag

ID 91 is back! And while we actually have 11 rows returned by the query, it’s pretty easy to parse it using PHP to make sure we don’t show the same entry details twice. The only thing to watch out for is that you get the ordering and limits the same in both the min() and max() statements. I hope that’s useful to you.