Sometimes you just have to get your hands dirty
Lately, we’ve been experiencing increased search times on our image archive due to the inclusion of information on which collections a particular image is contained in (r1970). For each result in a results list three database queries are issued just to get the desired counts. Suppose _conditionN_
and _joinsN_
are one of the three pairs of conditions and joins to determine own, shared, or public collections, and _conditionR_
is the clause for restricting the query to just the given resource. We then have the following queries:
SELECT COUNT(*) FROM table _joins1_ WHERE _condition1_ AND _conditionR_
SELECT COUNT(*) FROM table _joins2_ WHERE _condition2_ AND _conditionR_
SELECT COUNT(*) FROM table _joins3_ WHERE _condition3_ AND _conditionR_
Multiply that by, say, 100 results per page and it will wipe any smile from your face. So the question becomes: How can we perform multiple COUNT
s in a single SQL query? There are quite a few discussions about this on the web (namely this or this), but the first attempt didn’t quite cut it:
SELECT
SUM(IF(_condition1_, 1, 0)) AS count1,
SUM(IF(_condition2_, 1, 0)) AS count2,
SUM(IF(_condition3_, 1, 0)) AS count3
FROM table
_joins1_
_joins2_
_joins3_
WHERE _conditionR_
The problem – as we soon learn – is that the JOIN
s introduce extra rows which might potentially skew the results. Crap… What’s a boy to do in such a situation? Exactly, think for himself! So this is what we’ve finally come up with:
SELECT
COUNT(DISTINCT IF(_condition1_, table.id, NULL)) AS count1,
COUNT(DISTINCT IF(_condition2_, table.id, NULL)) AS count2,
COUNT(DISTINCT IF(_condition3_, table.id, NULL)) AS count3
FROM table
_joins1_
_joins2_
_joins3_
WHERE _conditionR_
Hooray, now it works!
That was easy, you say? Well, then try to convince Rails, or, more specifically, ActiveRecord, to execute that query for you while still providing you with as much abstraction as possible ;-) It wasn’t too bad after all because, to be honest, we started messing with ActiveRecord’s internals all along (most of which is handled through our ApplicationModel all of our models inherit from). The resulting method is in r2092 which eventually dropped the rendering time down to a half. :-)
So, what do we learn from this? (Sure, we already knew it, but just for the sake of it…) – Sometimes you just have to get your hands dirty and handcraft your SQL, possibly bypassing any ORM layer or parts of it.
How could this be optimized even further? Any other ideas? Just leave a comment or get in touch with us directly!