Sometimes you just have to get your hands dirty
Lately, we’ve been experiencing increased search times on our image archive due to the inclusion of information on which collections a particular image is contained in (r1970). For each result in a results list three database queries are issued just to get the desired counts. Suppose
_joinsN_ are one of the three pairs of conditions and joins to determine own, shared, or public collections, and
_conditionR_ is the clause for restricting the query to just the given resource. We then have the following queries:
SELECT COUNT(*) FROM table _joins1_ WHERE _condition1_ AND _conditionR_ SELECT COUNT(*) FROM table _joins2_ WHERE _condition2_ AND _conditionR_ SELECT COUNT(*) FROM table _joins3_ WHERE _condition3_ AND _conditionR_
Multiply that by, say, 100 results per page and it will wipe any smile from your face. So the question becomes: How can we perform multiple
COUNTs in a single SQL query? There are quite a few discussions about this on the web (namely this or this), but the first attempt didn’t quite cut it:
SELECT SUM(IF(_condition1_, 1, 0)) AS count1, SUM(IF(_condition2_, 1, 0)) AS count2, SUM(IF(_condition3_, 1, 0)) AS count3 FROM table _joins1_ _joins2_ _joins3_ WHERE _conditionR_
The problem – as we soon learn – is that the
JOINs introduce extra rows which might potentially skew the results. Crap… What’s a boy to do in such a situation? Exactly, think for himself! So this is what we’ve finally come up with:
SELECT COUNT(DISTINCT IF(_condition1_, table.id, NULL)) AS count1, COUNT(DISTINCT IF(_condition2_, table.id, NULL)) AS count2, COUNT(DISTINCT IF(_condition3_, table.id, NULL)) AS count3 FROM table _joins1_ _joins2_ _joins3_ WHERE _conditionR_
Hooray, now it works!
That was easy, you say? Well, then try to convince Rails, or, more specifically, ActiveRecord, to execute that query for you while still providing you with as much abstraction as possible ;-) It wasn’t too bad after all because, to be honest, we started messing with ActiveRecord’s internals all along (most of which is handled through our ApplicationModel all of our models inherit from). The resulting method is in r2092 which eventually dropped the rendering time down to a half. :-)
So, what do we learn from this? (Sure, we already knew it, but just for the sake of it…) – Sometimes you just have to get your hands dirty and handcraft your SQL, possibly bypassing any ORM layer or parts of it.
How could this be optimized even further? Any other ideas? Just leave a comment or get in touch with us directly!