Sometimes you just have to get your hands dirty

Lately, we’ve been experiencing increased search times on our image archive due to the inclusion of information on which collections a particular image is contained in (r1970). For each result in a results list three database queries are issued just to get the desired counts. Suppose _conditionN_ and _joinsN_ are one of the three pairs of conditions and joins to determine own, shared, or public collections, and _conditionR_ is the clause for restricting the query to just the given resource. We then have the following queries:

SELECT COUNT(*) FROM table _joins1_ WHERE _condition1_ AND _conditionR_

SELECT COUNT(*) FROM table _joins2_ WHERE _condition2_ AND _conditionR_

SELECT COUNT(*) FROM table _joins3_ WHERE _condition3_ AND _conditionR_

Multiply that by, say, 100 results per page and it will wipe any smile from your face. So the question becomes: How can we perform multiple COUNTs in a single SQL query? There are quite a few discussions about this on the web (namely this or this), but the first attempt didn’t quite cut it:

SELECT
  SUM(IF(_condition1_, 1, 0)) AS count1,
  SUM(IF(_condition2_, 1, 0)) AS count2,
  SUM(IF(_condition3_, 1, 0)) AS count3

FROM table
   _joins1_
   _joins2_
   _joins3_

WHERE  _conditionR_

The problem – as we soon learn – is that the JOINs introduce extra rows which might potentially skew the results. Crap… What’s a boy to do in such a situation? Exactly, think for himself! So this is what we’ve finally come up with:

SELECT
  COUNT(DISTINCT IF(_condition1_, table.id, NULL)) AS count1,
  COUNT(DISTINCT IF(_condition2_, table.id, NULL)) AS count2,
  COUNT(DISTINCT IF(_condition3_, table.id, NULL)) AS count3

FROM table
   _joins1_
   _joins2_
   _joins3_

WHERE  _conditionR_

Hooray, now it works!

That was easy, you say? Well, then try to convince Rails, or, more specifically, ActiveRecord, to execute that query for you while still providing you with as much abstraction as possible ;-) It wasn’t too bad after all because, to be honest, we started messing with ActiveRecord’s internals all along (most of which is handled through our ApplicationModel all of our models inherit from). The resulting method is in r2092 which eventually dropped the rendering time down to a half. :-)

So, what do we learn from this? (Sure, we already knew it, but just for the sake of it…) – Sometimes you just have to get your hands dirty and handcraft your SQL, possibly bypassing any ORM layer or parts of it.

How could this be optimized even further? Any other ideas? Just leave a comment or get in touch with us directly!