Read more

PSA: Be super careful with complex `eager_load` or `includes` queries

Tobias Kraze
November 06, 2023Software engineer at makandra GmbH

TLDR

Using .includes or .eager_load with 1-n associations is dangerous. Always use .preload instead.

Consider the following ActiveRecord query:

BlogPost.eager_load(
  :comments
  :attachments,
).to_a
Illustration web development

Do you need DevOps-experts?

Your development team has a full backlog? No time for infrastructure architecture? Our DevOps team is ready to support you!

  • We build reliable cloud solutions with Infrastructure as code
  • We are experts in security, Linux and databases
  • We support your dev team to perform
Read more Show archive.org snapshot

(Let's assume we only have a couple of blog posts; if you use pagination the queries will be more complicated, but the point still stands.

Looks harmless enough? It is not.

The problem

ActiveRecord will rewrite this into a query using LEFT JOINs which looks something like this:

SELECT "blog_posts".*, "comments".*, "attachments".*
FROM "blog_posts"
LEFT OUTER JOIN "comments" ON "comments"."blog_post_id" = "blog_posts"."id"
LEFT OUTER JOIN "attachments" ON "attachments"."blog_post_id" = "blog_posts"."id";

Now assume there is a single blog post with 100 comments and 10 attachments. The OUTER JOINs will cause the query to return 1000 database rows, the full cross product of the blog post × all its comments × all its attachments, including 1000 redundant copies of all the fields of the blog post. If you add a third association this gets exponentially worse.

Needless to say, this is super inefficient. We've seen cases in production where such queries consume GBs of memory bringing down our servers.

The solution

Just use .preload instead. Rails will use separate queries to preload the data. This cannot lead to a cross product.

About .includes

If you use .includes instead of .eager_load you might or might not be okay. In this case ActiveRecord decides via some heuristic whether to use JOINs or separate queries. Do not risk it.

Sometimes you need a JOIN due to some WHERE condition on the joined data set. In this case, use .eager_load on the associations you need, and preload otherwise.

As long as all your associations are 1-1 you are also okay, in this case the JOIN might have slightly better performance.

Posted by Tobias Kraze to makandra dev (2023-11-06 12:42)