Read more

PSA: Be super careful with complex `eager_load` or `includes` queries

Tobias Kraze
November 06, 2023Software engineer at makandra GmbH

TLDR

Using .includes or .eager_load with 1-n associations is dangerous. Always use .preload instead.

Consider the following ActiveRecord query:

BlogPost.eager_load(
  :comments
  :attachments,
).to_a
Illustration online protection

Rails Long Term Support

Rails LTS provides security patches for old versions of Ruby on Rails (2.3, 3.2, 4.2 and 5.2)

  • Prevents you from data breaches and liability risks
  • Upgrade at your own pace
  • Works with modern Rubies
Read more Show archive.org snapshot

(Let's assume we only have a couple of blog posts; if you use pagination the queries will be more complicated, but the point still stands.

Looks harmless enough? It is not.

The problem

ActiveRecord will rewrite this into a query using LEFT JOINs which looks something like this:

SELECT "blog_posts".*, "comments".*, "attachments".*
FROM "blog_posts"
LEFT OUTER JOIN "comments" ON "comments"."blog_post_id" = "blog_posts"."id"
LEFT OUTER JOIN "attachments" ON "attachments"."blog_post_id" = "blog_posts"."id";

Now assume there is a single blog post with 100 comments and 10 attachments. The OUTER JOINs will cause the query to return 1000 database rows, the full cross product of the blog post × all its comments × all its attachments, including 1000 redundant copies of all the fields of the blog post. If you add a third association this gets exponentially worse.

Needless to say, this is super inefficient. We've seen cases in production where such queries consume GBs of memory bringing down our servers.

The solution

Just use .preload instead. Rails will use separate queries to preload the data. This cannot lead to a cross product.

About .includes

If you use .includes instead of .eager_load you might or might not be okay. In this case ActiveRecord decides via some heuristic whether to use JOINs or separate queries. Do not risk it.

Sometimes you need a JOIN due to some WHERE condition on the joined data set. In this case, use .eager_load on the associations you need, and preload otherwise.

As long as all your associations are 1-1 you are also okay, in this case the JOIN might have slightly better performance.

Posted by Tobias Kraze to makandra dev (2023-11-06 12:42)