Posted about 7 years ago. Visible to the public.

ActiveRecord scopes [4d]

In this card we will learn to write code that scales with a large number of database records. We will do this by pushing work out of Ruby and into the database.

Goals

  • Understand what scopes ("relations") are and how they differ from arrays.

  • Understand how a purely object-oriented approach of processing data doesn't scale for a large number of database records.

  • Get into a habit of off-loading list-related calculations (filtering, mapping) to the database. Only fetch an ActiveRecord objects from the database when you actually plan to display or change it.

  • Understand that an expression like User.where(email: 'foo@bar.com') does not make an SQL query. It simply returns a scope object for further chaining with #where, #order, etc.

    Scopes will make an SQL query when you use them like an array, e.g. by iterating over them with #each. This may cause the scope to accidentally load its results prematurely. To prevent this, get into a habit of explicitely loading scopes by calling to_a when you're done filtering in the database.

  • Learn about the N+1 query problem Archive and eager loading.

  • Understand the difference between "named scopes" Archive and ad-hoc scopes using #where.

Note

We don't like to use default scopes Archive in our projects. It can be confusing when new scopes inherit conditions out of nowhere.

If an app does have a frequently used condition (like ignoring all archived posts), we define a named scope like Post.active and use that.

Learn

Resources

Tips

Preventing scopes from loading

A scope like User.where(email: 'foo@bar.com') does not make an SQL query. It simply returns a scope object for further chaining with #where, #order, etc.

It can be confusing that typing a scope into the Rails console will immediately load results from the database. This is due to the console automatically inspecting Archive the last statement, in an attempt to visualize the results.

To prevent this, add any other expression after the scope:

Copy
scope = Movie.all; nil

Call #to_sql on a scope to see the query it will produce. This does not actually run a query:

Copy
Card.where(title: 'Foo').to_sql # => SELECT * FROM cards WHERE title="Foo"

Advanced conditions

You can pass an array as a value in #where. ActiveRecord will make an IN() Archive query:

Copy
Card.where(id: [1,2,3]).to_sql # => SELECT * FROM cards WHERE id IN (1, 2, 3)

If you need an operator that is not equality (=) or inclusion (IN()) you cannot pass a hash to where. You need to pass an SQL fragment instead:

Copy
Card.where('id > ?', 5).to_sql # => SELECT * FROM cards WHERE id > 5

Chaining multiple #where will AND the conditions:

Copy
Card.where(id: [1,2,3]).where(title: 'Foo').to_sql # => SELECT * FROM cards WHERE id IN (1, 2, 3) AND title="Foo"

Preloading associations

When you preload asociations with #preload or #includes Rails will make one query for each table:

Copy
Card.where(title: 'Foo').preload(:user).to_a # SELECT * FROM cards WHERE title="Foo" # # | id | user_id | title # +----+---------+------------------- # | 2 | 4003 | Foo # | 4 | 4005 | Foo # # SELECT * FROM users WHERE id IN (4003, 4005)

Plucking columns

You can use #pluck to fetch a single column from a scope's result set. The result is returned as an array:

Copy
user_ids = Card.where(title: 'Foo').pluck(:user_id) # => [4003, 4005]

This is much faster than loading an array of Card records!

A common technique is to use an array of plucked foreign keys in another query:

Copy
User.where(id: user_ids).to_sql # => SELECT * FROM users WHERE id IN (4003, 4005)

This way, instead of one complicated query with many JOINs, you use multiple but simple queries. This can be easier to understand and may even perform better.

Exercises

Cards query (makandra only)

Ask someone for a dump of the staging target and load it into your local database Archive .

In the makandra cards app, load an array of all Users that are an admin in a deck whose title begins with makandra (e.g. makandra-talent).

Try to write as little SQL as possible. Wherever possible, define conditions using #where(hash). If you do need to use an SQL fragment, make it as short as possible.

It's OK to do to use more than one query (or one scope) for this exercises. However, your solution should always use the number of queries, regardless of the number of decks.

Search for MovieDB

In your MovieDB app, adapt the existing implementation Movie.search(query) to return a scope of movies matching the following query format:

  • The query is a list of space-separated words that all need to appear in the movie title in order for the movie to match (e. g. event horizon)
  • A word can be prefixed by a minus character to exclude that word (e.g. event -horizon).

If you haven't already considered this, make sure that the returned value is not an array but a scope that can be chained further. E.g. I should be able to use it like this:

Copy
Movie.search('event -main').search('horizon').where('year > 1995').order(:year).to_a

Adapt both the Cucumber scenario and RSpec examples for the search functionality, using the guidelines from the "Testing" chapter in Growing Rails Applications in Practice Archive .

  • RSpec examples should cover all edge cases
  • Cucumber scenario should only cover one "happy path" to show successful integration

Open budgets in Project Hero (makandra only)

In the Project Hero app, build a view that lists all projects that have open (unclosed) budgets with unused amounts:

Copy
+---------------+-----------------------------------+ | Project | Unused amount in all open budgets | +---------------+-----------------------------------+ | Foo | 200 € | | Bar | 150.000 € | | Baz | 90.000 € | +---------------+-----------------------------------+

Rows for projects with more than 2500 € of unused budget should be highlighted visually.

To get realistic data for this exercise, ask someone for a dump of the makandra stage and load it into your local database Archive .

Push your changes into a branch prefixed with your initials and a slash (/). E.g. if your name is John Doe push it into a branch jd/curriculum-open-budgets.

Version 1: OOP

First, build it naively by iterating over all projects, iterating over the budgets and activities of each project, and summing up amounts in Ruby.

When you're done, run tail -f log/development.log, reload the report and observe the performance of your view:

  • How long does your code need to compute the report?
  • How many SQL queries does it trigger?
  • How many Project, Budget and Activity objects are instantiated to handle the request?

Version 2: With scopes

Now build a version of your view that does the same number of queries regardless of how many projects, budgets, activities, etc. you have.

This new version should also not instantiate thousands of ActiveRecord objects, regardless of the database size.

Tip

You don't need to produce the report in a single query!
Consider running multiple separate queries, then compose the results in Ruby. The #index_by Archive and #group_by Archive methods are useful for that.

When the report works: Can you move your code to a OpenBudgetReport class with a nice API? For this the OpenBudgetReport needs to do as much work as possible and offer comfortable and minimum methods to its callers (the controller and view).

Note

For the purpose of this exercise, please don't reuse any existing classes that solve a similar problem in Project Hero, such as BudgetCalculator or Analysis::Budgets.

Discussion

Using scopes to decouple logic

Scopes can be a great tool to decompose complicated queries into multiple aspects or processing steps.

To illustrate this, let's expand our requirements for the search:

  • Our search method should filter movie records as described above
  • We also have an authorization rule that says users may only see the movies that they created (Movie#user_id points to the author).
  • We also paginate results so we only show 50 records at a time.

To implement this we can expand our method like this:

Copy
def search_movies(query:, user:, page:) page_size = 50 page_offset = (page - 1) * page_size Movie .where(user_id: user.id) # authorization .where('title ILIKE ?', "%#{query}%") # very naive search .offset(page_offset) # pagination .limit(page_size) # pagination ) end results = search_movies(query: 'sunshine', user: current_user, page: 3)

What's not so nice is that our method now has three responsibilities mixed together:

  • Filter movies
  • Honor authorization
  • Paginate

By leveraging scopes we can split this into three methods that each have a single responsibility:

Copy
def accessible_movies(scope, user) scope.where(user_id: user.id) end def search_movies(scope, query) scope.where('title ILIKE ?', "%#{query}%") end def paginate(scope, page) page_size = 50 page_offset = (page - 1) * page_size scope.offset(page_offset).limit(page_size) end scope = Movie.all scope = accessible_movies(scope, current_user) scope = search_movies(scope, 'sunshine') scope = paginate(scope, 3)
  • Each method has a single responsibility. In practice they would live in different classes, like MovieSearch, Authorization, Pagination.
  • The methods can be reused for other use cases, not just search. E.g. accessible_movies is a useful starting point for any method that filters movies. And paginate works for any kind of ActiveRecord scope, not just movies.

Discuss this with your mentor.

Growing Rails Applications in Practice
Check out our new e-book:
Learn to structure large Ruby on Rails codebases with the tools you already know and love.

Owner of this card:

Avatar
Henning Koch
Last edit:
about 1 month ago
by Henning Koch
Keywords:
relations
About this deck:
We are makandra and do test-driven, agile Ruby on Rails software development.
License for source code
Posted by Henning Koch to makandra Curriculum
This website uses short-lived cookies to improve usability.
Accept or learn more