165 ActiveRecord scopes [4d]

Posted Almost 9 years ago. Visible to the public.

In this card we will learn to write code that scales with a large number of database records. We will do this by pushing work out of Ruby and into the database.

Goals

  • Understand what scopes ("relations") are and how they differ from arrays.

  • Understand how a purely object-oriented approach of processing data doesn't scale for a large number of database records.

  • Get into a habit of off-loading list-related calculations (filtering, mapping) to the database. Only fetch an ActiveRecord objects from the database when you actually plan to display or change it.

  • Understand that an expression like User.where(email: 'foo@bar.com') does not make an SQL query. It simply returns a scope object for further chaining with #where, #order, etc.

    Scopes will make an SQL query when you use them like an array, e.g. by iterating over them with #each. This may cause the scope to accidentally load its results prematurely. To prevent this, get into a habit of explicitely loading scopes by calling to_a when you're done filtering in the database.

  • Learn about the N+1 query problem Show archive.org snapshot and eager loading.

  • Understand the difference between "named scopes" Show archive.org snapshot and ad-hoc scopes using #where.

Note

We don't like to use default scopes Show archive.org snapshot in our projects. It can be confusing when new scopes inherit conditions out of nowhere.

If an app does have a frequently used condition (like ignoring all archived posts), we define a named scope like Post.active and use that.

Learn

Resources

Tips

Preventing scopes from loading

A scope like User.where(email: 'foo@bar.com') does not make an SQL query. It simply returns a scope object for further chaining with #where, #order, etc.

It can be confusing that typing a scope into the Rails console will immediately load results from the database. This is due to the console automatically inspecting Show archive.org snapshot the last statement, in an attempt to visualize the results.

To prevent this, add any other expression after the scope:

scope = Movie.all; nil

Call #to_sql on a scope to see the query it will produce. This does not actually run a query:

Card.where(title: 'Foo').to_sql
# => SELECT * FROM cards WHERE title="Foo"

Advanced conditions

You can pass an array as a value in #where. ActiveRecord will make an IN() Show archive.org snapshot query:

Card.where(id: [1,2,3]).to_sql
# => SELECT * FROM cards WHERE id IN (1, 2, 3)

If you need an operator that is not equality (=) or inclusion (IN()) you cannot pass a hash to where. You need to pass an SQL fragment instead:

Card.where('id > ?', 5).to_sql
# => SELECT * FROM cards WHERE id > 5

Chaining multiple #where will AND the conditions:

Card.where(id: [1,2,3]).where(title: 'Foo').to_sql
# => SELECT * FROM cards WHERE id IN (1, 2, 3) AND title="Foo"

Preloading associations

When you preload associations with #preload or #includes Rails will make one query for each table:

Card.where(title: 'Foo').preload(:user).to_a  

# SELECT * FROM cards WHERE title="Foo"
#
# | id | user_id | title
# +----+---------+-------------------
# |  2 |    4003 | Foo
# |  4 |    4005 | Foo
#    
# SELECT * FROM users WHERE id IN (4003, 4005)

Plucking columns

You can use #pluck to fetch a single column from a scope's result set. The result is returned as an array:

user_ids = Card.where(title: 'Foo').pluck(:user_id)
# => [4003, 4005]

This is much faster than loading an array of Card records!

A common technique is to use an array of plucked foreign keys in another query:

User.where(id: user_ids).to_sql
# => SELECT * FROM users WHERE id IN (4003, 4005)

This way, instead of one complicated query with many JOINs, you use multiple but simple queries. This can be easier to understand and may even perform better.

Exercises

Cards query (makandra only)

Ask someone for a dump of the staging target and load it into your local database Show archive.org snapshot .

In the makandra cards app, load an array of all Users that are an admin in a deck whose slug begins with makandra (e.g. makandra-besprechungen).

Try to write as little SQL as possible. Wherever possible, define conditions using #where(hash). If you do need to use an SQL fragment, make it as short as possible.

It's OK to do to use more than one query (or one scope) for this exercises. However, your solution should always use the number of queries, regardless of the number of decks.

Search for MovieDB

In your MovieDB app, adapt the existing implementation Movie.search(query) to return a scope of movies matching the following query format:

  • The query is a list of space-separated words that all need to appear in the movie title in order for the movie to match (e. g. event horizon)
  • A word can be prefixed by a minus character to exclude that word (e.g. event -horizon).

If you haven't already considered this, make sure that the returned value is not an array but a scope that can be chained further. E.g. I should be able to use it like this:

Movie.search('event -main').search('horizon').where('year > 1995').order(:year).to_a

Adapt both the integration test and RSpec examples for the search functionality, using the guidelines from the "Testing" chapter in Growing Rails Applications in Practice Show archive.org snapshot .

  • RSpec examples should cover all edge cases
  • E2E tests should only cover one "happy path" to show successful integration

Open budgets in Project Hero (makandra only)

In the Project Hero app, build a view that lists all projects that have open (unclosed) budgets with unused amounts:

+---------------+-----------------------------------+
| Project       | Unused amount in all open budgets |
+---------------+-----------------------------------+
| Foo           | 200 €                             |
| Bar           | 150.000 €                         |
| Baz           | 90.000 €                          |
+---------------+-----------------------------------+

Rows for projects with more than 2500 € of unused budget should be highlighted visually.

To get realistic data for this exercise, ask someone for a dump of the makandra stage and load it into your local database Show archive.org snapshot .

Push your changes into a branch prefixed with your initials and a slash (/). E.g. if your name is John Doe push it into a branch jd/curriculum-open-budgets.

Version 1: OOP

First, build it naively by iterating over all projects, iterating over the budgets and activities of each project, and summing up amounts in Ruby.

When you're done, run tail -f log/development.log, reload the report and observe the performance of your view:

  • How long does your code need to compute the report?
  • How many SQL queries does it trigger?
  • How many Project, Budget and Activity objects are instantiated to handle the request?

Version 2: With scopes

Now build a version of your view that does the same number of queries regardless of how many projects, budgets, activities, etc. you have.

This new version should also not instantiate thousands of ActiveRecord objects, regardless of the database size.

Tip

You don't need to produce the report in a single query!
Consider running multiple separate queries, then compose the results in Ruby. The #index_by Show archive.org snapshot and #group_by Show archive.org snapshot methods are useful for that.

When the report works: Can you move your code to a OpenBudgetReport class with a nice API? For this the OpenBudgetReport needs to do as much work as possible and offer comfortable and minimum methods to its callers (the controller and view).

Note

For the purpose of this exercise, please don't reuse any existing classes that solve a similar problem in Project Hero, such as BudgetCalculator or Analysis::Budgets.

Discussion

Using scopes to decouple logic

Scopes can be a great tool to decompose complicated queries into multiple aspects or processing steps.

To illustrate this, let's expand our requirements for the search:

  • Our search method should filter movie records as described above
  • We also have an authorization rule that says users may only see the movies that they created (Movie#user_id points to the author).
  • We also paginate results so we only show 50 records at a time.

To implement this we can expand our method like this:

def search_movies(query:, user:, page:)
  page_size = 50
  page_offset = (page - 1) * page_size

  Movie
    .where(user_id: user.id)              # authorization
    .where('title ILIKE ?', "%#{query}%") # very naive search
    .offset(page_offset)                  # pagination
    .limit(page_size)                     # pagination
  )
end

results = search_movies(query: 'sunshine', user: current_user, page: 3)

What's not so nice is that our method now has three responsibilities mixed together:

  • Filter movies
  • Honor authorization
  • Paginate

By leveraging scopes we can split this into three methods that each have a single responsibility:

def accessible_movies(scope, user)
  scope.where(user_id: user.id)
end

def search_movies(scope, query)
  scope.where('title ILIKE ?', "%#{query}%")
end

def paginate(scope, page)
  page_size = 50
  page_offset = (page - 1) * page_size
  scope.offset(page_offset).limit(page_size)
end

scope = Movie.all
scope = accessible_movies(scope, current_user)
scope = search_movies(scope, 'sunshine')
scope = paginate(scope, 3)
  • Each method has a single responsibility. In practice they would live in different classes, like MovieSearch, Authorization, Pagination.
  • The methods can be reused for other use cases, not just search. E.g. accessible_movies is a useful starting point for any method that filters movies. And paginate works for any kind of ActiveRecord scope, not just movies.

Discuss this with your mentor.

Henning Koch
Last edit
11 months ago
Michael Leimstädtner
Keywords
relations
License
Source code in this card is licensed under the MIT License.
Posted by Henning Koch to makandra Curriculum (2015-07-08 17:31)