In this card we will learn to write code that scales with a large number of database records. We will do this by pushing work out of Ruby and into the database.
Goals
-
Understand what scopes ("relations") are and how they differ from arrays.
-
Understand how a purely object-oriented approach of processing data doesn't scale for a large number of database records.
-
Get into a habit of off-loading list-related calculations (filtering, mapping) to the database. Only fetch an ActiveRecord objects from the database when you actually plan to display or change it.
-
Understand that an expression like
User.where(email: 'foo@bar.com')
does not make an SQL query. It simply returns a scope object for further chaining with#where
,#order
, etc.Scopes will make an SQL query when you use them like an array, e.g. by iterating over them with
#each
. This may cause the scope to accidentally load its results prematurely. To prevent this, get into a habit of explicitely loading scopes by callingto_a
when you're done filtering in the database. -
Learn about the N+1 query problem Show archive.org snapshot and eager loading.
-
Understand the difference between "named scopes" Show archive.org snapshot and ad-hoc scopes using
#where
.
Note
We don't like to use default scopes Show archive.org snapshot in our projects. It can be confusing when new scopes inherit conditions out of nowhere.
If an app does have a frequently used condition (like ignoring all archived posts), we define a named scope like
Post.active
and use that.
Learn
Resources
- Active Record Query Interface Show archive.org snapshot
- Rails Database Best Practices Show archive.org snapshot
- ActiveRecord: Specifying conditions on an associated table
- Preload, Eagerload, Includes and Joins Show archive.org snapshot
- Battling n+1 Queries in Rails Show archive.org snapshot
Tips
Preventing scopes from loading
A scope like User.where(email: 'foo@bar.com')
does not make an SQL query. It simply returns a scope object for further chaining with #where
, #order
, etc.
It can be confusing that typing a scope into the Rails console will immediately load results from the database. This is due to the console automatically inspecting Show archive.org snapshot the last statement, in an attempt to visualize the results.
To prevent this, add any other expression after the scope:
scope = Movie.all; nil
Call #to_sql
on a scope to see the query it will produce. This does not actually run a query:
Card.where(title: 'Foo').to_sql
# => SELECT * FROM cards WHERE title="Foo"
Advanced conditions
You can pass an array as a value in #where
. ActiveRecord will make an
IN()
Show archive.org snapshot
query:
Card.where(id: [1,2,3]).to_sql
# => SELECT * FROM cards WHERE id IN (1, 2, 3)
If you need an operator that is not equality (=
) or inclusion (IN()
) you cannot pass a hash to where
. You need to pass an SQL fragment instead:
Card.where('id > ?', 5).to_sql
# => SELECT * FROM cards WHERE id > 5
Chaining multiple #where
will AND
the conditions:
Card.where(id: [1,2,3]).where(title: 'Foo').to_sql
# => SELECT * FROM cards WHERE id IN (1, 2, 3) AND title="Foo"
Preloading associations
When you preload associations with #preload
or #includes
Rails will make one query for each table:
Card.where(title: 'Foo').preload(:user).to_a
# SELECT * FROM cards WHERE title="Foo"
#
# | id | user_id | title
# +----+---------+-------------------
# | 2 | 4003 | Foo
# | 4 | 4005 | Foo
#
# SELECT * FROM users WHERE id IN (4003, 4005)
Plucking columns
You can use #pluck
to fetch a single column from a scope's result set. The result is returned as an array:
user_ids = Card.where(title: 'Foo').pluck(:user_id)
# => [4003, 4005]
This is much faster than loading an array of Card
records!
A common technique is to use an array of plucked foreign keys in another query:
User.where(id: user_ids).to_sql
# => SELECT * FROM users WHERE id IN (4003, 4005)
This way, instead of one complicated query with many JOIN
s, you use multiple but simple queries. This can be easier to understand and may even perform better.
Exercises
Cards query (makandra only)
Ask someone for a dump of the staging
target and
load it into your local database
Show archive.org snapshot
.
In the makandra cards app, load an array of all Users
that are an admin in a deck whose slug begins with makandra
(e.g. makandra-besprechungen
).
Try to write as little SQL as possible. Wherever possible, define conditions using #where(hash)
. If you do need to use an SQL fragment, make it as short as possible.
It's OK to do to use more than one query (or one scope) for this exercises. However, your solution should always use the number of queries, regardless of the number of decks.
Search for MovieDB
In your MovieDB app, adapt the existing implementation Movie.search(query)
to return a scope of movies matching the following query format:
- The query is a list of space-separated words that all need to appear in the movie title for the movie to match (e. g.
event horizon
) - A word can be prefixed by a minus character to exclude that word (e.g.
event -horizon
).
If you haven't already considered this, make sure that the returned value is not an array but a scope that can be chained further. E.g. I should be able to use it like this:
Movie.search('event -main').search('horizon').where('year > 1995').order(:year).to_a
Adapt both the integration test and RSpec examples for the search functionality, using the guidelines from the "Testing" chapter in Growing Rails Applications in Practice.
- RSpec examples should cover all edge cases
- E2E tests should only cover one "happy path" to show successful integration
Open budgets in Project Hero (makandra only)
In the Project Hero app, build a view that lists all projects that have open (unclosed) budgets with unused amounts:
+---------------+-----------------------------------+
| Project | Unused amount in all open budgets |
+---------------+-----------------------------------+
| Foo | 200 € |
| Bar | 150.000 € |
| Baz | 90.000 € |
+---------------+-----------------------------------+
Rows for projects with more than 2500 € of unused budget should be highlighted visually.
To get realistic data for this exercise, ask someone for a dump of the makandra
stage and
load it into your local database
Show archive.org snapshot
.
Push your changes into a branch prefixed with your initials and a slash (/
). E.g. if your name is John Doe
push it into a branch jd/curriculum-open-budgets
.
Version 1: OOP
First, build it naively by iterating over all projects, iterating over the budgets and activities of each project, and summing up amounts in Ruby.
When you're done, run tail -f log/development.log
, reload the report and observe the performance of your view:
- How long does your code need to compute the report?
- How many SQL queries does it trigger?
- How many
Project
,Budget
andActivity
objects are instantiated to handle the request?
Version 2: With scopes
Now build a version of your view that does the same number of queries regardless of how many projects, budgets, activities, etc. you have.
This new version should also not instantiate thousands of ActiveRecord objects, regardless of the database size.
Tip
You don't need to produce the report in a single query!
Consider running multiple separate queries, then compose the results in Ruby. The#index_by
Show archive.org snapshot and#group_by
Show archive.org snapshot methods are useful for that.
When the report works: Can you move your code to a OpenBudgetReport
class with a nice API? For this the OpenBudgetReport
needs to do as much work as possible and offer comfortable and minimum methods to its callers (the controller and view).
Note
For the purpose of this exercise, please don't reuse any existing classes that solve a similar problem in Project Hero, such as
BudgetCalculator
orAnalysis::Budgets
.
Discussion
Using scopes to decouple logic
Scopes can be a great tool to decompose complicated queries into multiple aspects or processing steps.
To illustrate this, let's expand our requirements for the search:
- Our search method should filter movie records as described above
- We also have an authorization rule that says users may only see the movies that they created (
Movie#user_id
points to the author). - We also paginate results so we only show 50 records at a time.
To implement this we can expand our method like this:
def search_movies(query:, user:, page:)
page_size = 50
page_offset = (page - 1) * page_size
Movie
.where(user_id: user.id) # authorization
.where('title ILIKE ?', "%#{query}%") # very naive search
.offset(page_offset) # pagination
.limit(page_size) # pagination
)
end
results = search_movies(query: 'sunshine', user: current_user, page: 3)
What's not so nice is that our method now has three responsibilities mixed together:
- Filter movies
- Honor authorization
- Paginate
By leveraging scopes we can split this into three methods that each have a single responsibility:
def accessible_movies(scope, user)
scope.where(user_id: user.id)
end
def search_movies(scope, query)
scope.where('title ILIKE ?', "%#{query}%")
end
def paginate(scope, page)
page_size = 50
page_offset = (page - 1) * page_size
scope.offset(page_offset).limit(page_size)
end
scope = Movie.all
scope = accessible_movies(scope, current_user)
scope = search_movies(scope, 'sunshine')
scope = paginate(scope, 3)
- Each method has a single responsibility. In practice they would live in different classes, like
MovieSearch
,Authorization
,Pagination
. - The methods can be reused for other use cases, not just search. E.g.
accessible_movies
is a useful starting point for any method that filters movies. Andpaginate
works for any kind of ActiveRecord scope, not just movies.
Discuss this with your mentor.