I was recently asked to optimize the response time of a notoriously slow JSON API endpoint that was backed by a Rails application.
While every existing app will have different performance bottlenecks and optimizing them is a rabbit hole of arbitrary depth, I'd like to demonstrate a few techniques which could help reaching actual improvements.
The baseline
The data flow examined in this card are based on an example barebone rails app Show archive.org snapshot , which can be used to reproduce the results. I tried to mimic the case where things started out fast but did not scale well:
- There is an existing Rails App with a core model (
class FatModel
) that has many (30) attributes and 100k records (totaling 500MB of data) in the corresponding PostgreSQL table - An API endpoint
/fat_models/index.json
renders a simple JSON array with the name of each record - It is reported to be very slow
- Measurements (N=20) showed that it takes an average of 10 seconds to render
- The log output already suggests an optimizable SQL query, view rendering time and a very high number of
Allocations
:
The code involved looks like this:
# db/schema.rb
ActiveRecord::Schema[7.1].define do
create_table "fat_models" do |t|
t.string "attribute_1"
t.string "attribute_2"
#... many more attributes
end
# app/controllers/fat_models_controller.rb
class FatModelsController
def index
@fat_models = FatModel.all
end
end
# app/views/fat_models/index.jbuilder
json.array! @fat_models do |fat_model|
json.name fat_model.attribute_1
end
Note
I tried to keep this example as simple as I could. More often than not, real JSON endpoints may also include data from (hopefully preloaded) associated records or data that needs to be transformed at render time. Some optimizations listed below can also applied in those cases, but they tend to result in less readable code.
Let's see how we can get that thing 232 times faster..
Getting rid of JBuilder
You're getting
jbuilder
Show archive.org snapshot
for free when you hit rails new
, but it's DSL comes with a huge performance cost when rendering large amounts of data. In my case, it's already more than 3 times faster to render it like so:
class FatModelsController
def index
render json: FatModel.all.map do |fat_model|
{ name: fat_model.attribute_1 }
end.to_json
end
end
SELECT
only required columns
Earlier I mentioned that the fat_models
table contains many more attributes (and thus data) that is unnecessary for the given request.
A select(:attribute_1)
statement shaves of 75% of all Allocations
, 85% of "ActiveRecord
time" and is again more than 4 times faster than the previous solution. YMMV.
class FatModelsController
def index
render json: FatModel.all.select(:attribute_1).map do |fat_model|
{ name: fat_model.attribute_1 }
end.to_json
end
end
pluck
: Don't create FatModel
instances
Speaking of Allocations
: What if we could omit building all does FatModel
instances when we are in fact only interested in some of its many attributes?
pluck
does just that, and is again 3 times faster.
class FatModelsController
def index
render json: FatModel.pluck(:attribute_1).map do |attribute_1|
{ name: attribute_1 }
end.to_json
end
end
Adding DB indices
Inspect the record scope that is used to generate the JSON. Does scope.explain include a reference to an index it's using? Does it come with an ORDER that you could maybe drop (reorder(nil)
) depending on your API requirements?
Even in my case where I pluck (select) a single column, having an index on that column speeds up things for me by ~30% - YMMV.
class AddIndexForPluckedColumn < ActiveRecord::Migration[7.1]
def change
add_index :fat_models, :attribute_1
end
end
jsonb_agg
: Aggregating in the database
We're now already down from 10 seconds to 168 milliseconds. Maybe it's a good time to stop and call it a day.
If the JSON you are trying to build is fairly simple, you could also move the heavy lifting to the database. PostgreSQL comes with a set of
Aggregate Functions
Show archive.org snapshot
like array_agg
, jsonb_agg
, jsonb_object_agg
and jsonb_build_object
.
Using them we can generate the desired JSON right in the DB - at the cost of adding more load to it. It's not always faster, but in this case, it is:
class FatModelsController
def index
result = ActiveRecord::Base.connection.execute(Arel.sql(<<-SQL))
SELECT jsonb_agg(
jsonb_build_object('name', indexed_attribute_1)
) AS aggregated_names
FROM fat_models;
SQL
render json: result.first['aggregated_names']
end
end
This still gives us the same result as before with a whitespace between each pair (Hash#to_json
does not add one):
[{"name": "First Name"}, {"name":"Second Name"}, ...
Maybe not worth it, but slightly faster (15%) for this particular example.
MATERIALIZED VIEW
: Caching the Query
We're finally in the danger zone. Adding a caching layer to anything is always a trade-off between sanity and speed.
Now, how can we get faster than the database at building some JSON for data that it already owns? By writing it to the disk (or into a so called "materialized view").
First, tell PostgreSQL with a migration what the contents of this view should be:
class CreateFatModelsMaterializedView < ActiveRecord::Migration[7.1]
def up
# Because materialized views are always explicitly refreshed,
# "now()" is not "right now" but the time of that refresh
execute <<-SQL
CREATE MATERIALIZED VIEW fat_models_materialized_json
AS
SELECT jsonb_agg(jsonb_build_object('name', indexed_attribute_1)) AS aggregated_names,
NOW() AS last_refreshed_at
FROM fat_models;
SQL
end
def down
execute <<-SQL
DROP MATERIALIZED VIEW fat_models_materialized_json;
SQL
end
end
Then, create an interface that can be used to query it:
class FatModelMaterializedJson < ApplicationRecord
self.table_name = 'fat_models_json_agg'
def self.read
refresh
first.aggregated_names
end
def self.refresh
if pick(:last_refreshed_at) < FatModel.maximum(:updated_at)
ActiveRecord::Base.connection.execute("REFRESH MATERIALIZED VIEW fat_models_materialized_json")
end
end
end
class FatModelsController
def index
render json: FatModelMaterializedJson.read
end
end
Note that we have to manually ensure that the underlying data is still up to date.
But surprisingly, that's twice as slow as the previous solution.
Let's identify some remaining culprits with the outlined solutions.
First: FatModel.maximum(:updated_at)
is not indexed:
class AddFatModelIndexOnUpdatedAt < ActiveRecord::Migration[7.1]
def change
add_index :fat_models, :updated_at
end
end
You could also express the if
statement in SQL to save yourself a data roundtrip, but it saves you only a millisecond or two:
class FatModelMaterializedJson < ApplicationRecord
def self.refresh
ActiveRecord::Base.connection.execute(<<~SQL)
DO $$
BEGIN
IF (
(SELECT last_refreshed_at FROM aggregated_names LIMIT 1)
<
(SELECT MAX(updated_at) FROM fat_models)
) THEN
REFRESH MATERIALIZED VIEW fat_models_materialized_json;
END IF;
END $$;
SQL
end
end
Second: FatModelMaterializedJson.first.aggregated_names
went back to allocates 600k objects! We want to pipe the calculated JSON to the client without parsing it with Ruby first.
Replacing it with an explicit SQL query does just that:
class FatModelMaterializedJson < ApplicationRecord
def self.read
refresh
ActiveRecord::Base.connection.execute(<<~SQL).first['aggregated_names']
SELECT aggregated_names FROM fat_models_materialized_json
SQL
end
end
Third: If the data changes without touching updated_at
, we get wrong results. As this does not affect the runtime performance I'll just link to the
CREATE TRIGGER
Show archive.org snapshot
documentation - the view can automatically be refreshed on data change (not always desirable though).
With the two patches above I'm now down to from 10 seconds to 43 milliseconds for a unchanged/cached JSON view. Any suggestions to make it faster? :-)