How to speed up JSON rendering with Rails

Posted . Visible to the public.

I was recently asked to optimize the response time of a notoriously slow JSON API endpoint that was backed by a Rails application.
While every existing app will have different performance bottlenecks and optimizing them is a rabbit hole of arbitrary depth, I'd like to demonstrate a few techniques which could help reaching actual improvements.

The baseline

The data flow examined in this card are based on an example barebone rails app Show archive.org snapshot , which can be used to reproduce the results. I tried to mimic the case where things started out fast but did not scale well:

  • There is an existing Rails App with a core model (class FatModel) that has many (30) attributes and 100k records (totaling 500MB of data) in the corresponding PostgreSQL table
  • An API endpoint /fat_models/index.json renders a simple JSON array with the name of each record
  • It is reported to be very slow
  • Measurements (N=20) showed that it takes an average of 10 seconds to render
  • The log output already suggests an optimizable SQL query, view rendering time and a very high number of Allocations:

Image

The code involved looks like this:

# db/schema.rb
ActiveRecord::Schema[7.1].define do
  create_table "fat_models" do |t|
    t.string "attribute_1"
    t.string "attribute_2"
    #... many more attributes
end

# app/controllers/fat_models_controller.rb
class FatModelsController
  def index
    @fat_models = FatModel.all
  end
end

# app/views/fat_models/index.jbuilder
json.array! @fat_models do |fat_model|
  json.name fat_model.attribute_1
end

Note

I tried to keep this example as simple as I could. More often than not, real JSON endpoints may also include data from (hopefully preloaded) associated records or data that needs to be transformed at render time. Some optimizations listed below can also applied in those cases, but they tend to result in less readable code.

Let's see how we can get that thing 232 times faster..

Getting rid of JBuilder

You're getting jbuilder Show archive.org snapshot for free when you hit rails new, but it's DSL comes with a huge performance cost when rendering large amounts of data. In my case, it's already more than 3 times faster to render it like so:

class FatModelsController
  def index
    render json: FatModel.all.map do |fat_model|
      { name: fat_model.attribute_1 }
    end.to_json
  end
end

SELECT only required columns

Earlier I mentioned that the fat_models table contains many more attributes (and thus data) that is unnecessary for the given request.

A select(:attribute_1) statement shaves of 75% of all Allocations, 85% of "ActiveRecord time" and is again more than 4 times faster than the previous solution. YMMV.

class FatModelsController
  def index
    render json: FatModel.all.select(:attribute_1).map do |fat_model|
      { name: fat_model.attribute_1 }
    end.to_json
  end
end

pluck: Don't create FatModel instances

Speaking of Allocations: What if we could omit building all does FatModel instances when we are in fact only interested in some of its many attributes?
pluck does just that, and is again 3 times faster.

class FatModelsController
  def index
    render json: FatModel.pluck(:attribute_1).map do |attribute_1|  
      { name: attribute_1 }  
    end.to_json
  end
end

Adding DB indices

Inspect the record scope that is used to generate the JSON. Does scope.explain include a reference to an index it's using? Does it come with an ORDER that you could maybe drop (reorder(nil)) depending on your API requirements?

Even in my case where I pluck (select) a single column, having an index on that column speeds up things for me by ~30% - YMMV.

class AddIndexForPluckedColumn < ActiveRecord::Migration[7.1]  
  def change  
    add_index :fat_models, :attribute_1  
  end  
end

jsonb_agg: Aggregating in the database

We're now already down from 10 seconds to 168 milliseconds. Maybe it's a good time to stop and call it a day.


If the JSON you are trying to build is fairly simple, you could also move the heavy lifting to the database. PostgreSQL comes with a set of Aggregate Functions Show archive.org snapshot like array_agg, jsonb_agg, jsonb_object_agg and jsonb_build_object.
Using them we can generate the desired JSON right in the DB - at the cost of adding more load to it. It's not always faster, but in this case, it is:

class FatModelsController
  def index
    result = ActiveRecord::Base.connection.execute(Arel.sql(<<-SQL))  
      SELECT jsonb_agg(
               jsonb_build_object('name', indexed_attribute_1)
             ) AS aggregated_names
      FROM fat_models;
    SQL  
    render json: result.first['aggregated_names']
  end
end

This still gives us the same result as before with a whitespace between each pair (Hash#to_json does not add one):

[{"name": "First Name"}, {"name":"Second Name"}, ...

Maybe not worth it, but slightly faster (15%) for this particular example.

MATERIALIZED VIEW: Caching the Query

We're finally in the danger zone. Adding a caching layer to anything is always a trade-off between sanity and speed.

Now, how can we get faster than the database at building some JSON for data that it already owns? By writing it to the disk (or into a so called "materialized view").

First, tell PostgreSQL with a migration what the contents of this view should be:

class CreateFatModelsMaterializedView < ActiveRecord::Migration[7.1]  
  def up  
    # Because materialized views are always explicitly refreshed,  
    # "now()" is not "right now" but the time of that refresh  
    execute <<-SQL  
      CREATE MATERIALIZED VIEW fat_models_materialized_json
      AS
        SELECT jsonb_agg(jsonb_build_object('name', indexed_attribute_1)) AS aggregated_names,
        NOW() AS last_refreshed_at
        FROM fat_models;
    SQL  
  end  
  
  def down  
    execute <<-SQL  
      DROP MATERIALIZED VIEW fat_models_materialized_json;
    SQL  
  end  
end

Then, create an interface that can be used to query it:

class FatModelMaterializedJson < ApplicationRecord  
  self.table_name = 'fat_models_json_agg'  
  
  def self.read 
    refresh
    first.aggregated_names  
  end

  def self.refresh
    if pick(:last_refreshed_at) < FatModel.maximum(:updated_at)  
      ActiveRecord::Base.connection.execute("REFRESH MATERIALIZED VIEW fat_models_materialized_json")  
    end
  end  
end 

class FatModelsController
  def index
    render json: FatModelMaterializedJson.read
  end
end

Note that we have to manually ensure that the underlying data is still up to date.
But surprisingly, that's twice as slow as the previous solution.

Let's identify some remaining culprits with the outlined solutions.

First: FatModel.maximum(:updated_at) is not indexed:

class AddFatModelIndexOnUpdatedAt < ActiveRecord::Migration[7.1]  
  def change  
    add_index :fat_models, :updated_at  
  end  
end

You could also express the if statement in SQL to save yourself a data roundtrip, but it saves you only a millisecond or two:

class FatModelMaterializedJson < ApplicationRecord  
  def self.refresh
    ActiveRecord::Base.connection.execute(<<~SQL)
      DO $$
      BEGIN
        IF (
          (SELECT last_refreshed_at FROM aggregated_names LIMIT 1)
          < 
          (SELECT MAX(updated_at) FROM fat_models)
        ) THEN
          REFRESH MATERIALIZED VIEW fat_models_materialized_json;
        END IF;
      END $$;
    SQL
  end
end

Second: FatModelMaterializedJson.first.aggregated_names went back to allocates 600k objects! We want to pipe the calculated JSON to the client without parsing it with Ruby first.

Replacing it with an explicit SQL query does just that:

class FatModelMaterializedJson < ApplicationRecord  
  def self.read 
    refresh
    ActiveRecord::Base.connection.execute(<<~SQL).first['aggregated_names']
      SELECT aggregated_names FROM fat_models_materialized_json
    SQL
  end
end

Third: If the data changes without touching updated_at, we get wrong results. As this does not affect the runtime performance I'll just link to the CREATE TRIGGER Show archive.org snapshot documentation - the view can automatically be refreshed on data change (not always desirable though).

With the two patches above I'm now down to from 10 seconds to 43 milliseconds for a unchanged/cached JSON view. Any suggestions to make it faster? :-)

Image

Michael Leimstädtner
Last edit
Michael Leimstädtner
License
Source code in this card is licensed under the MIT License.
Posted by Michael Leimstädtner to makandra dev (2024-08-28 11:03)