A simple example with a GIN index in Rails for optimizing a ILIKE query

Updated . Posted . Visible to the public. Repeats.

You can improve your LIKE / ILIKE search queries in PostgreSQL by adding a GIN index Show archive.org snapshot with an operate class ("opclass") to split the words into trigrams Show archive.org snapshot to the required columns.

Example

class AddSearchTextIndexToUsers < ActiveRecord::Migration[7.1]
  def change
    enable_extension 'pg_trgm'

    add_index :users, :search_text, using: :gin, opclass: :gin_trgm_ops
  end
end

Without index

Without index the query must use a sequential scan.

some_db=# EXPLAIN ANALYSE SELECT "users".* FROM "users" WHERE (users.search_text ILIKE '%max%');
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on users  (cost=0.00..332518.66 rows=343223 width=422) (actual time=143.706..61621.403 rows=213865 loops=1)
   Filter: (search_text ~~* '%max%'::text)
   Rows Removed by Filter: 4891068
 Planning Time: 28.082 ms
 JIT:
   Functions: 4
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 5.688 ms, Inlining 0.000 ms, Optimization 8.882 ms, Emission 132.362 ms, Total 146.931 ms
 Execution Time: 61856.303 ms
(9 rows)

With a GIN index

With a GIN index the query may can use a bitmap index scan.

some_db=# EXPLAIN ANALYSE SELECT "users".* FROM "users" WHERE (users.search_text ILIKE '%max%');
                                                                         QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on users  (cost=3175.98..340719.55 rows=343223 width=422) (actual time=308.028..10150.878 rows=213865 loops=1)
   Recheck Cond: (search_text ~~* '%max%'::text)
   Rows Removed by Index Recheck: 497792
   Heap Blocks: exact=43649 lossy=33582
   ->  Bitmap Index Scan on index_users_on_search_text  (cost=0.00..3090.18 rows=343223 width=0) (actual time=205.446..205.448 rows=213865 loops=1)
         Index Cond: (search_text ~~* '%max%'::text)
 Planning Time: 1.701 ms
 JIT:
   Functions: 4
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 2.845 ms, Inlining 0.000 ms, Optimization 2.752 ms, Emission 41.762 ms, Total 47.358 ms
 Execution Time: 10213.242 ms
(12 rows)

Notes

  • The execution times in the examples are very high, usually a Bitmap Index Scan returns within a few milliseconds.
  • The approach above assumes you have a search_text field, where some callback or trigger updates this text depending on other field values.
    • This approach requires more logic for updates and more space in the database (due to text duplication).
    • You can also index multiple columns directly e.g. in the example the users first_name, last_name and email. But it becomes more difficult to optimize this query when joining other records (for my tests PostgreSQL was not able to use the GIN index on the joined table within a join).
Last edit
Michael Leimstädtner
License
Source code in this card is licensed under the MIT License.
Posted by Emanuel to makandra dev (2024-10-04 06:20)