You can improve your LIKE
/ ILIKE
search queries in PostgreSQL by adding a
GIN index
Show archive.org snapshot
with an operate class ("opclass") to split the
words into trigrams
Show archive.org snapshot
to the required columns.
Example
class AddSearchTextIndexToUsers < ActiveRecord::Migration[7.1]
def change
enable_extension 'pg_trgm'
add_index :users, :search_text, using: :gin, opclass: :gin_trgm_ops
end
end
Without index
Without index the query must use a sequential scan.
some_db=# EXPLAIN ANALYSE SELECT "users".* FROM "users" WHERE (users.search_text ILIKE '%max%');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Seq Scan on users (cost=0.00..332518.66 rows=343223 width=422) (actual time=143.706..61621.403 rows=213865 loops=1)
Filter: (search_text ~~* '%max%'::text)
Rows Removed by Filter: 4891068
Planning Time: 28.082 ms
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 5.688 ms, Inlining 0.000 ms, Optimization 8.882 ms, Emission 132.362 ms, Total 146.931 ms
Execution Time: 61856.303 ms
(9 rows)
With a GIN index
With a GIN index the query may can use a bitmap index scan.
some_db=# EXPLAIN ANALYSE SELECT "users".* FROM "users" WHERE (users.search_text ILIKE '%max%');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on users (cost=3175.98..340719.55 rows=343223 width=422) (actual time=308.028..10150.878 rows=213865 loops=1)
Recheck Cond: (search_text ~~* '%max%'::text)
Rows Removed by Index Recheck: 497792
Heap Blocks: exact=43649 lossy=33582
-> Bitmap Index Scan on index_users_on_search_text (cost=0.00..3090.18 rows=343223 width=0) (actual time=205.446..205.448 rows=213865 loops=1)
Index Cond: (search_text ~~* '%max%'::text)
Planning Time: 1.701 ms
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 2.845 ms, Inlining 0.000 ms, Optimization 2.752 ms, Emission 41.762 ms, Total 47.358 ms
Execution Time: 10213.242 ms
(12 rows)
Notes
- The execution times in the examples are very high, usually a
Bitmap Index Scan
returns within a few milliseconds. - The approach above assumes you have a
search_text
field, where some callback or trigger updates this text depending on other field values.- This approach requires more logic for updates and more space in the database (due to text duplication).
- You can also index multiple columns directly e.g. in the example the users
first_name
,last_name
andemail
. But it becomes more difficult to optimize this query when joining other records (for my tests PostgreSQL was not able to use the GIN index on the joined table within a join).
Posted by Emanuel to makandra dev (2024-10-04 06:20)