Posted 5 months ago. Visible to the public.

Trigram indexing as an alternative to PostgreSQL fulltext search

For searching in large database tables we usually use PostgreSQL's fulltext search capabilities.

While this works reasonably well for content primarily consisting of prose, it is not necessarily a good solution for all use cases. The main issue is that it is only possible to search for prefixes of text tokens, which can potentially be unexpected for users.

One example are dates:

If you index the text 2019-01-23 15:16, PostgreSQL will create the following tokens: 2019, -01, -23, 15 16. A user searching for 01-23 will get no results (-01-23 would have worked).

Serial numbers (one case was s7-1500) pose similar problems.

Potential solution

One possibility is to use plain old LIKE queries:

SELECT * FROM contents WHERE search_text LIKE '%my-search-string%'

This has obvious performance problems for large datasets. You can mitigate those issues somewhat by using trigram indexes. These indexes work by collecting all combinations of 3 consecutive letters present in the source text. Trigram indexes are usually used for similarity search, but they do work for LIKE queries as well.

To use them, in a migration, enable the PostgreSQL trigram extension and add a trigram index on your search_text column:

class MoveToTrigramIndexing < ActiveRecord::Migration[5.2] def change enable_extension 'pg_trgm' # this requires superuser permissions for the database user add_index :contents, :search_test, using: :gin, opclass: :gin_trgm_ops end end

It is probably not a good idea to grant superuser capabilities to all database users. If you don't want to grant them, enable the extension manually with

sudo -u postgres psql -c "ENABLE EXTENSION IF NOT EXISTS pg_trgm" $DATABASE_NAME

PostgreSQL should now automatically start using the index for all LIKE queries, when it judges this to be a performance improvement.
This will however never work for search terms of fewer than 3 characters. In those cases PostgreSQL will fall back to a full table scan.

In theory, the index should also be used for ILIKE queries, and for queries using regular expressions, but I have not seen this working. I suspect that PostgreSQL judges the cost of using the index higher for ILIKE and might start using it for even large data sets. I would still recommend to simply convert search texts and queries to lowercase and use LIKE instead.

Growing Rails Applications in Practice
Check out our new e-book:
Learn to structure large Ruby on Rails codebases with the tools you already know and love.

Owner of this card:

Tobias Kraze
Last edit:
5 months ago
by Tobias Kraze
About this deck:
We are makandra and do test-driven, agile Ruby on Rails software development.
License for source code
Posted by Tobias Kraze to makandra dev
This website uses cookies to improve usability and analyze traffic.
Accept or learn more