Rails: Using PostgreSQL full-text search without a gem

PostgreSQL can cosplay as a full-text search engine. It doesn't have the features or fidelity of ElasticSearch or Algolia, but it's good enough if you just need to search and rank large volumes of text.

This card will teach you how to index, search and rank your Rails models in a PostgreSQL full-text index. We will do this without using any gems aside from ActiveRecord. While there are gems like pg_search Show archive.org snapshot or pg_fulltext Show archive.org snapshot , manual integration requires very little code. And since you are inevitably going to tweak your search implementation, you may find it easier to do so directly in your own code.

For the examples below we will search a simple Message model. It stores messages with a subject line and the message body:

class CreateMessage < ActiveRecord::Migration[7.2]
  def change
    create_table :messages do |t|
      t.string :subject
      t.text :body
      t.timestamps
    end
  end
end

class Message < ApplicationRecord
end

This card will start with a very basic search example. We will then gradually improve our implementation with additional requirements you will probably encounter. A good companion to this card is the Controlling Text Search Show archive.org snapshot reference from the PostgreSQL documentation.

Basic example

We want to implement a Message.search method that looks for a given query in the body column. It should return a scope (relation) with all matches:

Message.search('quarterly results') # => #<ActiveRecord::Relation [...]>

This is all we need to implement this method:

class Message < ApplicationRecord

  scope :search, ->(query) { where(<<~SQL, query: query) }
    to_tsvector('simple', messages.body) @@ websearch_to_tsquery('simple', :query)
  SQL

end

This implementation only finds matching records, without a particular order. We will learn how to rank results later.

The string 'simple' denotes the PostgreSQL dictionary for tokenization and normalization. simple is a very basic dictionary that only lower-cases words and removes some stop words. We will use more advanced dictionaries when we implement stemming later.

Google-like queries are supported by default

The websearch_to_tsquery function is provided by PostgreSQL and supports Google-like query grammar:

Query	Result
`Message.search('foo bar')`	Both `foo` and `bar` must match somewhere
`Message.search('foo OR bar')`	Either `foo` or `bar` must match somewhere
`Message.search('"foo bar"')`	The phrase `foo bar` must match in that exact order
`Message.search('foo -bar')`	`foo` must match, but `bar` must not

Searching in multiple columns

To search more than one column, we can apply the @@ operator multiple times and OR the results:

class Message < ApplicationRecord
  
  scope :search, ->(query) { where(<<~SQL, query: query) }
    (to_tsvector('simple', messages.subject) @@ websearch_to_tsquery('simple', :query)) OR
    (to_tsvector('simple', messages.body) @@ websearch_to_tsquery('simple', :query))
  SQL

end

Improving performance by caching vectors

The SQL queries we used so far are inefficient. We tokenize and normalize every column value (messages.body) into a tsvector, use that vector for a single comparison, and then discard it. The next query will recreate that vector, even if the underlying data hasn't changed.

A more performant approach would be to cache the vector in a separate column, and update it when the underlying data changes. That way all queries can run on the same, cached vector value. We can even index the vector cache to further improve performance.

To implement this, we create a new column search_tsvector and create a GIN index over it:

class AddSearchTsvectorToMessage < ActiveRecord::Migration[7.2]
  def up
    add_column :messages, :search_tsvector, :tsvector
    add_index :messages, :search_tsvector, using: :gin
  end

  def down
    remove_column :messages, :search_tsvector
  end
end

We now adjust our model so it updates the #search_tsvector attribute when subject or body changes. We also change our Message.search method so it runs the query against the cached vector value:

class Message < ApplicationRecord
  
  scope :search, ->(query) { where(<<~SQL, query: query) }
    search_tsvector @@ websearch_to_tsquery('simple', :query)
  SQL
  
  after_save :index_for_search

  def index_for_search
    return unless saved_change_to_subject? || saved_change_to_body?

    Message.where(id: id).update_all(<<~SQL)
      search_tsvector =
        to_tsvector('simple', messages.subject) ||
        to_tsvector('simple', messages.body)
    SQL
  end

end

Ranking results

So far our Message.search method has returned all matching messages, in no particular order. We often want to promote results where words match more often, or where matching words are closer to each other. We can do so by ordering against the ts_rank_cd function:

class Message < ApplicationRecord
  
  scope :search, ->(query) {
    query_sql = sanitize_sql(["websearch_to_tsquery('simple', ?)", query])

    self
      .where(Arel.sql("search_tsvector @@ #{query_sql}"))
      .order(Arel.sql("ts_rank_cd(search_tsvector, #{query_sql}) DESC"))
  }

  # ...

end

Normalizing long documents

Because longer documents are more likely to contain a search term by chance, it can make sense to consider document length. For this the ts_rank_cd function takes a third argument with a normalization setting.

For example, a normalization of 1 will divide the rank by 1 + the logarithm of the document length:

class Message < ApplicationRecord
  
  scope :search, ->(query) {
    query_sql = sanitize_sql(["websearch_to_tsquery('simple', ?)", query])

    self
      .where(Arel.sql("search_tsvector @@ #{query_sql}"))
      .order(Arel.sql("ts_rank_cd(search_tsvector, #{query_sql}, 1) DESC"))
  }

  # ...

end

There are many normalization strategies. See the PostgreSQL documentation Show archive.org snapshot for a list.

Weighing columns differently

When we search multiple columns, we sometimes want to weigh one attribute higher than another. That weight affects how much a match contributes to the document's rank.

PostgreSQL offers four categories which (by default) have the following ranking weights:

Category	Weight
`A`	`1.0`
`B`	`0.4`
`C`	`0.2`
`D`	`0.1`

We can set these weights when we create the tsvector value:

class Message < ApplicationRecord
  # ...
  
  def index_for_search
    return unless saved_change_to_subject? || saved_change_to_body?

    Message.where(id: id).update_all(<<~SQL)
      search_tsvector =
        setweight(to_tsvector('simple', messages.subject), 'A') ||
        setweight(to_tsvector('simple', messages.body), 'B')
    SQL
  end

end

Boosting or penalizing records

Sometimes you want the rank to reflect additional criteria than just the frequency of word matches. In our example we may want to higher rank messages from people in our address book, or boost messages that we received recently.

We can influence the ranking value by multiplying or adding to the return value of the ts_rank_cd() function. The following would penalize old messages by reducing the rank for each day since the message was received:

class Message < ApplicationRecord
  
  scope :search, ->(query) {
    query_sql = sanitize_sql(["websearch_to_tsquery('simple', ?)", query])
    penalize_age = '0.0002 * EXTRACT(DAY FROM (now() - messages.created_at))'

    self
      .where(Arel.sql("search_tsvector @@ #{query_sql}"))
      .order(Arel.sql("ts_rank_cd(search_tsvector, #{query_sql}) - #{penalize_age} DESC"))
  }

  # ...

end

Stemming

Stemming will reduce inflected words to their word stem. E.g. the words "fishing", "fished", and "fisher" can all be reduced to the stem "fish". Stemming can make our search more forgiving, as small variations in word usage can still yield a match.

So far we have used the simple dictionary, which only applies basic normalization by lower-casing words. Instead of simple we can use any available PostgreSQL dictionary Show archive.org snapshot , or even create our own.

For example, the english dictionary provides stemming for English text and also removes English stopwords:

class Message < ApplicationRecord
  
  scope :search, ->(query) { where(<<~SQL, query: query) }
    search_tsvector @@ websearch_to_tsquery('english', :query)
  SQL
  
  after_save :index_for_search

  def index_for_search
    return unless saved_change_to_subject? || saved_change_to_body?

    Message.where(id: id).update_all(<<~SQL)
      search_tsvector =
        to_tsvector('english', messages.subject) ||
        to_tsvector('english', messages.body)
    SQL   
  end

end

Henning Koch

makandra.de

Say thanks6

Last edit

5 days ago

Henning Koch

License

Source code in this card is licensed under the MIT License.

Scoping a sunspot solr search by text using a string field

Assuming the following sunspot setup of the post class:

class Post < ActiveRecord::Base
  searchable do
    text :title
    string :state
    integer :category_ids
  end
end

In Sunspot you can scope your search via th...

Exporting to Excel from Rails without a gem

See this Railscast.

Basically you can simply write views like index.xlsx.erb:

<?xml version="1.0"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
  xmlns:o="urn...

Posted by Henning Koch to makandra dev (2025-03-19 21:10)