Migrate searchkick from Elasticsearch client to Opensearch client without Downtime

General

A general overview about why and how we migrate can be found under Migrating from Elasticsearch to Opensearch

This card deals with specifics concerning the use of searchkick.

Step 1: Make Opensearch available for Searchkick

In your Gemfile

# Search
gem 'searchkick'                   # needs to be > 5, to use Opensearch 2
gem 'elasticsearch'
gem 'opensearch-ruby'

in config/initializers/searchkick.rb (or wherever you have configured your Searchkick settings) add:

SEARCHKICK_CLIENT_TYPE = case Rails.env
when 'production', 'staging', 'development', 'test'
  :elasticsearch
else
  :opensearch
end

Searchkick.client_type = ENV.fetch('SEARCHKICK_CLIENT_TYPE', SEARCHKICK_CLIENT_TYPE).to_sym


ENV['OPENSEARCH_URL'] ||= case Rails.env
when 'production'
  OPENSEARCH_PRODUCTION_SERVER
when 'staging'
  OPENSEARCH_STAGING_SERVER
else
  'http://opensearch:9200'    # docker container name
end

Step 2: Index on Staging to ensure that the Cluster is setup correctly

Deploy above changes and coordinate with operations to setup the opensearch cluster.
When the cluster exists, open a console on the staging server with the ENV-variable SEARCHKICK_CLIENT_TYPE set to "opensearch".

SEARCHKICK_CLIENT_TYPE=opensearch b rails c

If you use ingest pipelines, e.g. for extracting text from PDFs, make sure to initialize them first. In your console run

Searchkick.client.ingest.put_pipeline(<YOUR_PIPELINE_CONFIG>)

Then reindex all Models.

<MODELS_USING_SEARCHKICK>.each { |model| model.reindex }

Unfortunately, you cannot use the async-Option for reindexing, as that would send the task to some background worker, where searchkick is still configured to use Elasticsearch. However, if reindexing will take a long time, you can use screen, so you don't have to keep your computer running.

g shell [environment]
screen -Rd opensearch-reindex
SEARCHKICK_CLIENT_TYPE=opensearch b rails c

Your application will still use Elasticsearch for the time being.

Step 3: Make your Application use opensearch

Docker

The default images comes with some demo configurations. You have to disable those via environment variable. We also disable the security plugin, because if forces connections via https, which we don't have in local environments.

Option 1: With plugins

  1. Create a Dockerfile
FROM opensearchproject/opensearch:2.11.0
RUN /usr/share/opensearch/bin/opensearch-plugin list | grep 'ingest-attachment' || /usr/share/opensearch/bin/opensearch-plugin install --batch ingest-attachment

I had some issues with the plugin being installed and in the end wasn't sure if it was included in the standard image or not. With the above command, it will be installed, if it is not already included.

  1. In your docker-compose.yml add:
opensearch:  
  image: registry.makandra.de/makandra/my-project/opensearch:1.0
  build: ./docker/opensearch
  command: ["./opensearch-docker-entrypoint.sh", "-Ediscovery.type=single-node"]
  ports:
    - "127.0.0.1:9200:9200"
  environment:
    - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
    - "DISABLE_SECURITY_PLUGIN=true"
    - "DISABLE_INSTALL_DEMO_CONFIG=true"
  volumes:
    - opensearch-data:/usr/share/opensearch/data
      
   ...
   
volumes:
  opensearch-data:   

Option 2: Without plugins

In your docker-compose.yml add:

opensearch:  
  image: opensearchproject/opensearch:latest # Alternatively, you can specify a version number
  ports:
    - "127.0.0.1:9200:9200"
  environment:
    - discovery.type=single-node
    - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
    - "DISABLE_SECURITY_PLUGIN=true"
    - "DISABLE_INSTALL_DEMO_CONFIG=true"
  volumes:
    - opensearch-data:/usr/share/opensearch/data
      
   ...
   
volumes:
  opensearch-data:   

Note

You can run both elasticsearch and opensearch at the same time on your local host. In that case however, you have to change one of your containers port settings to something else than "127.0.0.1:9200:9200"

CI

in your .gitlab-ci.yml add


  # add to environment variables
  OPENSEARCH_URL: http://opensearch:9200                
  OPENSEARCH_JAVA_OPTS: '-Xms512m -Xmx512m'
  DISABLE_SECURITY_PLUGIN: 'true'
  DISABLE_INSTALL_DEMO_CONFIG: 'true'

   ...
   
  # define service
  services: 
   - &opensearch                                                            
      name: opensearchproject/opensearch:latest  # use the same image you used in your docker-compose.yaml
      alias: opensearch                                                      
      command: ["./opensearch-docker-entrypoint.sh", "-Ediscovery.type=single-node"]

Code

When initializing searchkick on a model:

module DoesSearch
  ENVIRONMENTS_WITH_MULTIPLE_NODES_IN_SEARCH_CLUSTER = %w[production]
  NUMBER_OF_REPLICAS = ENVIRONMENTS_WITH_MULTIPLE_NODES_IN_SEARCH_CLUSTER.include?(Rails.env) ? 1 : 0
  
  as_trait do
      options = {
      settings: { number_of_replicas:  NUMBER_OF_REPLICAS},
    }
  
    searchick **options
    
    ...
end
Class Foo < ActiveRecord::Base
  include DoesSearch
  
  ...

Our operations team used to patch the number_of_replicas setting manually. The opensearch-client of searchkick seems to overwrite the patched options, so we need to pass them via code. We usually only have multiple nodes in our cluster on production, therefore creating replicas does not make any sense in all other environments.

Note

Opensearch SHOULD be a drop-in replacement for Elasticsearch. However, depending on whether you use searchkick out of the box, or you added some custom code, you might have to make some further adjustment.
I used a custom combined_fields-query Show archive.org snapshot in Elasticsearch, which does not exist in Opensearch. However, the multi_match query in cross_fields mode Show archive.org snapshot in Opensearch does the same. Opensearch should have all functionality existing in Elasticsearch, it might just be called a little different.

Now go back to your config/initializers/searchkick.rb and change your environments to use Opensearch. I recommend leaving Production on Elasticsearch until you've made sure that everything works on Staging.

SEARCHKICK_CLIENT_TYPE = case Rails.env
when 'production',
  :elasticsearch
else
  :opensearch
end

Searchkick.client_type = ENV.fetch('SEARCHKICK_CLIENT_TYPE', SEARCHKICK_CLIENT_TYPE).to_sym

Step 4: Production Rollout

  1. Repeat Step 2 for Production.
  2. Remove gem 'elasticsearch' from your Gemfile
  3. Change your config/initializers/searchkick.rb and remove all Elasticsearch and client_type logic. With only opensearch in your Gemfile, searchkick will automatically use Opensearch as client.
  4. Deploy your changes to production.
  5. Index all content, that has been added since your initial indexing to Opensearch

Step 5: Profit

Your application now uses Opensearch.

Bruno Sedler 5 months ago