Read more

How to write complex migrations in Rails

Henning Koch
April 12, 2013Software engineer at makandra GmbH

Rails gives you migrations Show archive.org snapshot to change your database schema with simple commands like add_column or update.
Unfortunately these commands are simply not expressive enough to handle complex cases.

Illustration online protection

Rails professionals since 2007

Our laser focus on a single technology has made us a leader in this space. Need help?

  • We build a solid first version of your product
  • We train your development team
  • We rescue your project in trouble
Read more Show archive.org snapshot

This card outlines three different techniques you can use to describe nontrivial migrations in Rails / ActiveRecord.

Note that the techniques below should serve you well for tables with many thousand rows. Once your database tables grows to millions of rows, migration performance becomes an issue. We will talk about this in another card.

1. Use the full power of SQL

Plain SQL can be very powerful. Once you get to know it, it can actually be pleasant to write complex queries or updates with it.

As a first example, we have Post has_many :followers. For better performance, we want to cache the number of followers in Post#followers_count.

Now we need to write a migration that not only adds the followers_count column, but also updates it with the correct count for existing posts.
The migration below uses subqueries Show archive.org snapshot for this:

class AddFollowersCountToPosts < ActiveRecord::Migration

  def up
    add_column :posts, :followers_count, :integer
    update "UPDATE posts SET followers_count = (SELECT COUNT(*) FROM followers WHERE followers.post_id = posts.id)"
  end
  
  def down
    # not implemented
  end

end

As a second example, we have Project has_many :activities. Each project has a category. We now want to move this category from Project to Activity, so each activity can have a category of its own.

Now we need to write a migration that not only adds the category column to the activities table, but also copies the category from the project to each of its activities.
The migration below uses an UPDATE with JOIN for this:

class MoveCategoriesFromProjectToActivity < ActiveRecord::Migration

  def up
    add_column :activities, :category, :string
    update <<-SQL.squish
      UPDATE activities
      LEFT JOIN projects ON activities.project_id = projects.id
      SET activities.category = projects.category
    SQL
    remove_column :projects, :category
  end

  def down
    # not implemented
  end

end

Note that the <<-SQL bit is a HEREDOC, which is a nice way to write strings that span multiple lines.

Note that this example uses MySQL. Refer to this card if you use PostgreSQL.
If you need more than one JOIN refer to this card

2. Embed models into your migrations

You can never call models in a migration because your models will eventually evolve away from your migrations. This will block future developers (or application instances) from ever migrating a fresh database. Fortunately there is a way to decouple migrations from models by embedding models into the migration Show archive.org snapshot . This way you can freely change your application model without affecting old migrations.

As an example, we have Article has_many :vendors. We now introduce a boolean flag Vendor#current, which determines whether the article is currently being procured from that vendor.

We now need to write a migration, that not only adds the current column, but also flags the first vendor for each article as "current". The migration below uses embedded models to perform that update:

class AddCurrentToVendor < ActiveRecord::Migration

  class Vendor < ActiveRecord::Base
  end

  class Article < ActiveRecord::Base
    has_many :vendors, :class_name => 'AddCurrentToVendor::Vendor', :order => 'created_at'
  end

  def up
    add_column :vendors, :current, :boolean
    Vendor.reset_column_information
    Vendor.update_all(:current => false)
    Article.find_each do |article|
      first_vendor = article.vendors.first
      if first_vendor
        first_vendor.update_attributes!(:current => true)
      end
    end
  end

  def down
    remove_column :vendors, :current
  end
end

This technique has one important caveat: It is very easy to unintentionally use the actual model instead of the embedded model. Note how (in the example above) our two classes were namespaced into the migration class, so Vendor becomes AddCurrentToVendor::Vendor. Also note how when defining the association Article has_many :vendors, we must tell Rails that the class name is actually AddCurrentToVendor::Vendor. If you forget to do set the :class_name for a single association, you are using the actual Vendor class in app/models/vendor.rb, leading to the exact problems you wanted to avoid with this technique.

If you are unsure if you applied this technique correctly, here is a simple test: Delete all files from app/models/* and run your migration. If it completes successfully, you are golden.

Also note the Vendor.reset_column_information. Call this after you have changed a database table, or Rails might have cached an old schema and will silently ignore updates for the new column.

3. Use helpers methods from the database adapter

If your migration needs to analyse existing data, the database adapter gives you methods like select_rows Show archive.org snapshot or select_all Show archive.org snapshot . These methods return query results as simple arrays, hashes, strings, numbers, etc.:

select_all('SELECT * FROM articles')
# => [ { 'name': 'Bioshock Infinite', 'price': 40.0 }, { 'name': 'EVOLAND', 'price': 19.0 } ]

As an example, we have Article has_many :vendors. We now introduce a boolean flag Vendor#current, which determines whether the article is currently being procured from that vendor.

We now need to write a migration, that not only adds the current column, but also flags the first vendor for each article as "current". The migration below uses select_all to analyse existing records and build UPDATE statements with the results:

class AddCurrentToVendor < ActiveRecord::Migration

  def up
    add_column :vendors, :current, :boolean
    articles = select_all('SELECT * FROM articles')
    articles.each do |article|
      first_vendor = select_one("SELECT * FROM vendors WHERE article_id = #{quote article['id']}")
      if first_vendor
        update("UPDATE vendors SET current=#{quoted_true} WHERE id=#{quote first_vendor['id']}")
      end
    end
  end

  def down
    remove_column :vendors, :current
  end
  
end

Also see Rails: Talking to the database without instantiating ActiveRecord objects.

Henning Koch
April 12, 2013Software engineer at makandra GmbH
Posted by Henning Koch to makandra dev (2013-04-12 12:11)