How to write complex migrations in Rails

Updated . Posted . Visible to the public. Repeats.

Rails gives you migrations Show archive.org snapshot to change your database schema with simple commands like add_column or update.
Unfortunately these commands are simply not expressive enough to handle complex cases.

This card outlines three different techniques you can use to describe nontrivial migrations in Rails / ActiveRecord.

Note that the techniques below should serve you well for tables with many thousand rows. Once your database tables grows to millions of rows, migration performance becomes an issue. We will talk about this in another card.

1. Use the full power of SQL

Plain SQL can be very powerful. Once you get to know it, it can actually be pleasant to write complex queries or updates with it.

As a first example, we have Post has_many :followers. For better performance, we want to cache the number of followers in Post#followers_count.

Now we need to write a migration that not only adds the followers_count column, but also updates it with the correct count for existing posts.
The migration below uses subqueries Show archive.org snapshot for this:

class AddFollowersCountToPosts < ActiveRecord::Migration

  def up
    add_column :posts, :followers_count, :integer
    
    update <<~SQL.squish
      UPDATE posts
      SET followers_count = (
        SELECT COUNT(*)
        FROM followers
        WHERE followers.post_id = posts.id
      )
    SQL
  end
  
  def down
    remove_column :posts, :followers_count
  end

end

As a second example, we have Project has_many :activities. Each project has a category. We now want to move this category from Project to Activity, so each activity can have a category of its own.

Now we need to write a migration that not only adds the category column to the activities table, but also copies the category from the project to each of its activities.
The migration below uses an UPDATE with JOIN for this:

class MoveCategoriesFromProjectToActivity < ActiveRecord::Migration

  def up
    add_column :activities, :category, :string
    update <<-SQL.squish
      UPDATE activities
      LEFT JOIN projects ON activities.project_id = projects.id
      SET activities.category = projects.category
    SQL
    remove_column :projects, :category
  end

  def down
    # not implemented
  end

end

Note that the <<-SQL bit is a HEREDOC, which is a nice way to write strings that span multiple lines.

Note that this example uses MySQL. Refer to this card if you use PostgreSQL.
If you need more than one JOIN refer to this card

2. Embed models into your migrations

You can never call models in a migration because your models will eventually evolve away from your migrations. This will block future developers (or application instances) from ever migrating a fresh database. Fortunately there is a way to decouple migrations from models by embedding models into the migration Show archive.org snapshot . This way you can freely change your application model without affecting old migrations.

As an example, we have Article has_many :vendors. We now introduce a boolean flag Vendor#current, which determines whether the article is currently being procured from that vendor.

We now need to write a migration, that not only adds the current column, but also flags the first vendor for each article as "current". The migration below uses embedded models to perform that update:

class AddCurrentToVendor < ActiveRecord::Migration

  class Vendor < ActiveRecord::Base
  end

  class Article < ActiveRecord::Base
    has_many :vendors, :class_name => 'AddCurrentToVendor::Vendor', :order => 'created_at'
  end

  def up
    add_column :vendors, :current, :boolean

    Vendor.reset_column_information # reset database caching
    Vendor.update_all(:current => false)

    Article.reset_column_information # reset database caching
    Article.find_each do |article|
      first_vendor = article.vendors.first
      if first_vendor
        first_vendor.update_attributes!(:current => true)
      end
    end
  end

  def down
    remove_column :vendors, :current
  end
end

This technique has one important caveat: It is very easy to unintentionally use the actual model instead of the embedded model. Note how (in the example above) our two classes were namespaced into the migration class, so Vendor becomes AddCurrentToVendor::Vendor. Also note how when defining the association Article has_many :vendors, we must tell Rails that the class name is actually AddCurrentToVendor::Vendor. If you forget to do set the :class_name for a single association, you are using the actual Vendor class in app/models/vendor.rb, leading to the exact problems you wanted to avoid with this technique.

If you are unsure if you applied this technique correctly, here is a simple test: Delete all files from app/models/* and run your migration. If it completes successfully, you are golden.

Also note the use of reset_column_information Show archive.org snapshot . As a rule of thumb, call it once for every migration model, as Rails might have cached an old schema. In the example above, the vendor table just got a new column which would silently be ignored by the following call to update_all. But you should also use reset_column_information if you don't alter the schema in the current migration as it might still cause problems when migrating from scratch due to the caching of latter migrations.

3. Use helpers methods from the database adapter

If your migration needs to analyse existing data, the database adapter gives you methods like select_rows Show archive.org snapshot or select_all Show archive.org snapshot . These methods return query results as simple arrays, hashes, strings, numbers, etc.:

select_all('SELECT * FROM articles')
# => [ { 'name': 'Bioshock Infinite', 'price': 40.0 }, { 'name': 'EVOLAND', 'price': 19.0 } ]

As an example, we have Article has_many :vendors. We now introduce a boolean flag Vendor#current, which determines whether the article is currently being procured from that vendor.

We now need to write a migration, that not only adds the current column, but also flags the first vendor for each article as "current". The migration below uses select_all to analyse existing records and build UPDATE statements with the results:

class AddCurrentToVendor < ActiveRecord::Migration

  def up
    add_column :vendors, :current, :boolean
    articles = select_all('SELECT * FROM articles')
    articles.each do |article|
      first_vendor = select_one("SELECT * FROM vendors WHERE article_id = #{quote article['id']}")
      if first_vendor
        update("UPDATE vendors SET current=#{quoted_true} WHERE id=#{quote first_vendor['id']}")
      end
    end
  end

  def down
    remove_column :vendors, :current
  end
  
end

Also see Rails: Talking to the database without instantiating ActiveRecord objects.

Henning Koch
Last edit
Dominik Schöler
Keywords
insane, hard, difficult, killing, kittens
License
Source code in this card is licensed under the MIT License.
Posted by Henning Koch to makandra dev (2013-04-12 10:11)