Updated: How to write complex migrations in Rails

Added a general advise to migration models: Using reset_column_information should prevent some common issues when migrating from scratch.

Changes

  • Rails gives you [migrations](http://guides.rubyonrails.org/migrations.html) to change your database schema with simple commands like `add_column` or `update`.
  • Unfortunately these commands are simply not expressive enough to handle complex cases.
  • This card outlines **three different techniques** you can use to describe nontrivial migrations in Rails / ActiveRecord.
  • Note that the techniques below should serve you well for tables with many thousand rows. Once your database tables grows to millions of rows, migration performance becomes an issue. We will talk about this in another card.
  • # 1. Use the full power of SQL
  • Plain SQL can be very powerful. Once you get to know it, it can actually be pleasant to write complex queries or updates with it.
  • As a first example, we have `Post has_many :followers`. For better performance, we want to cache the number of followers in `Post#followers_count`.
  • Now we need to write a migration that not only adds the `followers_count` column, but also updates it with the correct count for existing posts.
  • The migration below uses [subqueries](http://dev.mysql.com/doc/refman/5.0/en/subqueries.html) for this:
  • ```ruby
  • class AddFollowersCountToPosts < ActiveRecord::Migration
  • def up
  • add_column :posts, :followers_count, :integer
  • update "UPDATE posts SET followers_count = (SELECT COUNT(*) FROM followers WHERE followers.post_id = posts.id)"
  • end
  • def down
  • # not implemented
  • end
  • end
  • ```
  • As a second example, we have `Project has_many :activities`. Each project has a category. We now want to move this category from `Project` to `Activity`, so each activity can have a category of its own.
  • Now we need to write a migration that not only adds the `category` column to the `activities` table, but also copies the category from the project to each of its activities.
  • The migration below uses an [UPDATE with JOIN](https://makandracards.com/makandra/32357-postgresql-vs-mysql-how-to-update-using-a-join) for this:
  • ```ruby
  • class MoveCategoriesFromProjectToActivity < ActiveRecord::Migration
  • def up
  • add_column :activities, :category, :string
  • update <<-SQL.squish
  • UPDATE activities
  • LEFT JOIN projects ON activities.project_id = projects.id
  • SET activities.category = projects.category
  • SQL
  • remove_column :projects, :category
  • end
  • def down
  • # not implemented
  • end
  • end
  • ```
  • Note that the `<<-SQL` bit is a [HEREDOC](https://makandracards.com/makandra/1675-using-heredoc-for-prettier-ruby-code), which is a nice way to write strings that span multiple lines.
  • Note that this example uses MySQL. Refer to [this card](https://makandracards.com/makandra/32357-postgresql-vs-mysql-how-to-update-using-a-join) if you use PostgreSQL.
  • If you need more than one JOIN refer to [this card](https://makandracards.com/makandra/48741-postgresql-how-to-update-multiple-attributes-with-multiple-joins)
  • # 2. Embed models into your migrations
  • You can never call models in a migration because your models *will* eventually evolve away from your migrations. This will block future developers (or application instances) from ever migrating a fresh database. Fortunately there is a way to decouple migrations from models by [embedding models into the migration](http://gem-session.com/2010/03/how-to-use-models-in-your-migrations-without-killing-kittens). This way you can freely change your application model without affecting old migrations.
  • As an example, we have `Article has_many :vendors`. We now introduce a boolean flag `Vendor#current`, which determines whether the article is currently being procured from that vendor.
  • We now need to write a migration, that not only adds the `current` column, but also flags the first vendor for each article as "current". The migration below uses embedded models to perform that update:
  • ```ruby
  • class AddCurrentToVendor < ActiveRecord::Migration
  • class Vendor < ActiveRecord::Base
  • end
  • class Article < ActiveRecord::Base
  • has_many :vendors, :class_name => 'AddCurrentToVendor::Vendor', :order => 'created_at'
  • end
  • def up
  • add_column :vendors, :current, :boolean
  • +
  • Vendor.reset_column_information
  • Vendor.update_all(:current => false)
  • +
  • + Article.reset_column_information
  • Article.find_each do |article|
  • first_vendor = article.vendors.first
  • if first_vendor
  • first_vendor.update_attributes!(:current => true)
  • end
  • end
  • end
  • def down
  • remove_column :vendors, :current
  • end
  • end
  • ```
  • This technique has one important caveat: It is very easy to unintentionally use the actual model instead of the embedded model. Note how (in the example above) our two classes were namespaced into the migration class, so `Vendor` becomes `AddCurrentToVendor::Vendor`. Also note how when defining the association `Article has_many :vendors`, we must tell Rails that the class name is actually `AddCurrentToVendor::Vendor`. If you forget to do set the `:class_name` for a **single** association, you are using the actual `Vendor` class in `app/models/vendor.rb`, leading to the exact problems you wanted to avoid with this technique.
  • If you are unsure if you applied this technique correctly, here is a simple test: Delete all files from `app/models/*` and run your migration. If it completes successfully, you are golden.
  • -Also note the `Vendor.reset_column_information`. Call this after you have changed a database table, or Rails might have cached an old schema and will silently ignoreupdates for the new column.
  • -
  • +Also note the use of `reset_column_information`. As a rule of thumb, call it once for every migration model, as Rails might have cached an old schema. In the example above, the `vendor` table just got a new column which would silently be ignored by the following call to `update_all`. But you should also use `reset_column_information` if you don't alter the schema in the current migration as it might still cause problems when migrating from scratch.
  • +
  • +
  • # 3. Use helpers methods from the database adapter
  • If your migration needs to analyse existing data, the database adapter gives you methods like [`select_rows`](http://apidock.com/rails/v3.2.13/ActiveRecord/ConnectionAdapters/DatabaseStatements/select_rows) or [`select_all`](http://apidock.com/rails/v3.2.13/ActiveRecord/ConnectionAdapters/DatabaseStatements/select_all). These methods return query results as simple arrays, hashes, strings, numbers, etc.:
  • ```rb
  • select_all('SELECT * FROM articles')
  • # => [ { 'name': 'Bioshock Infinite', 'price': 40.0 }, { 'name': 'EVOLAND', 'price': 19.0 } ]
  • ```
  • As an example, we have `Article has_many :vendors`. We now introduce a boolean flag `Vendor#current`, which determines whether the article is currently being procured from that vendor.
  • We now need to write a migration, that not only adds the `current` column, but also flags the first vendor for each article as "current". The migration below uses `select_all` to analyse existing records and build `UPDATE` statements with the results:
  • ```rb
  • class AddCurrentToVendor < ActiveRecord::Migration
  • def up
  • add_column :vendors, :current, :boolean
  • articles = select_all('SELECT * FROM articles')
  • articles.each do |article|
  • first_vendor = select_one("SELECT * FROM vendors WHERE article_id = #{quote article['id']}")
  • if first_vendor
  • update("UPDATE vendors SET current=#{quoted_true} WHERE id=#{quote first_vendor['id']}")
  • end
  • end
  • end
  • def down
  • remove_column :vendors, :current
  • end
  • end
  • ```
  • Also see [Rails: Talking to the database without instantiating ActiveRecord objects](/makandra/42942).
Michael Leimstädtner 4 days ago