Read more

An incomplete guide to migrate a Rails application from paperclip to carrierwave

Emanuel
September 03, 2018Software engineer at makandra GmbH

In this example we assume that not only the storage gem changes but also the file structure on disc.

A general approach

Illustration online protection

Rails Long Term Support

Rails LTS provides security patches for old versions of Ruby on Rails (2.3, 3.2, 4.2 and 5.2)

  • Prevents you from data breaches and liability risks
  • Upgrade at your own pace
  • Works with modern Rubies
Read more Show archive.org snapshot

Part A: Create a commit which includes a script that allows you to copy the existing file to the new file structure.

Part B: Create a commit which removes all paperclip logic and replace it with the same code you used in the first commit

Part A

Here are some implementation details you might want to reuse:

  • Use the existing models to read the files from

  • Use your own carrierwave models to write the files to the new file structure

  • Support reruns for the script. Meaning if you add or delete files and you rerun the script, the changes are overtaken.

  • Use sidekiq (pros: parallel jobs; overview of failed and pending jobs)

  • Add a new column for each file column of a model e.g. avatar (paperclip uses avatar_filename).

  • E.g. 148x148# is equivalent to resize_to_fill: [148, 148].

  • E.g. 1600x1200> is equivalent to resize_to_limit: [1600, 1200].

  • E.g. 100x200 is equivalent to resize_to_fit: [100, 200].

  • Carrierwave does not strip exif data from the orginal image an its versions (will e.g. result in rotated images).

  • Add a column e.g. carrierwave_migrated_at to all models that have a file column. Set the timestamp in you migration when you have processed the files. With the combination of carrierwave_migrated_at and updated_at you can easily rerun the script whereas only modified records are processed again. This can save you a lot of time and resources.

Normally we would place a script like this in lib/scripts/. But this folder is excluded by purpose from the Rails autoloading path and a Sidekiq worker will not find the required classes when invoked. So choose a namespaced place in app/models or require the files manually from lib/scripts in the workers. If you use Ruby < 2.5 you need to take care of choosing the right class names to avoid wrong lookups caused by Rails autoloading mechanism.

Part B

  • Replace all paperclip code with carrierwave logic. Copy many parts from Part A (as it should be the same).
  • Set default images / fallbacks
  • Carrierwave does not support default versions, so add a version to urls
  • Create a script or migration that replaces all old urls in e.g. wysiwyg text fields
  • Consider to leave the paperclip columns (you can debug issues better)

As many parts of file processing are not tested (resolution, fallback images and many more), you need to know the application and make many manual tests.

Alltogether

  • Deploy and run Part A (maybe takes some days)
  • Before deployment of Part B
    • Run Part A again (or maybe twice - depends on how long it runs and how many files could have been changed in the meantime)
    • You can now enable maintenance (or accept to loose files), run Part A a last time and deploy
    • If you need to fix urls run the migration or script
    • Disable maintenance

Error prone parts

Sidekiq

If you didn't use sidekiq heavily in the project before, starting many thousand of jobs might reveal unexpected issues:

  • Wrong order of sidekiq queues: Your queue might block the whole application and monitoring. Monitoring or mailer queues should always have a higher priority than the default queue. Consider to use a own queue which has the lowest priority.
  • Wrong database pool size: You can not set an arbitrary number of workers without getting pool size exceptions. The pool size (default 5) needs to fit to the maximum number of Passenger workers and Sidekiq worker. If you have a shared database server many possible connections are "expensive", so setting the pool size to a very high number is no good idea (also consider the number of servers you when calculating the total number of possible connections for the shared database server).
  • Blowing up the fail inbox: If you have configured an exception notifier all failed jobs will trigger an email.
  • Blocking the server: Mogrify (tool behind minimagic) sometimes freeze (for many thousand of jobs it will fail for sure). You need to kill the process manually. It will cause a high CPU load, slow down the application and block this sidekiq worker. You need to watch your servers carefully and kill frozen processes. If you use e.g. ClusterFS the general load on the server will be much higher than usual. Expect performance issues will this migration runs.

Corrupt files

There are many files that are not processable, missing in the old file structure (normally should not) or just fail to get copied by Carrierwave.

  • Check directly after the migration if a column filled by paperclip now is also filled by carrierwave
  • Check if there are records that have missing files. Here is an imperformate example (User.where.not(avatar: nil).select? { |user| user.avatar.blank? }.size)
  • Check if all records are valid

URL replacements

If you need to replace references to files (e.g. in text / wysiwyg / markdown) manually, consider these notes:

  • Undoing a broken text is very hard or impossible.
  • The text can include any type of file link (e.g. link to profile image) and not only image links included via defined insertion way
  • Carefully look at the code and the result.
  • Try to print all things you're manipulating on the console and/or in a log file. Check samples for correctness. Always add IDs to the output, so in case you did something wrong you can better find the wrong objects.
  • Request a code review
  • Make additional checks in development after you manipulated the text. Here are some examples:
    • Replace URLs in the text field, find again all URLs and check if they exist on disk.
    • Open your development console and check for all 404 while browsing (maybe wrong references)
    • Tail the server log for ActionController::RoutingError

Time

As usual migrations take time. It could take many hours to copy files and each time you want to fix an issue you need to wait for the whole processing. This pain will follow you through the whole process:

  • Deployment planing (often needs an exact time)
  • UATs, preparations and clean up after a broken deployment
  • Fixing bugs
Emanuel
September 03, 2018Software engineer at makandra GmbH
Posted by Emanuel to makandra dev (2018-09-03 10:02)