Best practices: Large data migrations from legacy systems

Migrating data from a legacy into a new system can be a surprisingly large undertaking. We have done this a few times. While there are significant differences from project to project, we do have a list of general suggestions.

Before you start, talk to someone who has done it before, and read the following hints:

Understand the old system

Before any technical considerations, you need to understand the old system as best as possible. If feasible, do not only look at its API, or database, or frontend, but let a user of the old system sho...

How to make changes to a Ruby gem (as a Rails developer)

At makandra, we've built a few gems over the years. Some of these are quite popular: spreewald (> 1M downloads), active_type (> 1M downloads), and geordi (> 200k downloads)

Developing a Ruby gem is different from developing Rails applications, with the biggest difference: there is no Rails. This means:

  • no defined structure (neither for code nor directories)
  • no autoloading of classes, i.e. you need to require all files yourself
  • no active_support niceties

Also, their scope...

Capistrano + Rails: Tagging production deploys

Just like Ruby Gems tag their version releases to the corresponding Git commit, it can be helpful to track production deploys within the commit history. This task does the tagging for you.

Capistrano 3

# lib/capistrano/tasks/deploy.rb
namespace :deploy do
  ...

  desc 'Tag the deployed revision'
  task :tag_revision do
    date = Date.today.to_s

    puts `git tag deploy-#{date} #{fetch :current_revision}`
    puts `git push --tags origin`
  end

end
# config/deploy/production.rb
after 'deploy:finished', 'deploy:tag_revi...

Capistrano + Rails: Automatically skipping asset compilation when assets have not changed

In medium-sized to large Rails applications, asset compilation can take several minutes. In order to speed up deployment, asset precompilation can be skipped. This card automates the process.

Capistrano 3

namespace :deploy do
  
  desc 'Automatically skip asset compile if possible'
  task :auto_skip_assets do
    asset_locations = %r(^(Gemfile\.lock|app/assets|lib/assets|vendor/asset))

    revisions = []
    on roles :app do
      within current_path do
        revisions << capture(:cat, 'REVISION').strip
      ...

Jasmine: Adding custom matchers

Definition

A matcher is a function that returns an object with a compare key. Usually it is registered with beforeEach:

beforeEach(() => {
  jasmine.addMatchers({
  
    // Example matcher
    toBeAnything() {
      return {
        compare(actualValue, ...matcherArguments) {
           // Do some computations here ...
         
           // Return whether the actualValue matches the expectation
           return {pass: true}
         }
      }
    }
  })
})

Usage

expect(actualValue).toBeAnything(...matcherArg...

screenfull.js: Simple wrapper for cross-browser usage of the JavaScript Fullscreen API

Using the JS fullscreen API is painful because all browers use different methods and events and you need to use lots of boilerplate code to make your application work in all browsers.

The "screenfull" library wraps that for you, including events.

Examples

The linked GitHub repo contains some information. You basically use the library like this:

// Make an element go fullscreen
screenfull.request(element)

// Leave fullscreen
screenfull.exit()

...

Understanding SQL compatibility modes in MySQL and MariaDB

MySQL and MariaDB have an SQL mode setting which changes how MySQL behaves.

The SQL mode value is comprised of multiple flags like "STRICT_TRANS_TABLES, NO_ZERO_IN_DATE". Each flag activates or disables a particular behavior.

The default SQL mode varies widly between versions of MySQL and MariaDB. In general, more recent versions of MySQL and MariaDB have stricter settings than older versions, and MySQL has stricter settings than the more liberal MariaDB.

If your app explodes ...

How to mirror a git repo to a new remote

Say you want to move a git repository from one remote (perhaps Github) to another (perhaps Gitlab).

If you have the repo checked out, you still should make sure to mirror all branches of the old remote, not only those you happen to have checked our. Otherwise the target repo will become a copy of your current repo, and not the source repo, potentially missing commits. You can use this:

git remote rename origin old-origin
git remote add origin <new-remote>
git fetch old-origin --prune
git push --prune origin +refs/remotes/old-origin/*:r...

Carrierwave: Deleting files outside of forms

TL;DR Use user.update!(remove_avatar: true) to delete attachments outside of forms. This will have the same behavior as if you were in a form.


As you know, Carrierwave file attachments work by mounting an Uploader class to an attribute of the model. Though the database field holds the file name as string, calling the attribute will always return the uploader, no matter if a file is attached or not. (Side note: use #present? on the uploader to check if the file exists.)

class User < ApplicationRecord
  mount :avatar, ...

Git: creating and deleting a tag

Git has two kind of tags:

  • annotated
  • lightweight

Annotated tags are stored as full objects in the Git database. They’re checksummed; contain the tagger name, email, and date; have a tagging message; and can be signed and verified with GNU Privacy Guard (GPG)

The following command will create a (lightweight) tag:

git tag v0.1

An annotated tag is created by:

git tag -a v0.1 -m "a special tag message"

A normal git push will not push the tag. So in order to publish your (local) tags, you have to

git push --tags
`...

RSpec: Run a single spec (Example or ExampleGroup)

RSpec allows you to mark a single Example/ExampleGroup so that only this will be run. This is very useful when using a test runner like guard.

Add the following config to spec/spec_helper.rb:

RSpec.configure do |config|
  # These two settings work together to allow you to limit a spec run
  # to individual examples or groups you care about by tagging them with
  # `:focus` metadata. When nothing is tagged with `:focus`, all examples
  # get run.
  config.filter_run_including :focus => true
  config.run_all_when_everything_filtere...

Fixing flaky E2E tests

An end-to-end test (E2E test) is a script that remote-controls a web browser with tools like Selenium WebDriver. This card shows basic techniques for fixing a flaky E2E test suite that sometimes passes and sometimes fails.

Although many examples in this card use Ruby, Cucumber and Selenium, the techniques are applicable to all languages and testing tools.

Why tests are flaky

Your tests probably look like this:

When I click on A
And I click on B
And I click on C
Then I should see effects of C

A test like this works fine...

JavaScript bookmarklet to click an element and copy its text contents

Here is some JavaScript code that allows you to click the screen and get the clicked element's text contents (or value, in case of inputs).

The approach is simple: we place an overlay so you don't really click the target element. When you click the overlay, we look up the element underneath it and show its text in a browser dialog. You can then copy it from there.

While moving the mouse, the detected element is highlighted.

Here is the one-liner URL that you can store as a bookmark. Place it in your bookmarks bar and click it to activate....

How to use a local gem in your Gemfile

You can use local copies of gems in your Gemfile like this:

gem 'spreewald', path: '~/gems/spreewald'

As soon as you have bundled your project with the local copy of the gem, all code changes in the copy will be available on your project. So you can for example set a debugger or add console output in the gem and use it from your project.
If you checked out the gem with your versioning tool, you can easily reset your changes afterwards or make a pull request for the gem if you improved it.

Don't commit a Gemfile with local path...

Speed up better_errors

If you use the Better Errors gem, you will sometimes notice that it can be very slow. This is because it sometimes renders a huge amount of data that will actually be hard to render for your browser.

You can significantly improve performance by adding this to config/initializers/better_errors:

if defined?(BetterErrors) && Rails.env.development?
  module BetterErrorsHugeInspectWarning
    def inspect_value(obj)
      inspected = obj.inspect
      if inspected.size > 20_000
        inspec...

RawGit

RawGit serves raw files directly from GitHub with proper Content-Type headers, for CDN-like purposes.

Note that they don't offer any uptime guarantee. You probably don't want to use it in production, but it may serve you well for some testing/prototyping/etc.

How to bulk-unwatch many repositories on Github

To easily opt out of notifications for a large number of Github repositories, go to https://github.com/watching.

Git: rebase dependent feature branch after squash

This card will show you how to use git rebase --onto without confusion.

Use case:

You've got two feature branches (one and two), where two depends on one. Now commits of branch one have changed after you branched two from it (i.e. after code review the commits of branch one are squashed). Thus the commit history of branch one has changed. Branch two's history however didn't change.

Solution:

To make the branches share the same commit history again you will have to rebase and replay (attach) the a...

Capistrano 3: How to deploy when a firewall blocks your git repo

Sometimes, through some firewall or proxy misconfiguration, you might have to deploy to a server that cannot access the git repository.

Solution 1: HTTP Proxy (this is the preferred fix)

SSH can be tunneled over an HTTP Proxy. For example, when the repo is on github, use this:

  1. Install socat

  2. Add a ~/.ssh/config on the target server(s) with permission 0600 and this content:

    Host github.com ssh.github.com
      User git
      Hostname ssh.github.com
      Port 443
      ProxyCommand socat - PROXY:<your proxyhost>:%h:%p,...
    

Git: undo delete

Assuming you're wanting to undo the effects of git rm or rm followed by git add -A or something similar:

This restores the file status in the index:

git reset -- <file>

then check out a copy from the index

git checkout -- <file>

To undo git add , the first line above suffices, assuming you haven't committed yet.


Note:

Make sure to use double dashes -- to tell git to checkout a file instead of a branch. This only is relevant for [files having the same name as a branch](https://git-scm.com/docs/git-ch...

Style Guide for Git commit messages

  1. Separate subject from body with a blank line
  2. Limit the subject line to 50 characters (max. 72), include reference (unique story ID) to requirements tracker (Linear in our case)
  3. Capitalize the subject line
  4. Do not end the subject line with a period
  5. Use the imperative mood in the subject line
  6. Wrap the body at 72 characters
  7. Use the body to explain what and why vs. how.

    Tip

    As an alternative, use a commit message that refers to a GitHub issue (fixes #321) or [story ID](https://makandracards.com/makandra/620718-b...

Rubymine provides a visual merge conflict resolution tool

RubyMine provides a visual tool for resolving merge conflicts locally.

Follow

Git > Resolve Conflicts

in the context menu to open RubyMine's merge conflict tool.

  • Left pane: local copy (read-only)
  • Right pane: checked in version from repository (read-only)
  • Central pane: base revision from which both conflicting versions are derived

You can also use a similar pane view to compare to files.
Mark two files and press Ctrl + D to compare.

Sending errors to sentry from development

For the initial setup or changes in the sentry reporting it might be useful to enabled reporting of sentry in development. Don't commit these changes and prefer to report to the staging environment. As other developers might be confused of these errors try to given them a proper message and delete them afterwards.


  1. Add config.raven_dsn = 'your-dns' in config/environments/development.rb.
  2. Add development to existing environments in the Raven.configure block: config.environments = ['development', 'staging', 'production'].
  3. ...

Auto-squashing Git Commits

git command line options for automating common rebasing tasks, like adding a fix to a commit that was already rebased into the history.