Read more

Carrierwave: Using a nested directory structure for file system performance

Arne Hartherz
April 26, 2021Software engineer at makandra GmbH

When storing files for lots of records in the server's file system, Carrierwave's default store_dir approach may cause issues, because some directories will hold too many entries.

Illustration online protection

Rails Long Term Support

Rails LTS provides security patches for old versions of Ruby on Rails (2.3, 3.2, 4.2 and 5.2)

  • Prevents you from data breaches and liability risks
  • Upgrade at your own pace
  • Works with modern Rubies
Read more Show archive.org snapshot

The default storage directory from the Carrierwave templates looks like so:

class ExampleUploader < CarrierWave::Uploader::Base
  def store_dir
    "uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
  end
end

If you store files for 500k records, that store_dir's parent directory will have 500k sub-directories which will cause some serious headaches when trying to navigate the file system, e.g. via ls or rsync.

Here is a simple solution that scales for a long while.

Solution

A simple, proven solution has been to split model.id into chunks. If you are using secrets in your directory structure, this is applicable as well.

Note that root below is the configured storage root. See our card on a suggested configuration for more information.

class ExampleUploader < CarrierWave::Uploader::Base
  def store_dir
    File.join(
      root,
      model.class.model_name.collection,
      mounted_as,
      split_id_path(model),
      secret_folder(model)
    ).to_s
  end

  def split_id_path(model)
    padded_id = model.id.to_s.rjust(6, '0')
    padded_id.split(/(\d\d\d)$/).join('/')
  end

  def secret_folder(model)
    # if you use secret folders, do your magic here
  end
end

Example structure

The resulting directory structure will be:

  • /app-root/public/system/users/avatar/000/001/... (1st record)
  • /app-root/public/system/users/avatar/000/002/... (2nd record)
  • ...
  • /app-root/public/system/users/avatar/000/999/... (999th record)
  • /app-root/public/system/users/avatar/001/000/... (1000th record)
  • ...
  • /app-root/public/system/users/avatar/999/999/... (999'999th record)
  • /app-root/public/system/users/avatar/1000/000/... (1 millionth record)

So if you have 500k records, you will still only have 500 directories inside /app-root/public/users/avatar/. And inside each of them, at most 1000 sub-directories.

But I have millions of files

If you expect to store a lot more records, simply introduce a third level (.../123/456/789/...).

  def split_id_path(model)
    padded_id = model.id.to_s.rjust(9, '0')
    padded_id.split(/(\d\d\d)$/).join('/')
  end

See also

Arne Hartherz
April 26, 2021Software engineer at makandra GmbH
Posted by Arne Hartherz to makandra dev (2021-04-26 09:20)