Read more

How to: Upgrade CarrierWave to 3.x

Dominic Beger
November 09, 2023Software engineer at makandra GmbH

While upgrading CarrierWave from version 0.11.x to 3.x, we encountered some very nasty fails. Below are the basic changes you need to perform and some behavior you may eventually run into when upgrading your application. This aims to save you some time understanding what happens under the hood to possibly discover problems faster as digging deeply into CarrierWave code is very fun...

Whitelists and blacklists

Illustration web development

Do you need DevOps-experts?

Your development team has a full backlog? No time for infrastructure architecture? Our DevOps team is ready to support you!

  • We build reliable cloud solutions with Infrastructure as code
  • We are experts in security, Linux and databases
  • We support your dev team to perform
Read more Show archive.org snapshot

The following focuses on extension allowlisting, but it is the exact same thing for content type allowlisting with the content_type_allowlist method.
Whitelists have been renamed to allowlists. Blacklists have been renamed to denylists. You will have to rename all occurrences in your code.

Make sure that you have defined a trait for validating the extensions and content types in all of your uploaders (by including it on the base uploader) and that it now looks like this:

class BaseUploader
  module DoesStrictTypeValidations
    as_trait do

      def revalidate(*)
        return unless file

        check_content_type_allowlist!(file) # This was check_content_type_whitelist! before
        check_extension_allowlist!(file) # This was check_extension_whitelist! before

        @revalidated_cache_id = cache_id
      rescue CarrierWave::IntegrityError, CarrierWave::ProcessingError => error
        model.errors.add(mounted_as, error.message)
      end

      def revalidate_cache?
        cache_id && @revalidated_cache_id != cache_id
      end

    end
  end
end

If you called check_blacklist!(file) and check_content_type_blacklist!(file) in this trait, these lines should now be removed.

class BaseUploader < CarrierWave::Uploader::Base

  include DoesStrictTypeValidations

  # ...

end

Now, in your uploaders...

def extension_white_list
  %w[jpg jpeg gif png]
end

must become

def extension_allowlist
  %w[
    jpg
    jpeg
    png
    gif
  ]
end

Also, every uploader must define allowlists because they are empty/nil by default and you will not be allowed to upload anything, if you miss them.

Basically, it is quite simple: If you do not want a specific extension to be uploaded, just don't put it into the whitelist/allowlist and you're fine.

The usage of denylists is deprecated. Denylists trump allowlists, i.e. if an extension is listed in both, it will not be allowed to be uploaded. The reason we sometimes use blacklists/denylists in our applications, is to override the extensions of any uploader for security purposes (e.g. disallowing SVGs for all uploaders by default), in case someone accidentally adds them. In our case, we may actually want to enable SVGs in some uploaders when some conditions are met, so our code basically looked like this:

class ImageUploader < CarrierWave::Uploader::Base

  def extension_whitelist
    %w[
      jpg
      png
    ] 
  end
  
  def extension_blacklist
    unless svg_allowed?
      %w[
        svg
      ]  
    end
  end
  
  private

  def svg_allowed?
    false
  end

end
class SpecificImageUploader < ImageUploader

  def extension_whitelist
    super.concat(
      %w[
        svg
      ]
    )
  end
  
  private

  def svg_allowed?
    condition_fulfilled
  end

end

However, you should only define allowlists in CarrierWave 3.x as defining both is ambiguous and requires deeper understand of which one will win. This means that you have to make sure that the entry is not present in the allowlist, if it must not be uploaded. You could achieve this by just adding svg to the allowlist of the inheriting uploader, if the svg_allowed? condition is met, but I recommend mimicking the previous behavior with blacklists/denylists by defining a private method disallowed_extensions:

class ImageUploader < CarrierWave::Uploader::Base
  EXTENSION_ALLOWLIST = %w[
    jpg
    png
  ]
  
  def extension_allowlist
    EXTENSION_ALLOWLIST - disallowed_extensions
  end
  
  private

  def disallowed_extensions
    unless svg_allowed?
      %w[
        svg
      ]
    end  
  end

  def svg_allowed?
    false
  end
end

This way, you still ensure that nobody accidentally allows svg where it shouldn't happen.

Inactive versions are no longer retrieved automatically

CarrierWave allows you to control, if versions are currently active or not. This is done by settings the if: ... kwarg within the version call. For example, if you migrated legacy videos from the old server, you still want to offer them for downloads, but never generate them when a new video is uploaded. In old CarrierWave versions you could just define the version like

version :mov, if: :create_legacy_versions? do
  # ...
end

and this would work. The version mov is then marked as inactive and will thus not be generated upon upload, but you can still access their existing files.
With CarrierWave 3.x, any try to access the file will result in file being nil. That's because CarrierWave no longer loads/retrieves files from the storage, if their version is inactive. This makes sense, but can break your desired behavior in this case. You have two options to fix this:

  • Migrate all legacy video versions to the new formats and remove these versions from the uploader (recommended, but more complex)
  • Monkeypatch CarrierWave to still retrieve all versions by adding this to lib/core_ext/carrierwave/version_extension.rb:
module VersionsExtension
  protected

  # We support legacy formats (.mov and .avi) in the video uploader for some videos that come from an early migration.
  # We still need to be able to access these versions to enable downloading them in the frontend, but we do not want to
  # generate them any longer when a new video is uploaded. This is why we have the `if: create_legacy_versions?` set on
  # them (which always evaluates to `false`).
  #
  # Carrierwave 0.11.x still allowed us to access the file in the file system, even though the versions are not active.
  # Carrierwave 3 disables access to files of inactive versions, i.e. versions that have a `if: ...` that evaluates
  # to `false`. As we cannot migrate these legacy versions for now, we restore the old Carrierwave behavior with this
  # small monkey patch.
  def retrieve_versions_from_cache!(cache_name)
    versions.each { |_, v| v.retrieve_from_cache!(cache_name) }
  end

  def retrieve_versions_from_store!(identifier)
    versions.each { |_, v| v.retrieve_from_store!(identifier) }
  end
end

module CarrierWave
  module Uploader
    module Versions
      prepend VersionsExtension
    end
  end
end

Be careful: CarrierWave initializes the files of all versions as soon as you either retrieve the original one from the storage or store the original file in the storage. That means, whenever retrieve_from_store! is called on the uploader, a hook also calls retrieve_versions_from_store! to initialize all of its versions. Until the point of retrieval, any files are nil. Once you store! a file, all dependent versions are generated and stored as well, effectively initializing them, too.

This may hurt you in the specific scenario of having inactive versions, then uploading a new file to override them and trying to delete these old files.

Say you have an uploader that still stores files for your inactive legacy versions that you can now still retrieve through our monkey patch (which is nice). If you now upload a new file, it will effectively mount a new uploader for it. The new uploader will not generate these legacy files, because they are inactive versions and upon store!, only active versions are stored (which is what we want). However, the files for the legacy versions are still there from the previous uploader, but as inactive versions have not been initialized by store!, their file methods will at this point return nil.
You most likely want to remove these stale files as soon as a new file is uploaded. For this purpose, there should be a method that is called after :store within the uploader and removes these files. In order to check, if the files are still there, you need to explicitly call my_legacy_version.retrieve_from_store!(identifier) first, to eventually initialize the underlying file. Then you can check for its presence and remove it. If you forget to call retrieve_from_store!, this will always be blank, leaving your code falsely thinking that the files are not there.

recreate_versions! does no longer recreate the original

In CarrierWave < 3, recreate_versions rebuilt the original file and all of its versions, if no versions parameter was given. That was considered a bug Show archive.org snapshot that has been fixed in CarrierWave 3.
If you used recreate_versions! in your app to rebuild all versions along with the original file, you can no longer do that and have to rebuild the original by manually calling

file.cache!
file.store!

before calling file.recreate_versions!.

force_extension

CarrierWave 3 allows you to enforce other file extensions without manually overwriting the filename method any longer. In earlier versions, you would have to strip the extension (by calling File.basename) and then append the new one within an overriding filename method. This was often done by trait methods, so you didn't have to define a method in every affected version. Please refer to this card: CarrierWave: How to generate versions with different file extensions. This is no longer needed for changing the file extension. It can be helpful, if you need to adapt the filename itself, though.

Now, you can just do this by defining a force_extension method on the version's uploader or calling force_extension(...) within the version block. Both are possible:

version :webm do
  def force_extension
    '.webm'
  end
  
  process :transcode_video, :webm
end

and

version :webm do
  process :transcode_video, :webm
  force_extension('.webm')
end

I recommend using the latter.

Warning

Forcing the extension is no longer sufficient in CarrierWave 3.x to make custom transcoding/conversion logic working. Unless you adapt your processor, this will now fail. Read the following paragraph on why and how to fix it.

The workfile_path breaking custom processing

CarrierWave < 3 immediately cached uploads with the provided filename in the cache directory and then processed them there. This way you had the possibility to convert the source file into different formats by relying on the file.path/current_path of the uploader.

Let's say you have a video uploader that generates a thumbnail for an uploaded video. You will need to create a version that uses a custom processor that extracts the first frame from the video using ffmpeg.

Old behavior

In CarrierWave < 3, you could write:

class VideoUploader < BaseUploader
  # ...

  version :thumbnail do
    def full_filename
      'thumbnail.jpg'
    end
  
    process(take_frame: [at: 0.seconds])
    process(resize_to_limit: [1280, 720])
  end
  
  def take_frame(at:)
    raise TakeFrameError, "Input file does not exist, capturing a frame aborted (#{current_path})" unless File.exist?(current_path)

    run_ffmpeg! '-ss', ffmpeg_target_second(at), '-i', current_path, '-vframes', '1', '-qscale:v', '2', current_path
    raise TakeFrameError, "Something went wrong while capturing a frame for this version. The output file does not seem to exist in #{current_path}" unless File.exist?(current_path)

    true
  end

end

current_path would resolve to file.path which is the path to the file in the cache directory with your specified filename (which is something like system/cache_dir/videos/<id>/thumbnail.jpg). The thumbnail.jpg is actually still the original mp4 video file at this point because we just renamed it, but have not done any processing, yet. FFMPEG does not mind the extension, though (which is why this works - it is not really correct in the first place and could potentially break with any other tool. This is probably why CarrierWave 3.x introduces the new behavior, below.
Still, FFMPEG will then read the video file, extract the first frame image and save it to the thumbnail.jpg file which is now a valid JPEG image. Afterwards, the next processing step can just take this JPEG and resize it to 1280x720.

New behavior

However, this will break in CarrierWave 3.x because it changed the way the processing happens. Before actually caching the file at your desired location with the desired filename (that already contains the target extension), CarrierWave will create a temporary workfile that is being processed and then copy the result to the cache. The problem is that this temporary workfile gets a predicted path and the original filename from CarrierWave, no matter what you define in your version block.

This means that your take_frame method will now rely on current_path which actually points to the workfile (something like tmp/<id>/video.mp4). As current_path is also the output path again, this will result in a JPEG output file with .mp4 extension. This cannot be processed correctly by resize_to_limit (ImageMagick) as it minds the extension -> it crashes.

The solution

The new way of dealing with this, is to manually move the workfile to the path with the correct extension after processing. This also makes more sense because it is actually nonsense that the file is already considered a JPEG in the cache directory before actually processing it to one. The logic is now completely encapsulated within your processor in which you can be sure to receive the correct input, but have to take care to output the correct format in return, so that the next processing steps can rely on it.

In our case, we would now adapt the processor like this:

def take_frame(at:)
  raise TakeFrameError, "Input file does not exist, capturing a frame aborted (#{current_path})" unless File.exist?(current_path)

  output_path = "tmp/ffmpeg_transcoding/#{SecureRandom.hex}_#{filename}" # Define a temporary JPEG output file as filename is thumbnail.jpg
  run_ffmpeg! '-ss', ffmpeg_target_second(at), '-i', current_path, '-vframes', '1', '-qscale:v', '2', output_path
  FileUtils.mv(output_path, current_path) # Move the JPEG file back to the current path, effectively setting the wrong extension (.mp4 again), which is no problem because we do not process it further before renaming it below

  raise TakeFrameError, "Something went wrong while capturing a frame for this version. The output file does not seem to exist in #{current_path}" unless File.exist?(current_path)

  # This is the crucial, new part
  # Enforce the correct content type and extension by moving the file handle (file is the uploader's SanitizedFile) to the same path with the correct extension
  file.content_type = 'image/jpeg'
  file.move_to(Pathname.new(current_path).sub_ext('.jpg').to_s)

  true
end

The resize_to_fit step relies on current_path which now resolves to your JPEG file and can process it correctly as an image input is expected.

This new logic can also be found in the CarrierWave::MiniMagick processing module. The minimagick! method does exactly the same to ensure the correct format is used, if the result of the underlying convert command delivers a different output.

Warning

Heads up: You still need to set the extension to the desired one within your version's full_filename method. You can use the force_extension method to accomplish this. Just moving it to the path with the correct extension within the processor is not sufficient because that's just the temporary workfile. Once all processing on this workfile is done, CarrierWave will cache it/move it to the cache path using its real filename that it gets from the uploader. Unless you force this filename to match the new extension, the original filename will be used and you end up with the wrong extension again.

In the example above, this will not happen because we override the full_filename method with a custom name and the correct extension. If you don't do this, you will have to call the force_extension(extension) method to tell CarrierWave that it should automatically adjust the full_filename for you internally to match this extension.

class VideoUploader < BaseUploader
  # ...

  version :thumbnail do
    process(take_frame: [at: 0.seconds])
    process(resize_to_limit: [1280, 720])
    force_extension('.jpg')
  end
  
  # ...
end

Hint: process: convert calls will automatically force the correct extension (as implemented by CarrierWave) and you do not have to call force_extension(...) in your version block.


If you have tests for file processors (e.g. for ClamAV virus scans via process: :virus_scan) and you have assertions on the file path (e.g. that ClamAV receives a scan call with the correct file path), keep in mind that you can no longer expect the uploader's file.path, but need to match against the new temporary workfile path. You can use the following helper variable for this purpose:

let(:carrierwave_workfile_path_matcher) do
  %r{#{CarrierWave.tmp_path}/(?<cache-id>[0-9\-]+)/#{file.original_filename}}
end

# ...

expect(location).to match(carrierwave_workfile_path_matcher)
Dominic Beger
November 09, 2023Software engineer at makandra GmbH
Posted by Dominic Beger to makandra dev (2023-11-09 09:58)