Generating and streaming ZIP archives on the fly

Posted . Visible to the public.

When your Rails application offers downloading a bunch of files as ZIP archive, you basically have two options:

  1. Write a ZIP file to disk and send it as a download to the user.
  2. Generate a ZIP archive on the fly while streaming it in chunks to the user.

This card is about option 2, and it is actually fairly easy to set up.

We are using this to generate ZIP archives with lots of files (500k+) on the fly, and it works like a charm.

Why stream downloads?

Offering downloads of large archives can be cumbersome:

  • It takes time to build the ZIP file. Users can not start their download immediately.
  • Or you have to prepare it up front, and can only offer download links once that has happened.
  • For one-off downloads, generating a large file is unnecessary, and you have to take care of removing it once it is no longer required.

When you generate them on the fly, downloads start immediately, and you can add contents while the user is downloading.

How to do it

The excellent zip_tricks Show archive.org snapshot gem helps you out, and is not too difficult to integrate.
If you want to send only files from ActiveStorage or Carrierwave attachments, it's even simpler when using zipline Show archive.org snapshot .

Instructions for ZipTricks

You want to use ZipTricks if you need some level of control, or if you are building some of the ZIP contents on the fly (like some CSV).

  1. Add zip_tricks to your Gemfile and bundle install.

  2. In your controller, include ZipTricks::RailsStreaming.

  3. You can then use the zip_tricks_stream method in controller actions to generated your contents, as described in the documentation.

  4. Make sure you disable Rails middlewares that generate ETag headers for you (like Rails' Rack::ETag Show archive.org snapshot or Rack::SteadyETag Show archive.org snapshot ).

    • Those middlewares usually want to capture the entire response body in order to generate the ETag based on its contents. If they do that, you won't be streaming to the user.
    • Usually, you can just set your own Last-Modified or ETag header to achieve that.

Example controller:

class DocumentsController < ApplicationController

  include ZipTricks::RailsStreaming

  def download
    documents = Document.all
    
    fresh_when Time.current # Sets `Last-Modified` header (see above). Or say "fresh_when documents" to use the scope.
    send_file_headers! filename: 'all_documents.zip' # Sets `Content-Disposition` and `Content-Type` headers.
    
    zip_tricks_stream do |zip|
      documents.find_each do |document|
        filename = document.filename
        path = document.file.path
        
        zip.write_stored_file(filename) do |stream|
          File.open(path, 'rb') do |source|
            IO.copy_stream(source, stream)
          end
        end
      end
    end
  end

end

What happens here is:

  1. zip_tricks_stream generates a buffer object (zip) can be written to from inside the block.
  2. The controller action completes.
  3. Your code inside the block given to zip_tricks_stream runs and writes to the buffer object.
  4. ZipTricks streams the contents to the user.

About compression:

  • Use write_stored_file for files that are large or unlikely to compress significantly (like PNG, JPEG, MP4, ...)
  • Use write_deflated_file when adding files that compress well, like CSV, XML, or other text documents.

Instructions for Zipline

Zipline uses ZipTricks under the hood and is specifically intended for streaming existing file attachments (from ActiveStorage, Carrierwave, etc.).

  1. Add zipline to your Gemfile and bundle install.
  2. In your controller, include Zipline.
  3. You can then use the zipline method in controller actions as described in the gem's documentation.

Note that Zipline will set a Last-Modified header already, disabling any etagging middlewares and allowing streaming.

Example controller:

class DocumentsController < ApplicationController

  include Zipline

  def download
    documents = Document.all
    
    files_and_filenames = documents.find_each.lazy.map do |document|
      [document.file, document.filename]
    end
    zipline(files_and_filenames, 'all_documents.zip')
  end

end

Take care to use lazy.map instead of map, or your controller action has to iterate the entire collection before it can start streaming.

Arne Hartherz
Last edit
Arne Hartherz
License
Source code in this card is licensed under the MIT License.
Posted by Arne Hartherz to makandra dev (2022-08-04 12:11)