Caching file properties with ActiveStorage Analyzers

Posted . Visible to the public.

When working with file uploads, we sometimes need to process intrinsic properties like the page count or page dimensions of PDF files. Retrieving those properties requires us to download (from S3 or GlusterFS) and parse the file, which is slow and resource-intensive.

Active Storage provides the metadata column on ActiveStorage::Blob to cache these values. You can either populate this column with ad-hoc metadata caching or with custom Analyzers Show archive.org snapshot .

Attachments vs. Blobs

Let's recap how Active Storage is structured:

  • ActiveStorage::Attachments are the join records connecting your application's models to the underlying files.
  • ActiveStorage::Blobs represent the actual file along with its metadata.

Because a single Blob can be copied and shared across multiple Attachments, the metadata stored on a Blob should only contain intrinsic properties of the file itself (like page count, duration, dimensions etc.). Do not store domain-specific or record-specific state on a blob, as it could leak across different contexts.

Ad-hoc Metadata Caching

For isolated, single-use requirements, you can write directly to the blob's metadata hash. This evaluates the property the first time it is requested and persists it for subsequent calls.

class User
  has_one_attached :pdf_resume

  def page_count
    return unless pdf_resume.attached?

    blob = pdf_resume.blob
    return blob.metadata[:page_count] if blob.metadata[:page_count].present?
  
    blob.open do |tempfile|
      pdf_info, stderr, status = Open3.capture3("pdfinfo", blob.path.to_s)
      raise "pdfinfo execution failed: #{stderr}" unless status.success?
  
      if (match = pdf_info.match(/^Pages:\s*(\d+)/))
        page_count = match[1].to_i
        blob.update!(metadata: blob.metadata.merge(page_count:))
      end
    end
  
    blob.metadata[:page_count]
  end
end

Custom Active Storage Analyzers

For properties needed across multiple models, custom Analyzer Show archive.org snapshot can be used to enrich all new uploaded files of a certain mime type.

All Analyzers must inherit from ActiveStorage::Analyzer and implement at least these two methods:

  • self.accept?(blob): Determines if this analyzer should run for a given blob.
  • metadata: Returns a Hash of data to be merged into the blob's metadata column.

Active Storage usually queues an ActiveStorage::AnalyzeJob automatically when a matching file is attached. The above example could be refactored like so:

# app/util/pdf_analyzer.rb
class PdfAnalyzer < ActiveStorage::Analyzer
  class Error < StandardError; end

  def self.accept?(blob)
    blob.content_type == "application/pdf"
  end

  def metadata
    download_blob_to_tempfile do |file|
      pdf_info, stderr, status = Open3.capture3("pdfinfo", file.path.to_s)
      raise Error, "pdfinfo execution failed: #{stderr}" unless status.success?

      if (match = pdf_info.match(/^Pages:\s*(\d+)/))
        { page_count: match[1].to_i }
      else
        {}
      end
    end
  rescue StandardError => e
    Rails.logger.error("PdfAnalyzer failed to parse metadata: #{e.message}")
    {}
  end
  
  def self.analyze_later?
    true # default
  end

end
# config/initializers/active_storage.rb
Rails.application.config.active_storage.analyzers.append PdfAnalyzer
# db/migrate/xxxxx_add_pdf_analyzer.rb
class AddPdfAnalyzer < ActiveRecord::Migration[8.0]
  def up
    pdf_blobs = ActiveStorage::Blob.where(content_type: 'application/pdf')

    # Mark all existing PDF blobs as to-be-analyzed-again
    pdf_blobs.find_each do |blob|
      updated_metadata = blob.metadata.except("analyzed", :analyzed)
      blob.update_column(:metadata, updated_metadata)
    end
  end
end
# app/models/user.rb
def page_count
  return unless pdf_resume.attached?

  blob = pdf_resume.blob

  blob.analyze unless blob.analyzed?
  blob.metadata.fetch(:page_count)
end

Note

Note Active Storage comes with two built-in and autoloaded analyzers: ActiveStorage::Analyzer::ImageAnalyzer and ActiveStorage::Analyzer::VideoAnalyzer.

Profile picture of Michael Leimstädtner
Michael Leimstädtner
Last edit
Michael Leimstädtner
License
Source code in this card is licensed under the MIT License.
Posted by Michael Leimstädtner to makandra dev (2026-03-16 08:30)