What we know about PDFKit

What PDFKit is

  • PDFKit Show archive.org snapshot converts a web page to a PDF document. It uses a Webkit engine under the hood.
  • For you as a web developer this means you can keep using the technology you are familar with and don't need to learn LaTeX. All you need is a pretty print-stylesheet.

How to use it from your Rails application

  • You can have PDFKit render a website by simply calling PDFKit.new('http://google.com').to_file('google.pdf'). You can then send the PDF using send_file 'google.pdf'.
  • You can use your controller to render the body of your PDF to a string and pass it to PDFKit.new. Paths to the stylesheets must be passed separately before calling to_file.
  • Alternatively you can use PDFKit::Middleware and all your Rails routes automagically respond to the .pdf format. This is awesome to get started fast, but details like setting the content disposition (download / inline) or download filename is awkward.

Configure PDFKit in an initializer:

PDFKit.configure do |config|
  config.default_options = {
    # print_media_type: true,
    # page_size: 'A4',

    # margin_top: '2cm',
    # margin_right: '2cm',
    # margin_left: '2cm',
    # margin_bottom: '2cm',

    quiet: true, # No output during PDF generation
    load_error_handling: 'abort', # Crash early
    load_media_error_handling: 'abort', # Crash early
    no_outline: true, # Disable the default outline
    # disable_smart_shrinking: true, # Enable to keep the pixel/dpi ratio linear
  }
  
  config.wkhtmltopdf = Rails.root.join('vendor/wkhtmltopdf/linux-trusty-amd64/wkhtmltopdf').to_s
end

Most options are forwarded to wkhtmltopdf (see below). You can get a list of supported options by running man wkhtmltopdf. However, you should always have quiet: true to keep your test output and logs clean.

How to express page breaks, headers, footers, etc.

There are concepts and formattings that only make sense on paper, so the question is how to implement them if you only have CSS:

  • CSS actually has a few print-related directives, e.g. for controlling page breaks: page-break-before:always; page-break-after:always; page-break-inside
  • PDFKit also comes with some custom options that are hard to express in CSS (or are not supported by the Webkit engine that PDFKit internally uses). These are things like:
    • Paper form
    • Print margins
    • Repeating header on every page
    • Repeating footer on every page
  • You can actually execute JavaScript before the page is rendered to PDF, and implement things like page numbers in the header or footer.

Headers and Footers

  • To add your repeated header and footer files, add or modify these attributes in your PDFKit options hash:
{
    header_html: 'app/views/foo/bar/header.html',
    footer_html: 'app/views/foo/bar/footer.html',
    margin_top: '200px', # Height of the header, can be px or mm
    margin_bottom: '150px', # Height of the footer, can be px or mm
    header_spacing: 52.197, # margin_top converted from px => mm (You can see it as "margin-top: -200px")
    footer_spacing: 0, # The top left edge of the footer is already at the right position
    replace: { # You can pass custom data to your JavaScript this way
      custom: {
        :foo => 'bar',
      }.to_json
}
  • Both files are independent DOM trees and share nothing. This means that styles, fonts and scripts must be included in each of these files.

  • I ended up in-lining style and script tags, since relative paths do not work when parsed from the wkhtmltopdf binary. With something like %style= Rails.application.assets.find_asset('pdf.sass').body.html_safe in your layout you still split the CSS in two files.

  • Use the following JavaScript function to access variables such as the page number or the custom JSON encoded hash:

function allQueryInformation() {
  var pdfInfo = {};
  var queryStrings = document.location.search.substring(1).split('&');

  for (var query in queryStrings) {
    if (queryStrings.hasOwnProperty(query)) {
        var keyValuePair = queryStrings[query].split('=', 2);
        var key = keyValuePair[0];
        var value = keyValuePair[1];
        pdfInfo[key] = decodeURI(value);
    }
  }

  return pdfInfo;
}
  • If you want to place repeated content outside the header or footer area, I found that this is only possible via absolute positioning in the header, but not the footer.

Fonts and their rendering quality

  • The font rendering quality of PDFKit used to be really, really horrible when compared to e.g. saving a page as PDF from a Chrome Browser. Horrible kerning, distorted characters, bad support for web fonts, etc.
  • PDFKit has improved a lot here. Their rendering quality is now fine in recent versions of wkhtmltopdf (0.12+).
  • You will never beat LaTeX if you need perfect font rendering.
  • If you are observing strange behavior when including your fonts, this card might help

Understand the wkhtmltopdf binary

PDFKit is only a thin wrapper around the wkhtmltopdf binary. Unfortunately old versions wkhtmltopdf have many, many issues and your package sources don't usually come with a recent version. You should have at least 0.12.1, which you may obtain from here Show archive.org snapshot . Bundle it with your application and tell PDFKit where to find the bundled binary like so:

PDFKit.configure do |config|
  config.wkhtmltopdf = "#{Rails.root}/vendor/wkhtmltopdf/linux-precise-amd64/wkhtmltopdf"
end

When using version 0.12.6 and above, you'll need to add the following command line switch to your PDFKit configuration to avoid crashes with cryptic error messages like PDFKit::ImproperWkhtmltopdfExitStatus:

PDFKit.configure do |config|
  config.default_options = {
    ...,
    enable_local_file_access: true,
  }
end

Deadlock issues on development machine ("PDFKit middleware hangs")

When using the PDFKit middleware on your development, you might experience that your application "locks up" whenever you request a .pdf route.

This behavior is caused by a deadlock:

  1. The Rails process is trying to render the page to PDF
  2. To render the PDF additional assets (CSS, images, Javascripts) are required
  3. When using a singlethreaded development server like Thin there is no additional worker process available to deliver those assets.

The easiest fix for this is to use Passenger Standalone for development, which can spawn multiple worker processes. However, Passenger does not allow to use debugger or byebug.

If you don't want to use Passenger you can also do this:

  • Switch from Thin to Webrick
  • In config/environments/development.rb set config.allow_concurrency = true (default in Rails 4)

Note that this allows concurrent requests served from the same process using threads. This might cause unexpected behavior if your application or dependencies are not thread-safe. If you don't know what that means, your application probably isn't thread-safe.

Caveats when implementing pixel-perfect layouts

  • PDFKit sets a default vertical margin of 0.75 inch which disables the automatic header/footer calculation from wkhtmltopdf. This margin was impossible to unset in some versions of PDFKit
  • If the rendered PDF document doesn't have a doctype, some versions of wkhtmltopdf won't render the header
  • A white border is drawn around the header and footer, which you might want to reset
Henning Koch Over 9 years ago