How to auto-rotate document (text based) images before OCR or computer vision tasks

Posted . Visible to the public.

A page scanned upside down or sideways has the potential to confuse OCR engines and vision LLMs. While both are often capable of handling such inputs, the overall extraction quality tends to be better when we pass in only input with correctly oriented text.

Detecting and correcting the image orientation does not require extra hardware on our web servers, it just adds a bit of complexity to the overall pipeline.

Approach

Tesseract ships with an Orientation and Script Detection (OSD) Show archive.org snapshot mode (--psm 0) that returns a rotation angle and a confidence score. We run it on a small sector of the image, read the angle, and rotate the image with libvips.

Two caveats drive the design:

  • Tesseract OSD needs a minimum amount of text to be confident. Running it on a full page often picks up noise (borders, stamps, tiny footer text) and returns low-confidence garbage.
  • OSD takes ~1s per A4 page, so we try to speed that up.

So we crop a few candidate sectors (center first, then corners), probe each with OSD until one clears a confidence threshold, and rotate the image accordingly. If nothing clears the threshold, we leave the image alone.

If the image is smaller than a single sector in either dimension, sector scanning makes no sense. We skip the loop and run OSD once on the whole image instead, trusting whatever confidence tesseract reports since there is no better alternative.

Implementation

This module takes a Vips::Image and returns a rotated version as another Vips::Image. This matches the interface of Image::Autocrop so the two can easily be chained.

require 'open3'
require 'vips'

class Autorotate

  Result = Data.define(:angle, :confidence, :source)
  
  MIN_CONFIDENCE = 2.0
  SECTOR_SIZE = 650
  PADDING = 100
  TESSERACT_CMD = "tesseract stdin stdout --psm 0 -c min_characters_to_try=20".freeze
  
  # Tesseract "Rotate: N" means "rotate N° clockwise to upright".
  # libvips rot :d90 / :d180 / :d270 are clockwise rotations, so mapping is 1:1.
  VIPS_ROTATIONS = {
    0   => nil,
    90  => :d90,
    180 => :d180,
    270 => :d270,
  }.freeze
  
  def self.perform(image)
    result = detect(image)
    rotation = VIPS_ROTATIONS[result.angle]
    rotation ? image.rot(rotation) : image
  end
  
  def self.detect(image)
    if image.width < SECTOR_SIZE || image.height < SECTOR_SIZE
      angle, confidence = scan_whole(image)
      return Result.new(angle: angle, confidence: confidence, source: 'Whole image')
    end

    sectors = define_sectors(image.width, image.height)

    sectors.each do |sector|
      angle, confidence = scan_sector(image, sector[:x], sector[:y])

      if confidence >= MIN_CONFIDENCE
        return Result.new(angle: angle, confidence: confidence, source: sector[:name])
      end
    end

    Result.new(angle: 0, confidence: 0.0, source: 'No confident text')
  end

  def self.define_sectors(width, height)
    [
      { name: 'Center',       x: (width - SECTOR_SIZE) / 2, y: (height - SECTOR_SIZE) / 2 },
      { name: 'Top-Left',     x: PADDING,                   y: PADDING },
      { name: 'Bottom-Left',  x: PADDING,                   y: height - SECTOR_SIZE - PADDING },
      { name: 'Top-Right',    x: width - SECTOR_SIZE - PADDING, y: PADDING },
      { name: 'Bottom-Right', x: width - SECTOR_SIZE - PADDING, y: height - SECTOR_SIZE - PADDING },
    ]
  end

  def self.scan_whole(image)
    run_osd(image.write_to_buffer('.png'))
  end

  def self.scan_sector(image, x, y)
    safe_x = [0, x].max
    safe_y = [0, y].max
    safe_width = [SECTOR_SIZE, image.width - safe_x].min
    safe_height = [SECTOR_SIZE, image.height - safe_y].min
    return [0, 0.0] if safe_width < 100 || safe_height < 100

    run_osd(image.crop(safe_x, safe_y, safe_width, safe_height).write_to_buffer('.png'))
  end

  def self.run_osd(png_bytes)
    stdout, _stderr, status = Open3.capture3(
      TESSERACT_CMD,
      stdin_data: png_bytes, binmode: true,
    )
    return [0, 0.0] unless status.success?

    angle = stdout[/Rotate: (\d+)/, 1].to_i
    confidence = stdout[/Orientation confidence: ([\d\.]+)/, 1].to_f
    [angle, confidence]
  end

end

There are a few variables that can be tuned:

  • SECTOR_SIZE = 650 is a good default for images rendered at 150-200 DPI. Large enough to contain multiple text lines, small enough that tesseract finishes quickly. Scale it up for much larger inputs.
  • Sector priority: center first (body text is the cleanest signal), corners as fallback for pages with wide margins or figures in the middle.
  • MIN_CONFIDENCE = 2.0 is tesseract's confidence score, not a percentage. Below this, the result is essentially a guess. Well-scanned pages typically land between 2 and 5.
  • min_characters_to_try=20 makes OSD bail out quickly on near-empty sectors.

Usage

require 'vips'
image = Vips::Image.new_from_file('sideways.png')
rotated = Autorotate.perform(image)
rotated.write_to_file('rotated.png')

For multi-page PDFs, render each page to an image first (pdftoppm -png -r 200 input.pdf out), or use libvips' PDF loader, then call the detector per page.

Chaining with autocrop

Orientation detection and autocrop are complementary: both clean up the image before OCR or a vision LLM sees it. Rotate first, then crop: cropping a sideways page would trim the wrong edges, and the autocrop header/footer heuristic assumes the image is already upright.

original_image = Vips::Image.new_from_file('large_sideways_image.png')
thumbnail = original_image.thumbnail_image(1500, height: 1500, size: :down)
rotated = Autorotate.perform(thumbnail)
cropped = Autocrop.perform(rotated)
cropped&.write_to_file('ready_for_llm_analysis.png')

Caveats

  • Pages without enough readable text in any sector fall through to angle: 0, confidence: 0.0. That is the intended behavior: better to leave a page alone than to rotate it based on noise. Figure-only pages, near-blank separator pages, and heavily handwritten pages will pass through unchanged.
  • OSD only detects rotations in 90° increments (0, 90, 180, 270). Skewed scans (a few degrees off) need a separate deskew step. Tika handles this with detectAngles="true" (see the Tika configuration card).
Profile picture of Michael Leimstädtner
Michael Leimstädtner
Last edit
Michael Leimstädtner
License
Source code in this card is licensed under the MIT License.
Posted by Michael Leimstädtner to makandra dev (2026-04-22 12:42)