Posted over 4 years ago. Visible to the public.

Import Excel files without running into memory limitations

There are several gems that make it easy to read and process xlsx files. Parsing the entire file at once however is error-prone since each cell is transformed to a ruby object - sometimes including thousands of formatted but empty cells.

As of today, I found two promising alternatives that provide a stream-based access to spradsheet rows:

  • Roo Archive supports multiple spreadsheet types like ODS or CSV and has a quite large contributor base
  • Creek Archive uses a similar approach and worked fine for me as well. It does not support XLS or CSV files but is the clear choice if you have to process metadata like hidden rows. All numbers will be parsed as floats though.

In my case, I had to switch from SimpleXlsxReader Archive to Creek since a badly formatted 12MB Sheet broke the 500MB Staging Memory limit when reading it as a whole.

The implementation follows the Creek's documentation Archive :

Copy
document = Creek::Book.new('/path/to/file.xlsx') document.sheets.each do |sheet| sheet.rows.each do |row| cells = row.values # Process this row end end

You should note two things:

  • The row hash maps cell names to their values
  • sheet.rows is an Enumerator and yields each row subsequently

Your development team has a full backlog of feature requests, chores and refactoring coupled with deadlines? We are familiar with that. With our "DevOps as a Service" offering, we support developer teams with infrastructure and operations expertise.

Owner of this card:

Avatar
Michael Leimstädtner
Last edit:
over 1 year ago
by Michael Leimstädtner
About this deck:
We are makandra and do test-driven, agile Ruby on Rails software development.
License for source code
Posted by Michael Leimstädtner to makandra dev
This website uses short-lived cookies to improve usability.
Accept or learn more