There are several gems that make it easy to read and process xlsx files. Parsing the entire file at once however is error-prone since each cell is transformed to a ruby object - sometimes including thousands of formatted but empty cells.
As of today, I found two promising alternatives that provide a stream-based access to spradsheet rows:
In my case, I had to switch from SimpleXlsxReader Show archive.org snapshot to Creek since a badly formatted 12MB Sheet broke the 500MB Staging Memory limit when reading it as a whole.
The implementation follows the Creek's documentation Show archive.org snapshot :
document = Creek::Book.new('/path/to/file.xlsx')
document.sheets.each do |sheet|
sheet.rows.each do |row|
cells = row.values
# Process this row
end
end
You should note two things: