Read more

Nokogiri: How to parse large XML files with a SAX parser

Florian Leinsinger
December 02, 2021Software engineer at makandra GmbH

In my case [...] the catalog is an XML that contains all kinds of possible products, categories and vendors and it is updated once a month. When you read this file with the Nokogiri default (DOM) parser, it creates a tree structure with all branches and leaves. It allows you to easily navigate through it via css/xpath selectors.

The only problem is that if you read the whole file into memory, it takes a significant amount of RAM. It is really ineffective to pay for a server if you need this RAM once a month. Since I don't need to navigate through the tree structure, but just replicate all the needed data into the database, the best option is to use SAX parser.

When you read very large XML files Nokogiri may explode with this message when creating the tree structure of your file:

Nokogiri::XML::XPath::SyntaxError: FATAL: Memory allocation failed: growing nodeset hit limit
Illustration online protection

Rails Long Term Support

Rails LTS provides security patches for old versions of Ruby on Rails (2.3, 3.2, 4.2 and 5.2)

  • Prevents you from data breaches and liability risks
  • Upgrade at your own pace
  • Works with modern Rubies
Read more Show archive.org snapshot

A SAX parser Show archive.org snapshot could be a way to solve this problem.

Posted by Florian Leinsinger to makandra dev (2021-12-02 09:10)