Read more

Nokogiri: How to parse large XML files with a SAX parser

Florian Leinsinger
December 02, 2021Software engineer at makandra GmbH

In my case [...] the catalog is an XML that contains all kinds of possible products, categories and vendors and it is updated once a month. When you read this file with the Nokogiri default (DOM) parser, it creates a tree structure with all branches and leaves. It allows you to easily navigate through it via css/xpath selectors.

The only problem is that if you read the whole file into memory, it takes a significant amount of RAM. It is really ineffective to pay for a server if you need this RAM once a month. Since I don't need to navigate through the tree structure, but just replicate all the needed data into the database, the best option is to use SAX parser.

When you read very large XML files Nokogiri may explode with this message when creating the tree structure of your file:

Nokogiri::XML::XPath::SyntaxError: FATAL: Memory allocation failed: growing nodeset hit limit
Illustration money motivation

Opscomplete powered by makandra brand

Save money by migrating from AWS to our fully managed hosting in Germany.

  • Trusted by over 100 customers
  • Ready to use with Ruby, Node.js, PHP
  • Proactive management by operations experts
Read more Show archive.org snapshot

A SAX parser Show archive.org snapshot could be a way to solve this problem.

Posted by Florian Leinsinger to makandra dev (2021-12-02 09:10)