Nokogiri: How to parse large XML files with a SAX parser

Posted . Visible to the public.

In my case [...] the catalog is an XML that contains all kinds of possible products, categories and vendors and it is updated once a month. When you read this file with the Nokogiri default (DOM) parser, it creates a tree structure with all branches and leaves. It allows you to easily navigate through it via css/xpath selectors.

The only problem is that if you read the whole file into memory, it takes a significant amount of RAM. It is really ineffective to pay for a server if you need this RAM once a month. Since I don't need to navigate through the tree structure, but just replicate all the needed data into the database, the best option is to use SAX parser.

When you read very large XML files Nokogiri may explode with this message when creating the tree structure of your file:

Nokogiri::XML::XPath::SyntaxError: FATAL: Memory allocation failed: growing nodeset hit limit

A SAX parser Show archive.org snapshot could be a way to solve this problem.

Florian Leinsinger
Last edit
Florian Leinsinger
License
Source code in this card is licensed under the MIT License.
Posted by Florian Leinsinger to makandra dev (2021-12-02 08:10)