Read more

Traverse large XML files with Nokogiri

Tobias Kraze
July 11, 2011Software engineer at makandra GmbH

If you need to parse a large XML file (> 20 MB or so), you should parse it in chunks, otherwise it will need lots of memory.

Illustration money motivation

Opscomplete powered by makandra brand

Save money by migrating from AWS to our fully managed hosting in Germany.

  • Trusted by over 100 customers
  • Ready to use with Ruby, Node.js, PHP
  • Proactive management by operations experts
Read more Show archive.org snapshot

Nokogiri offers a reader that lets you parse your XML one node at a time.

Given an XML library.xml with this content

    <library>
      <book>
        <title>...</title>
        <author>...</author>
      </book>
      <book>
         ...
      </book>
       ...
    </library>

you can for example loop over all books with

    def each_book(filename, &block)
      File.open(filename) do |file|
        Nokogiri::XML::Reader.from_io(file).each do |node|
          if node.name == 'book' and node.node_type == XML::Reader::TYPE_ELEMENT
            yield(Nokogiri::XML(node.outer_xml).root)
          end
        end
      end
    end

This will simply yield regular Nokogiri nodes.

Posted by Tobias Kraze to makandra dev (2011-07-11 12:32)