Read more

Traverse large XML files with Nokogiri

Tobias Kraze
July 11, 2011Software engineer at makandra GmbH

If you need to parse a large XML file (> 20 MB or so), you should parse it in chunks, otherwise it will need lots of memory.

Illustration web development

Do you need DevOps-experts?

Your development team has a full backlog? No time for infrastructure architecture? Our DevOps team is ready to support you!

  • We build reliable cloud solutions with Infrastructure as code
  • We are experts in security, Linux and databases
  • We support your dev team to perform
Read more Show archive.org snapshot

Nokogiri offers a reader that lets you parse your XML one node at a time.

Given an XML library.xml with this content

    <library>
      <book>
        <title>...</title>
        <author>...</author>
      </book>
      <book>
         ...
      </book>
       ...
    </library>

you can for example loop over all books with

    def each_book(filename, &block)
      File.open(filename) do |file|
        Nokogiri::XML::Reader.from_io(file).each do |node|
          if node.name == 'book' and node.node_type == XML::Reader::TYPE_ELEMENT
            yield(Nokogiri::XML(node.outer_xml).root)
          end
        end
      end
    end

This will simply yield regular Nokogiri nodes.

Posted by Tobias Kraze to makandra dev (2011-07-11 12:32)