Traverse large XML files with Nokogiri

Updated . Posted . Visible to the public.

If you need to parse a large XML file (> 20 MB or so), you should parse it in chunks, otherwise it will need lots of memory.

Nokogiri offers a reader that lets you parse your XML one node at a time.

Given an XML library.xml with this content

    <library>
      <book>
        <title>...</title>
        <author>...</author>
      </book>
      <book>
         ...
      </book>
       ...
    </library>

you can for example loop over all books with

    def each_book(filename, &block)
      File.open(filename) do |file|
        Nokogiri::XML::Reader.from_io(file).each do |node|
          if node.name == 'book' and node.node_type == XML::Reader::TYPE_ELEMENT
            yield(Nokogiri::XML(node.outer_xml).root)
          end
        end
      end
    end

This will simply yield regular Nokogiri nodes.

Tobias Kraze
Last edit
Judith Roth
License
Source code in this card is licensed under the MIT License.
Posted by Tobias Kraze to makandra dev (2011-07-11 10:32)