Traverse large XML files with Nokogiri

Posted Almost 13 years ago. Visible to the public.

If you need to parse a large XML file (> 20 MB or so), you should parse it in chunks, otherwise it will need lots of memory.

Nokogiri offers a reader that lets you parse your XML one node at a time.

Given an XML library.xml with this content

    <library>
      <book>
        <title>...</title>
        <author>...</author>
      </book>
      <book>
         ...
      </book>
       ...
    </library>

you can for example loop over all books with

    def each_book(filename, &block)
      File.open(filename) do |file|
        Nokogiri::XML::Reader.from_io(file).each do |node|
          if node.name == 'book' and node.node_type == XML::Reader::TYPE_ELEMENT
            yield(Nokogiri::XML(node.outer_xml).root)
          end
        end
      end
    end

This will simply yield regular Nokogiri nodes.

Tobias Kraze
Last edit
Over 3 years ago
Deleted user #4117
License
Source code in this card is licensed under the MIT License.
Posted by Tobias Kraze to makandra dev (2011-07-11 10:32)