If you need to parse a large XML file (> 20 MB or so), you should parse it in chunks, otherwise it will need lots of memory.
Nokogiri offers a reader that lets you parse your XML one node at a time.
Given an XML library.xml with this content
    <library>
      <book>
        <title>...</title>
        <author>...</author>
      </book>
      <book>
         ...
      </book>
       ...
    </library>
you can for example loop over all books with
    def each_book(filename, &block)
      File.open(filename) do |file|
        Nokogiri::XML::Reader.from_io(file).each do |node|
          if node.name == 'book' and node.node_type == XML::Reader::TYPE_ELEMENT
            yield(Nokogiri::XML(node.outer_xml).root)
          end
        end
      end
    end
This will simply yield regular Nokogiri nodes.
Posted by Tobias Kraze to makandra dev (2011-07-11 10:32)