If you need to parse a large XML file (> 20 MB or so), you should parse it in chunks, otherwise it will need lots of memory.
Nokogiri offers a reader that lets you parse your XML one node at a time.
Given an XML library.xml
with this content
<library>
<book>
<title>...</title>
<author>...</author>
</book>
<book>
...
</book>
...
</library>
you can for example loop over all books with
def each_book(filename, &block)
File.open(filename) do |file|
Nokogiri::XML::Reader.from_io(file).each do |node|
if node.name == 'book' and node.node_type == XML::Reader::TYPE_ELEMENT
yield(Nokogiri::XML(node.outer_xml).root)
end
end
end
end
This will simply yield regular Nokogiri nodes.
Posted by Tobias Kraze to makandra dev (2011-07-11 10:32)