Parse XML or HTML with Nokogiri

Updated . Posted . Visible to the public.

To parse XML-documents, I recommend the gem nokogiri Show archive.org snapshot .

A few hints:

  • xml = Nokogiri::XML("<list><item>foo</item><item>bar</item></list>") parses an xml string. You can also call Nokogiri::HTML to be more liberal about accepting invalid XML.
  • xml / 'list item' returns all matching nodes; list item is used like a CSS selector
  • xml / './/list/item' also returns all matching nodes, but .//list/item is now an XPath selector
    • XPath seems to be triggered by a leading . or /
  • xml % 'item' returns the first matching node
  • node.attribute('foo') returns the attribute named foo
  • node.attribute('foo').value returns its value
  • node.content returns the content

Careful with XPath:

Whenever an XML document declares a namespace, like

<list xmlns="http://mylist.org'>
  <item />
</list>

xml % './/list' will not match any more (since there is no list tag any more, just a {http://mylist.org}:list tag).

You may use xml % './/xmlns:list' instead.

XPath examples Show archive.org snapshot

XPath sandbox Show archive.org snapshot

Tobias Kraze
Last edit
License
Source code in this card is licensed under the MIT License.
Posted by Tobias Kraze to makandra dev (2010-08-25 13:04)