Parse XML or HTML with Nokogiri

To parse XML-documents, I recommend the gem nokogiri Show archive.org snapshot .

A few hints:

xml = Nokogiri::XML("<list><item>foo</item><item>bar</item></list>") parses an xml string. You can also call Nokogiri::HTML to be more liberal about accepting invalid XML.
xml / 'list item' returns all matching nodes; list item is used like a CSS selector
xml / './/list/item' also returns all matching nodes, but .//list/item is now an XPath selector
- XPath seems to be triggered by a leading . or /
xml % 'item' returns the first matching node
node.attribute('foo') returns the attribute named foo
node.attribute('foo').value returns its value
node.content returns the content

Whenever an XML document declares a namespace, like

<list xmlns="http://mylist.org'>
  <item />
</list>

xml % './/list' will not match any more (since there is no list tag any more, just a {http://mylist.org}:list tag).

You may use xml % './/xmlns:list' instead.

Tobias Kraze

Last edit

2012-09-21

License

Source code in this card is licensed under the MIT License.

Posted by Tobias Kraze to makandra dev (2010-08-25 13:04)