Read more

Parse XML or HTML with Nokogiri

Tobias Kraze
August 25, 2010Software engineer at makandra GmbH

To parse XML-documents, I recommend the gem nokogiri Show archive.org snapshot .

A few hints:

  • xml = Nokogiri::XML("<list><item>foo</item><item>bar</item></list>") parses an xml string. You can also call Nokogiri::HTML to be more liberal about accepting invalid XML.
  • xml / 'list item' returns all matching nodes; list item is used like a CSS selector
  • xml / './/list/item' also returns all matching nodes, but .//list/item is now an XPath selector
    • XPath seems to be triggered by a leading . or /
  • xml % 'item' returns the first matching node
  • node.attribute('foo') returns the attribute named foo
  • node.attribute('foo').value returns its value
  • node.content returns the content

Careful with XPath:

Illustration web development

Do you need DevOps-experts?

Your development team has a full backlog? No time for infrastructure architecture? Our DevOps team is ready to support you!

  • We build reliable cloud solutions with Infrastructure as code
  • We are experts in security, Linux and databases
  • We support your dev team to perform
Read more Show archive.org snapshot

Whenever an XML document declares a namespace, like

<list xmlns="http://mylist.org'>
  <item />
</list>

xml % './/list' will not match any more (since there is no list tag any more, just a {http://mylist.org}:list tag).

You may use xml % './/xmlns:list' instead.

XPath examples Show archive.org snapshot

XPath sandbox Show archive.org snapshot

Posted by Tobias Kraze to makandra dev (2010-08-25 15:04)