Read more

How to inspect RSS feeds with Spreewald, XPath, and Selenium

Dominik Schöler
June 06, 2016Software engineer at makandra GmbH

Spreewald Show archive.org snapshot gives you the <step> within <selector> meta step that will constrain page inspection to a given scope.

Illustration online protection

Rails Long Term Support

Rails LTS provides security patches for old versions of Ruby on Rails (2.3, 3.2, 4.2 and 5.2)

  • Prevents you from data breaches and liability risks
  • Upgrade at your own pace
  • Works with modern Rubies
Read more Show archive.org snapshot

Unfortunately, this does not work with RSS feeds, as they're XML documents and not valid when viewed from Capybara's internal browser (e.g. a <link> tag cannot have content in HTML).

Inspecting XML

If you're inspecting XML that is invalid in HTML, you need to inspect the page source instead of the DOM. You may use Spreewald's "... in the HTML" meta step, or add this proxy step for better semantics:

Then /^I should( not)? see "(.+?)" in the page source$/ do |negate, text|
  step %(I should#{ negate } see "#{ text }" in the HMTL)
end

Now you can say Then I should see "Application title" in the page source within "channel > title" whenever a simple "within ..." does not work.

Note that you need to write all selectors in lowercase letters in your tests. Capybara talks HTML, after all, and in HTML tags and attributes are case-insensitive (as opposed to XML).

Using Xpath

If you need xpath to express things that CSS selectors cannot do, add the following selector in selectors.rb:

    when /^"(.+?)" \(xpath\)$/
      [:xpath, $1]

Then use it like this: Then I should see "Application title" in the page source within "//channel/title" (xpath)

Using Selenium and Firefox

Selenium doesn't seem to be able to inspect RSS when you follow a link to the feed. (At least) Firefox seems to intercept those RSS links, and Capybara internally remains on the previous page.
This behavior can be circumvented with the following simple step:

# Firefox seems to intercept RSS links. Use this step to circumvent that.
When /^I open the RSS feed behind "(.+?)"$/ do |link_label|
  link = find_link(link_label)
  visit link[:href]
end
Posted by Dominik Schöler to makandra dev (2016-06-06 13:51)