255 Parsing text with regular expressions [1d]

Posted Over 2 years ago. Visible to the public.

Learn

Resources

Exercises

Find words

Write a method second_words(string) that returns the second word of every sentence in the given string.

Tip

You can generate paragraphs of random text on https://loremipsum.de/ Show archive.org snapshot .

Write a regular expression that matches a sentence, then call it multiple times.

Parse Ruby classes

Write a ClassScanner class that parses a .rb file containing a simple Ruby class:

# student.rb

class Student < Person
  attr_reader :first_name, :last_name, :disabled
  attr_accessor :credits

  def full_name
    first_name + ' ' + last_name
  end

  def active?
    !@disabled
  end

end

The ClassScanner should work like this:

# main.rb

code = File.read('student.rb')

scanner = ClassScanner.new(code)
scanner.name # => 'Student'
scanner.superclass # => 'Person'
scanner.own_methods # => [:first_name, :last_name, :disabled, :credits, :credits=, :full_name, :active?]

We're practicing regular expressions here, not implement a fully correct Ruby parser. Here are more details to scope what your implementation does and does not need to do:

  • You don't need to implement the parser as a single giant regex. Instead write individual patterns for methods, accessors, etc. Call each patterns until there are no more matches left.
  • You only need to parse the given string of Ruby code. You don't need to include methods from the superclass.
  • You may assume that the given code contains exactly one class.
  • You don't need to support namespaced classes.
  • You only need to support method definition through def ... end, attr_reader and attr_accessor. You don't need to support metaprogramming.
  • You may assume that keywords like attr_reader, def or end do not appear in any strings.
  • You should support the following variants for the same thing:
    attr_reader :one, :two
    
    attr_reader(:one, :two)
    
    attr_reader(:one, 'two')
    
    attr_reader :one
    attr_reader :two
    
  • You may assume that all arguments of attr_reader and attr_accessor sit on the same line.
  • The superclass may be optional. ClassScanner#superclass should return nil in that case.

Info

In practice we would never parse Ruby code like this. For most file formats there are libraries that parse data correctly, like the parser gem Show archive.org snapshot for Ruby code or Nokogiri Show archive.org snapshot for HTML.

Regular expressions are a blunt tool that happens to be good enough much of the time. Read Parsing Html The Cthulhu Way Show archive.org snapshot for more.

Henning Koch
Last edit
About 1 month ago
Michael Leimstädtner
License
Source code in this card is licensed under the MIT License.
Posted by Henning Koch to makandra Curriculum (2021-12-23 16:16)