Learn
Resources
- RubyGuides: Mastering Ruby Regular Expressions Show archive.org snapshot
- Using regular expressions in JavaScript
- Testing regular expressions visually
- Regular Expressions: Quantifier modes
- Ruby: You can nest regular expressions
- Matching line feeds with regular expressions works differently in every language
- Ruby: Making your regular expressions more readable with /x
- Regular Expressions - Cheat Sheet
- ReDOS: Denial of Service attack vector on inefficient regular expressions
- Vortrag ReDos-Angriffe Show archive.org snapshot (internal talk, in our library, Slides)
- Tool to check if you are affected Show archive.org snapshot
- Ruby: Using named groups in Regex
Exercises
Find words
Write a method second_words(string)
that returns the second word of every sentence in the given string
.
Tip
You can generate paragraphs of random text on https://loremipsum.de/ Show archive.org snapshot .
Write a regular expression that matches a sentence, then call it multiple times.
Parse Ruby classes
Write a ClassScanner
class that parses a .rb
file containing a simple Ruby class:
# student.rb
class Student < Person
attr_reader :first_name, :last_name, :disabled
attr_accessor :credits
def full_name
first_name + ' ' + last_name
end
def active?
!@disabled
end
end
The ClassScanner
should work like this:
# main.rb
code = File.read('student.rb')
scanner = ClassScanner.new(code)
scanner.name # => 'Student'
scanner.superclass # => 'Person'
scanner.own_methods # => [:first_name, :last_name, :disabled, :credits, :credits=, :full_name, :active?]
We're practicing regular expressions here, not implement a fully correct Ruby parser. Here are more details to scope what your implementation does and does not need to do:
- You don't need to implement the parser as a single giant regex. Instead write individual patterns for methods, accessors, etc. Call each patterns until there are no more matches left.
- You only need to parse the given string of Ruby code. You don't need to include methods from the superclass.
- You may assume that the given code contains exactly one class.
- You don't need to support namespaced classes.
- You only need to support method definition through
def ... end
,attr_reader
andattr_accessor
. You don't need to support metaprogramming. - You may assume that keywords like
attr_reader
,def
orend
do not appear in any strings. - You should support the following variants for the same thing:
attr_reader :one, :two
attr_reader(:one, :two)
attr_reader(:one, 'two')
attr_reader :one attr_reader :two
- You may assume that all arguments of
attr_reader
andattr_accessor
sit on the same line. - The superclass may be optional.
ClassScanner#superclass
should returnnil
in that case.
Info
In practice we would never parse Ruby code like this. For most file formats there are libraries that parse data correctly, like the parser gem Show archive.org snapshot for Ruby code or Nokogiri Show archive.org snapshot for HTML.
Regular expressions are a blunt tool that happens to be good enough much of the time. Read Parsing Html The Cthulhu Way Show archive.org snapshot for more.