Let’s set up an example use case for regular expressions:
- You have a database of human-entered addresses.
- These addresses typically use abbreviations for street type (e.g., “ST” for “Street”).
- We want to display the unabbreviated version.
- Don’t forget that addresses like “1234 St Peter St, Ste 12” are a real possibility.
Using a capturing group to replace just the last instance
'1234 St Peter St, Ste 12'.sub(/(.*)\bst\b/i, '\1Street')
# => "1234 St Peter Street, Ste 12"
- The full regex (the part to be replaced) matches “1234 St Peter St”.
- Capture group
\1
matches “1234 St Peter “. -
\1Street
essentially says, “Replace '1234 St Peter St' with '1234 St Peter ' + 'Street'.”
Advanced usage with block syntax and interpolation
So obviously there are more street types than just street. What about streets and lanes and courts and boulevards? How do we handle all of those? Let’s extend our regex to make it match any of these, and test it on “1234 St Mary Ave”…
'1234 St Mary Ln'.sub(/(.*)\b(st|ct|ln|blvd)\b/i, '\1Street')
# => "1234 St Mary Street"
Oops, now “St Mary” is a street, not a lane. We need to use a different replacement depending on the abbreviation. For this, we’ll have to make a few changes.
- We need to set up a dictionary to map abbreviations to long forms.
- We need a capture group for the abbreviation. (We already do because it has parentheses around it; it’s
\2
.) - We need to use
sub
‘s block syntax in place of'\1Street'
that we have now.
expansions = {
'st' => 'Street',
'ct' => 'Court',
'ln' => 'Lane',
'blvd' => 'Boulevard',
}
pattern = /(.*)\b(#{expansions.keys.join('|')})\b/i # => st|ct|ln|blvd
With block syntax, capture groups become available as block variables $1
, $2
, etc. Using $2
we can look up the correct expansion to substitute for the given street abbreviation.
"1234 St Peter's State Blvd Ste E".sub(pattern) do
"#{$1}#{expansions[$2.downcase]}"
end
# => "1234 St Peter's State Boulevard Ste E"
Posted by Alexander M to Ruby and RoR knowledge base (2016-05-02 13:48)