If you have a text that is edited by WSYIWYG-Editor but want some length checking nevertheless, you need to strip all tags and then the special characters:
def hard_sanitize(text)
ActionController::Base.helpers.strip_tags(text).gsub(/[^[:word:]]+/, " ")
end
:001 > hard_sanitize("This is <strong>beautiful</strong> <h1>markup<h1>")
=> "This is beautiful markup"
If you allready have nokogiri on board, you can use that as well, though it has no extra benefit:
:001 > Nokogiri::HTML("This is <strong>beautiful</strong> <h1>markup<h1>").text.gsub(/[^[:word:]]+/, " ")
=> "This is beautiful markup"
DANGER, WILL ROBINSON: Both solution need the extra gsub, else special characters might leak through:
:001 > ActionController::Base.helpers.strip_tags("<<<<YY<>>NASTY><s<<>s>")
=> "<<<<>>NASTY><<>s>"
:002 > Nokogiri::HTML("<<<<YY<>>NASTY><s<<>s>").text
=> ">NASTY>s>"
:003 > ActionController::Base.helpers.strip_tags("<<<<YY<>>NASTY><s<<>s>").gsub(/[^[:word:]]+/, " ")
=> " NASTY s "
:004 > Nokogiri::HTML("<<<<YY<>>NASTY><s<<>s>").text.gsub(/[^[:word:]]+/, " ")
=> " NASTY s "
AGAIN: That's what the gsub at the end is for.
In that step, you may even want to discard whitespaces alltogether, e.g. when you want to count only "real" characters:
def char_count(text)
ActionController::Base.helpers.strip_tags(text).gsub(/[^[:word:]]+/, "").length
end
Like this:
:002 > "This is <strong>beautiful</strong> \n<br>\n<h1>markup<h1>".length
=> 55
:003 > char_count "This is <strong>beautiful</strong> \n<br>\n<h1>markup<h1>"
=> 21
Now you are not fooled by extra white-spaces anymore:
:004 > "This is <strong> beautiful </strong> \n<br>\n <h1> markup <h1> ".length
=> 61
005 > char_count "This is <strong> beautiful </strong> \n<br>\n <h1> markup <h1> "
=> 21
This works for the nokogiri solution as well.
Posted by Lexy to makandra dev (2014-02-06 09:32)