If you have a text that is edited by WSYIWYG-Editor but want some length checking nevertheless, you need to strip all tags and then the special characters:
  def hard_sanitize(text)
    ActionController::Base.helpers.strip_tags(text).gsub(/[^[:word:]]+/, " ")
  end
   :001 > hard_sanitize("This is <strong>beautiful</strong> <h1>markup<h1>")
   => "This is beautiful markup" 
If you allready have nokogiri on board, you can use that as well, though it has no extra benefit:
   :001 > Nokogiri::HTML("This is <strong>beautiful</strong> <h1>markup<h1>").text.gsub(/[^[:word:]]+/, " ")
   => "This is beautiful markup" 
DANGER, WILL ROBINSON: Both solution need the extra gsub, else special characters might leak through:
   :001 > ActionController::Base.helpers.strip_tags("<<<<YY<>>NASTY><s<<>s>")
   => "<<<<>>NASTY><<>s>"
  
   :002 > Nokogiri::HTML("<<<<YY<>>NASTY><s<<>s>").text
   => ">NASTY>s>" 
   :003 > ActionController::Base.helpers.strip_tags("<<<<YY<>>NASTY><s<<>s>").gsub(/[^[:word:]]+/, " ")
   => " NASTY s " 
   :004 > Nokogiri::HTML("<<<<YY<>>NASTY><s<<>s>").text.gsub(/[^[:word:]]+/, " ")
   => " NASTY s "  
   
AGAIN: That's what the gsub at the end is for.
In that step, you may even want to discard whitespaces alltogether, e.g. when you want to count only "real" characters:
  def char_count(text)
    ActionController::Base.helpers.strip_tags(text).gsub(/[^[:word:]]+/, "").length
  end
Like this:
   :002 > "This is <strong>beautiful</strong> \n<br>\n<h1>markup<h1>".length
   => 55 
   
   :003 > char_count "This is <strong>beautiful</strong> \n<br>\n<h1>markup<h1>"
   => 21 
Now you are not fooled by extra white-spaces anymore:
   :004 > "This is <strong> beautiful </strong> \n<br>\n <h1> markup <h1> ".length
   => 61
   
   005 >  char_count "This is <strong> beautiful </strong> \n<br>\n <h1> markup <h1> "
   => 21 
This works for the nokogiri solution as well.
Posted by Lexy to makandra dev (2014-02-06 09:32)