Checking the character length of a text containing markup (e.g. WSYIWYG)

Posted . Visible to the public.

If you have a text that is edited by WSYIWYG-Editor but want some length checking nevertheless, you need to strip all tags and then the special characters:

  def hard_sanitize(text)
    ActionController::Base.helpers.strip_tags(text).gsub(/[^[:word:]]+/, " ")
  end
   :001 > hard_sanitize("This is <strong>beautiful</strong> <h1>markup<h1>")
   => "This is beautiful markup" 

If you allready have nokogiri on board, you can use that as well, though it has no extra benefit:

   :001 > Nokogiri::HTML("This is <strong>beautiful</strong> <h1>markup<h1>").text.gsub(/[^[:word:]]+/, " ")
   => "This is beautiful markup" 

DANGER, WILL ROBINSON: Both solution need the extra gsub, else special characters might leak through:

   :001 > ActionController::Base.helpers.strip_tags("<<<<YY<>>NASTY><s<<>s>")
   => "<<<<>>NASTY><<>s>"
  
   :002 > Nokogiri::HTML("<<<<YY<>>NASTY><s<<>s>").text
   => ">NASTY>s>" 

   :003 > ActionController::Base.helpers.strip_tags("<<<<YY<>>NASTY><s<<>s>").gsub(/[^[:word:]]+/, " ")
   => " NASTY s " 

   :004 > Nokogiri::HTML("<<<<YY<>>NASTY><s<<>s>").text.gsub(/[^[:word:]]+/, " ")
   => " NASTY s "  
   

AGAIN: That's what the gsub at the end is for.

In that step, you may even want to discard whitespaces alltogether, e.g. when you want to count only "real" characters:

  def char_count(text)
    ActionController::Base.helpers.strip_tags(text).gsub(/[^[:word:]]+/, "").length
  end

Like this:

   :002 > "This is <strong>beautiful</strong> \n<br>\n<h1>markup<h1>".length
   => 55 
   
   :003 > char_count "This is <strong>beautiful</strong> \n<br>\n<h1>markup<h1>"
   => 21 

Now you are not fooled by extra white-spaces anymore:

   :004 > "This is <strong> beautiful </strong> \n<br>\n <h1> markup <h1> ".length
   => 61
   
   005 >  char_count "This is <strong> beautiful </strong> \n<br>\n <h1> markup <h1> "
   => 21 

This works for the nokogiri solution as well.

Last edit
License
Source code in this card is licensed under the MIT License.
Posted by Lexy to makandra dev (2014-02-06 09:32)