How do I sanitize a javascript text?

I have a textarea that will receive a js snippet(Google Analytics). Is there a way to sanitize that? Since I cannot use functions like wp_filter_nohtml_kse(), what should I use?

Related posts

3 comments

  1. No, there is no function for that. You would need a complete JavaScript parser. This is not part of the WordPress core.

  2. One simple way to do it would be to have the user enter just his or her Google Analytics Property ID, instead of having them input the entire JavaScript code. Then you generate the snippet yourself, using their Property ID.

    According to this Google help page, here is the current Analytics tracking code:

    <script type="text/javascript">
    
      var _gaq = _gaq || [];
      _gaq.push(['_setAccount', 'UA-XXXXX-Y']);
      _gaq.push(['_trackPageview']);
    
      (function() {
        var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
        ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
        var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
      })();
    
    </script>
    

    The ‘UA-XXXXX-Y’ is the Property ID. It looks like the first two characters are always ‘UA’. ‘XXXXX’ represents some number of numeric digits (not necessarily five digits, since one of the Property IDs for my sites has eight digits). ‘Y’ is an integer that might have more than one digit.

    One way you could validate the Property ID would be with a regular expression, like this:

    preg_match('/^UA-[0-9]+-[0-9]+$/', $input);
    

    This will return 1 if $input is a valid Property ID format, and false if it is not.

  3. If you want to allow the user to be able to input their own Javascript code, but you don’t want to allow them to enter any HTML, you could use something like this:

    preg_match( '/A<script((?!<[a-zA-Z])[sS])*</script>Z/', trim($input) );
    

    The trim function removes whitespace from the beginning and end, and the regex pattern will only match the string if it begins with ‘<script’, ends with ‘</script>’, and doesn’t have any ‘<‘ characters immediately followed by a letter. That should effectively keep any stray HTML out, while allowing the user to put some Javascript code using a less than operator, if they need to for some reason.

    The other thing I would do is make sure that only an administrator can set this option, and if he or she sneaks some HTML into there and breaks their own website, it’s their own fault. Of course, if this is going into the database, make sure it is sanitized for SQL.

Comments are closed.