Something is unescaping all html entities before output to browser

I have a nasty problem. In my wordpress site something unescapes all html entities before sending data to the browser. It happens for all of the following cases:

echo ""XSS"  
print_r( htmlspecialchars( '"XSS' ) );  
esc_html( '"XSS' );
// output in all cases is unescaped
// im running version 3.4.1 with a bunch of plugins all disabled

// error_log() is fine and manages to put " in the log file

Works fine in a file that does not include any wordpress code, but in a plugin or theme. Blam. Kill all output escaping in one go. That’s freaky.

Read More

Where can I look?
ob_list_handlers only mentions the “default output handler” and searching the wp code for htmlspecialchars_decode and html_entity_decode do not reveal much interesting.

update
Going back to wp3.1 and installing twenty-eleven don’t change the issue.

Ok, I found it.

Both firefox “view selection source” and firebug “inspect element with firebug” will do decoding for you….:-(

This kind of stuff kills me!

“view page source” does not do this. Thanks to those who thought with me.

Related posts

Leave a Reply

1 comment

  1. This was a case of UTF-8 character encoding taking over the presentational view of your browser and converting those HTML entities into their counter parts, human readable text.

    After all, you might have very well wanted a string that looked like; "BLA for some reason or another to the eyes of your viewer instead of "BLA.

    From a security perspective such input, especially if its coming in the form of user-generated data (unescaped), poses a great risk, so we have functions like,

    htmlentities       //Native PHP function
    htmlspecialchars   //Native PHP function
    esc_html           //WordPress API function
    esc_attr           //WordPress API function 
    

    …to take care of business an encode and also decode input according to our needs.

    In fact both htmlspecialchars_decode and html_entity_decode and others do reveal some interesting things about the way in which strings can be handled.

    Strings in 'single quotes' are treated differently than strings in "double quotes".

    I encourage you to read PHP’s documents on String Types as well as,

    Because they really do give some great insight to the topic at hand.

    Now beyond that I recommend reading,

    Which exposes the WordPress APIs functions for much of the above and more.

    When you take a look under the hood, like you mentioned by View Page Source, you actually get to see the real workings of how PHP and WordPress are processing the data, which as mentioned above was obfuscated from your view because WordPress was enforcing UTF-8 character encoding and in the functions shown in your original post, htmlspecialchars was being used in a stock standard fashion without passing any extra parameters along with said function which may have produced different results and somewhat less frustrating too 😉

    Take a look at the following and the way in which they handle output when you are viewing the source via “View Page Source;

        $string = '"XSS"';
    
        echo htmlspecialchars($string);  
        //output -> "XSS"
        //converts double quotes, but also converts existing html entities
        //specifiy double_encode = false to prevent double encoding of entities
        //wordpress preferred method is esc_attr($string) (does not double encoding)
    
        echo htmlspecialchars_decode($string);
        //outputs -> "XSS"
        //decodes special char " back to " (double quote) 
        //default formatting is ENT_COMPAT which only converts double quotes not single
    
        echo htmlspecialchars_decode($string, ENT_NOQUOTES); 
        //output -> "XSS"
        //ignores decoding for special chars ' (single) and " (double) quotes
    
        echo htmlspecialchars_decode($string, ENT_QUOTES); 
        //outputs -> "XSS"
        //decodes special char " back to " (double quote)
    
        echo htmlentities($string, ENT_NOQUOTES);
        //output -> "XSS"
    
        echo htmlentities($string, ENT_COMPAT);
        //output -> "XSS"
    
        echo esc_attr($string);
        //output -> "XSS"
        //encoding "UTF-8"
    
        $stringed = "x8F!!"!";
    
        echo htmlentities($stringed, ENT_QUOTES);
        //output -> ?!!"!  (notice ? character)
        //PHP < 5.4 defaults encoding to "ISO-8859-1"
        //PHP >= 5.4 defaults encoding to "UTF-8"
    
        echo htmlentities($stringed, ENT_QUOTES | ENT_IGNORE, "UTF-8");
        //output -> !!&quot;!
        //ENT_IGNORE prevents an otherwise empty string being returned (potential security risk)
    
        echo htmlentities($stringed, ENT_QUOTES, "UTF-8");
        //returns an empty string
    
        echo html_entity_decode($stringed, ENT_QUOTES | ENT_IGNORE);
        //output -> ?!!"!   (notice ? character)
        //PHP < 5.4 defaults encoding to "ISO-8859-1"
        //PHP >= 5.4 defaults encoding to "UTF-8"
    
        echo esc_attr($stringed);
        //returns an empty string
        //encoding "UTF-8"
    

    So while all that is going on under the hood, in source, this is what you would have been seeing for each one of those functions when viewed in the browser.

    enter image description here

    I wrote this in the event that it might help others who encounter similar, what appears first to be a problem, but actually is not when you understand what is doing what.