I have a nasty problem. In my wordpress site something unescapes all html entities before sending data to the browser. It happens for all of the following cases:
echo ""XSS"
print_r( htmlspecialchars( '"XSS' ) );
esc_html( '"XSS' );
// output in all cases is unescaped
// im running version 3.4.1 with a bunch of plugins all disabled
// error_log() is fine and manages to put " in the log file
Works fine in a file that does not include any wordpress code, but in a plugin or theme. Blam. Kill all output escaping in one go. That’s freaky.
Where can I look?
ob_list_handlers
only mentions the “default output handler” and searching the wp code for htmlspecialchars_decode
and html_entity_decode
do not reveal much interesting.
update
Going back to wp3.1 and installing twenty-eleven don’t change the issue.
Ok, I found it.
Both firefox “view selection source” and firebug “inspect element with firebug” will do decoding for you….:-(
This kind of stuff kills me!
“view page source” does not do this. Thanks to those who thought with me.
This was a case of UTF-8 character encoding taking over the presentational view of your browser and converting those HTML entities into their counter parts, human readable text.
After all, you might have very well wanted a string that looked like;
"BLA
for some reason or another to the eyes of your viewer instead of"BLA
.From a security perspective such input, especially if its coming in the form of user-generated data (unescaped), poses a great risk, so we have functions like,
…to take care of business an encode and also decode input according to our needs.
In fact both
htmlspecialchars_decode
andhtml_entity_decode
and others do reveal some interesting things about the way in which strings can be handled.Strings in
'single quotes'
are treated differently than strings in"double quotes"
.I encourage you to read PHP’s documents on String Types as well as,
Because they really do give some great insight to the topic at hand.
Now beyond that I recommend reading,
Which exposes the WordPress APIs functions for much of the above and more.
When you take a look under the hood, like you mentioned by View Page Source, you actually get to see the real workings of how PHP and WordPress are processing the data, which as mentioned above was obfuscated from your view because WordPress was enforcing UTF-8 character encoding and in the functions shown in your original post,
htmlspecialchars
was being used in a stock standard fashion without passing any extra parameters along with said function which may have produced different results and somewhat less frustrating too 😉Take a look at the following and the way in which they handle output when you are viewing the source via “View Page Source;
So while all that is going on under the hood, in source, this is what you would have been seeing for each one of those functions when viewed in the browser.
I wrote this in the event that it might help others who encounter similar, what appears first to be a problem, but actually is not when you understand what is doing what.