I’m trying to write a oneboxing routine that gives WordPress blog entries special treatment. So given a simple, unadorned URL in content, such as
http://blog.stackoverflow.com/2011/03/a-new-name-for-stack-overflow-with-surprise-ending/
How would I detect that this is a WordPress installation, ideally without doing a full HTTP GET on every URL I see?
There are certainly common conventions for WordPress URLs that we could start with, which eliminates at least some URLs from contention. In this case it is …
But that isn’t a universal constant either.
I tried looking at the headers of that URL using HTTP HEAD, and I see:
Connection:Keep-Alive
Content-Encoding:gzip
Content-Length:18340
Content-Type:text/html; charset=UTF-8
Date:Thu, 07 Jun 2012 07:07:38 GMT
Keep-Alive:timeout=15, max=100
Server:Apache/2.2.9 (Ubuntu) DAV/2 PHP/5.2.6-2ubuntu4.2 with Suhosin-Patch mod_ssl/2.2.9 OpenSSL/0.9.8g
Vary:Cookie,Accept-Encoding
WP-Super-Cache:Served legacy cache file
X-Pingback:http://blog.stackoverflow.com/xmlrpc.php
X-Powered-By:PHP/5.2.6-2ubuntu4.2
I don’t think relying on the presence of WP-Super-Cache
would be particularly reliable, and that’s the only thing I see in the headers that would help, so maybe there are zero common HTTP headers in a WordPress install?
From my experience and quick code search there are no deliberate ways WP identifies itself in headers. However there are some that seem distinct enough and not likely to be customized.
HEAD to
/wp-login.php
will contain following for .org install:And for .com:
Cookie name is customizable by defining
TEST_COOKIE
constant, butWP Cookie check
string is hardcoded in core, as well asset_cookie()
call for this in the file’s source.For locating
wp-login.php
there are some URL shortcuts (implemented inwp_redirect_admin_locations()
since WP 3.4 (see ticket #19607 ):/login
on site’s root does302
redirect towp-login.php
, wherever it is.So the only scenario that cannot be reliably detected if WP is installed in and confined to subdirectory, without being used to manage site’s root at all.
Send a
HEAD
request to/wp-feed.php
in the same directory as/xmlrpc.php
(even in subdirectory installations). In WordPress you will get aLocation
header as response containing the stringfeed
.In your example for
blog.stackoverflow.com
youâll get:The bare existence of a file
xmlrpc.php
alone is not safe enough. Anybody can give this name to a file.Caveat: The
X-Pingback
header can be disabled by filtering'wp_headers'
. So my suggestion is not bullet-proof.Related: Steps to Take to Hide the Fact a Site is Using WordPress?
Append the URL with
?page_id=-1
and do an HTTP HEAD request for that.On self-installed WordPress blogs, this will result in a 404 response.
On wordpress.com blogs, this will result in a 301 response (which ends up at a 200 response if you follow the redirect).
On non-WordPress sites, you should get a 200 response (assuming the original URL without the query string gave you a 200) – the query string should make no difference.
Example with a HEAD request for
http://blog.stackoverflow.com/2011/03/a-new-name-for-stack-overflow-with-surprise-ending/?page_id=-1
:Example with a HEAD request for
http://dailycrave.wordpress.com/2012/06/01/three-cheese-grilled-pizza/?page_id=-1
(follow redirects turned off):(Note the X-Hacker easter egg!)
If you follow the 301 redirect for the wordpress.com blog, you end up with this:
Note the “Link” header containing the
http://wp.me/
URL, which seems to be common to all wordpress.com hosted blogs and could be used to identify them.I believe this works because passing
?page_id=-1
in the URL overrides the default routing from the URL segments. There will not be a page with ID of -1, and so a 404/redirect is served instead.Neither is wp-super-cache available on all wordpress installations, nor is there any fixed format in the URLs. While the permalinks settings page do give some fixed settings for URL schemes which can be used, anyone can just use any custom URL scheme. For example, if anyone just decides to use only the page/post name in the URL, it is more or less impossible to figure out if it is a WordPress website.
The presence of xmlrpc can be used to detect, but again, this can be disabled.
And finally, even if you do a full get on the URL, it is still not 100% possible to detect if the page is built using wordpress. It all depends on the theme template and how it is developed.
One fairly reliable way is to look for the presence wp-login and wp-admin. But even these could also be moved. I’d go for this way though.
Two alternatives to the comments, set your own WordPress header. Drop this in your theme’s functions.php.
The WP scan fingerprinter (ruby), it goes through several steps to try and figure out if WordPress is being used such as looking for the plugin directory, theme name, meta tags, readme, etc (I have no idea how accurate this actually is).http://code.google.com/p/wpscan/source/browse/#svn%2Ftrunk%2Flib%2Fwpscan
How about sending a head request to one of the files starting with the prefix wp-.
Ideally look at wp-login.php. If it exists that means the website is running WordPress.