Making my AJAX powered WordPress Crawlable

I read the following and try to apply this scheme to my website running WordPress: http://code.google.com/intl/fr-CA/web/ajaxcrawling/index.html

If you visit my website at http://www.visualise.ca/ you will see that it loads the posts within the home page and the url becomes http://visualise.ca/#!/anne-au-cherry when this post is loaded. A static version of the same content is available for the crawler at http://visualise.ca/anne-au-cherry but if a visitor using a browser visits it he will be redirected to http://visualise.ca/#!/anne-au-cherry (this is done with javascript).

Read More

In order to provide the crawler with the needed ?_escaped_fragment_= I used a WordPress hack I’ve found on the net: http://www.wordpress-fr.net/support/sujet-54810-add-action-parse-request and now the GoogleBot can see the content of my AJAX powered pages. I thought it was all done.

But when I paste a post link to Facebook (i.e.) it is unable to read the content of the page so I guessed that my website isn’t really respecting the scheme described in the Google documentation since Facebook is supporting it (If you paste http://twitter.com/#!/gablabelle it will work). So since I’m using the jQuery.address plugin to get my hashbang (#!) I went to their website and downloaded their sample files to see what were the differences between their files and mine and realized that they were probably using a php function to create the needed HTML snapshots: https://github.com/bartaz/jquery-address/blob/master/samples/crawling/index.php because this is why, I guess, Facebook can’t read mine.

<?php

    error_reporting(E_ALL ^ (E_NOTICE | E_WARNING));

    $fragment = $_REQUEST['_escaped_fragment_'];
    $file = 'data/' . (isset($fragment) && $fragment != '' && $fragment != '/' ? preg_replace('///', '', $fragment) : 'home') . '.xml';
    $re = '/(^<[^>]*>)|(n|rn|t|s{2,4})*/';

    $handle = fopen($file, 'r');
    if ($handle != false) {
        $content = preg_replace($re, '', fread($handle, filesize($file)));
        fclose($handle);
    } else {
     $content = 'Page not found!';
        header(php_sapi_name() == 'cgi' ? 'Status: 404' : 'HTTP/1.1 404');
    }

?>

So my guess is that I could maybe use a similar php function to serv the HTML snapshots instead of using the WordPress hack but I would need to adapt it to WordPress. The problem is that I’m no programmer and I did my best so far.

My post are in this format: http://visualise.ca/#!/anne-au-cherry and the static version is available at http://visualise.ca/anne-au-cherry (Where anne-au-cherry is the slug of the post and changes depending on the page we are viewing).

So my question is could someone confirm that I’m on the right path and if possible also help creating that php function?

Many thanks for your time and help!

Related posts

Leave a Reply

3 comments

  1. Specifically AVOID using “hashbang” (“!#”) in order to make AJAX powered WordPress sites crawlable.

    You really don’t want to use the “hashbang” method on a WordPress site.

    The “!#” is more like a hacky patch for sites that cannot provide a static analog to it’s AJAX version. Its use in general is not recommended by Google unless no alternative is available.

    There is no benefit to implementing a hashbang system in WordPress. A front-end AJAX solution for WordPress should work around the existing url scheme (no hash, no bang).

    Summary: WordPress is naturally crawlable; simply don’t break it using hash-bangs.

  2. If you’re specifically referring to Facebook not properly showing the meta info for your page, you should look into the OpenGraph plugin for WordPress, as it will add the appropriate og:attribute metadata. http://wordpress.org/extend/plugins/opengraph/

    Also, you can add a link like this in the header

    <link rel="canonical" href="link_back_to_real_post_url">
    

    And see if that does anything.

    I have to ask though, why are you doing this? Twitter has come under major fire for this URL structure precisely because it is a pain to crawl. Not saying you shouldn’t do it, but quite curious as to what the reason is 🙂

  3. What I did actually is to not use hashbangs like WraithKenny suggested.

    Using the jQuery address plugin with the $.address.state(value) method in order to set the base path of the website that is utilized in HTML5 state management and the $.address.value(value) method in order to set the current deep linking value I was able to do what I wished for.

    http://visualise.ca/

    Posts (click on an image thumbnail) are loaded via AJAX and the URL changes at the same time, the posts exist on their own using, of course, the same permalink structure so it’s fully crawlable.

    The only problem will be for older browsers that will, I think (needs to be verified), still see the hashbangs.

    Since I’m not a developper it took me much time to understand. :-/ But I’m quite happy with the results now 😉

    jQuery address: http://www.asual.com/jquery/address/