Using robots.txt to block /?param=X

I have created a website using wordpress, and the first day it was full of dummy content until I uploaded mine. Google indexed pages such as:

www.url.com/?cat=1

Read More

Now these pages doesn’t exists, and to make a removal request google ask me to block them on robots.txt

Should I use:

User-Agent: *
Disallow: /?cat=

or

User-Agent: *
Disallow: /?cat=*

My robots.txt file would look something like this:

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /author
Disallow: /?cat=
Sitemap: http://url.com/sitemap.xml.gz

Does this look fine or can it cause any problem with search engines? Should I use Allow: / along with all the Disallow:?

Related posts

Leave a Reply

3 comments

  1. I would go with this actually

    To block access to all URLs that
    include a question mark (?) (more
    specifically, any URL that begins with
    your domain name, followed by any
    string, followed by a question mark,
    followed by any string):

    User-agent: Googlebot
    Disallow: /*?
    

    So I would actually go with:

    User-agent: Googlebot
    Disallow: /*?cat=
    

    Resourse (Under patttern matching)

  2. In general, you should not use the robots.txt directives to handle removed content. If a search engine can’t crawl it, it can’t tell whether or not it’s been removed and may continue to index (or even start indexing) those URLs. The right solution is to make sure that your site returns a 404 (or 410) HTTP result code for those URLs, then they’ll drop out automatically over time.

    If you want to use Google’s urgent URL removal tools, you would have to submit these URLs individually anyway, so you would not gain anything by using a robots.txt disallow.