I have a very naive question I can’t find an answer.
I have a wordpress blog.
All posts are listed in several pages, e.g.
mydomain.com/blog/
mydomain.com/blog/page/2/
...
mydomain.com/blog/page/N/
so I don’t want a crawler to “remember” what was on particular page, but want to let it
crawl all posts linked on each “/page/”, will it be able to follow and crawl links on pages I disallow with
disallow: /blog/page/ ?
Or how do I disallow crawling what posts are on particular page, but still let it crawl all posts?
You can’t do that with robots.txt. Your sample
Disallow
line would tell the crawler, “don’t ever request a URL that starts with/blog/page/
.What you probably want to do is add a “noindex” robots meta tag to all of your /page/ pages. That tells Google, “don’t index these pages,” but allows the bot to crawl them and get links to individual blog entries.