Preventing Search Engines Indexing Pages 2, 3 and More?

Do you know how to prevent indexing of pages past the home page in WP?
I mean I don’t want mysite.com/page/2, mysite.com/page/3 to be indexed.

This is because I use home.php for my theme, so that page/2, page/3
are all the same.

Read More

Please give me a hint or a code snippet please, I don’t want to add another plugin (robots meta).

Related posts

Leave a Reply

4 comments

  1. How exactly are you setting up your home page? I think the problem is with it having unwanted pagination in first place and not that pagination being indexed.

    In general robots.txt file is good way to prevent indexing in bulk. I think it would be following directive in your case (please test it so it doesn’t affec pagination in other places):

    User-agent: *
    Disallow: /page/
    

  2. If it is because of SEO and the warnings in the Google Search console, these can be ignored. wp / Page2 and so on should still be indexed. The this answer and the article with the answer from google:

    For a while, SEOs thought it might be a good idea to add a noindex robots meta tag to page 2 and further of a paginated archive. This would prevent people from finding page 2 and further in the search results. The idea was that the search engine would still follow all these links, so all the linked pages would still be properly indexed.

    The problem is that at the end of last year, Google said something that caught our attention: long-term noindex on a page will lead to them not following links on that page. This makes adding noindex to page 2 and further of paginated archives a bad idea, as it might lead to your articles no longer getting the internal links they need.

    Because of what Google said about long-term noindex, in Yoast SEO v6.3 we removed the option to add noindex to subpages of archives. Should page 2 and further of an archive have a canonical link to page 1, or to itself? The idea was that you mostly want visitors to end up on page 1 of an archive. That page is usually the most relevant for the majority of users.

    Google is very clear now: each page within a paginated series should canonicalize to itself, so /page/2/ has a canonical pointing to /page/2/. This is why you see your paginated archives being indexed.

    To learn more about it, you can refer to this article — https://yoast.com/pagination-seo-best-practices/

  3. If you’re trying to prevent duplicate content you should look at the root of the problem. You state that your homepage is using a home.php template, does this include some static text which you’re passing to all others pages using the home template? If this is the case either remove it or create a unique home template, which in all honestly should be home.php.

    if for whatever reason you want to keep the pages that display the same content as your homepage but using a different URL you can always resort to canonicals.

    If you replace the content of your header.php with the following you can specify different headers, one that’ll include a canonical that and those that wont.

    <?php
    if (is_page('1')){
        <?php include(TEMPLATEPATH.'/header1.php'); ?>
    }
    elseif (is_page('2')){
        <?php include(TEMPLATEPATH.'/header2.php'); ?>
    }
    else {
        <?php include(TEMPLATEPATH.'/headerdefault.php'); ?>
    }
    ?>
    

    And than you just make sure that you include the canonical which refers to your homepage

    <link rel="canonical" href="http://www.yourdomain.com/" />
    

    This will tell Google what is the appropriate URL to the content it’s viewing without resorting to using a Plugin.

    Either way this all seems a bit weird and I fear that I am just misunderstanding your request as it doesn’t seem to make sense. Are you aware of how duplicate content works? Or is it me who should be heading back to bed.

    I fail to see the purpose to willingly creating new pages that contain the same content and than looking for a solution to prevent duplicate content.

  4. I think the robots meta tags are what need adjusting. You want the spiders to go to page 2, and follow the links to your articles, but you don’t want it actually indexing that page (since it will change). So in your header.php, find the “robots” meta tag, and change it to the following:

    <meta name="robots" content="follow, <?php echo (get_query_var('paged')==1)?'index':'noindex'?>" />
    

    Using a blanket robots.txt will unfortunately cause the spider to not follow the links, and not find the articles that are on the other pages.