How Does WordPress Block Search Engines?

February 12, 20234 Views

If you go to wordpress admin and then settings->privacy, there are two options asking you whether you want to allow your blog to be searched though by seach engines and this option:

I would like to block search engines,
but allow normal visitors
Read More
Get rid of this Strict Standards warning
WooCommerce: get_current_screen not working with multi language
Can’t login on WordPress

How does wordpress actually block search bots/crawlers from searching through this site when the site is live?

Post Views: 4

5 comments

Anonymous says:

February 12, 2023 at 11:19 am

According to the codex, it’s just robots meta tags, robots.txt and suppression of pingbacks:

Causes <meta name='robots' content='noindex,nofollow' /> to be generated into the section (if wp_head is used) of your site’s source, causing search engine spiders to ignore your site.

Causes hits to robots.txt to send back:

User-agent: *

Disallow: /

Note: The above only works if WordPress is installed in the site root and no robots.txt exists.

These are “guidelines” that all friendly bots will follow. A malicious spider searching for E-Mail addresses or forms to spam into will not be affected by these settings.

Log in to Reply
Anonymous says:

February 12, 2023 at 11:19 am
With a robots.txt (if installed as root)
```
 User-agent: *
 Disallow: /
```
or (from here)
I would like to block search engines, but allow normal visitors –
check this for these results:
- Causes "<meta name='robots' content='noindex,nofollow' />"
  to be
  generated into the
  section (if wp_head is used) of your
  site’s source, causing search engine
  spiders to ignore your site.
  * Causes hits to robots.txt to send back:
  
  User-agent: * Disallow: /
  
  Note: The above only works if WordPress is installed in the site root and no robots.txt exists.
- Stops pings to ping-o-matic and any other RPC ping services specified in the Update
  Services of Administration > Settings > Writing. This works by having the function privacy_ping_filter() remove
  the sites to ping from the list. This
  filter is added by having
  add_filter(‘option_ping_sites’,’privacy_ping_filter’);
  in the default-filters. When the
  generic_ping function attempts to get
  the “ping_sites” option, this filter
  blocks it from returning anything.
- Hides the Update Services option entirely on the
  Administration > Settings > Writing
  panel with the message “WordPress is
  not notifying any Update Services
  because of your blog’s privacy
  settings.”
Log in to Reply
Anonymous says:

February 12, 2023 at 11:19 am

You can’t actually block bots and crawlers from searching through a publicly available site; if a person with a browser can see it, then a bot or crawler can see it (caveat below).

However, there is something call the Robots Exclusion Standard (or robots.txt standard), which allows you to indicate to well behaved bots and crawlers that they shouldn’t index your site. This site, as well as Wikipedia, provide more information.

The caveat to the above comment that what you see on your browser, a bot can see, is this: most simple bots do not include a Javascript engine, so anything that the browser renders as a result of Javascript code will probably not be seen by a bot. I would suggest that you don’t use this as a way to avoid indexing, since the robots.txt standard does not rely on the presence of Javascript to ensure correct rendering of your page.

Once last comment: bots are free to ignore this standard. Those bots are badly behaved. The bottom line is that anything that can read your HTML can do what it likes with it.

Log in to Reply
Anonymous says:

February 12, 2023 at 11:19 am

I don’t know for sure but it probably generates a robots.txt file which specifies rules for search engines.

Log in to Reply
Anonymous says:

February 12, 2023 at 11:19 am
Using a Robots Exclusion file.

Example:
```
User-agent: Google-Bot
Disallow: /private/
```
Log in to Reply

How Does WordPress Block Search Engines?

Leave a Reply Cancel reply

5 comments

Social Network

Related posts

Leave a Reply Cancel reply

5 comments

Social Network