There are several reasons you may want to restrict access to either part or all of your website from search engine spiders and other crawlers. These are just a few examples of how to do that with a robots.txt file.
Granting access with a robots.txt file
To tell a robot that it's okay to access either your entire website, or certain sections of your website you really just need to tell it which sections not to access. Anything not explicitly restricted is assumed to be fair game.
How to tell all robots that it's okay to crawl your entire website.
User-agent: *
Disallow:
Note that * here translates to all robots, and 'Disallow:' followed by a blank line translates to nothing being disallowed, thus everything is allowed.
How to tell a specific robot that it's okay to crawl your entire website.
User-agent: googlebot
Disallow:
User-agent: *
Disallow: /
Note that / by itself here translates to the entire site.
Restricting access with a robots.txt file
If you don't want robots to crawl certain parts of your website, then you can tell them which parts you don't want them to access with the robots.txt file.
It's important to remember, however, that a robot might not listen. And, because your robots.txt file is publicly visible, a bad robot might use your robots.txt file to identify potentially private sections of your site.
How to tell robots not to crawl your entire website.
User-agent: *
Disallow: /
How to tell all robots not to crawl certain folders within your website.
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~steve/
How to tell all robots not to crawl certain files within your website.
User-agent: *
Disallow: /dont-crawl-me-bro.html
How to tell a specific robot not to crawl your entire website.
User-agent: googlebot
Disallow: /
How to tell a specific robot not to crawl specific sections and files within your website.
User-agent: googlebot
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /dont-crawl-me-google.html
Disallow: /dont-crawl-me-either.html
Note that regular expressions are not valid in a robots.txt file. Each file or folder you want to restrict must be included in the file.
|