A Deeper Look At Robots.txt
Robots.txt syntax
- User-Agent: the robot the following rule applies to (e.g. “Googlebot,” etc.)
- Disallow: the pages you want to block the bots from accessing (as many disallow lines as needed)
- Noindex: the pages you want a search engine to block AND not index (or de-index if previously indexed). Unofficially supported by Google; unsupported by Yahoo and Live Search.
- Each User-Agent/Disallow group should be separated by a blank line; however no blank lines should exist within a group (between the User-agent line and the last Disallow).
- The hash symbol (#) may be used for comments within a robots.txt file, where everything after # on that line will be ignored. May be used either for whole lines or end of lines.
- Directories and filenames are case-sensitive: “private”, “Private”, and “PRIVATE” are all uniquely different to search engines.
Let’s look at an example robots.txt file. The example below includes:
- The robot called “Googlebot” has nothing disallowed and may go anywhere
- The entire site is closed off to the robot called “msnbot”;
- All robots (other than Googlebot) should not visit the /tmp/ directory or directories or files called /logs, as explained with comments, e.g., tmp.htm, /logs or logs.php.
User-agent: Googlebot Disallow:
User-agent: msnbot Disallow: /
# Block all robots from tmp and logs directories User-agent: * Disallow: /tmp/ Disallow: /logs # for directories and files called logs
via A Deeper Look At Robots.txt .
Tags: Googlebot, Robots.txt, SEO







