From Lunarpages Web Hosting Wiki
What is the robots.txt file used for?
In web site development, the robots.txt file is used as a special file that can talk back to the search engine spiders and crawlers to tell them what to do. Here is a little more about them from robotstxt.org:
Web Robots (also known as Web Wanderers, Crawlers, or Spiders), are programs that traverse the Web automatically. Search engines such as Google use them to index the web content, spammers use them to scan for email addresses, and they have many other uses.
How do you create a robots.txt file?
All you have to do is create a basic text file (with the name robots.txt) in your public_html folder inside of your hosting account. You can create that from within your control panel or you can create it on your desktop, and upload it via an FTP client.
What should I put in my robots.txt file?
This would depend on what you want to do.
If you would like to make sure all crawlers and spiders make their way in, insert this into your robots.txt file:
User-agent: * Disallow:
If you want to keep the spiders and bots away from your content in a particular folder, use this:
User-agent: * Disallow: /privatestuff
With "privatestuff" being the folder you wish to protect.
How can I check to see my robots.txt file has been created successfully?
It is easy to check if your have an account with Google Webmaster Tools. To analyze a site's robots.txt file:
- Sign into Google Webmaster Tools with your Google Account.
- On the Dashboard, click the URL for the site you want.
- Click Tools, and then click Analyze robots.txt.
How can I figure out where this robot or crawler came from?
You can find a good list of just about all the different robots out there here:
That would be a good list to check first, when you have a robot crawling your site, and you are unaware where it came from.