Let's start by saying that the robots.txt file is a powerful, yet unobtrusive little file that sits in the root of your website.
Website owners use this file to give instructions about your website to web robots, typically sent from search engines like Google, Yahoo and Bing. If you want to be technical, it is called The Robots Exclusion Protocol.
Let's picture the scenario:
Google has heard about your website and wants to come and visit to see what pages you have and if it would like to index them and show them on the Google search results page.
Before it goes steaming through your website it first stops to have a look at your robots.txt (think of it as the gatekeeper to your website). It then checks where it can go. However, and beware! - there are some naughty robots out there, particularly malware robots and email harvesters - so you should not rely on the robots.txt to protect the parts of your website you don't want people to get access to.
Here are some basic considerations:
Robots can ignore your robots.txt file, as explained above.
The robots.txt file is publicly accessible. Anyone can see where you do not want people to go.
How to Create A robots.txt File
There are loads of places to create a robots.txt file out on the web including Google's very own Webmaster Tools. We suggest you head on over to Wikipedia (which has loads of examples) Visit Wikipedia (new window)
Summarising:
- When creating your robots.txt file, make sure that it is all lower case 'robots.txt'
- Don't rely on it to protect secure areas of your website
- Don't add regular expressions in your robots.txt
- Remember to put the robots.txt file in the top level of your website (root level)
If you have any questions or comments about robots.txt files, please let us know by using our contact form.