|
Using_Robotstxt_Files_To_Feed_The_Spiderbots
| Using Robots.txt Files To Feed The Spiderbots
It's a Thursday evening. You are looking at your website logs to
determine where your hits are coming from. You notice you are
getting a ton of 404 errors records for a robots.txt file.
You might not even know what a robots.txt file is, let alone why
it is missing from your website. Let take a look at this
mysterious file that seems to be missing and why it's important
to have it.
Search engines like Google cruise the internet by sending out
their spidering software. These are commonly known as
spiderbots. The spiderbots visit websites all around the
internet to include them in their index listings. The first
thing they look for when they visit is a file called the
robots.txt file. This file normally is found in the root
directory of hosted website.
This file contains a set of rules that the spiders are
programmed to obey based on standard protocol. These rules help
the visiting spider determine what part of your website to
include or to ignore all together.
The most common rule used in the robots.txt file is to deny the
search engine spiders access to restricted areas of your website
that you don't want them visiting and indexing for the whole
internet to view.
These restricted areas normally contain your downloads, images,
or a cgi-bin directory that are used only by your website
visitors or for the normal daily operations of you website.
What A robots.txt file is not....
Keep in mind that a robots.txt file is not a method to keep your
information secure and safe from prying eyes. It simply is used
to lock visiting spiders from indexing areas of your website.
Note that using a robots.txt file does not speed up the process
of search engines indexing and getting your website in their
search directories. Also, a robots.txt file is not used to tell
search engine spiders what to do, only what not to do.
Benefits of using a robots.txt file:
- If you have parts of your website that are very similar you
can block them from being crawled to avoid being flagged as a
spammer. This is especially useful if you have similar pages
optimized for different website browsers or website connection
speeds.
- You eliminate 404 errors for missing robots.txt from your
server logs by using a robots.txt file. Just create a blank
robots.txt file in a basic text file editing program and upload
it to your root directory.
- Can be used to block search engine spiders from indexing part
or all of your website saving valuable bandwidth
Creating A robots.xt
Creating a robots.txt file is not complicated but you should be
sure to do it correctly. If your file contains incorrect rules
it can completely block all spiders and prevent them from
indexing your website.
You can create a robots.txt file using a simple text editing
program like NotePad or you can generate a file automatically
using several software programs or online website resources.
For information and rules on how to manually create a robots.txt
file visit http://www.robotstxt.org/wc/exclusion.html#robotstxt
To create a robots.txt file online visit:
http://searchbliss.com/webmaster_tools/robots-txt-text-generator.
htm
Once you have a robots.txt file created upload it to your root
directory of your website. Now you will be ready the next time
the spiderbots come around.
About the author:
Christian Whiting is the publisher of Internet Profits.
Dedicated to bringing you the best tips, tools and resources to
help you make more money online.
http://internetprofits.bushido.net
|
|
| |
| |