Alou Web Design

Visit Our Blog For Free WordPress Plugins & Webmaster Resources

| Web Design | ProductsFree ebay templates | Free ebooks & reportsFreeware | Security Tools | Free Download Software | Webmaster Tools | Home |

 

Visit Our Blog

Products

Free Webmaster Resources

WordPress Profit System

Premium Templates

Free ebay templates


Free ebooks & reports


Freeware


Security Tools


Free Download Software


Webmaster Tools

Small Business Guide

Web Design Guide

Web Hosting Guide

Selling On Ebay Guide

eBay Consignment Business

What To Sell On eBay

Garage Sale Products

Pay-Per Click Ad Campaign

Set your Products to Sell

Home



robots.txt and search engine spiders
What To Do When A Search Engine or Robot Knocks At Your Door

 

Do you know who is visiting your website at night in the dark shadows?

Search engines use programs that are called spiders, crawlers or robots to visit your site and gather the information on your web pages. These robots leave evidence behind in your access logs, just as visitors do. If you know what to look for, you can tell when a spider has come to call. That can save you worrying that you haven't been visited. You can tell exactly what a robot has recorded or failed to record. You can also spot robots that may be making a large number of requests, which can affect your page impression statistics or even burden your server resources.

How do you identify a spider?

Those from the major search engines can sometimes be identified from their host names. These often incorporate part of the search engine's name or the company's name. For example, one of AltaVista's host names is scooter.pa-x.dec.com.

Below is a chart with some of the names Search Engines use. Please be reminded that they change the names from time to time. But this will give you an idea of what you are looking for.

AltaVista
(normal spider)
Scooter/2.0 G.R.A.B. X2.0
Scooter/1.0 scooter@pa.dec.com
scooter.pa-x.dec.com
scooter*.av.pa-x.dec.com
such as: scooter3.av.pa-x.dec.com
AltaVista
(instant spider)
Scooter/1.0 add-url.altavista.digital.com
ww2.altavista.digital.com
Euroseek Arachnoidea (arachnoidea@euroseek.com) *.euroseek.net
such as: infra.euroseek.net
Google
(Experimental search engine)
BackRub/2.1 backrub@google.stanford.edu *.stanford.edu
such as: hake.stanford.edu
Inktomi
(powers HotBot, others)
Slurp/2.0 (slurp@inktomi.com;
http://www.inktomi.com/slurp.html)
*.inktomi.com
such as: j2001.inktomi.com
or j10.inktomi.com
Infoseek
(normal spider)
InfoSeek Sidewinder/0.9 *.infoseek.com
such as: wilbur-bbn.infoseek.com
or
IP number
such as: 204.162.98.90
Infoseek
(instant spider)
Mozilla/3.01 (Win95; I) as above
Lycos
(regular spider)
Lycos_Spider_(T-Rex) lycosidae.lycos.com
or
*.pgh.lycos.com
such as: spider3.srv.pgh.lycos.com
Lycos
(Add URL spider)
Lycos_Spider_(T-Rex) *.sjc.lycos.com
such as: sjc-fe4-1.sjc.lycos.com
Northern Light Gulliver/1.2 taz.northernlight.com
WebCrawler Served by Excite spiders Served by Excite spiders

Your Best Clue: robots.txt

Start your search with a review of requests for the robots.txt file. This is a file that tells robots what they may and may not index within a site. Not all spiders follow the robots.txt convention, but most do. Anything requesting this file is almost certainly a spider, robot or an agent.

By reviewing the requests, you can usually spot spiders from the major search engines by their host names, which in turn tells you the latest agent names. You'll probably be surprised to see how many smaller search engines, personal agents and other robots are also accessing your site.

Review this information. You will be able to start to see trends for the search engines which are regularly paying you a visit.

Don't feel like tackling this yourself!
Massive Targeted Traffic Guaranteed
Amazing Formula Allows You To Drive ALL The Targeted Website Traffic You Could Ever Possibly Want!

Facebook Iframe Made EZ Wordpress Plugin - Free Download!

Monetize Your Entire Blog
In Just Seconds!

A Brand New Wordpress Plugin Lets You Enter Your Keyword And Your Affiliate URL To Link To - Across Every Page Of Your Blog - With One Click!
Wordpress Auto Links Plugin

SatelliteDirect
Watch Over 3500 Channels Directly On Your Computer

| Web Design | Free ebay templates | Free ebooks & reportsFreeware | Security Tools | Free Download Software | Webmaster Tools | Home |

Alou.com