Controlling Search Engines with Robots.txt
Member Login:
Article Sender Submissions
 
:. MAIN SERVICES
:. Webmaster Radio Sites

White Label Audio

Providing our clients with fully branded turnkey audio solutions for a number of 'on demand' and 'live' applications. From simple audio commercials, to professionally produced product launches.

Radio Advertising

Buy Category based Ads with Webmaster Radio Display audio ads and banners base on show or show category.

Search Bash

If you haven't been to a search bash party, you're missing out! If you an advertiser help sponsor a Search Bash Event for maximum exposure.

Affiliate Bash

If you haven't been to a Affiliate bash party, you're missing out! If you an advertiser help sponsor a Search Bash Event for maximum exposure.

Free Trade Publications

Looking for Trade Publications, we have tons of them and their all free!

SEO Services

SEO Seek offers you Professional SEO Help Information and SEO Services.

   Web-Site » Development » Controlling Search Engines with Robots.txt
Controlling Search Engines with Robots.txt
Introduction

As mentioned in previous articles, search engines can be great source of traffic to a standard business or personal website. What would happen though, if you didn't want to appear in them?

This is the purpose of robots.txt files.

While they generally do not help you get listed, they can help ensure that you don't get listed if you wish not to be.
What is a robot?

A robot (also shortened to just "bot", or called a spider) is a computer that goes around collecting information from websites.

Different bots do different things, depending on the owners reasons for having them. In the case of search engines, the robots purpose is to collect information about what your site contains ready for it to be included in the search engine.
So where does the robots.txt file fit in?

Search engines generally like to respect the owners of websites. Most like to provide people the option of not including some or all of their pages on their site in the search engine. The robots.txt file is used for telling them.

Before the bot goes around your site looking at the various pages you have, it will take a look inside your robots.txt file first to see if it is allowed to.

If the bot doesn't find a robots.txt file, or the file is blank, it will normally assume you don't want any robot blocked and feel free to roam around your site.
So how do I control where it can go?

Robots.txt files can either specify individual robots to restrict, or cover them all with the one command.

Commands for robots consist of two parts:

* User-agent: used for the name of the robot to control
* Disallow: where they are banned from accessing

In the example below, we would block robots called googlebot from accessing greentree.html. Googlebot is the name of Google's search engine robot, and by blocking it from this page we would remove it from Google next time they update their results.

User-agent: googlebot
Disallow: greentree.html

While this works great for that individual page, what if we wanted to block it from all pages? It would be highly inefficient to list every page on your site as blocked, but we could do:

User-agent: googlebot
Disallow: greentree.html
Disallow: /frogs/

The above code would block googlebot from accessing greentree.html and every page in the frogs directory.

Still the whole site would not be blocked, but we have already reduced the areas that can be seen significantly. To block the whole site we disallow the "/" directory. This "/" directory is absolutely everything on the site.

For example:

User-agent: googlebot
Disallow: /

You now have the ability to block as many bots as you like by naming each one individually down the file. In the case below we have banned googlebot and slurp (the name of Yahoo's robot) from the site.

User-agent: googlebot
Disallow: /

User-agent: slurp
Disallow: /

Finally, if the same rules apply to all bots we can specify them with the "*" character instead.

User-agent: *
Disallow: /

Finally, it is worth mentioning that while almost every bot likes to play nicely with the websites it visits, there are some that do not. If you have pages that really shouldn't be seen my any sort of robot, then perhaps you should use a method of password protecting them.

David Fitzgerald is a network administrator for the cheap web hosting and domain name registration services of Cheap Web Site Hosting.
:. ARTICLE CATEGORIES
Affiliate
Business
Computers & Internet
Economics
Entertainment
Finance & Accounting
Humanities
Industry Publications
Life Style
Web-Site
Writing
:. Featured Articles

Site Search’s Functions and Features.

Case studies by large number of ecommerce and enterprise based search-tool vendors indicates that, by improving search functions and features, website can translate into higher conversion rates, improved customers satisfactions and greater...

How to Increase Your Website’s Search Engine Ranking

There is a significant competition online for search engine ranking; how does your compete? There are several simple methods that you can implement that will immediately work to improve your business’s search engine ranking.

Building a database driven website

Most websites are database driven in a web 2.0 world and rely on user generated content. If your goal is to create a site that can help you earn a living, you can still use a database driven website.

A Guide to choose Keywords and place it on website..

Keywords can Change your business.here is a guide on how to choose Better keywords,why to choose keywords,how to place keywords and keyphrases and more..

What is a Press Release

The following article reveals information on how to create and effectively use a powerful marketing tool called a press release.


©2008 ArticleSender.com All Rights Reserved.