| Casual Articles |
Hubs | Hubbers | Topics | Request |
| #1 in Business | Subscribe Email Print |
|
You are here: Home > Internet and Businesses Online > SEO > The Robots Text File Or How To Get Your Site Properly Spidered, Crawled, Indexed By Bots |
|
Casual Articles - The Robots Text File Or How To Get Your Site Properly Spidered, Crawled, Indexed By Bots
Managing People - Why Is It So Difficult? acyManaging, supervising, being a team leader is the hardest job in the world and I'll tell you why. Imagine what it's like to drive a car. You turn the key to start the engine, select drive or the gear you want and press the gas pedal. The car then moves off and if you want to turn you rotate the steering wheel to the right or left and to stop, you press the brake pedal. All this was quite difficult when you first learned to drive but its easy now.If I asked you to drive my car, you might take a short while to get used to it, but you'd immediately be able to drive down to the supermarket and get me some food.However, if I was to tell you that my car was different from any other you'd driven then I'm sure you'd have a problem - "You don't start it with a key there's a little switch somewhere. When you engage forward gear it might go backward and if you turn the wheel left it might go right but sometimes it goes left. And the gas peddle is what stops it and the brake pedal makes it go faster but not every day. You'll get used to it in time; I've lived with it for years".Managing people is pretty much like this, every model is different and you need different skills to "drive" each one. Just because pressing the gas pedal on one model makes it go forward, doesn't mean to say that the next one will be the same; it might, but it might not.The problem arises because we "learn" on certain models and then find to our annoyance that the others are different. "Why can't they all be the same" we scream in frustration."Human beings are the most complex and complicated pieces of "equipment" you'll ever have to deal with. Many of them have similarities but every one of them in this world is different and they all work in a slightly different way. Your job You might have some directories on your website where you keep stuff that you don't want the entire Galaxy to see, such as pictures of a friend who forgot to put clothes on, etc. 4. Doorway pages Besides illicit attempts to increase rankings by blasting doorways all over the internet, doorway pages actually do have a very morally sound usage. They are similar pages, but each one is optimized for a specific search engine. In this case, you must make sure that individual robots do not have access to all of them. This is extremely important, in order to avoid being penalized for spamming a search engine with a series of extremely similar pages. 5. Bad bot, bad bot, what’cha gonna do... You might want to exclude robots whose known purpose is to collect email addresses, or other robots whose activity does not agree with your beliefs on the world. 6. Your site gets overwhelmed In rare situations, a robot goes through your site too fast, eating your bandwidth or slowing down your server. This is called "rapid-fire" and you'll notice it if you are reading your access log file. A medium performance server should not slow down. You may however have problems if you have a low performance site, such as one running of your personal PC or Mac, if you run poor server software, or if you have heavy scripts or huge documents. Is these cases, you'll see dropped connections, heavy slowdowns, in extremes, even a complete system crash. If this ever happens to you, read your logs, try to get the robot's IP or name, read the list of active robots and try to identify and block it. What's in a robots.txt file anyway? There are only two lines f Pet Grooming Is Fastest Growing Pet Service Business So you heard about someone stressing the importance of the robots.txt file, or noticed in your website's logs that the robots.txt file is causing an error, or somehow it is on the very top of the top visited pages, or, you read some article about the death of the robots.txt file and about how you should not bother with it ever again. Or maybe you never heard of the robots.txt file but are intrigued by all that talk about spiders, robots and crawlers. In this article, I will hopefully make some sense out of all of the above.I arrived at Logan airport about an hour before my girlfriend’s plane from Atlanta was due in. As I wandered through the airport, I went to look at some books and magazines to kill some time. My eyes started to glaze over, as every magazine cover seemed to only be concerned with Paris Hilton’s latest escapades, or Jennifer Aniston and Brad Pitts break up. Then, a head line on one magazine caught my eye. It said:Hot Business! Pets are a $36 Billion MarketSince I’ve been in the pet business for the past 17 years, I quickly picked up the magazine and purchased it. The article, which appeared in the November 2005 issue of Entrepreneur magazine, stated that “Pet spending has more than doubled in the past 11 years – from 17 billion in 1994 to a projected $35.9 billion by the end of this year.” This is very exciting if you’re in the pet business. It’s even more exciting is if you’re a pet groomer. The article also added that the #1 fastest growing pet service business is GROOMING!The article went on to explain that a whopping 63% of all US households, or 69 million homes, have a pet. There’s no doubt about it, it is a great time to be in the pet industry. If you’re a pet groomer, it is especially good. Lets face it, the pet industry is hot and getting hotter by the day. If you’re a groomer or are thinking about grooming, now is the time to get in…but there is one downside to all of this.When an industry becomes hot it will always draw the attention of the big guys. They are going to see numbers like the ones listed above and they are going to look at the trends. If the numbers are big and the trend is going up, they are going to want in.You see it already. Petsmart, PETCO, and Best Friends are all opening up more locations and bigger stores. The stadium in San Di There are many folks out there who vehemently insist on the uselessness of the robots.txt file, proclaiming it obsolete, a thing of the past, plain dead. I disagree. The robots.txt file is probably not in the top ten methods to promote your get-rich-fast affiliate website in 24 hours or less, but still plays a major role in the long run. First of all, the robots.txt file is still a very important factor in promoting and maintaining a site, and I will show you why. Second, the robots.txt file is one of the simple means by which you can protect your privacy and/or intellectual property. I will show you how. Let's try to figure out some of the lingo. What is this robots.txt file? The robots.txt file is just a very plain text file (or an ASCII file, as some like to say), with a very simple set of instructions that we give to a web robot, so the robot knows which pages we need scanned (or crawled, or spidered, or indexed - all terms refer to the same thing in this context) and which pages we would like to keep out of search engines. What is a www robot? A robot is a computer program that automatically reads web pages and goes through every link that it finds. The purpose of robots is to gather information. Some of the most famous robots mentioned in this article work for the search engines, indexing all the information available on the web. The first robot was developed by MIT and launched in 1993. It was named the World Wide Web Wander and its initial purpose was of a purely scientific nature, its mission was to measure the growth of the web. The index generated from the experiment's results proved to be an awesome tool and effectively became the first search engine. Most of the stuff we consider today to be indispensable online tools was born as a side effect of some scientific experiment. What is a search engine? Generically, a search engine is a program that searches through a database. In the popular sense, as referred to the web, a search engine is considered to be a system that has a user search form, which can search through a repository of web pages gathered by a robot. What are spiders and crawlers? Spiders and crawlers are robots, only the names sound cooler in the press and within metro-geek circles. What are the most popular robots? Is there a list? Some of the most well known robots are Google's Googlebot, MSN's MSNBot, Ask Jeeves's Teoma, Yahoo!'s Slurp (funny). One of the most popular places to search for active robot info is the list maintained at http://www.robots.org. Why do I need this robots.txt file anyway? A great reason to use a robots.txt file is actually the fact that many search engines, including Google, post suggestions for the public to make use of this tool. Why is it such a big deal that Google teaches people about the robots.txt? Well, because nowadays, search engines are not a playground for scientists and geeks anymore, but large corporate enterprises. Google is one of the most secretive search engines out there. Very little is known to the public about how it operates, how it indexes, how it searches, how it creates its rankings, etc. In fact, if you do a careful search in specialized forums, or wherever else these issues are discussed, nobody really agrees on whether Google puts more emphasis on this or that element to create its rankings. And when people don't agree on things as precise as a ranking algorithm, it means two things: that Google constantly changes its methods, and that it does not make it very clear or very public. There's only one thing that I believe to be crystal clear. If they recommend that you use a robots.txt ("Make use of the robots.txt file on your web server" - Google Technical Guidelines), then do it. It might not help your ranking, but it will definitely not hurt you. There are other reasons to use the robots.txt file. If you use your error logs to tweak and keep your site free of errors, you will notice that most errors refer to someone or something not finding the robots.txt file. All you have to do is create a basic blank page (use Notepad in Windows, or the most simple text editor in Linux or on a Mac), name it robots.txt and upload it to the root of your server (that's where your home page is). On a different note, nowadays, all search engines look for the robots.txt file as soon as their robots arrive on your site. There are unconfirmed rumors that some robots might even 'get annoyed' and leave, if they don't find it. Not sure how true that is, but hey, why not be on the safe side? Again, even if you don't intend to block anything or just don't want to bother with this stuff at all, having a blank robots.txt is still a good idea, as it can actually act as an invitation into your site. Don't I want my site indexed? Why stop robots? Some robots are well designed, professionally operated, cause no harm and provide valuable service to mankind (don't we all like to "google"). Some robots are written by amateurs (remember, a robot is just a program). Poorly written robots can cause network overload, security problems, etc. The bottom line here is that robots are devised and operated by humans and are prone to the human error factor. Consequently, robots are not inherently bad, nor inherently brilliant, and need careful attention. This is another case where the robots.txt file comes in handy - robot control. Now, I'm sure your main goal in life, as a webmaster or site owner is to get on the first page of Google. Then, why in the world would you want to block robots? Here are some scenarios: 1. Unfinished site You are still building your site, or portions of it, and don't want unfinished pages to appear in search engines. It is said that some search engines even penalize sites with pages that have been "under construction" for a long time. 2. Security Always block your cgi-bin directory from robots. In most cases, cgi-bin contains applications, configuration files for those application (that might actually have sensitive information), etc. Even if you don't currently use any CGI scripts or programs, block it anyway, better safe than sorry. 3. Privacy You might have some directories on your website where you keep stuff that you don't want the entire Galaxy to see, such as pictures of a friend who forgot to put clothes on, etc. 4. Doorway pages Besides illicit attempts to increase rankings by blasting doorways all over the internet, doorway pages actually do have a very morally sound usage. They are similar pages, but each one is optimized for a specific search engine. In this case, you must make sure that individual robots do not have access to all of them. This is extremely important, in order to avoid being penalized for spamming a search engine with a series of extremely similar pages. 5. Bad bot, bad bot, what’cha gonna do... You might want to exclude robots whose known purpose is to collect email addresses, or other robots whose activity does not agree with your beliefs on the world. 6. Your site gets overwhelmed In rare situations, a robot goes through your site too fast, eating your bandwidth or slowing down your server. This is called "rapid-fire" and you'll notice it if you are reading your access log file. A medium performance server should not slow down. You may however have problems if you have a low performance site, such as one running of your personal PC or Mac, if you run poor server software, or if you have heavy scripts or huge documents. Is these cases, you'll see dropped connections, heavy slowdowns, in extremes, even a complete system crash. If this ever happens to you, read your logs, try to get the robot's IP or name, read the list of active robots and try to identify and block it. What's in a robots.txt file anyway? There are only two lines fo Dialing For Dollars d goes through every link that it finds. The purpose of robots is to gather information. Some of the most famous robots mentioned in this article work for the search engines, indexing all the information available on the web.This concept is not out dated or an old dinosaur. It still works just as good as any other strategy that people have. Many people think that in the high tech world that we live in today old strategies has no place. What sounds better and empty wallet or an open cash register?Why Cold Calling?Many people often ask me why cold calling? My reply is always why not. Most sales people have a misconception of cold calling. Everyone has fears but if you expect to be the best at what you do you must overcome your fears. Cold calling has place in your business somewhere. It is not a waste of time. You can use cold calling as a way to measure the effectiveness of your prospecting.A warm prospect can still be a cold call. I can hear you asking yourself how. If a real estate agent cold call on FSBO’S every morning as I once did he is already dealing with a warm lead. However this is still going to be a cold call because you have never spoken to them personally. All cold calls do not have to be made to people who are not expecting to be contacted.Work Warm LeadsMake sure the prospecting tools or methods you use produce warm leads. It is much easier to cold call someone who you know need your product even when they have never heard of you or the company you represent. Clients are everywhere you turn but you are missing them because you are afraid to call people that you do not know. In sales when you need a raise you must find new clients. The easiest way to do that is to pick up the telephone and dial for dollars.My former coach Larry Horn once told me, “The difference between great sales people and average sales people, is know how to give you a raise”. Today’s trend in sales seminars is to focus on referrals. I agree with that idea however you must still call them and usua The first robot was developed by MIT and launched in 1993. It was named the World Wide Web Wander and its initial purpose was of a purely scientific nature, its mission was to measure the growth of the web. The index generated from the experiment's results proved to be an awesome tool and effectively became the first search engine. Most of the stuff we consider today to be indispensable online tools was born as a side effect of some scientific experiment. What is a search engine? Generically, a search engine is a program that searches through a database. In the popular sense, as referred to the web, a search engine is considered to be a system that has a user search form, which can search through a repository of web pages gathered by a robot. What are spiders and crawlers? Spiders and crawlers are robots, only the names sound cooler in the press and within metro-geek circles. What are the most popular robots? Is there a list? Some of the most well known robots are Google's Googlebot, MSN's MSNBot, Ask Jeeves's Teoma, Yahoo!'s Slurp (funny). One of the most popular places to search for active robot info is the list maintained at http://www.robots.org. Why do I need this robots.txt file anyway? A great reason to use a robots.txt file is actually the fact that many search engines, including Google, post suggestions for the public to make use of this tool. Why is it such a big deal that Google teaches people about the robots.txt? Well, because nowadays, search engines are not a playground for scientists and geeks anymore, but large corporate enterprises. Google is one of the most secretive search engines out there. Very little is known to the public about how it operates, how it indexes, how it searches, how it creates its rankings, etc. In fact, if you do a careful search in specialized forums, or wherever else these issues are discussed, nobody really agrees on whether Google puts more emphasis on this or that element to create its rankings. And when people don't agree on things as precise as a ranking algorithm, it means two things: that Google constantly changes its methods, and that it does not make it very clear or very public. There's only one thing that I believe to be crystal clear. If they recommend that you use a robots.txt ("Make use of the robots.txt file on your web server" - Google Technical Guidelines), then do it. It might not help your ranking, but it will definitely not hurt you. There are other reasons to use the robots.txt file. If you use your error logs to tweak and keep your site free of errors, you will notice that most errors refer to someone or something not finding the robots.txt file. All you have to do is create a basic blank page (use Notepad in Windows, or the most simple text editor in Linux or on a Mac), name it robots.txt and upload it to the root of your server (that's where your home page is). On a different note, nowadays, all search engines look for the robots.txt file as soon as their robots arrive on your site. There are unconfirmed rumors that some robots might even 'get annoyed' and leave, if they don't find it. Not sure how true that is, but hey, why not be on the safe side? Again, even if you don't intend to block anything or just don't want to bother with this stuff at all, having a blank robots.txt is still a good idea, as it can actually act as an invitation into your site. Don't I want my site indexed? Why stop robots? Some robots are well designed, professionally operated, cause no harm and provide valuable service to mankind (don't we all like to "google"). Some robots are written by amateurs (remember, a robot is just a program). Poorly written robots can cause network overload, security problems, etc. The bottom line here is that robots are devised and operated by humans and are prone to the human error factor. Consequently, robots are not inherently bad, nor inherently brilliant, and need careful attention. This is another case where the robots.txt file comes in handy - robot control. Now, I'm sure your main goal in life, as a webmaster or site owner is to get on the first page of Google. Then, why in the world would you want to block robots? Here are some scenarios: 1. Unfinished site You are still building your site, or portions of it, and don't want unfinished pages to appear in search engines. It is said that some search engines even penalize sites with pages that have been "under construction" for a long time. 2. Security Always block your cgi-bin directory from robots. In most cases, cgi-bin contains applications, configuration files for those application (that might actually have sensitive information), etc. Even if you don't currently use any CGI scripts or programs, block it anyway, better safe than sorry. 3. Privacy You might have some directories on your website where you keep stuff that you don't want the entire Galaxy to see, such as pictures of a friend who forgot to put clothes on, etc. 4. Doorway pages Besides illicit attempts to increase rankings by blasting doorways all over the internet, doorway pages actually do have a very morally sound usage. They are similar pages, but each one is optimized for a specific search engine. In this case, you must make sure that individual robots do not have access to all of them. This is extremely important, in order to avoid being penalized for spamming a search engine with a series of extremely similar pages. 5. Bad bot, bad bot, what’cha gonna do... You might want to exclude robots whose known purpose is to collect email addresses, or other robots whose activity does not agree with your beliefs on the world. 6. Your site gets overwhelmed In rare situations, a robot goes through your site too fast, eating your bandwidth or slowing down your server. This is called "rapid-fire" and you'll notice it if you are reading your access log file. A medium performance server should not slow down. You may however have problems if you have a low performance site, such as one running of your personal PC or Mac, if you run poor server software, or if you have heavy scripts or huge documents. Is these cases, you'll see dropped connections, heavy slowdowns, in extremes, even a complete system crash. If this ever happens to you, read your logs, try to get the robot's IP or name, read the list of active robots and try to identify and block it. What's in a robots.txt file anyway? There are only two lines f 2007 Lessons for Internet forum Users and Bloggers ple about the robots.txt? Well, because nowadays, search engines are not a playground for scientists and geeks anymore, but large corporate enterprises. Google is one of the most secretive search engines out there. Very little is known to the public about how it operates, how it indexes, how it searches, how it creates its rankings, etc. In fact, if you do a careful search in specialized forums, or wherever else these issues are discussed, nobody really agrees on whether Google puts more emphasis on this or that element to create its rankings. And when people don't agree on things as precise as a ranking algorithm, it means two things: that Google constantly changes its methods, and that it does not make it very clear or very public. There's only one thing that I believe to be crystal clear. If they recommend that you use a robots.txt ("Make use of the robots.txt file on your web server" - Google Technical Guidelines), then do it. It might not help your ranking, but it will definitely not hurt you.For those who like to use the Internet for social interaction and to discuss topics of interest there are things you need to know about the Internet and things you might wish to watch out for. It is truly a fascinating to study the behavior patterns and Human psychology as folks participate in online Forums and Blogs. There is quite a bit of name-calling, personal attacks and Machiavellian principle that are often displayed. For those who study the human interaction of Teens in a High School lunchroom they will readily see similarities and begin to understand how small social groups and clicks are formed in online chat rooms, Internet forms and interactive Blogs.The interesting thing is that with the Internet you often have a record of what was said and who set off who with a snide comment, personal attack or inappropriate comment. Indeed, what appears to be very childish is indeed, more of what people are thinking in everyday life. The difference of course is that the participants often use Multiple User Names and handles rather than their real names and this anonymity much like those wearing masks at a Halloween Costume Party, allows the players to say things they normally never would in public or in front of a real person, but are most likely thinking.Often what starts as a friendly debate rapidly turns into slander, libel and personal attacks on ones character and indeed there is an abundance of revenge motivated disrespect. In this my articles on Blogging Psychology I often discuss tactics to prevent being disrespected without giving up your passion for the subject or your personal position in online debates. You might also discuss how you can join up with other users to quell a hostile participant into a more peaceful displacement, as well as drive them insane if you so desire, as t There are other reasons to use the robots.txt file. If you use your error logs to tweak and keep your site free of errors, you will notice that most errors refer to someone or something not finding the robots.txt file. All you have to do is create a basic blank page (use Notepad in Windows, or the most simple text editor in Linux or on a Mac), name it robots.txt and upload it to the root of your server (that's where your home page is). On a different note, nowadays, all search engines look for the robots.txt file as soon as their robots arrive on your site. There are unconfirmed rumors that some robots might even 'get annoyed' and leave, if they don't find it. Not sure how true that is, but hey, why not be on the safe side? Again, even if you don't intend to block anything or just don't want to bother with this stuff at all, having a blank robots.txt is still a good idea, as it can actually act as an invitation into your site. Don't I want my site indexed? Why stop robots? Some robots are well designed, professionally operated, cause no harm and provide valuable service to mankind (don't we all like to "google"). Some robots are written by amateurs (remember, a robot is just a program). Poorly written robots can cause network overload, security problems, etc. The bottom line here is that robots are devised and operated by humans and are prone to the human error factor. Consequently, robots are not inherently bad, nor inherently brilliant, and need careful attention. This is another case where the robots.txt file comes in handy - robot control. Now, I'm sure your main goal in life, as a webmaster or site owner is to get on the first page of Google. Then, why in the world would you want to block robots? Here are some scenarios: 1. Unfinished site You are still building your site, or portions of it, and don't want unfinished pages to appear in search engines. It is said that some search engines even penalize sites with pages that have been "under construction" for a long time. 2. Security Always block your cgi-bin directory from robots. In most cases, cgi-bin contains applications, configuration files for those application (that might actually have sensitive information), etc. Even if you don't currently use any CGI scripts or programs, block it anyway, better safe than sorry. 3. Privacy You might have some directories on your website where you keep stuff that you don't want the entire Galaxy to see, such as pictures of a friend who forgot to put clothes on, etc. 4. Doorway pages Besides illicit attempts to increase rankings by blasting doorways all over the internet, doorway pages actually do have a very morally sound usage. They are similar pages, but each one is optimized for a specific search engine. In this case, you must make sure that individual robots do not have access to all of them. This is extremely important, in order to avoid being penalized for spamming a search engine with a series of extremely similar pages. 5. Bad bot, bad bot, what’cha gonna do... You might want to exclude robots whose known purpose is to collect email addresses, or other robots whose activity does not agree with your beliefs on the world. 6. Your site gets overwhelmed In rare situations, a robot goes through your site too fast, eating your bandwidth or slowing down your server. This is called "rapid-fire" and you'll notice it if you are reading your access log file. A medium performance server should not slow down. You may however have problems if you have a low performance site, such as one running of your personal PC or Mac, if you run poor server software, or if you have heavy scripts or huge documents. Is these cases, you'll see dropped connections, heavy slowdowns, in extremes, even a complete system crash. If this ever happens to you, read your logs, try to get the robot's IP or name, read the list of active robots and try to identify and block it. What's in a robots.txt file anyway? There are only two lines f Security Camera DVR: Finding the Type That Suits You how true that is, but hey, why not be on the safe side?Not all security camera Digital Video Recorders, or DVRs, are created equal. Remember this as you look for security camera DVRs for your business. There are great DVRs, good DVRs, and DVRs so terrible you cannot tell what you're looking at.DVR stands for Digital Video Recorder. It is faster and easier to manage than non-digital and analog systems. Moreover, it provides instant access to recorded or live video. You need not worry about storage, too, because the bulk of video that can be stored on a single disc tremendously outweighs that stored on tape. In fact, a single disc is the equivalent of over 30 VCR tapes. What this means is that you get to economize on space use and costs. The bigger your office space and thus, the more cameras you use, the more you will be able to appreciate the money DVRs can save you.There are two main types of security camera DVRs, pc-based and stand-alone.PC-based Security Camera DVR with Video Capture Board A PC-based DVR is a digital video recorder built like a computer. It may consist of two things: a tower mount or a rack mount. Inside it, you will find a mother board, network card, video card, CPU, hard drive, and memory. The DVR video card and the DVR software give users remote access. The DVR capture board receives video directly from security cameras. The DVR software, in turn, works in conjunction with the DVR board to first, change captured video images into a manageable and recognizable format and second, to provide camera controls, as well as record and playback functions.Stand-alone Security Camera DVR A stand-alone security camera DVR is an all-in-one unit. All components of a stand-alone security camera DVR are built on one complete circuit board. Its appearance is very similar to that of a VCR or DVD player. Stand-alone secur Again, even if you don't intend to block anything or just don't want to bother with this stuff at all, having a blank robots.txt is still a good idea, as it can actually act as an invitation into your site. Don't I want my site indexed? Why stop robots? Some robots are well designed, professionally operated, cause no harm and provide valuable service to mankind (don't we all like to "google"). Some robots are written by amateurs (remember, a robot is just a program). Poorly written robots can cause network overload, security problems, etc. The bottom line here is that robots are devised and operated by humans and are prone to the human error factor. Consequently, robots are not inherently bad, nor inherently brilliant, and need careful attention. This is another case where the robots.txt file comes in handy - robot control. Now, I'm sure your main goal in life, as a webmaster or site owner is to get on the first page of Google. Then, why in the world would you want to block robots? Here are some scenarios: 1. Unfinished site You are still building your site, or portions of it, and don't want unfinished pages to appear in search engines. It is said that some search engines even penalize sites with pages that have been "under construction" for a long time. 2. Security Always block your cgi-bin directory from robots. In most cases, cgi-bin contains applications, configuration files for those application (that might actually have sensitive information), etc. Even if you don't currently use any CGI scripts or programs, block it anyway, better safe than sorry. 3. Privacy You might have some directories on your website where you keep stuff that you don't want the entire Galaxy to see, such as pictures of a friend who forgot to put clothes on, etc. 4. Doorway pages Besides illicit attempts to increase rankings by blasting doorways all over the internet, doorway pages actually do have a very morally sound usage. They are similar pages, but each one is optimized for a specific search engine. In this case, you must make sure that individual robots do not have access to all of them. This is extremely important, in order to avoid being penalized for spamming a search engine with a series of extremely similar pages. 5. Bad bot, bad bot, what’cha gonna do... You might want to exclude robots whose known purpose is to collect email addresses, or other robots whose activity does not agree with your beliefs on the world. 6. Your site gets overwhelmed In rare situations, a robot goes through your site too fast, eating your bandwidth or slowing down your server. This is called "rapid-fire" and you'll notice it if you are reading your access log file. A medium performance server should not slow down. You may however have problems if you have a low performance site, such as one running of your personal PC or Mac, if you run poor server software, or if you have heavy scripts or huge documents. Is these cases, you'll see dropped connections, heavy slowdowns, in extremes, even a complete system crash. If this ever happens to you, read your logs, try to get the robot's IP or name, read the list of active robots and try to identify and block it. What's in a robots.txt file anyway? There are only two lines f Promoting Your Exhibition or Exhibit by Using email Marketing acyEmail marketing can be a very effective tool for promoting your exhibition or event. The problem these days though is that the benefits of e-marketing are being reduced due to more and more spam. Of thirty billion emails thought to have been sent last year, some statistics suggest that 60% of those emails were spam. The spam situation is so bad now that marketers, who have a genuine message to send out, are finding it harder and harder to get their information across to recipients about upcoming events and exhibitions. So what can we do to try and improve results? Some ideas are detailed below.Subject Line:One of the biggest mistakes when marketers send out their information regarding events and exhibitions, is that they include spam like words in the subject line. There are certain words which, if included, will almost definitely result in your email being automatically classified as spam. Words involving education should always be avoided as should words such as 'free'. Also ensure you differentiate your email subject line from the standard messages you receive. Including your company name at or near to the start of the subject line is thought to be very effective because it ensures that the recipient (having signed up to your database) will recognise the sender and be much more likely to open it.Choose time of day and day of week:You might be anxious to get news out about your next exhibition or event, but be patient. Plan your emails in advance and you will then have the time to choose the best send time.From my own experience, try and send the emails in the mornings or lunchtime and avoid times late in the day. Try also to avoid sending just before the weekends because recipients may then end up checking the emails on Monday morning, a time when many weekend spam You might have some directories on your website where you keep stuff that you don't want the entire Galaxy to see, such as pictures of a friend who forgot to put clothes on, etc. 4. Doorway pages Besides illicit attempts to increase rankings by blasting doorways all over the internet, doorway pages actually do have a very morally sound usage. They are similar pages, but each one is optimized for a specific search engine. In this case, you must make sure that individual robots do not have access to all of them. This is extremely important, in order to avoid being penalized for spamming a search engine with a series of extremely similar pages. 5. Bad bot, bad bot, what’cha gonna do... You might want to exclude robots whose known purpose is to collect email addresses, or other robots whose activity does not agree with your beliefs on the world. 6. Your site gets overwhelmed In rare situations, a robot goes through your site too fast, eating your bandwidth or slowing down your server. This is called "rapid-fire" and you'll notice it if you are reading your access log file. A medium performance server should not slow down. You may however have problems if you have a low performance site, such as one running of your personal PC or Mac, if you run poor server software, or if you have heavy scripts or huge documents. Is these cases, you'll see dropped connections, heavy slowdowns, in extremes, even a complete system crash. If this ever happens to you, read your logs, try to get the robot's IP or name, read the list of active robots and try to identify and block it. What's in a robots.txt file anyway? There are only two lines for each entry in a robots.txt file, the User-Agent, which has the name of the robot you want to give orders or the '*' wildcard symbol meaning 'all', and the Disallow line, which tells a robot all the places it should not touch. The two line entry can be repeated for every file or directory you don't want indexed, or for each robot you want to exclude. If you leave the Disallow line empty, this means you are not disallowing anything, in other words, you are allowing the particular robot to index your entire site. Some examples and a few scenarios should make it clear: A. Exclude a file from Google's main robot (Googlebot): User-Agent: Googlebot B. Exclude a section of the site from all robots: User-Agent: * Note that the directory is enclosed between two forward slashes. Although you are probably used to see URLs, links and folder references that do not end with a slash, note that a web server always needs a slash at the end. Even when you see links on websites that do not end with a slash, when that link is clicked, the web server has to do and extra step before serving the page, which is adding the slash through what we call a redirect. Always use the ending slash. C. Allow everything (blank robots.txt): User-Agent: * Note that when a "blank robots.txt" is mentioned, it is not a completely blank file, but it contains the two lines above. D. Do not allow any robot on your site: User-Agent: * Note that the single forward slash means "root", which is the main entrance to your site. E. Do not allow Google to index any of your images (Google uses Googlebot-Image for images): User-Agent: Googlebot-Image F. Do not allow Google to index some of your images: User-Agent: Googlebot-Image Note the use of multiple disallows. This is allowed, no pun intended. G. Build a doorway for Google and Lycos (the Lycos robot is called T-Rex) - do not play with this unless you are 100% sure you know what you are doing: User-Agent: T-Rex H. Allow only Googlebot.. User-Agent: Googlebot Note that the commands are sequential. The example above reads in English: Let Googlebot through, then stop everyone else. If your file gets really large, or you just feel like writing notes for yourself or for potential viewers (remember, robots.txt is a public file, anyone can see it), you can do so by preceding your comment with a # sign. Although according to the standard, you can have a comment on the same line with a command, I recommend that you start every command and every comment on a new line, this way, robots will never be confused by a potential formatting glitch. Examples: This is correct, as per the standard, but not recommended (a newer robot or a badly written one might read the following as "disallow the # We... Directory", not complying to the "disallow all" command): User-Agent: * Disallow: / # We decided to stop all robots but we were very silly in typing a long comment which got truncated and made the robots.txt unusable The way I recommend that you format this is: # We decided to stop all robots and we made sure Although theoretically, each robot should comply to the standards introduced around 1994 and enhanced in 1996, each robot acts a little differently. You are advised to check the documentation provided by the owners of those robots, you'll be surprised to discover a world of useful facts and techniques. For instance, from Google's site we learn that Googlebot completely disregards any URL that contains "&id=". Here are some sites to check: Google: http://www.google.com/bot.html Yahoo: http://help.yahoo.com/help/us/ysearch/slurp/ MSN: http://search.msn.com/docs/siteowner.aspx A database of robots is maintained at http://www.robotstxt.org/wc/active/html/contact.html A robots.txt validation tool - invaluable in finding potential typos that can completely change the way search engines see your site, can be found at: http://searchengineworld.com/cgi-bin/robotcheck.cgi There are also some extensions to the standard. For example, some robots allow wildcards in the Disallow line, some even allow different commands. My advice is: don't bother with anything outside the standard and you will not be unpleasantly surprised. A final word of caution: In this article I showed you how things should work in a perfect world. Somewhere along this article I mentioned that there are good bots and bad bots. Let's stop for a moment and think from a deranged person's perspective. Is there anything to prevent one from writing a robot program that reads a robots.txt file and specifically look at pages that you marked as "disallowed"? The answer is absolutely not, this entire standard is based on the honor system and is based on the concept that everyone should work hard to make the internet a better place. Basically, do not rely on this for real security or privacy. Use passwords when necessary. In conclusion, do not forget that indexing robots are your best friends. While you shouldn't build your site for robots, but for your human visitors, do not underestimate the power of those mindless crawlers - make sure the pages you want to be indexed are clearly seen by robots, make sure you have regular hyperlinks that robots can follow without roadblocks (robots can't follow Flash based navigation systems, for instance). To keep your site at tip top performance, to keep your logs clean, your applications, scripts and private data safe, always use a robots.txt file and make sure you read your logs to monitor all robotic activity.
HTTP = HTML link (for blogs, profiles,phorums):
Related Articles:Market Testing The Key to Advertising Success Can Your Corporate Policy Pass the Monkeys, Bananas, and Water-spray Experiment? The Internet 101: Inexpensive, Round-The-Clock Advertising
|