Casual Articles
#1 in Business Subscribe Email Print

You are here: Home > Internet and Businesses Online > SEO > 60 Day Sandbox for Google & AskJeeves; MSN Indexes Quickest, Yahoo Next

Tags

  • machines
  • writing
  • sites
  • heavily early
  • began hitting
  • crawling behavior

  • Links

  • What's Your Loyalty Quotient?
  • Getting Beyond Drug and Alcohol Temptations
  • 7 Vital During Pregnancy Exercises
  • Casual Articles - 60 Day Sandbox for Google & AskJeeves; MSN Indexes Quickest, Yahoo Next

    How to Use Your Newsletter to Research Your Market
    A few decades ago, in the early days of automatic banking terminals, the management at Citibank made a somewhat reluctant decision to introduce automatic tellers. They were anxious to cut costs, but they assumed that customers would rather deal with human tellers, had they the choice. Therefore, they compromised and reserved human tellers for people with more than $5,000 in their accounts. Depositors who weren't in such a fortunate position were relegated to the machines. It soon became clear, though ,that the machines were wildly unpopular. Citibank stopped using them a year or two later. "Well," thought the bankers, "we were right all along. People just won't get used to dealing with machines." It seemed as if one of the most promising inv
    ing heavily one day and lightly the next in random fashion.

    3) MSNbot requires robots.txt files and once they decide they like your site, may crawl too fast, requiring "crawl-delay" instructions in that robots.txt file. Implement immediately.

    4) Bad bots can strain resources and hit too many pages too quickly until you tell them to stay out. We banned 3 bots outright after they slammed our servers for a day or two. Noted "aipbot" crawled first then "BecomeBot" came along and then "Pbot" from Picsearch.com crawled heavily looking for image files we don't have. Bad bots, stay out. Best to implement robots.txt exclusions for all but top engines if their crawlers strain your server resources. We considered excluding the Chinese search engine named Baidu.com when they began crawling heavily early on. We don't expect much traffic from China, but why exclude one billion people? Especially since Google is rumored to be considering a possible purchase of Baidu.com as entry to Chinese market.

    The bottom line is that we've discovered all engines

    Are 'They' Stealing Your Traffic With This Trick?
    I got an interesting email this morning. I was advised by a new website that they'd acquired a database of 36,000 articles...and they wanted to know if any of my content was within it.So I visited the site and I found they had indeed 'acquired' some of my articles. No big problem at first blush - that's what article marketing is: you write great content and in exchange for someone else promoting your work, you get the bio box with active hyperlinks to your chosen webpage (great for increasing your web presence.)However there was more to this - they were stealing my traffic and here's how they did it...Generally publishing your article to an official article syndication site is safe, but there's always someone out there looking twist things up a little. You see
    Search engine listing delays have come to be called the Google Sandbox effect are actually true in practice at each of four top tier search engines in one form or another. MSN, it seems has the shortest indexing delay at 30 days. This article is the second in a series following the spiders through a brand new web site beginning on May 11, 2005 when the site was first made live on that day under a newly purchased domain name.

    First Case Study Article

    Previously we looked at the first 35 days and detailed the crawling behavior of Googlebot, Teoma, MSNbot and Slurp as they traversed the pages of this new site. We discovered the each robot spider displays distinctly different behavior in crawling frequency and similarly differing indexing patterns.

    For reference, there are about 15 to 20 new pages added to the site daily, which are each linked from the home page for a day. Site structure is non-traditional with no categories and a linking structure tied to author pages listing their articles as well as a "related articles" index varied by linking to relevant pages containing similar content.

    So let's review where we are with each spider crawling and look at pages crawled and compare pages indexed by engine.

    The AskJeeves spider, Teoma has crawled most of the pages on the site, yet indexes no pages 60 days later at this writing. This is clearly a site aging delay that's modeled on Google's Sandbox behavior. Although the Teoma spider from Ask.com has crawled more pages on this site than any other engine over a 60 day period and appears to be tired of crawling as they've not returned since July 13 - their first break in 60 days.

    In the first two days, Googlebot gobbled up 250 pages and didn't return until 60 days later, but has not indexed even a single page in 60 days since they made that initial crawl. But Googlebot is showing a renewed interest in crawling the site since this crawling case study article was published on several high traffic sites. Now Googlebot is looking at a few pages each day. So far no more than about 20 pages at a decidedly lackluster pace, a true "Crawl" that will keep it occupied for years if continued that slowly.

    MSNbot crawled timidly for the first 45 days, looking over 30 to 50 pages daily, but not until they found a robots.txt file, which we'd neglected to post to the site for a week and then bobbled the ball as we changed site structure, then failed to implement robots.txt in new subdomains until day 25 - and THEN MSNbot didn't return until day 30. If little else were discovered about initial crawls and indexing, we have seen that MSNbot relies heavily on that robots.txt file and proper implementation of that file will speed crawling.

    MSNbot is now crawling with enthusiasm at anywhere between 200 to 800 pages daily. As a matter of fact, we had to use a "crawl-delay" command in the robots.txt file after MSNbot began hitting 6 pages per second last week. The MSN index now shows 4905 pages 60 days into this experiment. Cached pages change weekly. MSNbot has apparently found that it likes how we changed the page structure to include a new feature which links to questions from several other article pages.

    Slurp gets strangely inactive then alternately hyperactive for periods of time. The Yahoo crawler will look at 40 pages one day and then 4000 the next, then simply look at the home page for a few days and then jump back in for 3000 pages the next day and back to only reviewing robots.txt for two days. Consistency is not a curse suffered by Slurp. Yahoo now shows 6 pages in their index, one an errors page and another is a "index/of" page as we have not posted a home page to several subdomains. But Slurp has crawled easily 15,000 pages to date.

    Lessons learned in the first 60 days on a new site follow:

    1) Google crawls 250 pages on first discovery of links to site. Then they don't return until they find more links and crawl slowly. Google has failed to index new domain for 60 days.

    2) Yahoo looks for errors pages and once they find bad links will crawl them ceaselessly until you tell them to stop it. Then won't crawl at all for weeks until crawling heavily one day and lightly the next in random fashion.

    3) MSNbot requires robots.txt files and once they decide they like your site, may crawl too fast, requiring "crawl-delay" instructions in that robots.txt file. Implement immediately.

    4) Bad bots can strain resources and hit too many pages too quickly until you tell them to stay out. We banned 3 bots outright after they slammed our servers for a day or two. Noted "aipbot" crawled first then "BecomeBot" came along and then "Pbot" from Picsearch.com crawled heavily looking for image files we don't have. Bad bots, stay out. Best to implement robots.txt exclusions for all but top engines if their crawlers strain your server resources. We considered excluding the Chinese search engine named Baidu.com when they began crawling heavily early on. We don't expect much traffic from China, but why exclude one billion people? Especially since Google is rumored to be considering a possible purchase of Baidu.com as entry to Chinese market.

    The bottom line is that we've discovered all engines s

    Underwater Welding Takes Combination Of Skills
    To get a basic idea of just what it takes, here are a few skills you should have. First you must be both a certified welder and a commercial diver. It is essential to have good diving skills and equally essential to place a premium on safety. Most of the tasks involved in the job of an underwater welder are not the welding itself, but the things that need to be done to get ready for the job.Those interested in becoming underwater welders can attend one of the many fine commercial diving schools. Most of them offer a certificate of completion and acceptance as a commercial diver upon completion. You will also need to pass a diving physical exam, and often times a written exam as well. The Association of Commercial Diving Education provides a list of certified diving sch
    articles as well as a "related articles" index varied by linking to relevant pages containing similar content.

    So let's review where we are with each spider crawling and look at pages crawled and compare pages indexed by engine.

    The AskJeeves spider, Teoma has crawled most of the pages on the site, yet indexes no pages 60 days later at this writing. This is clearly a site aging delay that's modeled on Google's Sandbox behavior. Although the Teoma spider from Ask.com has crawled more pages on this site than any other engine over a 60 day period and appears to be tired of crawling as they've not returned since July 13 - their first break in 60 days.

    In the first two days, Googlebot gobbled up 250 pages and didn't return until 60 days later, but has not indexed even a single page in 60 days since they made that initial crawl. But Googlebot is showing a renewed interest in crawling the site since this crawling case study article was published on several high traffic sites. Now Googlebot is looking at a few pages each day. So far no more than about 20 pages at a decidedly lackluster pace, a true "Crawl" that will keep it occupied for years if continued that slowly.

    MSNbot crawled timidly for the first 45 days, looking over 30 to 50 pages daily, but not until they found a robots.txt file, which we'd neglected to post to the site for a week and then bobbled the ball as we changed site structure, then failed to implement robots.txt in new subdomains until day 25 - and THEN MSNbot didn't return until day 30. If little else were discovered about initial crawls and indexing, we have seen that MSNbot relies heavily on that robots.txt file and proper implementation of that file will speed crawling.

    MSNbot is now crawling with enthusiasm at anywhere between 200 to 800 pages daily. As a matter of fact, we had to use a "crawl-delay" command in the robots.txt file after MSNbot began hitting 6 pages per second last week. The MSN index now shows 4905 pages 60 days into this experiment. Cached pages change weekly. MSNbot has apparently found that it likes how we changed the page structure to include a new feature which links to questions from several other article pages.

    Slurp gets strangely inactive then alternately hyperactive for periods of time. The Yahoo crawler will look at 40 pages one day and then 4000 the next, then simply look at the home page for a few days and then jump back in for 3000 pages the next day and back to only reviewing robots.txt for two days. Consistency is not a curse suffered by Slurp. Yahoo now shows 6 pages in their index, one an errors page and another is a "index/of" page as we have not posted a home page to several subdomains. But Slurp has crawled easily 15,000 pages to date.

    Lessons learned in the first 60 days on a new site follow:

    1) Google crawls 250 pages on first discovery of links to site. Then they don't return until they find more links and crawl slowly. Google has failed to index new domain for 60 days.

    2) Yahoo looks for errors pages and once they find bad links will crawl them ceaselessly until you tell them to stop it. Then won't crawl at all for weeks until crawling heavily one day and lightly the next in random fashion.

    3) MSNbot requires robots.txt files and once they decide they like your site, may crawl too fast, requiring "crawl-delay" instructions in that robots.txt file. Implement immediately.

    4) Bad bots can strain resources and hit too many pages too quickly until you tell them to stay out. We banned 3 bots outright after they slammed our servers for a day or two. Noted "aipbot" crawled first then "BecomeBot" came along and then "Pbot" from Picsearch.com crawled heavily looking for image files we don't have. Bad bots, stay out. Best to implement robots.txt exclusions for all but top engines if their crawlers strain your server resources. We considered excluding the Chinese search engine named Baidu.com when they began crawling heavily early on. We don't expect much traffic from China, but why exclude one billion people? Especially since Google is rumored to be considering a possible purchase of Baidu.com as entry to Chinese market.

    The bottom line is that we've discovered all engines

    Creating an Effective About Me Page for Your eBook Business on eBay
    Just skip all the stuff and start listing your products, that's probably the advice most eBay experts would give you, but not me. The most important aspect of your internet eBay business when selling eBooks is your overall account appearance. Think about how many more sales you get as a PowerSeller, or even just a regular member with fifty plus positive feedback. The amount that trust plays in customers making a purchase cannot be underestimated. Therefore it is important to build an eBay account that gives every potential customers as much of a chance to get to know you as possible. The about me page is a easy way to make this happen, and best of all, is free for your to use.Since you are selling eBooks on eBay the about me page should be as appropriate towards your niche
    than about 20 pages at a decidedly lackluster pace, a true "Crawl" that will keep it occupied for years if continued that slowly.

    MSNbot crawled timidly for the first 45 days, looking over 30 to 50 pages daily, but not until they found a robots.txt file, which we'd neglected to post to the site for a week and then bobbled the ball as we changed site structure, then failed to implement robots.txt in new subdomains until day 25 - and THEN MSNbot didn't return until day 30. If little else were discovered about initial crawls and indexing, we have seen that MSNbot relies heavily on that robots.txt file and proper implementation of that file will speed crawling.

    MSNbot is now crawling with enthusiasm at anywhere between 200 to 800 pages daily. As a matter of fact, we had to use a "crawl-delay" command in the robots.txt file after MSNbot began hitting 6 pages per second last week. The MSN index now shows 4905 pages 60 days into this experiment. Cached pages change weekly. MSNbot has apparently found that it likes how we changed the page structure to include a new feature which links to questions from several other article pages.

    Slurp gets strangely inactive then alternately hyperactive for periods of time. The Yahoo crawler will look at 40 pages one day and then 4000 the next, then simply look at the home page for a few days and then jump back in for 3000 pages the next day and back to only reviewing robots.txt for two days. Consistency is not a curse suffered by Slurp. Yahoo now shows 6 pages in their index, one an errors page and another is a "index/of" page as we have not posted a home page to several subdomains. But Slurp has crawled easily 15,000 pages to date.

    Lessons learned in the first 60 days on a new site follow:

    1) Google crawls 250 pages on first discovery of links to site. Then they don't return until they find more links and crawl slowly. Google has failed to index new domain for 60 days.

    2) Yahoo looks for errors pages and once they find bad links will crawl them ceaselessly until you tell them to stop it. Then won't crawl at all for weeks until crawling heavily one day and lightly the next in random fashion.

    3) MSNbot requires robots.txt files and once they decide they like your site, may crawl too fast, requiring "crawl-delay" instructions in that robots.txt file. Implement immediately.

    4) Bad bots can strain resources and hit too many pages too quickly until you tell them to stay out. We banned 3 bots outright after they slammed our servers for a day or two. Noted "aipbot" crawled first then "BecomeBot" came along and then "Pbot" from Picsearch.com crawled heavily looking for image files we don't have. Bad bots, stay out. Best to implement robots.txt exclusions for all but top engines if their crawlers strain your server resources. We considered excluding the Chinese search engine named Baidu.com when they began crawling heavily early on. We don't expect much traffic from China, but why exclude one billion people? Especially since Google is rumored to be considering a possible purchase of Baidu.com as entry to Chinese market.

    The bottom line is that we've discovered all engines

    Don't Get Caught Out By Tough Interview Questions
    Most people dread job interviews. The very thought of one is enough to send them into a cold sweat. Most people see the job interview as an interrogation. In fact, job interviews are as much for your benefits as the company’s. The interview gives the company an opportunity to get to know you, and it gives you a chance to decide whether or not you want to work for them.Although it is only natural to be nervous you will find the whole process is not as scary as you might think. With a little bit of preparation you can be ready to answer anything that the interviewer throws at you.Naturally you can't prepare for every question that will come up at interview, but you can anticipate most of them.Let me guide you through the process and give you a few tips:Fi
    ture to include a new feature which links to questions from several other article pages.

    Slurp gets strangely inactive then alternately hyperactive for periods of time. The Yahoo crawler will look at 40 pages one day and then 4000 the next, then simply look at the home page for a few days and then jump back in for 3000 pages the next day and back to only reviewing robots.txt for two days. Consistency is not a curse suffered by Slurp. Yahoo now shows 6 pages in their index, one an errors page and another is a "index/of" page as we have not posted a home page to several subdomains. But Slurp has crawled easily 15,000 pages to date.

    Lessons learned in the first 60 days on a new site follow:

    1) Google crawls 250 pages on first discovery of links to site. Then they don't return until they find more links and crawl slowly. Google has failed to index new domain for 60 days.

    2) Yahoo looks for errors pages and once they find bad links will crawl them ceaselessly until you tell them to stop it. Then won't crawl at all for weeks until crawling heavily one day and lightly the next in random fashion.

    3) MSNbot requires robots.txt files and once they decide they like your site, may crawl too fast, requiring "crawl-delay" instructions in that robots.txt file. Implement immediately.

    4) Bad bots can strain resources and hit too many pages too quickly until you tell them to stay out. We banned 3 bots outright after they slammed our servers for a day or two. Noted "aipbot" crawled first then "BecomeBot" came along and then "Pbot" from Picsearch.com crawled heavily looking for image files we don't have. Bad bots, stay out. Best to implement robots.txt exclusions for all but top engines if their crawlers strain your server resources. We considered excluding the Chinese search engine named Baidu.com when they began crawling heavily early on. We don't expect much traffic from China, but why exclude one billion people? Especially since Google is rumored to be considering a possible purchase of Baidu.com as entry to Chinese market.

    The bottom line is that we've discovered all engines

    Internet Marketing is Growing like Weeds
    The Internet is growing so fast. Are you growing with it. Are you trying to sell something on the Internet? The numbers are growing daily. It is getting harder and harder just to market to the whole net. It has become so large that now you really need to concentrate in your niche area. Actually you probably were just wasting your time, effort and money if you were just shooting out into the crowd. If you listen or have heard anyone in the know about advertising they speak about a targeted group or market.We want the 15 to 25 or the 18 to 35. Or they want the middle aged white male. These are targeted groups. But there is another area that smart marketers use. It is call a targeted niche. I niche is a group that are interested in a specific area or field or even product. So
    ing heavily one day and lightly the next in random fashion.

    3) MSNbot requires robots.txt files and once they decide they like your site, may crawl too fast, requiring "crawl-delay" instructions in that robots.txt file. Implement immediately.

    4) Bad bots can strain resources and hit too many pages too quickly until you tell them to stay out. We banned 3 bots outright after they slammed our servers for a day or two. Noted "aipbot" crawled first then "BecomeBot" came along and then "Pbot" from Picsearch.com crawled heavily looking for image files we don't have. Bad bots, stay out. Best to implement robots.txt exclusions for all but top engines if their crawlers strain your server resources. We considered excluding the Chinese search engine named Baidu.com when they began crawling heavily early on. We don't expect much traffic from China, but why exclude one billion people? Especially since Google is rumored to be considering a possible purchase of Baidu.com as entry to Chinese market.

    The bottom line is that we've discovered all engines seem to delay indexing of new domain names for at least thirty days. Google so far has delayed indexing THIS new domain for 60 days since first crawling it. AskJeeves has crawled thousands of pages, while indexing none of them. MSN indexes faster than all engines but requires robots.txt file. Yahoo's Slurp crawls on again off again for 60 days, but indexes only six of total 15,000 or more pages crawled to date.

    We seem to have settled that there is a clear indexing delay, but whether this site specifically is "Sandboxed" and whether delays apply universally is less clear. Many webmasters claim that they have been indexed fully within 30 days of first posting a new domain. We'd love to see others track spiders through new sites following launch to document their results publicly so that indexing and crawling behavior are proven.

    © Copyright July 18, 2005 Mike Banks Valentine

    HTTP = HTML link (for blogs, profiles,phorums):
    <a href="http://www.casualarticles.com/article/78605/casualarticles-60-Day-Sandbox-for-Google--AskJeeves-MSN-Indexes-Quickest-Yahoo-Next.html">60 Day Sandbox for Google & AskJeeves; MSN Indexes Quickest, Yahoo Next</a>

    BB link (for phorums):
    [url=http://www.casualarticles.com/article/78605/casualarticles-60-Day-Sandbox-for-Google--AskJeeves-MSN-Indexes-Quickest-Yahoo-Next.html]60 Day Sandbox for Google & AskJeeves; MSN Indexes Quickest, Yahoo Next[/url]

    Related Articles:

    Market Research and Focus Groups

    Competitor Analysis - A Graphic Design Perspective

    How to Make Money Selling on eBay - Can You Be Trusted?

    Bookmark it: del.icio.us digg.com reddit.com netvouz.com google.com yahoo.com technorati.com furl.net bloglines.com socialdust.com ma.gnolia.com newsvine.com slashdot.org simpy.com shadows.com blinklist.com