Casual Articles
#1 in Business Subscribe Email Print

You are here: Home > Computers and Technology > Hardware > Review: Implementing the Google Search Appliance in an Intranet environment

Tags

  • confederate
  • solution
  • installation process
  • checked against
  • laptop already

  • Links

  • Why It Is Wise To Consider Pet Insurance
  • What is Atopic Dermatitis?
  • In Stitches with Machine Quilting - Whole Cloth and Trapunto
  • Casual Articles - Review: Implementing the Google Search Appliance in an Intranet environment

    Richmond Home Loans: Where to Find Them!
    Historically, Richmond is not only Virginia’s thriving capital but the city has played an important part in the development of our nation. During the Revolutionary War Richmond was attacked and burned by British troops, but quickly rebuilt. During the War Between the States, the city operated as the capital of the Confederate States of America. In 1865, much of Richmond lay in ruins as retreating confederate forces decided to burn the city instead of relinquishing it to advancing union forces. Today, Richmond is a bustling metropolis operating as an important middle point between cities in the northeast corridor and America’s south. Its favorable climate, historical features, cultural and sporting amenities, business community, and overall convenience are positive aspects of the city. If you are buying a home in Richmond, please keep reading for important mortgage information.Like any American city, buying a home in Richmond involves similar steps: you find the home you want, put down some money, you get in contact with several mortgage lenders for competing rates, and you choose a home loan
    word we provided in the previous step, we were presented with the Administration console. We created a new collection to hold our index, put in the “Start crawling from” URL, copied that same URL into the “Follow and Crawl only URLs with the Following patterns” box and we were done. We saved our settings and then clicked the “Start crawling” button. We then went over to the “Crawl status” screen and watched the “Crawled URLs” counter increase. Google advertises that it can crawl about 4,000 URLs in about 15 minutes or so. We found the crawl time would increase significantly if there are documents (Word, PDF, Excel, etc.) linked to from those URLS.

    After the crawl is done the collection is automatically indexed and then checked against the Serving Prerequisites (any criteria you wish to use to determine whether to move an indexed collection to production) and the collection will either be moved to Production (and consequently searchable) or be moved to Staging. The Staging area lets you validate new crawls before letting users search against them.

    Crawling configuratio

    How to Earn a Safe Income From Horse Racing
    Many people seek an alternative (or second income) from Home Based Business. Only after they are in do they really get to the disappointment - the set up costs and low returns from MLM, pyramids and the like.Horse Racing has nearly always been associated with gambling and there a mass of cure-all 'magic pills' offered worldwide from within the Gambling Industry claiming overnight success and profit for one's money, if only they spend another $1000 on a computer program or subscribe to a “better” $100 per month race tipping service etc.The TEST OF TIME is the only true test of any business. And when it comes to profitability, it is the only one.Whether it is a cafe, hairdressing salon, a garage, used car lot, all business has its ups and downs. The critical aspect of all business is Cash Flow management. Fundamentally and simply put, a positive FLOW of CASH into the business is critical for its survival. It is how a business owner handles the low Cash Flow months that ultimately determines the long term success of the business.Most 'gambling' business adverts show outrageo
    Our corporate intranet is a non-framed environment with both Lotus Domino and IIS (.Net and classic ASP) applications and content. We have between 300,000-500,000 pages of web content and documents across more than 1200 “sites” on approximately 30 unique domains. We used to have Inktomi’s UltraSeek Server 3.0 as our intranet search engine which was beginning to look like its age (purchased in 1998). The Inktomi product did not handle attachments well (DOC, PPT, PDF, etc.), would not crawl our secured sites, and was no longer supported by the vendor. We did a cursory review of the search vendors and were immediately attracted to Google’s 30 day trial offer for their Google Search Appliance (GSA). After signing a standard agreement, they shipped us a brand new shiny yellow unit which we could test for 30 days before returning or purchasing.

    Product info

    The GSA is a “black box” 1U standard rack-mountable server. By “black box” I mean, Google gives you a web interface to administer the device but do not want you to access the Operating System (a heavily Google-customized version of Linux). In fact, the license agreement stipulates that you will not tamper with the hardware or OS of the appliance in any way. The device has no need for a keyboard, mouse or video – all you need for normal operation is a network cable and standard power input.

    The GSA comes in different flavors to fit different needs varying by size of the hardware and correspondingly size of the license. (Licensing is based on the number of URLs crawled by the appliance.) There are 3 different hardware configurations; the GB-1001, GB-5005, and GB-800. These are broken down as follows;

    • GB-1001 – 150K documents for $28K, 300K documents for $50K
    • GB-5005 – 1.5M documents for $230K
    • GB-8008 – 4M documents for $450K

    Why Google?

    As advertised, the GSA met all of our needs being able to index the large variety of filetypes we have in our environment, access secured content, having a documented API, etc. The Google brand power was another big selling factor. When we told our users that they were going to get a Google-based search engine they knew their days of troubled searching were over. Lastly, the 30-day trial run experience we had with the GSA sealed the deal. The appliance is the easiest enterprise solution I’ve ever had to install, configure and maintain. We were literally up and running within an hour of opening the shipping box.

    Installation

    The appliance has two network ports on the back panel; one for normal operation and the other used exclusively for network configuration. To configure the network settings we connected a laptop to the appliance via a special (some pin-outs are non-standard) orange Ethernet cable which is included. The installation process was about as easy as one can imagine for a “black box.”

    First we plugged in the normal operation network cable and then the power. The power plug on the appliance IS the power switch; plug it in to turn on and unplug it to turn it off. After plugging it in, we waited about 5 minutes for the appliance to play a tune which is the signal to continue. Next, we hooked up our laptop (already set to DHCP mode) to the appliance and powered it up. After logging in to our laptop and making sure we had the correct IP assigned by the appliance’s built-in DHCP server we are ready to configure the network settings. Total elapsed time (excluding rack mounting): 10 minutes.

    Configuration

    Network configuration, like normal administration, is done entirely through a browser and is a simple 5 step process. The first screens ask you for basic network information; IP address, subnet mask, default gateway, and DNS. Subsequent screens collect SMTP server, “From” address for GSA notification messages, time zone, NTP (time) servers and the admin account name/password. The last step is to test a few URLs which you will be crawling to make sure you’ve done the setup correctly. After a final settings review screen configuration is complete and you can then unplug your laptop and get to the good part; start crawling. Total elapsed time: 10 minutes.

    Crawling the site(s)

    Using the URL provided, all administration of the GSA is done remotely. After logging in with the ID/password we provided in the previous step, we were presented with the Administration console. We created a new collection to hold our index, put in the “Start crawling from” URL, copied that same URL into the “Follow and Crawl only URLs with the Following patterns” box and we were done. We saved our settings and then clicked the “Start crawling” button. We then went over to the “Crawl status” screen and watched the “Crawled URLs” counter increase. Google advertises that it can crawl about 4,000 URLs in about 15 minutes or so. We found the crawl time would increase significantly if there are documents (Word, PDF, Excel, etc.) linked to from those URLS.

    After the crawl is done the collection is automatically indexed and then checked against the Serving Prerequisites (any criteria you wish to use to determine whether to move an indexed collection to production) and the collection will either be moved to Production (and consequently searchable) or be moved to Staging. The Staging area lets you validate new crawls before letting users search against them.

    Crawling configuration

    Real Estate Development Marketing
    A Specialist Article For Those Interested In Real Estate Development From the desk of Colm Dillon ... Author of  "Residential Development Made Easy" Hello Colm Dillon here ... Real Estate Development Marketing! When do you start? As soon as you open your 'baby blue eyes' every morning! "The Easy Part of Property Development is Spending Money" ... "Marketing Is What Gets It Back + A Bit More For Profit." Anyone can spend money. It takes a good manager to spend it at a predetermined rate in line with a planned 'cash flow.' So this topic is very important. People think Development Marketing is all about putting an advert in the paper, designing a brochure and following up the agents ... I don't think so folks!! Marketing starts before you buy the land. The location of the land impacts on marke
    ized version of Linux). In fact, the license agreement stipulates that you will not tamper with the hardware or OS of the appliance in any way. The device has no need for a keyboard, mouse or video – all you need for normal operation is a network cable and standard power input.

    The GSA comes in different flavors to fit different needs varying by size of the hardware and correspondingly size of the license. (Licensing is based on the number of URLs crawled by the appliance.) There are 3 different hardware configurations; the GB-1001, GB-5005, and GB-800. These are broken down as follows;

    • GB-1001 – 150K documents for $28K, 300K documents for $50K
    • GB-5005 – 1.5M documents for $230K
    • GB-8008 – 4M documents for $450K

    Why Google?

    As advertised, the GSA met all of our needs being able to index the large variety of filetypes we have in our environment, access secured content, having a documented API, etc. The Google brand power was another big selling factor. When we told our users that they were going to get a Google-based search engine they knew their days of troubled searching were over. Lastly, the 30-day trial run experience we had with the GSA sealed the deal. The appliance is the easiest enterprise solution I’ve ever had to install, configure and maintain. We were literally up and running within an hour of opening the shipping box.

    Installation

    The appliance has two network ports on the back panel; one for normal operation and the other used exclusively for network configuration. To configure the network settings we connected a laptop to the appliance via a special (some pin-outs are non-standard) orange Ethernet cable which is included. The installation process was about as easy as one can imagine for a “black box.”

    First we plugged in the normal operation network cable and then the power. The power plug on the appliance IS the power switch; plug it in to turn on and unplug it to turn it off. After plugging it in, we waited about 5 minutes for the appliance to play a tune which is the signal to continue. Next, we hooked up our laptop (already set to DHCP mode) to the appliance and powered it up. After logging in to our laptop and making sure we had the correct IP assigned by the appliance’s built-in DHCP server we are ready to configure the network settings. Total elapsed time (excluding rack mounting): 10 minutes.

    Configuration

    Network configuration, like normal administration, is done entirely through a browser and is a simple 5 step process. The first screens ask you for basic network information; IP address, subnet mask, default gateway, and DNS. Subsequent screens collect SMTP server, “From” address for GSA notification messages, time zone, NTP (time) servers and the admin account name/password. The last step is to test a few URLs which you will be crawling to make sure you’ve done the setup correctly. After a final settings review screen configuration is complete and you can then unplug your laptop and get to the good part; start crawling. Total elapsed time: 10 minutes.

    Crawling the site(s)

    Using the URL provided, all administration of the GSA is done remotely. After logging in with the ID/password we provided in the previous step, we were presented with the Administration console. We created a new collection to hold our index, put in the “Start crawling from” URL, copied that same URL into the “Follow and Crawl only URLs with the Following patterns” box and we were done. We saved our settings and then clicked the “Start crawling” button. We then went over to the “Crawl status” screen and watched the “Crawled URLs” counter increase. Google advertises that it can crawl about 4,000 URLs in about 15 minutes or so. We found the crawl time would increase significantly if there are documents (Word, PDF, Excel, etc.) linked to from those URLS.

    After the crawl is done the collection is automatically indexed and then checked against the Serving Prerequisites (any criteria you wish to use to determine whether to move an indexed collection to production) and the collection will either be moved to Production (and consequently searchable) or be moved to Staging. The Staging area lets you validate new crawls before letting users search against them.

    Crawling configuratio

    The Need for Dedicated Servers & Managed Hosting
    The World Wide Web has brought with it some of the most amazing technological advances ever seen by a single generation. Just ten years ago, it was considered “cutting edge” for a company to simply have it’s own website, never mind an interactive page with high-tech graphics and shopping carts. But now if a business doesn’t have its own site, they are considered to be archaic by the majority of net surfers. But with all of these advances come some issues, especially for those companies and businesses that require larger or heavily visited sites. For this and other reasons, dedicated servers and managed hosting has become quite popular in the more recent years of the Internet.Simply put, a dedicated server refers to the manner in which Web hosting is done. A shared Web hosting company may have several different companies’ websites on one of their “on-premises” computers, all running at the same time. But larger companies (with the websites to match), maybe with a forum or need for a lot of storage, are much better off with their own computer from a Web hosting company. This is what a dedicated
    -based search engine they knew their days of troubled searching were over. Lastly, the 30-day trial run experience we had with the GSA sealed the deal. The appliance is the easiest enterprise solution I’ve ever had to install, configure and maintain. We were literally up and running within an hour of opening the shipping box.

    Installation

    The appliance has two network ports on the back panel; one for normal operation and the other used exclusively for network configuration. To configure the network settings we connected a laptop to the appliance via a special (some pin-outs are non-standard) orange Ethernet cable which is included. The installation process was about as easy as one can imagine for a “black box.”

    First we plugged in the normal operation network cable and then the power. The power plug on the appliance IS the power switch; plug it in to turn on and unplug it to turn it off. After plugging it in, we waited about 5 minutes for the appliance to play a tune which is the signal to continue. Next, we hooked up our laptop (already set to DHCP mode) to the appliance and powered it up. After logging in to our laptop and making sure we had the correct IP assigned by the appliance’s built-in DHCP server we are ready to configure the network settings. Total elapsed time (excluding rack mounting): 10 minutes.

    Configuration

    Network configuration, like normal administration, is done entirely through a browser and is a simple 5 step process. The first screens ask you for basic network information; IP address, subnet mask, default gateway, and DNS. Subsequent screens collect SMTP server, “From” address for GSA notification messages, time zone, NTP (time) servers and the admin account name/password. The last step is to test a few URLs which you will be crawling to make sure you’ve done the setup correctly. After a final settings review screen configuration is complete and you can then unplug your laptop and get to the good part; start crawling. Total elapsed time: 10 minutes.

    Crawling the site(s)

    Using the URL provided, all administration of the GSA is done remotely. After logging in with the ID/password we provided in the previous step, we were presented with the Administration console. We created a new collection to hold our index, put in the “Start crawling from” URL, copied that same URL into the “Follow and Crawl only URLs with the Following patterns” box and we were done. We saved our settings and then clicked the “Start crawling” button. We then went over to the “Crawl status” screen and watched the “Crawled URLs” counter increase. Google advertises that it can crawl about 4,000 URLs in about 15 minutes or so. We found the crawl time would increase significantly if there are documents (Word, PDF, Excel, etc.) linked to from those URLS.

    After the crawl is done the collection is automatically indexed and then checked against the Serving Prerequisites (any criteria you wish to use to determine whether to move an indexed collection to production) and the collection will either be moved to Production (and consequently searchable) or be moved to Staging. The Staging area lets you validate new crawls before letting users search against them.

    Crawling configuratio

    Creating Residual Income - Affiliate Programs Are The Answer
    If you are tired of the daily nine to five slog and trapped in the proverbial rat race of modern life, it’s time for you to take a trip back to the Middle Ages. Here’s why: In the Middle Ages, although no one really knew it at the time, a concept developed that would be responsible for the wealth of some of the biggest business titans in our modern age. Back in the Middle Ages a select few people (usually aristocracy) were land owners, and we are not just talking about little lots, but we are talking states and provinces which ‘belonged’ to individuals and families. Due to their positions as land owners they were able to make a fortune for doing absolutely nothing.See, what these landowners did was to charge every person who lived on their land a levy – regardless of what they did on the land. If you were a landowner and you had seven towns on your land every person and farmer in and around those towns had to pay you a monthly or weekly fee, just for being there. This is where the idea and the principle behind residual income originated. Residual income, by definition, is income that continue
    the appliance and powered it up. After logging in to our laptop and making sure we had the correct IP assigned by the appliance’s built-in DHCP server we are ready to configure the network settings. Total elapsed time (excluding rack mounting): 10 minutes.

    Configuration

    Network configuration, like normal administration, is done entirely through a browser and is a simple 5 step process. The first screens ask you for basic network information; IP address, subnet mask, default gateway, and DNS. Subsequent screens collect SMTP server, “From” address for GSA notification messages, time zone, NTP (time) servers and the admin account name/password. The last step is to test a few URLs which you will be crawling to make sure you’ve done the setup correctly. After a final settings review screen configuration is complete and you can then unplug your laptop and get to the good part; start crawling. Total elapsed time: 10 minutes.

    Crawling the site(s)

    Using the URL provided, all administration of the GSA is done remotely. After logging in with the ID/password we provided in the previous step, we were presented with the Administration console. We created a new collection to hold our index, put in the “Start crawling from” URL, copied that same URL into the “Follow and Crawl only URLs with the Following patterns” box and we were done. We saved our settings and then clicked the “Start crawling” button. We then went over to the “Crawl status” screen and watched the “Crawled URLs” counter increase. Google advertises that it can crawl about 4,000 URLs in about 15 minutes or so. We found the crawl time would increase significantly if there are documents (Word, PDF, Excel, etc.) linked to from those URLS.

    After the crawl is done the collection is automatically indexed and then checked against the Serving Prerequisites (any criteria you wish to use to determine whether to move an indexed collection to production) and the collection will either be moved to Production (and consequently searchable) or be moved to Staging. The Staging area lets you validate new crawls before letting users search against them.

    Crawling configuratio

    Use This Internet Marketing Secret To Eliminate Almost All Your Refunds
    Want to know one of the best things you can do to make your Internet marketing promotions pull money left and right, even amongst stiff competition and hyper skepticism?It's simple:Instead of using the typical 30, 60 or 90 day guarantee for your products, use a lifetime guarantee.In other words, if people don’t like the product, let them return it at any time during their lives, no questions asked. This is especially effective if you are in a crowded market or a market where marketers have a bad reputation (such as diet or Internet marketing for example).And even if you aren't in a skeptical market, if you want to cut through the noise and shouting in any market you’ve got to have a good, strong guarantee.That way people will be way more likely to give you a chance. You are basically taking all the pressure off of them. You're showing them what an upstanding person you are. That you back your product all the way. That you aren't in a hurry to just take their money and run like so many others.Look, I know this sounds sort of radical, but it works.
    word we provided in the previous step, we were presented with the Administration console. We created a new collection to hold our index, put in the “Start crawling from” URL, copied that same URL into the “Follow and Crawl only URLs with the Following patterns” box and we were done. We saved our settings and then clicked the “Start crawling” button. We then went over to the “Crawl status” screen and watched the “Crawled URLs” counter increase. Google advertises that it can crawl about 4,000 URLs in about 15 minutes or so. We found the crawl time would increase significantly if there are documents (Word, PDF, Excel, etc.) linked to from those URLS.

    After the crawl is done the collection is automatically indexed and then checked against the Serving Prerequisites (any criteria you wish to use to determine whether to move an indexed collection to production) and the collection will either be moved to Production (and consequently searchable) or be moved to Staging. The Staging area lets you validate new crawls before letting users search against them.

    Crawling configuration

    After your first crawl you may find the need to go back and tweak the crawling parameters. Google gives you a good amount of control over how sites are crawled, the frequency, how many threads are used, etc. For sites with security, the GSA supports Basic Authentication and an additional security module is available which supports Forms Authentication. The most challenging configuration aspects for us were determining the right combination of URL patterns to exclude from the search. If you are a Domino shop and looking to use the GSA you may need to spend some time getting the crawler configuration just right to support the sometimes convoluted Domino query string parameters.

    After we got the crawl parameters tuned and the first complete crawl done we did some testing to see if the crawler grabbed all the content. Browsing our site and testing with some strings buried deep inside the taxonomy we always found the GSA had crawled them accurately. We also did some testing with strings inside PDF documents, PowerPoint presentations and the like. When we did come across something that hadn’t been crawled a careful analysis led us to discover that we needed to do some more tweaking of the crawl settings.

    Other notable features

    Google also gives you a KeyMatch tool that allows you to specify which indexed documents should appear at the top of the results page for a given query. These manifest themselves almost identically to the Sponsored Links at the top of the results page of the Google we all use. A Synonym tool allows you to specify alternate words or phrases for search queries. For example, if someone searches for WCM, you can suggest “Web Content Management” at the top of the results page.

    An output format feature lets you control (via an XSLT) the presentation of the search results. You can use this for changing the fonts, colors, logo, header, etc. of the results page. We were able to easily remove the “Cached” feature on the results page with some XSLT modifications.

    The Reporting tool lets you run reports on search queries over various time ranges. It will show you the number of searches per day, per hour, the top 100 keywords and top 100 queries for the time period specified.

    Downsides

    The GSA is not for organizations looking to index their shared network drives as the appliance has no facility for crawling file systems. This is really too bad as many companies struggle with the huge quantities of unstructured content on stored on their networks. Of course, there are a plethora of other products out there for exactly this issue.

    Access directly to databases (e.g. SQL, Oracle, etc.) is another area which is off-limits for the GSA as well as any kind of integration with content or document management systems.

    Conclusion

    The Google Search Appliance (GSA) is an excellent search product for HTTP-accessible content. It gives great control over administrative features such as crawler configuration and results serving and sufficient reporting capabilities as well. Those looking for a solution to integrate directly with a content/document management system, databases, or indexing network drives should look to another product. However, if you have a intranet or intranet site with plenty of HTML-based content the GSA may be just what you need.

    HTTP = HTML link (for blogs, profiles,phorums):
    <a href="http://www.casualarticles.com/article/175264/casualarticles-Review-Implementing-the-Google-Search-Appliance-in-an-Intranet-environment.html">Review: Implementing the Google Search Appliance in an Intranet environment</a>

    BB link (for phorums):
    [url=http://www.casualarticles.com/article/175264/casualarticles-Review-Implementing-the-Google-Search-Appliance-in-an-Intranet-environment.html]Review: Implementing the Google Search Appliance in an Intranet environment[/url]

    Related Articles:

    How To Make Money With Google Adsense (Part 3)

    Bad Credit Secured Personal Loans - Easy Finance on Good Term

    What Do All Those Numbers Mean?

    Bookmark it: del.icio.us digg.com reddit.com netvouz.com google.com yahoo.com technorati.com furl.net bloglines.com socialdust.com ma.gnolia.com newsvine.com slashdot.org simpy.com shadows.com blinklist.com