Beginning insights from the Google Mini
Upon setting it up, nothing really great came to mind. However, after doing some tweaking (it's powering the search on toolbarn.com right now), it has become clear that the more I learn about this box and it's capabilities the more I understand Google.
For starters, I ended up having to cloak some pages to our mini to get our results to come out right. A search for makita drills gave me results of milwaukee drills as well because of our breadcrumb navigation and the cross-linking. Every page on our site was returned for power tools because it's in the main navigation. Some searches return poor results, such as makita 5000, which several people have searched for. I have a temp solution in place for that.
So, after playing with it and then sitting back to think about how it works / serves results, I figured something out that may end up being priceless.
Searches done on our site that the Google Mini return 0 results for need site changes.
I'm logging how many results the mini returns for every search that is done on our site. What I'm seeing is patterns in the way people search that yield no results. Well guess what... people search for those same phrases at Google and many times get 0 relevant results there as well. Sure, they'll get results, but the relevancy isn't there.
For example, the search for makita 5000 could be a GV5000 or an HR5000, accessories for either of those, or perhaps something else. The results at Google.com are 5000 RPM, 5000 staples per pack on a stapler page, or 5000 Watts for a generator. Why wouldn't I do some work to make my site come up #1 for makita 5000 since I've seen quite a few searches on our site for it (I'm sure I'll see more after hitting submit for this post) and the results are poor in the SERPs.
Now, that's not the only thing I've learned. Google's mini, while having some technical differences due to only being concerned with a small sampling of the web, gives me a sense of what optimizes better between 2 pages. For example, I can create a test result set and have 1 page using identical link text to point to 2 pages, then have their algo decide which is better optimized. Any SEO that just read that should be getting out their credit cards. How useful is that? I've seen some results from those experiments already within the Google SERPs. Oh, and I can supress those pages from being served in the results, allow them for a few minutes to do my test, then hide them again. Very cool.
It also makes sense now why there is a delay between crawling and showing up in the SERPs.
There is a 3 step process that the mini uses.
2) Build Index.
3) Launch / Replicate Index.
While they've undoubtedly got more processing power and storage than thousands of these little guys for their primary engine (Dual PIII with 2GB of RAM in that little blue box), indexing our site takes it over 4 hours. By default, it tries to keep no more than 4 connections open at a time to any domain. Given how many pages our site is comprised of, 4 pages at a time makes for a very long crawl time.
Once everything is crawled, the index building takes it almost 30 minutes for our sites. That's just 25,000 pages that we index out of the billions that they index. We're limiting which pages the mini crawls and assigning it a cookie so it doesn't see 100,000 different checkout page URL's to evaluate. Talk about some major processing power to build an index on the data they gather - mind blowing. When this machine takes that long for 25,000 pages it's got to take a while for their index and that's got to take more processing power than I've ever considered building. =)
Then, after everything is crawled and an index is built, it replicates the index. It copies the old index to a new location, sets the copy active, then replaces the primary index with the new build, followed by a switch to the new index after testing for our required results. After considering the safeguards that it gives by having some test searches with required results, I'm sure they've got a ton of required results to make an index active in their web search. For example, searching for microsoft better give you microsoft.com somewhere in the top so many pages of the SERPs or you've got issues.
I've got more, but I'm still pondering what useful information I can garner from the insight. Really, for under $4000 (we bought the extra year of upgrades and hardware replacement which is where the extra $1000 came in) it's probably going to be a worthwhile investment just for increasing our SERPs, let alone the search results it gives our customers.