Monday, August 15, 2005

Yahoo's 20 Billion Web Doc Index Kerfuffle

Yahoo's Index Calculations Attract Suspicions.
Recently Yahoo announced that their index now contained over 20 billion "web objects", specifically 19.2 billion web documents, 1.6 billion images, and over 50 million audio and video files.

At first this number was blithely accepted, but within hours, the impact began to sink in... 20 billion is a lot, and far outstrips Google's current claim of 8,168,684,336 web pages. Google pushed past the 8 billion mark last November in a flurry of spidering that everyone noticed.

No one has been posting about Yahoo's slurp hammering servers. In fact, to my irritation - I noticed that Yahoo has been spidering and caching my sites' CSS files. What the heck is that about? How is the spidering/caching a CSS file useful for the average user? Should I start optimizing so I can rank for p{font-size: 12px;}?

Why do I have to waste my time changing my robots.txt files just for this stupidity from Yahoo anyhow?

Anyhow, soon Google co-founder Sergey Brin weighed in to GOOG share holders (well, actually at the New York Times), stating:

"The comprehensiveness of any search engine should be measured by real Web pages that can be returned in response to real search queries and verified to be unique, we report the total index size of Google based on this approach."

The article continues by citing a survey done on Sunday by the National Center for Supercomputer Applications, which found that Google returned an average of 166% more results over Yahoo in a random survey of over 10,000 search terms. The survey also found that Yahoo only beat Google's raw overall results numbers in 3% of the searches.

Where's the proof Yahoo? Can they be counting duplicate content, CSS files, RSS feeds in multiple formats and all the other dregs of information that make up the human useless "better that it stays invisible" web? All I can say I that I have a couple sites that have yet to be deep crawled by my friend Slurp, yet G and MSN's indexing count is in the thousands for these same sites.

Sorry Y!, just because a deep crawl is on your "to-do list" doesn't mean your url crawl list should count towards the index numbers.

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< SEO DotComicide Home