Google Percolator

Tue 28 September 2010 09:00, Mark Farragher

Google Percolator

Google has rolled out its new search platform Caffeine in June, but the company has been a little tight lipped about the implementation details. We were told that Caffeine can incrementally update the search index and produce "50% fresher" results, but Google did not go into further details. All of that has changed with two Google search engineers, Daniel Peng and Frank Dabek publishing a paper describing the technology that drives Caffeine, called Percolator.

Up until now Google used a system called MapReduce to build their search index. Many crawlers would gather content from all over the web and feed it into the MapReduce system which then generates the search index in one large batch operation. Given the size of the index (more than 100 million gigabytes) this process can take many days to complete. The delay between a page being crawled and it showing up in the index used to be about 2-3 days on average, with sites being recrawled every one or two weeks. To be able to index breaking news stories Google split the index into layers, with the fastest layer being updated every 10 seconds. But the majority of the web is in the 'slow' layer which means we are stuck with a multi-day latency.

Percolator changes all that. In the new system pages are still being crawled, but now any changes are written directly to Google's distributed database. Small programs are embedded in the database (called observers) that respond to these changes. When an observer activates it writes directly into the search index and incrementally updates it to reflect the changed page. This triggers other observers to also activate, and after roughly 50 operations the index is up to date.

The new system is a massive performance improvement for Google. Pages are now indexed 100 times faster than before, so if my calculations are correct Google can now process a page in less than an hour! And with this reduction in processing speed there is now room for a lot more pages in the index. Caffeine processes three times as many pages as the old MapReduce system, and the average age of a page in the index is down by 50% which should put it at 1-2 days at most.

So why is this good news for SEO's? Well, you can now run SEO campaigns around breaking news stories using longtail keywords and have a much better chance at getting indexed. A tight deadline used to be a showstopper for SEO, forcing us to resort to AdWords or viral marketing with social media. With Percolator we can use traditional SEO techniques and still produce good results overnight.

... Or so I hope. What do you think?


  • Comments (3)
  •  
  • SEO
  • Tell-a-cowboy

Comments (3)

 

  • Obviously Percolator is better for those who use google search and hence make more sense for white-hat SEO

    Wo 13 okt 2010, 01:05


  • See also my blog post on it http://bigdatacraft.com/archives/240

    Wo 13 okt 2010, 01:06


  • Yup you are right. Keyword rankings are coming faster than earlier and losing the rankings also in the same speed :) To keep the rankings we should have good content and quality back-links.

    Ma 1 nov 2010, 14:42

Comment

  • HTML is not allowed. URLs are automatically clickable.
    * Email address is not shown



  • DateRank: PageRank for singles
  • Raven Site Finder
  • 50 SEO's 1 question
  • SEO Oktoberfest 2008
  • SEO Wars - Planet Link Space
  • Online Workshop: Managing SEO Projects 10 Mistakes, 10 Keys

Events

Last event

Bloggers

  • J-P De Clerck
    J-P De Clerck

    Profession: Customer-centric digi...

    Company: Conversionation

  • Sam Murray
    Sam Murray

    Profession: Senior Search Consultant

    Company: Verve Search

  • Susie Hood
    Susie Hood

    Profession: Head of Copywriting

    Company: Click Consult / SEO C...

  • Tom Bogaert
    Tom Bogaert

    Profession: Managing Partner

    Company: QueroMedia


Latest Videos

Border_top
Border_bottom

Columns

  • Lizette van der Laan
    Social Media Image

    Is it the real you, the witty you, the person who reads the most interesting articles, makes t...




Newsletter

Subscribe to SC Newsletter:


RSS Feed

Are you a bloggerFacebook


Search



© 2014 Searchcowboys.com - All Rights Reserved - All views and opinions expressed are those of the authors of Searchcowboys.

All trademarks, slogans, text or logo representation used or referred to in this website are the property of their respective owners. Sitemap