Google Percolator

Tue 28 September 2010 09:00, Mark Farragher

Google Percolator

Google has rolled out its new search platform Caffeine in June, but the company has been a little tight lipped about the implementation details. We were told that Caffeine can incrementally update the search index and produce "50% fresher" results, but Google did not go into further details. All of that has changed with two Google search engineers, Daniel Peng and Frank Dabek publishing a paper describing the technology that drives Caffeine, called Percolator.

Up until now Google used a system called MapReduce to build their search index. Many crawlers would gather content from all over the web and feed it into the MapReduce system which then generates the search index in one large batch operation. Given the size of the index (more than 100 million gigabytes) this process can take many days to complete. The delay between a page being crawled and it showing up in the index used to be about 2-3 days on average, with sites being recrawled every one or two weeks. To be able to index breaking news stories Google split the index into layers, with the fastest layer being updated every 10 seconds. But the majority of the web is in the 'slow' layer which means we are stuck with a multi-day latency.

Percolator changes all that. In the new system pages are still being crawled, but now any changes are written directly to Google's distributed database. Small programs are embedded in the database (called observers) that respond to these changes. When an observer activates it writes directly into the search index and incrementally updates it to reflect the changed page. This triggers other observers to also activate, and after roughly 50 operations the index is up to date.

The new system is a massive performance improvement for Google. Pages are now indexed 100 times faster than before, so if my calculations are correct Google can now process a page in less than an hour! And with this reduction in processing speed there is now room for a lot more pages in the index. Caffeine processes three times as many pages as the old MapReduce system, and the average age of a page in the index is down by 50% which should put it at 1-2 days at most.

So why is this good news for SEO's? Well, you can now run SEO campaigns around breaking news stories using longtail keywords and have a much better chance at getting indexed. A tight deadline used to be a showstopper for SEO, forcing us to resort to AdWords or viral marketing with social media. With Percolator we can use traditional SEO techniques and still produce good results overnight.

... Or so I hope. What do you think?

  • Comments (8)
  • SEO
  • Tell-a-cowboy

Comments (8)



  • HTML is not allowed. URLs are automatically clickable.
    * Email address is not shown

  • DateRank: PageRank for singles
  • Raven Site Finder
  • 50 SEO's 1 question
  • SEO Oktoberfest 2008
  • SEO Wars - Planet Link Space
  • Online Workshop: Managing SEO Projects 10 Mistakes, 10 Keys

Last Comments


Last event


  • J-P De Clerck
    J-P De Clerck

    Profession: Customer-centric digi...

    Company: Conversionation

  • Sam Murray
    Sam Murray

    Profession: Senior Search Consultant

    Company: Verve Search

  • Susie Hood
    Susie Hood

    Profession: Head of Copywriting

    Company: Click Consult / SEO C...

  • Tom Bogaert
    Tom Bogaert

    Profession: Managing Partner

    Company: QueroMedia

Latest Videos



  • Lizette van der Laan
    Social Media Image

    Is it the real you, the witty you, the person who reads the most interesting articles, makes t...


Subscribe to SC Newsletter:

Most Read

RSS Feed

Are you a bloggerFacebook


© 2016 - All Rights Reserved - All views and opinions expressed are those of the authors of Searchcowboys.

All trademarks, slogans, text or logo representation used or referred to in this website are the property of their respective owners. Sitemap