My reading of Percolator architecture: a Google search engine component
In April 2010, Google updated its indexing system. Caffeine – the name of this project – was pretty transparent for the large public but represents an in depth change for Google. It does not directly improve the search page, like instant search, but the indexing mechanism, the way to provide pertinent search results. For the end user, this change allows reducing the delay between when a page is founded and when it is made available in the Google search. Google has recently published a research paper about Percolator, one of the backend systems that subtend Caffeine. Research papers that described the previous system were written on Map/Reduce and Google File System. These two papers became the foundation for Hadoop on which I have written some articles. Therefore I was excited to discover this new architecture. After reading it, I decided to write out this article to give you, not just a summary in itself, but my understanding of this new architecture.
(more…)

