Wednesday, August 12

The Google File System Evolved: Real-Time User Applications

The ACM has an interview with Sean Quinlan on the evolution of the Google File System.

They talk about the issues they dealt with as GFS has evolved with an emphasis on the move to a distributed master design.
Our distributed master system that will provide for 1-MB files is essentially a whole new design. That way, we can aim for something on the order of 100 million files per master. You can also have hundreds of masters.
Towards the end, Sean discusses how GFS is evolving beyond its batch design to meet the needs of user-facing and latency sensitive applications often using BigTable to store structured data:
... engineers at Google have been working for much of the past two years on a new distributed master system designed to take full advantage of BigTable to attack some of those problems that have proved particularly difficult for GFS
I'm sure the Hadoop and HBase teams will find it interesting reading. I haven't had a chance to read the entire interview in detail because I'm leaving for a week long vacation on Cape Cod. Don't expect many updates from the beach!

1 comment: