Originally via Matt Cutts.
Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format...I'm a big fan of compact and simple serialization formats. We've developed a similar system here at Globalspec for our inter-system communication. If I'm allowed, perhaps I can elaborate more in the future. It's really exciting to take a look at Google's solution to a similar problem. One of the coolest features is that their protocol language generates stubs for multiple languages: C++, Java, and Python.
As the system evolved, it acquired a number of other features and uses:
Protocol buffers are now Google's lingua franca for data – at time of writing, there are 48,162 different message types defined in the Google code tree across 12,183 .proto files. They're used both in RPC systems and for persistent storage of data in a variety of storage systems.
- Automatically-generated serialization and deserialization code avoided the need for hand parsing.
- In addition to being used for short-lived RPC (Remote Procedure Call) requests, people started to use protocol buffers as a handy self-describing format for storing data persistently (for example, in Bigtable).
- Server RPC interfaces started to be declared as part of protocol files, with the protocol compiler generating stub classes that users could override with actual implementations of the server's interface.
I wonder if Hadoop/HBase could leverage this as a way to store serialize data in the file system.
I can't wait to try it out, this looks incredibly useful. Thank you Google developers!
Update: I ran across Thrift, which is Facebook's model for cross-language service communication. Thrift is now being spun off as a new Apache incubator project, Apache Thrift.
Thrift allows you to define data types and service interfaces in a simple definition file. Taking that file as input, the compiler generates code to be used to easily build RPC clients and servers that communicate seamlessly across programming languages.I don't have time for a more complete comparison, but a few differences:
1) Thrift defines services and communication, Google protocol buffer is only a way to serialize data
2) Thrift is a c-like language, Google's protocol buffers is a specialized data language with very simple structures