Google Protocol Buffers

Post on 13-Jan-2017

431 views 2 download

transcript

Google Protocol Buffers(Overview)

Sergey Podolskysergey.podolsky@gmail.com

History

Binary serialization - sucks• Language-specific• Not safe (see “Effective Java”)• Not extensible

XML, JSON – back to 1999• Too verbose• Need to parse• Slow performance• Huge size if not compressed• No strong types (int vs float)• Need to store field names

Google Protocol Buffers• Cross-language• Schema evolution• Compact• Strongly typed

MessagePerson.json Person.proto{ "userName": "Martin", "favouriteNumber": 1337, "interests": ["daydreaming", "hacking"]}

message Person { required string user_name = 1; optional int64 favourite_number = 2; repeated string interests = 3;}

Options• Outer class• Default value• Deprecated• Speed / Size• Custom options• desriptor.proto

Our protobuf use cases:• Java, C++, C#• Payload for ZeroC ICE• IBM MQ / Solace messages• DB raw data• Log messages to disk• Compress using TAR.GZIP• Show as XML / JSON• exe utility associated with protobuf files

Disadvantages• No Map<K, V> / Dictionary<K, V>• No Set<T>• No short / int16 / uint16• No interning• Generated classes are immutable• compiler vs library are not backwards compatible• descriptor.proto is not backwards compatible• Poor number of officially supported languages• Enum is not extensible (unknown resets to 0)

Apache Avro• No tag • Schema is required• The entire record is tagged by schema ID• Fields are matched by name• No optional values: union { null, long } is used instead• Resolution rules are used for server vs client schemas

Apache Avro

JSON Notation IDL{ "type": "record", "name": "Person", "fields": [ {"name": "userName", "type": "string"}, {"name": "favouriteNumber", "type": ["null", "long"]}, {"name": "interests", "type": {"type": "array", "items": "string"}} ]}

record Person { string userName; union { null, long } favouriteNumber; array<string> interests;}

Apache• “one-stop shop”• RPC framework• Different serialization formats (“protocols”)

Apachestruct Person { 1: string userName, 2: optional i64 favouriteNumber, 3: list<string> interests}

Comparison  Thrift Protobuf

Language Bindings Java, C++, Python, C#, Cocoa, Erlang, Haskell, OCaml, Perl, PHP, Ruby, Smalltalk Java, C++, Python

Primitive Types bool, byte, 16/32/64-bit integers, double, string, byte sequence, map<t1,t2>, list<t>, set<t>

bool, 32/64-bit integers, float, double, string, byte sequence, “repeated” properties act like lists

Enumerations Yes YesConstants Yes NoComposite Type Struct MessageException Handling Yes NoDocumentation Lacking GoodLicense Apache BSD-styleCompiler C++ C++RPC Interfaces Yes YesRPC Implementation Yes NoComposite Type Extensions No YesData Versioning Yes Yes

Pros- More languages supported out of the box- Richer data structures than Protobuf (e.g.: Map and Set)- Includes RPC implementation for services

- Slightly faster than Thrift when using "optimize_for = SPEED"- Serialized objects slightly smaller than Thrift due to more aggressive data compression- Better documentation- API a bit cleaner than Thrift

Cons - Good examples are hard to find - Missing/incomplete documentation

- .proto can define services, but no RPC implementation is defined (although stubs are generated for you).