Protocol Buffers, Part 2 — The Untold Parts Of Using “Any”
The nice thing about Protobuf is that it is well documented.. at least for the most part ;) One of the topics covered with less detail is how to wrap arbitrary messages within an “envelope” message. This post aims to fill this gap.
This is the second post in my series on Protocol Buffers. If you haven’t read the first one yet I recommend you check it out here.
Motivation: DDD Aggregates
For me, the most recurring use case for wrapping messages is the case of Aggregates (from Domain Driven Design). In short, aggregates act as boundaries that ensure the integrity of operations on an entity (along with its sub-entities).
For example, most virtual machine hypervisors won’t let you edit a VM’s spec while its state is in transition. So one way to ensure the integrity of a VM is kept intact is to manage it within an Aggregate. i.e. we would funnel all of the commands for a given VM via a single VM Aggregate component. This way there is a single place (per VM) that can verify that integrity conditions are met before going ahead and performing the operation.
Hence, the aggregate component would need to be able to process different types of messages without knowing in advance “which is which”.
Let’s see an example use case, followed by some code.
Use case: VM service
As mentioned earlier, the use case is that a given service may accept several different types of messages via a single channel. The frame story is the same as in the previous post, so let’s assume you are building a backend for a cloud platform (much like my team does at CloudShare).
So, suppose we want our service to support 3 operations:
- Provision a VM.
- Edit a VM’s spec.
- Stop a VM.
Per the Domain Driven Design approach, each of the above operations would have a command message and an event message. The commands are funneled via a single service — the VmService
— in order to deal with conflicting operations.
Back to wrapping Protobuf messages
So how do we go about parsing Protobuf messages without knowing their type in advance?
Attempt no. 1 — Polymorphic Messages:
Referencing a message of specific type from another message is a built-in part Protobuf — but can we have polymorphic references via message inheritance?
Attempt no. 2 — Composition:
It turns out there’s a mechanism for packing an arbitrary message inside another “envelope” message — using the Any message type.
The full code example can be found at — https://github.com/rotomer/protobuf-blogpost-2. We will focus on the Provision VM flow as an example.
Message Definitions
The ProvisionVmCommand
and its enclosing envelope message definitions may look like so:
As we can see, the VmCommandEnvelope
has two fields:
- An inner message of type
Any
. - VM id in order to enable routing the envelope to the appropriate VM aggregate.
Packing Into Any
Packing a message into an Any
message is done by: Any.pack(message)
The resulting Any
message is actually quite simple. You can see it’s message definition here. It is composed of just two fields — an arbitrary serialized message as bytes
along with a type URL that acts as a globally unique identifier to resolve that message's type. In our example the type URL would be: type.googleapis.com/rotomer.simplevm.messages.ProvisionVmCommand
.
The pack methods provided by protobuf library will by default use
'type.googleapis.com/full.type.name' as the type URL and the unpack
methods only use the fully qualified type name after the last '/'
in the type URL, for example "foo.bar.com/x/y.z" will yield type
name "y.z". (from the Any message definition)
Sharing Message Definitions
Theoretically, the type URL prefix can be used to specify a schema repository and lookup the message schema there. However, in practice there is no built-in / open source schema repository for Protobuf. Admittedly, Avro has an edge there with Confluent’s schema registry.
What worked for me in practice is to simply share the generated message classes as a binary package. Or as Udi Dahan puts it:
Encoding For Textual Formats
Once packed, we would like to send the serialized message over the wire. In case you are using a textual protocol as HTTP then you would need to encode the message appropriately. In this example we would use AWS SQS as the asynchronous message transport and are therefore required to encode the messages before sending them. Simply use Guava’s / Apache Commons’ base 64 encoders to get the job done.
Another option, which will be covered in the next post, is to serialize the protobuf message to JSON instead of binary.
Full Example: Sending Any Wrapped Messages
Unpacking Using The Type URL — It’s Up To You
As we you can see in the class below, there’s no magic involved in unpacking Any
messages . You must specify the type of the message you wish to unpack to. i.e. it’s up to you to map from the type URL into the appropriate generated message class.
Let’s have a look at one possible implementation for doing just that in the VmMessageUnpacker
class:
This is pretty much it for the demo part dealing with Any
messages. For completeness of demonstrating the use case — let’s see the VmService
and the ProvisionVmOperation
classes:
Side note: The VmService
in this demo is over-simplified for the sake of focusing on the Protobuf aspects of this post (it is a single instance, and the processing of the messages is performed in-process & synchronously). In real life our services are modelled as aggregates and implemented using Akka actors (The Akka toolkit facilitates concurrency and asynchronous message passing between services). A future blog post will cover that in detail.
The VmService
processes an incoming message by decoding it, unpacking the inner Any
message and invoking the appropriate handler:
The above example uses Vavr’s pattern matching as an implementation for dynamically dispatching the appropriate operation based on the command type, but any other implementation will do (map, switch, etc’…).
The ProvisionVmOperation
processes an incoming message by calling the hypervisor service to provision a VM, creating an event with the result, and sending it back via the response channel.
Pitfalls of Any
Last but not least, it’s worth mentioning some of the pitfalls of using Any:
- As noted in this excellent blog post from the Envoy team — changing the package namespace of a message packed into
Any
breaks wire compatibility. Note that this case isn’t covered in the official Protobuf guidelines on how to evolve message schema without breaking existing code. - JSON formatting causes deep deserialization of
Any
packed messages. This has a profound effect on the use cases ofAny
. i.e. it forces the very first component that deserializes the envelope message to have message type mappings for each of the possible types that can be packed into theAny
field. For example, if we were to transform the above code into something more production grade, then we could consider adding a “router” component that would send the incoming message into the appropriateVmAggregate
(thus lifting the constraint of a singleton serving all VMs). In this case, the router component would have to import each of the possible generated message classes in order to be able to parse just the envelope part. In summary, when using JSON formatting theAny
type is no longer opaque. We’ll cover ways to deal with this case in the next post which deals with serializing Protobuf messages to JSON.
Recap
The Any
construct provides an effective way send arbitrary messages in Protobuf. There is zero magic involved which leaves room for different implementation options, but also very little documentation to help evaluating the options.
I hope this post helped in shedding some light into how one can go about using Any
. Please feel free to comment and share your experience with using Any
if you found other techniques to be useful :)
Next post in the series
Be sure to check the next post on Protobuf & JSON formatting.
✉️ Subscribe to CodeBurst’s once-weekly Email Blast, 🐦 Follow CodeBurst on Twitter, view 🗺️ The 2018 Web Developer Roadmap, and 🕸️ Learn Full Stack Web Development.