Blog Post

Avro Schema Versioning and Performance – Part 3

August 3, 2016 Author: Matt Dailey

Version dependency is an important consideration, especially in large distributed systems where it's usually impossible to upgrade all components in unison. This is the third and final in a series of posts about how we use Avro schema versioning in Rocana Ops to help minimize this dependency with minimum performance impact.

In our first post, we talked about how Rocana Ops leverages Apache Avro for data interchange and third-party integrations, and how microbenchmarks can be used to help improve Avro decode performance. In our second post, we provided an overview of JMH, a microbenchmark we use at Rocana. In this final post, we compare two alternate Avro Event decoding approaches using the JMH benchmark to determine which delivers the best runtime performance in Rocana Ops.

Check out part 1 of this series: High Performance Avro Schema Versioning in Rocana Ops, and part 2: Avro Schema Versioning and Performance.

Option 1: Decode VersionedEvent and EventV2 in a single sweep

Recall from our first post, that we were concerned this approach, while straightforward, would be slow due to excessive decode operations. Because EventV2 is wrapped within a VersionedEvent, the original procedure for decoding the Event itself has two steps.

  1. Decode a VersionedEvent that includes a byte array
  2. Decode that byte array as an EventV2

Avro serializes data in the same order that fields are defined in the schema.  This is what allows us to always read the schema version as the first field of an encoded VersionedEvent. Knowing this, the decode process can be improved by “manually” decoding the values in the VersionedEvent.


This diagram shows the different serialized fields of the VersionedEvent.  The left side shows how we logically think of the VersionedEvent wrapping the serialized EventV2. The right side shows that the binary representation simply contains a flattened, encoded EventV2; there are no special nesting semantics.

Hence, the decode procedure becomes:

  1. Decode a long from the VersionedEvent. This is the version.
  2. Decode another long, which is the length of the byte array for the Event itself. This value can be discarded.
  3. Decode the remaining bytes as an EventV2

Option 2: Update the schema to include the version

We could do away with the VersionedEvent in its entirety by changing the Event schema to include a version. This should perform better than Option 1 because it eliminates the need to decode a long, the length of the encoded Event. The question we wanted to answer with microbenchmarks was does this option perform better and if so, by how much?.

Prove it with microbenchmarks

To create a performance baseline, we have the method fromBytes that can decode any version of Event passed to it. The baseline is created by passing a VersionedEvent to fromBytes, which is decoded using the original algorithm. To test Option 1, we created a new decode method named fromBytesUnwrapped to decode VersionedEvents using the new algorithm. To test Option 2, fromBytes was modified to understand when to decode the bytes as an EventV3. With the benchmarking setup from the previous post, the benchmark code itself is very simple.

// Original Benchmark
public Event benchmarkDecodeV2Original() {
 return decoder.fromBytes(eventV2Bytes);

// Option 1
public Event benchmarkDecodeV2Unwrapped() {
 return decoder.fromBytesUnwrapped(eventV2Bytes);

// Option 2
public Event benchmarkDecodeV3() {
 return decoder.fromBytes(eventV3Bytes);

Microbenchmark results


The chart shows that decode time decreases when implementing the two Options. To express the numbers as a performance improvement, Option 1 improved decode performance by 22%, and Option 2 improved decode performance a total of 30%.


With the numbers in hand, we were able to deduce that changing the schema for maximum performance was the right solution. It also set us up to be able to better evolve the schema in the future by including the version as a first-class member of the schema.

However, unlike Option 1, Option 2 changed the format of our Event class by adding the version field.  Additional work, especially testing, was needed to verify the change was backward compatible. Luckily, this burden was mitigated by the fact that Producers and Consumers can evolve at different speeds. Consumers must be upgraded first to understand decoding both old and new schemas, but Producers can continue to produce in the old format as long as is needed. As such, we could more quickly deliver the decode performance improvement in the Consumers.

This change also has the added benefit of simplifying integrations with Rocana Ops. Any client that wants to encode or decode Avro events in a language that we do not provide an API for can more easily write their own encoder and decoder; they do not need to worry about both the Event and VersionedEvent schemas.

Finally, because our EventDecoder needs to handle all older schema versions, we also were able to incorporate the improvement in Option 1 for whenever we need to decode an EventV2 from a VersionedEvent.

Rocana Ops captures and processes many terabytes of data per day. In such a system, improving performance in a core operation like Avro decoding can have massive effects in aggregate. The optimizations discussed in this post ensures we can provide maximum overall performance using the fewest resources at the lowest possible cost.

Learn More...

Learn About Rocana Ops: The Central Nervous System for IT Operations