Skip to content

Latest commit

 

History

History
425 lines (325 loc) · 22.1 KB

README.md

File metadata and controls

425 lines (325 loc) · 22.1 KB

QuickBuffers - Fast Protocol Buffers without Allocations

Build Protobuf Conformance Tests Protobuf Conformance Tests Maven Central

QuickBuffers is a Java implementation of Google's Protocol Buffers that has been developed for low latency use cases in zero-allocation environments. It has no external dependencies, and the API follows Protobuf-Java where feasible to simplify migration.

The main highlights are

  • Allocation-free in steady state. All parts of the API are mutable and reusable.
  • No reflections. GraalVM native-images and R8/ProGuard obfuscation (config) are supported out of the box
  • Faster encoding and decoding speed
  • Smaller code size than protobuf-javalite
  • Built-in JSON marshalling compliant with the proto3 mapping
  • Improved order for optimized sequential memory access
  • Optional accessors as an opt-in feature (java8)

QuickBuffers passes all proto2 conformance tests and is compatible with all Java versions from 6 through 20 as well as Android. Proto3 messages can be generated and are wire compatible, but so far the behavioral differences have not been explicitly added due to some proto3 design decisions that have kept us from using it. Current limitations include

  • Services are not implemented
  • Extensions are embedded directly into the extended message, so support is limited to generation time.
  • Well-known proto3 types such as Timestamp and Duration are not special cased in JSON marshalling
  • Unsigned integer types are JSON encoded as signed integer numbers

Getting started

In order to use QuickBuffers you need to generate messages and add the corresponding runtime dependency. The runtime can be found at the Maven coordinates below.

<properties>
  <quickbuf.version>1.2</quickbuf.version>
  <quickbuf.options>indent=4,allocation=lazy,extensions=embedded</quickbuf.options>
</properties>
<dependency>
  <groupId>us.hebi.quickbuf</groupId>
  <artifactId>quickbuf-runtime</artifactId>
  <version>${quickbuf.version}</version>
</dependency>

The message generator protoc-gen-quickbuf is set up as a plugin for the protocol buffers compiler protoc. You can install one of the pre-built packages and run:

protoc-quickbuf --quickbuf_out=${options>:<outputDir> <protoFiles>

or use a protoc-gen-quickbuf-${version}-${arch}.exe plugin binary with an absolute pluginPath:

protoc --plugin-protoc-gen-quickbuf=${exePath} --quickbuf_out=${options>:<outputDir> <protoFiles>

or build messages in Maven using the protoc-jar-maven-plugin:

<!-- Downloads protoc w/ plugin and generates messages -->
<!-- Default settings expect .proto files to be in src/main/protobuf -->
<plugin>  
  <groupId>com.github.os72</groupId>
  <artifactId>protoc-jar-maven-plugin</artifactId>
  <version>3.11.4</version>
  <executions>
    <execution>
      <phase>generate-sources</phase>
      <goals>
        <goal>run</goal>
      </goals>
      <configuration>
        <protocVersion>3.21.12</protocVersion>

        <outputTargets>
          <outputTarget>
            <type>quickbuf</type>
            <pluginArtifact>us.hebi.quickbuf:protoc-gen-quickbuf:${quickbuf.version}</pluginArtifact>
            <outputOptions>${quickbuf.options}</outputOptions>
          </outputTarget>
        </outputTargets>

      </configuration>
    </execution>
  </executions>
</plugin>

The generator features several options that can be supplied as a comma-separated list. The default values are marked bold.

Option Value Description
indent 2, 4, 8, tab sets the indentation in generated files
replace_package (pattern)=replacement replaces the Java package of the generated messages to avoid name collisions with messages generated by --java_out.
input_order quickbuf, number, none improves decoding performance when parsing messages that were serialized in a known order. number matches protobuf-java, and none disables this optimization (not recommended).
output_order quickbuf, number number matches protobuf-java serialization to pass conformance tests that require binary equivalence (not recommended).
store_unknown_fields false, true generates code to retain unknown fields that were encountered during parsing. This allows messages to be routed without losing information, even if the schema is not fully known. Unknown fields are stored in binary form and are ignored in equality checks.
enforce_has_checks false, true throws an exception when accessing fields that were not set
allocation eager, lazy, lazymsg changes the allocation strategy for nested types. eager allocates up-front and results in fewer runtime-allocations, but it may be wasteful and prohibits recursive type declarations. lazy waits until the field is actually needed. lazymsg acts lazy for nested messages, and eager for everything else.
extensions disabled, embedded embedded adds extensions from within a single protoc call directly to the extended message. This requires extensions to be known at generation time. Some plugins may do a separate request per file, so it may require an import to combine multiple files.
java8_optional false, true creates tryGet methods that are short for return if(hasField()) ? Optional.of(getField()) : Optional.absent(). Requires a runtime with Java 8 or higher.

Reading and writing messages

We tried to keep the public API as close to Google's protobuf-java as possible, so most use cases should require very few changes. The Java related file options are all supported and behave the same way.

// .proto definition
message RootMessage {
  optional string text = 1;
  optional NestedMessage nested_message = 2;
  repeated Person people_list = 3;
}

message NestedMessage {
  optional double value = 1;
}

message Person {
  optional uint32 id = 1;
  optional string name = 2;
}

The main difference is that there are no extra builder classes and that all message contents are mutable. The getMutable() accessors set the has flag and provide access to the nested references.

// Use fluent-style to set values
RootMessage msg = RootMessage.newInstance()
        .setText("Hello World");

// Use getMutable() to set nested messages
msg.getMutableNestedMessage()
        .setValue(1.0);

// Write repeated values into the internally allocated list
RepeatedMessage<Person> people = msg.getMutablePeopleList().reserve(4);
for (int i = 0; i < 4; i++) {
    Person person = people.next()
        .setId(i)
        .setName("person " + i);
}

Messages can be read from a ProtoSource and written to a ProtoSink. newInstance instantiates optimized implementations for accessing contiguous blocks of memory such as byte[] and ByteBuffer. Reads and writes do not modify the ByteBuffer state, so positions and limits need to be manually if needed.

// Convenience wrappers
byte[] buffer = msg.toByteArray();
RootMessage result = RootMessage.parseFrom(buffer);
assertEquals(result, msg);

The internal state can be reset with the setInput and setOutput methods. ProtoMessage::getSerializedSize sets an internally cached size, so it should always be called before serialization if there were any changes.

 // Reusable objects
byte[] buffer = new byte[512];
ProtoSink sink = ProtoSink.newArraySink();
ProtoSource source = ProtoSource.newArraySource();

// Stream messages
for (int i = 0; i < 100; i++) {
    int length = msg.getSerializedSize();
    msg.writeTo(sink.setOutput(buffer, 0, length));
    result.clearQuick().mergeFrom(source.setInput(buffer, 0, length));
}

Additionally, there are also (non-optimized) convenience wrappers for InputStream, OutputStream, and ByteBuffer.

ProtoSink.newInstance(new ByteArrayOutputStream());
ProtoSource.newInstance(new ByteArrayInputStream(bytes));

Keep in mind that mutability comes at the cost of thread-safety, so contents should be cloned with ProtoMessage::clone or copied with ProtoMessage::copyFrom before being passed to another thread.

Direct Source/Sink

Depending on platform support for sun.misc.Unsafe, the DirectSource and DirectSink implementations allow working with off-heap memory. This is intended for reducing unnecessary memory copies when working with direct NIO buffers. Besides not needing to copy data, there is no performance benefit compared to working with heap arrays.

// Write to direct buffer
ByteBuffer directBuffer = ByteBuffer.allocateDirect(msg.getSerializedSize());
ProtoSink directSink = ProtoSink.newDirectSink();
msg.writeTo(directSink.setOutput(directBuffer));
directBuffer.limit(directSink.getTotalBytesWritten());

// Read from direct buffer
ProtoSource directSource = ProtoSource.newDirectSource();
RootMessage result = RootMessage.parseFrom(directSource.setInput(directBuffer));
assertEquals(msg, result);

JSON Source/Sink

ProtoMessages also support reading from and writing to JSON as specified in the proto3 mapping.

// Set some contents
RootMessage msg = RootMessage.newInstance();
msg.setText("👍 QuickBuffers \uD83D\uDC4D");
msg.getMutablePeopleList().next()
    .setId(0)
    .setName("First Name");
msg.getMutablePeopleList().next()
    .setId(1)
    .setName("Last Name");

// Print as prettified json
System.out.println(msg);

The default toString method for all messages returns prettified json. The above prints:

{
  "text": "👍 QuickBuffers 👍",
  "peopleList": [{
      "id": 0,
      "name": "First Name"
    }, {
      "id": 1,
      "name": "Last Name"
    }]
}

More fine grained control is exposed via the JsonSink and JsonSource interfaces.

// json options
JsonSink sink = JsonSink.newInstance()
    .setPrettyPrinting(false)
    .setWriteEnumsAsInts(false)
    .setPreserveProtoFieldNames(false);

// use ProtoMessage::writeTo or JsonSink::writeMessage to serialize the contents
msg.writeTo(sink.clear());
RepeatedByte bytes = sink.getBytes();

// use ProtoMessage::parseFrom or JsonSource::parseMessage to parse the contents
JsonMessage result = JsonSource.newInstance(bytes)
    .setIgnoreUnknownFields(true)
    .parseMessage(JsonMessage.getFactory());

Parts can be combined to convert an incoming protobuf stream to outgoing json and vice-versa

msg.clearQuick()
    .mergeFrom(protoSource.setInput(input))
    .writeTo(jsonSink.clear());

The default implementation encodes the minimal representation accepted by the protobuf spec, i.e., floating point numbers do not append a trailing zero, and long integers are encoded without quotes. Alternative implementations based on GSON and Jackson can be found in the quickbuf-compat artifact.

Note that the built-in JsonSink has been optimized quite a bit, but the JsonSource is very barebones due to a lack of an internal use case for JSON decoding.

Building from source

The project can be built with mvn package using jdk 8 through jdk 20.

mvn clean package --projects generator,runtime -am omits building the benchmarks.

Note that the package goal is always required, and that mvn clean test is not enough to work. This limitation is introduced by the plugin mechanism of protoc, which exchanges information with plugins via protobuf messages on std::in and std::out. Using std::in makes it comparatively easy to get schema information, but it is quite difficult to set up unit tests and debug plugins during development. To enable standard tests, the parser module contains a tiny protoc-plugin that stores the raw request from std::in inside a file that can be loaded during testing and development of the actual generator plugin. This makes the generator module dependent on the packaged output of the parser module.

Detailed accessors for different types

All nested object types such as message or repeated fields have getField() and getMutableField() accessors. Both return the same internal storage object, but getField() should be considered read-only. Once a field is cleared, it should also no longer be modified.

Primitive fields

All primitive values generate the same accessors and behavior as Protobuf-Java's Builder classes

// .proto
message SimpleMessage {
    optional int32 primitive_value = 1;
}
// simplified generated code
public final class SimpleMessage {
    public SimpleMessage setPrimitiveValue(int value);
    public SimpleMessage clearPrimitiveValue();
    public boolean hasPrimitiveValue();
    public int getPrimitiveValue();

    private int primitiveValue;
}

Message fields

Nested message types are allocated internally. The recommended way to set nested message content is by accessing the internal store with getMutableNestedMessage(). Setting content using setNestedMessage(NestedMessage.newInstance()) copies the data, but does not change the internal reference.

// .proto
message NestedMessage {
    optional int32 primitive_value = 1;
}
message RootMessage {
    optional NestedMessage nested_message = 1;
}
// simplified generated code
public final class RootMessage {
    public RootMessage setNestedMessage(NestedMessage value); // copies contents to internal message
    public RootMessage clearNestedMessage(); // clears has bit as well as the backing object
    public boolean hasNestedMessage();
    public NestedMessage getNestedMessage(); // internal message -> treat as read-only
    public NestedMessage getMutableNestedMessage(); // internal message -> may be modified until has state is cleared

    private final NestedMessage nestedMessage = NestedMessage.newInstance();
}
// (1) setting nested values via 'set' (does a data copy!)
msg.setNestedMessage(NestedMessage().newInstance().setPrimitiveValue(0));

// (2) modify the internal store directly (recommended)
RootMessage msg = RootMessage.newInstance();
msg.getMutableNestedMessage().setPrimitiveValue(0);

String fields

String types are internally stored as Utf8String that are lazily parsed and can be set with CharSequence. Since Java String objects are immutable, there are additional access methods to allow for decoding characters into a reusable StringBuilder instance, as well as for using a custom Utf8Decoder that can implement interning.

// .proto
message SimpleMessage {
    optional string optional_string = 2;
}
// simplified generated code
public final class SimpleMessage {
    public SimpleMessage setOptionalString(CharSequence value);
    public SimpleMessage clearOptionalString(); // sets length = 0
    public boolean hasOptionalString();
    public String getOptionalString(); // lazily converted string
    public Utf8String getOptionalStringBytes(); // internal representation -> treat as read-only
    public Utf8String getMutableOptionalStringBytes(); // internal representation -> may be modified until has state is cleared

    private final Utf8String optionalString = Utf8String.newEmptyInstance();
}
// Get characters
SimpleMessage msg = SimpleMessage.newInstance().setOptionalString("my-text");

StringBuilder chars = new StringBuilder();
msg.getOptionalStringBytes().getChars(chars); // chars now contains "my-text"

Repeated fields

Repeated scalar fields work mostly the same as String fields, but the internal array() can be accessed directly if needed. Repeated messages and object types provide a next() method that adds one element and provides a mutable reference to it.

// .proto
message SimpleMessage {
    repeated double repeated_double   = 42;
}
// simplified generated code
public final class SimpleMessage {
    public SimpleMessage addRepeatedDouble(double value); // adds one value
    public SimpleMessage addAllRepeatedDouble(double... values); // adds N values
    public SimpleMessage clearRepeatedDouble(); // sets length = 0
    public boolean hasRepeatedDouble();
    public RepeatedDouble getRepeatedDouble(); // internal store -> treat as read-only
    public RepeatedDouble getMutableRepeatedDouble(); // internal store -> may be modified 

    private final RepeatedDouble repeatedDouble = RepeatedDouble.newEmptyInstance();
}

Proguard configuration

There are no reflections, so none of the fields need to be preserved or special cased. However, Proguard may warn about missing methods when obfuscating against an older runtime. This is related to an intentional workaround, so the warnings can just be disabled by adding the line below to the proguard.conf file. R8 should automatically pick it up from the bundled config file.

-dontwarn us.hebi.quickbuf.JdkMethods

Acknowledgements

Many internals and large parts of the generated API are based on Protobuf-Java. The encoding of floating point numbers during JSON serialization is based on Schubfach [Giu2020]. Many other JSON parts were inspired by dsl-json, jsoniter, and jsoniter-scala.