Skip to content

BinaryVice is better than term_to_binary/1 at serializing structured Erlang data.

License

Notifications You must be signed in to change notification settings

rustyio/BinaryVice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BinaryVice

What is BinaryVice?

BinaryVice allows you to serialize Erlang terms more efficiently than term_to_binary/1, so long as the Erlang terms have a repetitive structure, or "schema."

Consider a simple record defined as follows:

-record(my_record, { n })

Behind the scenes, Erlang stores the record as a tuple:

{my_record, 5000}

If you call term_to_binary on this tuple, Erlang needs to store information in the binary saying that:

  1. The term is a tuple (1 byte)
  2. The tuple contains one element (1 bytes for a small tuple, 4 for a large tuple)
  3. The first element is an atom (1 byte)
  4. The atom is 9 characters long (2 bytes)
  5. The name of the atom, 'my_record' (9 bytes)
  6. The second element is an integer (1 byte)
  7. The value of the integer (4 bytes)

If you are keeping track, this means we have wasted 15 bytes of space in order to store 4 bytes of data. If your application stores or transmits millions or billions of terms, this adds up.

How does BinaryVice work?

BinaryVice allows you to specify a schema when you encode an element with placeholders for the information that will change. Continuing with the example above:

% Our term...
Term = #record { n = 5000 },

% Our schema. Notice the 'integer@' placeholder 
Schema = #record { n=integer@ },

% Encode the term...
B = vice:to_binary(Schema, Term)

The binary produced by vice:to_binary/1 is 6 bytes, compared to 20 bytes returned by term_to_binary/1. There are placeholders for every Erlang primitive, plus some special ones for encoding a list or dictionary where all items have the same schema.

Versioning

The one rule about a schema is that it will eventually change. When it does, BinaryVice is ready. BinaryVice allows you to encode your term with a version number. Then, when you want to decode your data, you can pass BinaryVice a list of possible versions, and BinaryVice will choose the right one. The version number can be any integer from 0 to 255 except for 131, because this number is used to identify terms encoded by term_to_binary/1.

For example:

-record(my_record1, { n }).
-record(my_record2, { n, a}).

...

% Schemas...
Schema1 = #my_record1 { n=integer@ },
Schema2 = #my_record2 { n=integer@, a=atom@ }
Schemas = {
	{1, Schema1},
	{2, Schema2}
],

% Encode using version 1...
Term1 = #my_record1 { n = 5000 },
B1 = vice:to_binary_version(1, Schema1, Term1),

% Encode using version 2...
Term2 = #my_record2 { n = 5000, a=version_two }
B2 = vice:to_binary_version(2, Schema2, Term2),

% Decode automatically detects whether our
% term is version 1 or version 2.	

% This returns {1, #my_record1 { n=5000}}.
vice:from_binary_version(Schemas, B1),

% And this returns {2, #my_record2 { n=5000, a=version_two}}.
vice:from_binary_version(Schemas, B2),

...

Drop-In Replacement

BinaryVice was built so that you can drop it into your current application without having to migrate your existing data. The vice:from_binary_version/2 function detects when a binary was encoded using term_to_binary/1 and returns the decoded term with version 131.

BinaryVice vs. term_to_binary/1

Based on simple tests, BinaryVice makes your data about 40% smaller than term_to_binary(Term), and about 10% smaller than term_to_binary(Term, [compressed]).

BinaryVice is fast, but slower than term_to_binary(Term), but about 5 times faster than term_to_binary(Term, [compressed]).

Actual results depend upon your data.

Interface

  • vice:to_binary(Schema, Term) -> B - Encode a term using the provided schema.
  • vice:from_binary(Schema, B) -> Term - Decode a binary using the provided schema.
  • vice:to_binary_version(Version, Schema, Term) -> B - Encode a term using the provided schema, tagged with a version number.
  • vice:from_binary_version(Versions, Term) -> {Version, Term} - Decode a versioned binary. Versions is a list of {Version, Schema}.

What's with the name?

It vaguely rhymes with Miami Vice.

Disclaimer

Use this at your own risk, and test thoroughly. There may be some lurking corner cases that haven't been addressed.

About

BinaryVice is better than term_to_binary/1 at serializing structured Erlang data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages