Beads Bracelet — an object serialisation strategy for Beads data structure

Beads Bracelet — an object serialisation strategy for Beads data structureMaxim ZaksBlockedUnblockFollowFollowingJun 1Beads data structure is an append only heterogeneous data structure, which applies bit packing techniques directly on append.

It is meant to be a simple serialisation strategy for a sequence of values.

When a value is appended it is examined and the smallest representation is stored in a buffer.

In order to distinguish heterogeneous values in a buffer, we store a 4 bit tag identifying the value type.

As the tags are just 4 bits small, we store them pairwise in 1 byte.

Adding values 34 and -48 to a beads sequence would look something like this in the buffer:[{u8/i8}, 34, -48]u8 corresponds with an unsigned integer which can be represented in 8bits / 1byte and i8 is a signed integer which is representable in 1byte.

Beads HeaderWhen we serialise the beads sequence, we prepend the buffer with a header.

The header has two reasons for existence:It transport information about the number of tags / values we added to the buffer and the total size of the buffer in bytesIt also indicates if we created this beads sequence with big or little endian byte order.

Both reasons are important for buffer validation and correct value deserialisation.

They also give us the possibility to store values in big or little endian.

Serialising an object means that we add all property values of the object to a beads sequence.

Here is an example of a Dart class which we want to serialise as a Beads Bracelet:class Person with BeadsBracelet { @BeadIndex(0) String userName; @BeadIndex(1) num favouriteNumber; @BeadIndex(2) List<String> interests;}The class has three properties — a string, a number and a list of strings.

This is how the Bead Bracelet buffer will look, if we set user name to be Martin, favourite number to 1337 and the list of interests to daydreaming and hacking.

Beads Bracelet buffer in hex notationBTW the class example, data and buffer representation is inspired by following blog post:Schema evolution in Avro, Protocol Buffers and ThriftPublished by Martin Kleppmann on 05 Dec 2012.

So you have some data that you want to store in a file or send over the…martin.

kleppmann.

comLater in the blog post, we will compare Beads Bracelet with formats mentioned in this blog post, but first let's examine how Beads Bracelet works.

As mentioned before, a beads sequence buffer starts with a header, which holds two values.

Number of bytes and number of tags.

The buffer itself is 35 bytes long, where 2 bytes are header and 33 bytes are the beads sequence itself.

So the first two bytes 21 and 0A are the header.

Next two bytes: 01 and 05 identify that the object itself contains of 5 values.

This information is needed for evolution strategy (we will discuss it in detail later).

Next 7 bytes represent the string Martin — 6C is a tag which says that type is tiny_data and the length is 6.

The following bytes are the string in UTF-8.

This is the value of property userName.

13 is the type tag for two next value beads.

First value bead is of type u16 and second is u8.

The values themselves are 39 05 and 02.

39 05 represent the number 1337, which is the value of favouriteNumber property.

02 is the number 2 which represents the number of elements in the interests property.

Rest of the bytes represent the values daydreaming, hacking and the according tag bytes.

How does backwards and forwards compatibility work in Beads Bracelet?In Beads Bracelet every property needs to be annotated with a beads index.

Every value of a property is serialised in the given index order.

A value can be a simple value, but it also could be a list, a map or even another beads bracelet serialisable object.

In a complex scenario an object can be seen as a sequence of sequences.

After we put all properties in the beads sequence, we prepend it with the number of elements of this sequence.

In our previous example it is 5 elements.

Now why do we do this?When an object is deserialised, we can check if we read all of the elements we expected to read.

If we get a buffer which was serialised with an older version of the object, we might see that we already exceeded the number of elements without setting the new properties.

This is fine, because the previous object had no notion of new properties and you need to set them to null and stop reading from the beads sequence.

If we read values from beads sequence and identify that we already set all our properties, but the number of elements is not exceeded yet, than it means that we got a buffer, which was serialised by a newer version of beads bracelet object and we need to skip through elements, which we can not associate to a property.

Comparing Beads Bracelet to other formatsIf we compare Beads Bracelet format to Protocol Buffer, Avro and Thrift format (mentioned in Martin Kleppmanns blog post) we can see that for given example Beads Bracelet is not the most compact solution.

The object serialised as Beads Bracelet is 35 bytes long where Protocol Buffers is 33 bytes, Avro is 32 bytes and Thrift can be represented in 34 or 59 bytes.

But don’t be sad about it, we can evaluate cases where Beads Bracelet can be a better choice.

Generally Beads Bracelet has following overheads:The header — the size of the header is dependent on the size of the beads buffer.

Beads buffer smaller than 127 (2⁷-1) bytes take up 2 bytes.

Buffer smaller than 2¹⁵-1 bytes take up 4 bytes.

2³¹-1 bytes buffer results in 8 bytes header and everything which is bigger than 2³¹ bytes will result in 16 bytes headerNumber of elements prepending every object — this is an integer number, which is prepended with a type tag so best case scenario 2 bytes, worst case scenario, 9 bytes.

Properties which are set to null or are deprecated will still be part of the sequence and will take up 4 bits — one tag.

The header gives us a possibility for a simple buffer validation and also to encode the values in big or little endian byte order.

But it also comes with a constant price per buffer.

This cost can be visible for very small objects, but for the larger ones, it should be neglectable.

Number of elements prepending every object is a much more complex topic.

If we have an object and its properties are instances of bracelet serialisable objects, we have a tree of objects, where number of elements is present in every parent node.

This is not a single overhead, but a repeated overhead.

It has some positive side effects as well, but for now we are concentrating on buffer size only.

This overhead can be amortised by the fact, that we don’t have to store the 1 byte field tag / index as it is done in Protocol Buffers and Thrift.

In Beads Bracelet we jut need to store the type of the property which is 4 bits.

Meaning that the 2 bytes cost for number of elements can be amortised if the number of non null properties is larger than 4.

This also means that a broad object tree is better than deep object tree for Beads Bracelet.

Another positive effect in Beads Bracelet is the fact that we don’t store integer values as VLQ, but as actual full binary representation of the value.

This means that numbers between 128 and 255 are stored in 1 byte in Beads Bracelet, but will take up 2 bytes in Protocol Buffers, Thrift and Avro.

Similar benefit can be observed with floating numbers.

Beads picks up the smallest tolerable floating number representation and even lets user set a tolerance range manually.

This kind of strategy is not given in other serialisation formats and could lead to significant benefits.

Beads Bracelet is an alternative object serialisation strategy, which is designed around Beads Data Structure.

It is currently implemented only in beads_dart repository and might find its way to other languages.

I hope this blog post was informative.

If you have any questions, don’t hesitate to ask.

.

. More details

Leave a Reply