DRAFT: Prefixed-compact extension: header-checksum

❝An extension for prefixed-compact encoding that introduces a header-checksum-byte, to allow confirming that read bytes are header-bytes.❞
Contents

note This is an initial idea. It has not yet been implemented and may yet be subject to invasive changes.

Prefixed-compact is a prefixed-based binary encoding for some most-elementary data and structure. It is sufficient to provide basic structure without going into full-blown type-definitions, which can get complicated quickly. Prefixed-compact is assumed to be used when integrity of data is guaranteed, as the encoding itself does not account for errors. The encoding by itself is sufficient. There is one significant caveat: there is no way to ensure that byte(s) read as part of a header, are indeed correctly interpreted as header-byte(s). Even if integrity is preserved, there is a risk of misinterpretation in case of processing failures or programming errors.

This extension defines a checksum-byte that covers the header and, by its definition, content-count. (count is either a number of bytes or a number of elements.) This makes it possible to detect processing errors early and abort. It does not provide any information to correct data or skip ahead in the data-stream.

Header-checksum-byte

Prefixed-compact deliberately does not provide error detection and/or correction, because it expects that integrity is preserved. Data must (should) be correct. This extension is supplementary and not necessary, but can be beneficial. The checksum-byte is superfluous, but may provide benefit.

This checksum is not intended as protection against malicious clients. If malicious data-streams are a real risk, it is better to strictly define expected nested structures and allowed data-sizes. Note that, for example, prefixed-compact defines a type for key-value-pairs thus allowing for defining a strict expectation for the value corresponding to a specific key. Due to the nature of prefixed-compact handling of termination and headersize bits, it is also trivial to cut up streams in smaller fragments or combine small continuing fragments into a larger single, contiguous value, in case of resource-constrained environments.

Definition

Add checksum-byte immediately after 1-byte or 2-byte header, before start of content-bytes.

A marker, either 0b01010101 for one-byte header or 0b10101010 for two-byte header, is mixed in for distinction, with distinct markers to further differentiate between checksums for one-byte and two-byte headers.

This definition extends the formalism of prefixed-compact.

Redefine header as header with subsequent checksum-byte.

header = header , checksum ;

under consideration checksums will be fixed values corresponding to any specific 1-byte or 2-byte header. At the moment, I see no harm in this considering its intended purpose.

Changelog

This article will receive updates, if necessary.


This post is part of the Encoding: prefixed-compact series.
Other posts in this series: