DRAFT: Prefixed-compact extension: header-checksum
Tue, Mar 17, 2026 ❝An extension for prefixed-compact encoding that introduces a header-checksum-byte, to allow confirming that read bytes are header-bytes.❞Contents
note This is an initial idea. It has not yet been implemented and may yet be subject to invasive changes.
Prefixed-compact is a prefixed-based binary encoding for some most-elementary data and structure. It is sufficient to provide basic structure without going into full-blown type-definitions, which can get complicated quickly. Prefixed-compact is assumed to be used when integrity of data is guaranteed, as the encoding itself does not account for errors. The encoding by itself is sufficient. There is one significant caveat: there is no way to ensure that byte(s) read as part of a header, are indeed correctly interpreted as header-byte(s). Even if integrity is preserved, there is a risk of misinterpretation in case of processing failures or programming errors.
This extension defines a checksum-byte that covers the header and, by its definition, content-count. (count is either a number of bytes or a number of elements.) This makes it possible to detect processing errors early and abort. It does not provide any information to correct data or skip ahead in the data-stream.
Header-checksum-byte
Prefixed-compact deliberately does not provide error detection and/or correction, because it expects that integrity is preserved. Data must (should) be correct. This extension is supplementary and not necessary, but can be beneficial. The checksum-byte is superfluous, but may provide benefit.
- Checksum-byte, any
1in256chance:- Functions as one-shot confirmation/sanity-check of preceding header-byte(s);
- Should not be used to search for next header, because a
1-in-256probability can easily happen by accident; - Cannot be used to verify integrity. A different mechanism should be employed to protect or ensure integrity of the data-stream as a whole.
- A checksum-byte essentially covers the (local) path to a subsequent header.
- Under assumption of data-integrity:
- a checksum-byte is redundant information;
- a checksum-byte functions as local (single-value) confirmation;
- a succeeding checksum provides a reasonable, safe checkpoint in a large data-stream;
- a failing checksum likely indicates that preceding header-byte(s) were misinterpreted, whether due to bad data-stream or (local) processing error;
- a failing checksum allows aborting early, thus avoiding excess resource allocation due to misinterpretation or processing failure;
- The checksum-byte does not confirm anything about the body, i.e. the payload, apart from
count(number of bytes or elements), which is minimal but allows for determining the extent of this one entry. - The header-checksum is an extension and therefore not an inherent part of prefixed-compact. It should be enabled by convention or through some negotiation-mechanism.
This checksum is not intended as protection against malicious clients. If malicious data-streams are a real risk, it is better to strictly define expected nested structures and allowed data-sizes. Note that, for example, prefixed-compact defines a type for key-value-pairs thus allowing for defining a strict expectation for the value corresponding to a specific key. Due to the nature of prefixed-compact handling of termination and headersize bits, it is also trivial to cut up streams in smaller fragments or combine small continuing fragments into a larger single, contiguous value, in case of resource-constrained environments.
Definition
Add checksum-byte immediately after 1-byte or 2-byte header, before start of content-bytes.
A marker, either 0b01010101 for one-byte header or 0b10101010 for two-byte header, is mixed in for distinction, with distinct markers to further differentiate between checksums for one-byte and two-byte headers.
- One-byte header (
checksum):h[0] ^ 0b01010101 - Two-byte header (
checksum):h[0] ^ h[1] ^ 0b10101010
This definition extends the formalism of prefixed-compact.
Redefine header as header with subsequent checksum-byte.
header = header , checksum ;
under consideration checksums will be fixed values corresponding to any specific 1-byte or 2-byte header. At the moment, I see no harm in this considering its intended purpose.
Changelog
This article will receive updates, if necessary.
- 2026-03-17 Initial draft version.