Features vs. Requirements - Buffering

❝... on buffering needs.❞
Contents

Buffering is yet another of those challenges that may arise when writing a library or application. This is quite closely related to I/O operations in that way that you typically encounter this when you are reading and/or processing large amounts of data. Large amounts of data are not a requirement for this particular issue, but they do emphasize its existence as there is no way around it if you need to process more data than fits in memory.

The question of how much memory to use, and how to use the available memory is dictated by 2 numbers: The minimum amount required for execution to succeed, and a sufficient amount of buffer space to speed up processing.

Buffered versus unbuffered

First decision is on whether you want to use a buffer at all. If you are severely memory constrained and the implementation does not require a buffer in order to function, then you will likely not use a buffer at all. This will work for stream-processing tasks where data can be processed byte-by-byte without needing to look back. Another reason to not use a buffer could be that you acquire data from multiple sources and you need data that is absolutely up to date or e.g. from a priority source. Anything already stored in a (large) buffer will age and that may not be desirable. Furthermore, when using a buffer, one needs to actually use the available buffer as much as possible to enjoy the additional speed. If data needs to be read “by the byte” and one cannot just read a few kilobytes before starting processing, then the buffer may simply not be useful.

Otherwise, especially if there is a clear reason for buffering, we can reserve an amount of memory for buffering data before it is being processed, while processing and/or after it is processed. There are a number of properties to consider.

Buffers

Memory usage is based on the trade-off between processing speed and efficient use of memory. Very large buffers can be allocated, and anything that is done in memory is more efficient than when it is done through I/O, such as persisting data to harddisk or reading data from the network. A small memory footprint may be an explicit requirement. If so, then you will be looking very directly into minimizing memory use and falling back to reading from storage device or network, even if this slows down processing. In a typical situation though, this is not a critical limitation.

Buffers for processing the main data stream

For the core processing logic, buffers are typically used to help simplify code and speed up processing for data streams that are difficult to process without the availability of buffers. An obvious example is when you process XML data. XML contains start and end tags and you will need to keep track of the start and end of each element.

Without memory, you would need to read back to the start tag you last encountered in order to verify that this is indeed the element that is now ended first. Many other examples exist, all based on having an amount of data buffered during the processing phase in order to “finish up” a batch of data. Obviously, you do not need to buffer all data from a “batch”. It can be limited to some selection that has special significance, structural/control information such as a stack of tag names as you parse deeper into an XML document.

Buffers for processing this data may be fixed or dynamic in nature. Although fixed buffers are far more convenient, as memory usage is very predictable, it is not always possible.

Buffers for storing/recalling additional data (Cache)

Then there are the buffers that store additional data that is used in processing but is not directly part of the data stream being processed. One can think about data that can be looked up, such as codes that are used in the data stream that correspond to values or texts or translations. This data may actually be rather static in nature and of a fixed size. (Or it may be changing but only at a low rate.)

This data is being buffered purely to speed up processing, as lookups on disk/network are removed. In its simplest form this in-memory data is just buffered and read from repeatedly. More complex caching solutions will work intelligently with its usage characteristics in order to make lookups even more efficient and through smart caching may decrease the amount of required memory by “forgetting” values that are hardly ever requested. This is a trade-off for reduced memory requirements at the expense of extra delays if such values are requested.

Provided buffers

A buffer could also be provided by the user upon initializing/calling the processor. If an implementation is very reliant on many small bits of memory or buffers are frequently created and thrown away, this can be the source of a significant amount of “garbage”, i.e. used memory that needs to be cleaned up in order to be reusable.

To avoid this “garbage” some implementations let the user provide a buffer for it to use. The user allocates an amount of memory to use so that you do not need to do any memory allocations yourself. Additionally, the user can then decide the size of the buffer that it will provide.

User-provided buffers are especially valuable in cases where low latency is a requirement and garbage collector pauses threaten the feasibility of this requirement. However, this tends to go towards the area of micro-optimizations, so if there is no good reason to do so, not requiring a buffer to be provided will probably make your implementation and the use of your library simpler. Especially if there are a number of constraints for the provided buffer, e.g. in terms of size.

Alternatives to buffers

Next there are other types of buffers, such as the buffer for data source and target. If your data source and target are unbuffered, that may prove to cause delays - which may either be not acceptable or useful, depending on your application. Depending on your sources and targets (which may not be known at implementation time, or may be dynamic in nature) it may not be possible to optimize for. On the other hand, is this something we really should optimize for?

Avoiding a buffer / user-determined “magic”

If the sources and destinations are not known in advance or are dynamic in nature, you should think of using a generic interface to an (intelligent) input or output component, such as Java’s InputStream or Go’s io.Reader, and leave any buffering and other magic to the implementations provided by the user. Such an implementation could use a finely tuned memory-optimized buffer leveraging the last of the available memory of a complex system. Or it may just allocate 2 GB of memory arbitrarily chosen, but even so it is decided by the user and fits the user’s needs. This is of course under the assumption that all requirements of the processor logic itself are met.

Note that by leveraging a “reader” interface you defer 2 choices to the user:

  1. Whether or not to use a buffered implementation in the first place.
  2. What the behaviour of this buffer is.
    An implementation can be made to fit the capabilities of a device. Futhermore, a source implementation could also read from multiple buffered and unbuffered sources - possibly prioritized - and a target could write in round-robin fashion to various targets, or publish to multiple targets at once.

By accepting an interface implementation, you allow the user control over the buffer and the buffer itself is encapsulated in the implementation. Consequently it is of no concern to the library implementor. Thus avoiding the choices that need not be made by the library implementor.

(Re)using existing memory

Another way of avoiding to use extra buffers is to reuse already available space during processing. Depending on the type of processing work that is required, additional memory for use during processing may be needed. By using an interface - and depending on how that interface is defined - it may be possible to (re)use memory from sources or destinations as temporary scratch space before passing on the final result. Go’s io.Reader explicitly defines that the provided byte-array may be used for processing and the user should only read as many bytes as the number that is returned after a particular batch is processed. So, for a buffer of 50 bytes, only 40 bytes may contain actual relevant information, and the remaining 10 bytes may have been used during processing and now contain obsolete (useless-to-the-user) temporary data.

Memory mapped files

Lastly, there is another approach to buffering: memory mapping, where an area of memory gets mapped directly to a file or device. The memory map is managed by the operating system which will load as much or as little into memory as is feasible given the requirements of the processes currently running. In many cases, large amounts of data are loaded as the operating system simply has the memory available. The operating system may also take back memory if other demand arises. This does however not impact your code, as you will always interact with the memory map itself. This mechanism has relatively little overhead and supports huge amounts of data. The downside is that you access raw data (bytes). Still, this is an interesting approach for huge amounts of data of a very simple form, as a buffer transparently backed by disk.

Conclusion

There are a number of options available. Part of the solution is being very conscious of which decisions you really need to make yourself. It is always a win if you can defer decisions (and their implementation) at the interface boundary. Some best practices, such as the use of generic access to sources/destinations through interfaces instead of direct (byte) buffers, are quite well established and will probably be provided in the standard library. Knowing of those features allows you to simplify your own implementation in a well-understood manner. Other solutions are a bit more specific to your use case such as memory maps and provided buffers and will not be useful in all cases.

If you go for maximum speed with no regard for memory usage, then the obvious solution is to buffer everything so you can work from memory for as much as possible. As the trade-off becomes more specific and requirements more strict, the solution becomes more complicated and more intelligent behaviour is required.


This post is part of the Features vs Requirements series.
Other posts in this series: