Features vs. Requirements - Conclusions
Sun, Nov 22, 2015 ❝Concluding the Features vs. Requirements series.❞Contents
The common denominator
The hidden gem in all of this is simplicity. Simplicity, not in the sense of providing the user with everything he can possibly need so that the user does not have to do anything, but simplicity in providing (and demanding) exactly enough to get the job done - and not a thing more.
“A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.” – Antoine de Saint-Exupery
We’ve seen this in a number of examples, such as the use of a Reader interface to avoid having to decide on whether to use a buffer and if so what size and behaviour are required. The choice is deferred, as it is not relevant to the implementation. Or whether or not to use of blocking I/O and let the user decide when to spawn the worker thread if at all.
Let’s look at this from another perspective. Any added complexity is there in every use of the implementation. Complexity is transitive, “viral” if you like. Necessary complexity is acceptable, of course. Unnecessary complexity is not.
Furthermore, there is typically a competition between complexity and efficiency. One aims to make an efficient implementation. An implementation that provides the best possible to its users - given its intended goal. In that arena, there is often a battle between efficiency and complexity. One needs to recognize these characteristics in order to make the best suited trade-off.
In the case of threading, the challenge is often in mastering concurrency, and - in much lesser amounts - in parallel execution. In the case of I/O it is aimed at keeping code understandable while minimizing delays - the waiting times for I/O operations - and latency of an implementation in general. And in the case of buffering it is in memory management and the trade-off for speed vs. memory usage in use of fast and efficient processing.
Conclusion
Concurrency is an essential part of any program that is complicated enough that it cannot be implemented as a simple sequential list of instructions. Blocking is very valuable, both for programmers (readability, understandability, full overview over multiple concurrent threads of execution) as well as for programs (complexity). Unfortunately, blocking behaviour in itself does not contribute to concurrency at all. The application developer has a role to play.
It does not make sense to make implementations asynchronous and thread-safety by default. Even if individual implementations are thread-safe by default, it does not imply that their composition is thread-safe. You have already paid the performance, complexity, etc. penalties multiple times, once for each of the individual implementations, even though you will need to implement thread-safety over the composition and pay the corresponding penalties again. The penalties are also not guaranteed to be cheaper if individual components are already “safe”. The overall effect may even be negatively impacted over the non-safe case, since you are not able to optimize specifically for safety over the composition.
For memory usage in buffering, the focus is very much on doing only that which is necessary for realization and leave everything else to the user to provide. Memory allocation may be reduced to the amount technically required to enable processing. Arbitrary decisions on buffering may be deferred to the user. Even the choice to buffer or not, may be deferred to the user. At the same time, it gives more control to the user and less responsibility to the library implementation.
A few guidelines:
-
Do not start asynchronous execution as a favor to the user. Let the user decide the exact “moment” / level of abstraction of when to handle execution asynchronously. Library methods should always just block.
-
Do not provide thread-safe implementations by default. Let the user implement the mechanisms for thread-safety at the right level for his use case. (Unless your primary feature is thread-safety.)
-
Do not provide arbitrary interruption triggers. If you do, at least provide a basic “interruptionless” method. If cancellation is feasible, provide a cancellation mechanism, such that the user may implement its own interruption mechanism that matches his requirements, while leveraging the provided cancellation method.
-
Be aware that a property such as asynchronous execution or thread-safety - if it is applied for internal use - should not be exposed to the user.
-
Evaluate if buffering is actually necessary for the processing task that is being implemented. Additionally, check whether it is actually necessary for you to decide upon. And be perceptive of the features already provided by the programming language or its standard library.
-
Make use of interfaces that define the required properties for a reader to optimally make use of the implementation instead of deciding to use arbitrary-sized internal buffers that aren’t actually required for any technical reason (other than that it “speeds up processing a bit”). Let the user provide a buffered or unbuffered implementation as a way of deciding on that efficiency trade-off.
-
In (rather extreme) cases, provide alternative methods that allow the user to provide a pre-allocated buffer for use as scratch-space as to avoid doing unnecessary memory allocations.
Furthermore, programming languages can help immensely by providing the right primitives. Various locking mechanisms will help at that moment when thread-safety becomes a requirement. It should be trivial to write methods both for synchronous and asynchronous execution so that a user can decide for each particular use case. Support for this is typically incorporated in the language’s standard library or in the programming language (syntax) itself.
“If you need to decide on a solution where it seems like you do not have all the facts for a well-founded decision, then it may not be your decision to make in the first place.” – me
Do not try to decide these requirements for your (future) user. Keep a strict separation between the library’s features and a potential (arbitrary) use case’s requirements. This will help you implement a clean, lean, reusable library without the need to make assumptions. Users of your library may have fundamentally different use cases in mind, so a user’s solution may be very different from what you could predict.
References
These are some of the other references, which I haven’t yet referred to in this or the earlier articles.
- Steve Francia - Common mistakes in Go and when to avoid them
At the end of the talk, Steve also points out that you want to design your implementation to be unsafe by default. - Turning Your Code Inside Out talks about doing too much in a function, resulting in non-reusable code, in a fashion that cripples the caller.
- A good explanation of futures and promises in Scala: Futures and Promises in Scala 2.10
- Martin Fowler - The LMAX Architecture
A good article that describes how far one can go to optimize throughput of computation. Note that this article does not describe using non-blocking I/O at all. Instead they call upon required services in a separate thread and consider the result to be a new “event” to handle. Only after the event arrives and gets queued up, will processing eventually be resumed by the core business logic processor.
And as a last-minute addition, I’d like to add CockroachDB and more specifically the talk by Ben Darnell “CockroachDB” - from Golang UK 2015. He touches upon the fact that for cockroachDB to be a fast, scalable, distributed SQL database they rely upon/build on key-value stores rather than the “classical” relational databases.
This is another example of once you have made a trade-off, you have to live with the costs, although this example is not on the level of a single library, but rather on the level of supporting components in a distributed system. So they decided to work with key-value stores so that they do not need to pay this cost and implement missing characteristics themselves, on a different level of abstraction.