Prematurely abandoning simplicity

Mon, Nov 1, 2021 ❝Premature optimization, premature generalization, premature expansion, ... appreciating simplicity❞

Contents

This post further explores simplicity as we previously defined it. We will look at engineering concepts premature optimization and premature abstraction to see how they relate to simplicity.

I was reminded, as so many times, by quotes and tweets about premature optimization and – now more and more – premature generalization. People who try to best understand how to approach programming, try to find some intuition or even some rules on dos and don’ts. Interestingly, these quotes perfectly fit two out of three dimensions of simplicity: optimization and generalization. The third dimension, expansion (reduced), is less obvious.

First the definition:

simple = unoptimized & specialized & reduced

This article discusses premature optimization, premature generalization and by logical reasoning there likely is premature expansion. We will discuss these concepts in the domain of computer science and assume some basic knowledge in that domain.

Premature optimization

The most well-known – by far – is premature optimization. Optimization transitions to another level of abstraction. Typically a lower level. The transition comes with other assumptions, tools, limits and restrictions. Consequently, the solution is often less readable, less portable, more elaborate, more detailed, less comprehensible (requiring intimate knowledge of this new domain), more restricted, less memory-safe, sensitive to memory ordering, more prone to calculation errors, more prone to buffer-overflowing or calculation over-/underflowing, but nonetheless workable for a very specific, particular case that happens to be the relevant case.

Optimization is most often applied by compilers and interpreters as a way to automatically enhance implementations. These benefit from having simple expressions as input. This is the (non-premature) optimization that one gets for free. It is automatic, and effortless, basically without sacrifice. (Whichever sacrifices you do make are documented in the programming language or run-time.)

Then there is the case for intentionally choosing to manually optimize to squeeze out every last bit of benefit. Manually writing the low-level (optimized) code. This is the go-to solution for getting maximum performance out of a small hot code-path. Such optimization is often applied to implementations of cryptographic primitives, such as adopting the use of processor architecture-specific instructions to take maximum advantage of processor capabilities, and consequently maximum control. In case of cryptography, there is often a need for a careful, well thought-out balance between maximum speed and constant-time operations. This ensures access to performance, while preserving required security characteristics.

Optimization is premature if it is not actually needed. If other options are still available. For example, other types of improvements, such as selecting the right algorithm or data structure, or a well thought-out hybrid. If this is the case, you sacrifice the benefits of unoptimized (simple) code for difficult to comprehend and maintain code written for a narrower use case. This should only be done if other, higher-level options are exhausted.

Premature generalization

It seems ideal that any piece of logic can be broadly applied. Generalization allows solutions to apply to a whole class of a certain type of problem. Of course, it seems attractive to generalize everything, but it requires a common denominator which enforces restrictions. The implementation of such a generalization may be lacking, due to the imposed limitations.

If a solution is necessary only to solve one particular, specific problem, then there is no point in generalizing. You preserve the expressiveness and directness that comes from a specialized implementation: it is simpler.

Premature expansion

As mentioned before, two of the three components of simplicity - unoptimized and specialized - are known to be sacrificed on occasion, knowingly if the trade-off seems worthy. So what about the third component: reduced?

Reduced is the property for making the expression as concise as possible: reducing the number of “moving parts” and prescribed rules and dependencies. In terms of math, this would include precomputing values where possible, removing parts that eventually cancel out, removing variables that are unnecessary for application to the specific use case, removing rules that prescribe restrictions such as an allowed or denied range of values (if irrelevant for this use case), and dependencies such as variables required to be equal or different or smaller/larger than the other. All resulting in less “moving parts”, less complexity.

The reason for introducing variability is to re-use and to support multiple closely related use cases. Variables allow for the necessary variation, while prescribed rules and dependencies ensure proper use. Introducing these variables and rules expand the original expression. Especially if variables are filled in with values from somewhere else, e.g. specification documents. Given a sufficiently elaborate solution, the expanded solution may seem necessary. Reduction may be entirely possible but not obvious.

Logjam: precomputing to reduce variability

Logjam is the name of a vulnerability in the Diffie-Hellman key exchange algorithm that takes advantage of the possibility to reduce a mathematical calculation under the assumption of predefined constants. When one of these predefined constants is chosen, it makes it possible to precompute part of the expression. The remaining computation is sufficiently reduced that the computation (for weaker constants) is suitable for on-line attacks.

This vulnerability was not originally obvious. These constants are recommended by the specification. There is also the possibility to generate a suitable value yourself. However, many use cases recommend or require use of a prescribed value. The mathematics assumes a variable which may contain many values, but – in these cases – the value can be filled in. That part of the mathematics can be precomputed, thus reducing the expression. This leaves a simpler mathematical problem to solve. One that could reasonably be solved on-the-fly.

Please refer to information on the vulnerability itself for accurate details. The Wikipedia-snippet below explains the actual vulnerability better.

Diffie–Hellman key exchange depends for its security on the presumed difficulty of solving the discrete logarithm problem. The authors took advantage of the fact that the number field sieve algorithm, which is generally the most effective method for finding discrete logarithms, consists of four large computational steps, of which the first three depend only on the order of the group G, not on the specific number whose finite log is desired. If the results of the first three steps are precomputed and saved, they can be used to solve any discrete log problem for that prime group in relatively short time. This vulnerability was known as early as 1992. It turns out that much Internet traffic only uses one of a handful of groups that are of order 1024 bits or less.

– Wikipedia: Logjam (computer security)

Abstraction

Premature abstraction is another term that is used. Abstraction refers to either generalization, i.e. working from a more general case. The other is what we here call “expansion”: the increased variability and/or addition of prescribed rules and dependencies. Both extend usage, either through broader application or repeat use. The term premature abstraction, therefore, seems to capture both premature generalization and premature expansion.

Consequences of premature complexity

The simple form is the most convenient form available: readable, comprehensible, functional, straight-forward, but not optimized nor generalized nor expanded.

Going into any of the extremes means trading off the default conveniences and benefits for singular-focused improvements. This choice adds to the complexity, so the trade-off must be worth it. By adding complexity, you essentially make your current use case one among many, adding other concerns which may not be relevant.

Optimization: may drop edge and/or corner cases, may add (boundary) conditions, such as input restrictions. Makes rework and fine-tuning more difficult.
Generalization: broadens applicability at (potential) loss of finely tuned, specialized implementation.
Expansion: adds variations or prescribed rules and dependencies, making the specific case just one of many.

Conclusion

Simplicity is decided by all three dimensions. Moving away from simplicity, or stopping prior to reaching it, in any dimension makes things more complex. A trade-off is made, where you decide whether the benefits of one dimension outweigh the sacrifices required. Whether you optimize, generalize or expand, you add complexity, meaning you sacrifice some benefits of simplicity.

This post is part of the Simplicity in engineering series.
Other posts in this series:

Timelessness