Java signed integer types and alternative representations

Java has a number of different notations for expressing a number value. The decimal notation (e.g. 13) is by far the best known. Then there are the binary (e.g. 0b1101), octal (e.g. 015) and hexadecimal (e.g. 0xD) notations that can alternatively be used to express a literal number value.

Now, when you use and reason about decimal values, Java behaves exactly as expected. Types are signed, i.e. both positive and negative values can be stored, 0 is a valid value and you can compare values as you would expect. Things start to break down when you attempt to use other representations. Binary, octal and hexadecimal notations do not have a notion of a signed value. Hexadecimal is often used as a convenient shorthand for writing bit values. For example, 0xff is bit value 0b11111111 (255) of which the first f represents the upper 4 bits and the second f represents the lower 4 bits. This notation is especially useful when you are doing bitwise operations, are implementing low level protocols or are writing data directly to the wire.

However, even though this representation is different from the decimal representation, you do need to take into account that Java will still interpret them as signed decimal numbers, i.e. with the highest bit representing the sign, indicating whether the value is positive or negative. (See Two’s complement for details on how integers are stored in memory.) This goes as far as number comparisons, even when 2 hexadecimal number literals are compared.

Consider the following comparisons:

You will see that in the decimal representation this makes sense. The binary, octal and hexadecimal representations do not have a notion of signedness and as such have this “flip-over” point where the highest bit starts to be used and comparisons break down.

Furthermore, due to Java interpreting values as signed, certain types of conversions will fail even though these values technically still fit in memory.

A “safe” integer (sign bit untouched) value will convert as expected:

… while a value that depends on the sign bit will get converted incorrectly:

So, what happened in the last case? The value 0xffffffffL, a value of type long, was cast to int. Java basically copied the bits “as is” to the memory location of the int, resulting in the value -1 in signed decimal representation. Then we cast back to long. However, now we cast to a larger type and as such it is certain that we can convert the value without loss of precision. Hence, it will convert the value -1 to the value -1 as expressed in type long. In the case of long this is 0xffffffffffffffffL, since the size of type long is 64 bits, instead of int’s 32 bits.

To be clear, none of this behaviour is erroneous. Java provides signed integer types and behaves as such. It is merely good practice to be aware of this behaviour when using alternative representations such as hexadecimal. And especially hexadecimal, as this format is often used in protocol specifications for (low level) control values.

For completeness, let’s look at what the Java Language Specification says about this:

A narrowing conversion of a signed integer to an integral type T simply discards all but the n lowest order bits, where n is the number of bits used to represent type T. In addition to a possible loss of information about the magnitude of the numeric value, this may cause the sign of the resulting value to differ from the sign of the input value.

There is no real alternative to unsigned types in Java. In general, the advice is to use a larger type which fits the whole value in the signed part of the type. In my particular case I needed a type that can store a 32 bits value as defined in the specification. These values are used as identifiers and as such only equality is relevant. There is no meaning to x < y and similar comparisons for this specific purpose and we know that signed ints still have 32 bits of memory available for use.

The only downside to this approach is that it might get represented as a negative number for values where the highest bit is in use, but that’s alright. Also note that we are not abusing the type. We store a 32-bit value in a type with a size of 32 bits. What causes the confusion is its decimal representation, which is irrelevant for this use case. On the other hand, if I would use a larger type, then I would have to check for values not to pass the 32 bits boundary, as we should only use the first 32 bits of a larger type. Then, later on, we would have to extract the lower 32 bits from the variable in order to get only 32 bits as defined in the specification.

For my particular use case, this approach works nicely. YMMV