Numeric Data
Representations
- unsigned integers
- 8-bit, 16-bit, 32-bit, 64-bit, others (implemented in hardware)
- unlimited length (implemented in software)
- signed integers (two's complement)
- positive integers: high bit 0
- zero: all bits 0
- negative integers: high bit 1
- negation: flip all bits, add 1
- BCD (binary coded decimal)
- 1 decimal digit (0000 through 1001) in 4 bits
- + (1100) or - (1101) sign in last 4 bits
- variations exist
- IEEE 754-2019 floating point
- value =
-1sign x significand x baseexponent
- binary16 (half): 1 sign bit, 4 exponent bits, 11 significand bits
- binary32 (single): 1 sign bit, 8 exponent bits, 23 significand bits
- binary64 (double): 1 sign bit, 11 exponent bits, 52 significand bits
- binary128 (quadruple): 1 sign bit, 15 exponent bits, 113 significand bits
- binary256 (octuple): 1 sign bit, 19 exponent bits, 237 significand bits
- also decimal32, decimal64, decimal128
- NaN (not a number): all exponent bits 1, not all significand bits 0
- infinity (too large): all exponent bits 1, all significand bits 0
- subnormal (too close to zero): all exponent bits 0
- +0 and -0 are different
Errors
- integer overflow
- excess leading bits of result are truncated with no warning
- floating point roundoff and truncation
- many values cannot be represented exactly with a finite number of bits
- errors can accumulate in repeated calculations with no warning