compact-u16 (shortvec)

The 1-to-3-byte variable-length integer that prefixes every array in a Solana transaction. Seven value bits per byte plus a continuation flag — the source of most off-by-one transaction-parsing bugs.

Transaction encoding concept

What it is

compact-u16 (also called shortvec) is a variable-length encoding for the length prefix of every array inside a Solana transaction — the signature count, account-key count, instruction count, and the per-instruction account and data lengths. It encodes a value from 0 to 65,535 in 1 to 3 bytes.

Why it exists

Transactions are size-constrained (1,232 bytes on the wire), so spending a fixed 2 or 4 bytes on every array length is wasteful when most arrays are short. compact-u16 spends a single byte for lengths under 128 — which covers the overwhelming majority of real transactions — and only grows when it has to.

Byte layout

Each byte carries 7 bits of value in its low bits; the high bit (0x80) is a continuation flag meaning “another byte follows.”

Bytes used	Value range	Encoding
1	0 – 127	`0vvvvvvv` — high bit clear, value in low 7 bits.
2	128 – 16,383	`1vvvvvvv 0vvvvvvv` — first byte’s high bit set; next 7 bits in the second byte.
3	16,384 – 65,535	`1vvvvvvv 1vvvvvvv 000000vv` — third byte uses only its low 2 bits.

Decoding accumulates 7 bits at a time, shifting left by 7 for each subsequent byte, stopping at the first byte whose high bit is clear.

value = 0
shift = 0
loop:
  byte = next_byte()
  value |= (byte & 0x7F) << shift
  if (byte & 0x80) == 0: break
  shift += 7

Where you see it

Before every array in a legacy transaction and v0 transaction: the signatures vector, the account-keys vector, the instructions vector, each instruction’s account-index vector and data vector, and (in v0) the address-table-lookup vectors. If you’re hand-parsing a transaction and your offsets drift, a misread compact-u16 is the usual culprit.

Common gotchas

It is not LEB128, and not little-endian u16. compact-u16 is its own format. Decoding it as a plain 2-byte little-endian integer works by accident for values 0–127 (one byte) and then silently breaks. Use a real shortvec decoder.
The third byte only has 2 usable value bits. Because the max value is u16 (65,535), the third byte tops out at 0b00000011 in its low bits. Encoders must reject anything larger.
Length, then elements. The prefix is the element count, not a byte length. To skip an array you must decode the count and then walk each element (whose size you know from context) — you can’t just jump N bytes.
Non-canonical encodings are invalid. Encoding 5 as two bytes (0x85 0x00) instead of one (0x05) is malformed; strict decoders reject it. Don’t pad.