Performance and Measurement

This is the one GitBook page that carries SOF's detailed performance story.

Use it for:

how SOF performance work is actually done
what kinds of optimizations have accumulated across releases
which measured wins are representative
what SOF is and is not claiming when it says it is optimized

The Method

SOF is not tuned by intuition alone.

The normal loop is:

form a concrete bottleneck hypothesis
capture a baseline
change one thing
re-run A/B validation with release fixtures
check perf and runtime metrics
keep the change only if the data holds

That matters because some candidate optimizations were explicitly measured and then reverted when they did not produce a stable win. Regressions are rejected rather than rationalized.

This Is A Multi-Release Story

SOF performance did not appear all at once in 0.13.0.

`0.7.x` and `0.8.x`

These releases moved SOF away from a mostly serial ingest shape and toward:

multi-core packet-worker ingest
lower-copy dataset and transaction handling
narrower plugin fanout
cheaper dispatch and scratch-buffer reuse
better dataset reassembly locality

This is where much of the basic runtime-efficiency foundation came from.

`0.12.0`

0.12.0 tightened the shred-to-plugin path and made inline transaction visibility explicit, measurable, and easier to profile.

Validated VPS latency on the 0.12.0 line improved from:

59.978 / 8.007 / 6.415 ms

to:

44.929 / 6.593 / 5.370 ms

for:

first_shred / last_required_shred / ready -> plugin

That release also added:

hot-path benchmark harnesses
standalone profiling binaries
symbolized perf investigation on live VPS traffic

It also carried measured dispatch/stateful-fanout improvements. One derived-state profiling slice improved from:

12424.649 ns/iter

to:

8603.920 ns/iter

on the final validated branch state.

`0.13.0`

0.13.0 carried the largest single concentration of measured provider/runtime hot-path work so far.

Representative validated fixture results on the 0.13.0 line:

provider transaction-kind classification:
- 34112us -> 4487us
- about 7.6x faster
provider transaction dispatch path:
- 39157us -> 5751us
- about 6.8x faster
provider serialized-ignore path:
- 42422us -> 23760us
- about 44% faster
websocket full-transaction parse path:
- 162560us -> 133309us
- about 18% faster

That line also matters because it combined hardening with performance work. Replay, health, capability, and observability improvements were kept without giving back the main provider/runtime ingest wins on the validated 0.13.0 release branch.

What The Optimizations Usually Look Like

The gains above did not come from one trick. They came from repeated patterns:

removing redundant work from hot paths
adding fast paths so ignored or low-value traffic dies earlier
using borrowed/shared views where full owned materialization is not needed yet
cutting copies and allocations
reducing instruction count for the same work
reducing branching and, where the data showed it, improving cache behavior
narrowing plugin fanout and dispatch overhead instead of rescanning everything

Representative examples across releases:

borrowed transaction classification
compiled transaction prefilters
skipping owned decode on prefiltered misses
skipping completed-dataset tx decode when the prefilter already proves nothing will consume it
reducing transaction dispatch handoff overhead
reducing generic plugin dispatch overhead
trimming provider transaction overhead and avoiding serialized payload copies

What SOF Is Actually Claiming

The claim is not:

SOF is automatically the fastest way to see Solana data

The claim is:

SOF removes a large amount of local runtime waste
SOF gives you one reusable runtime instead of rebuilding ingest and dispatch every time
SOF keeps measured wins and rejects regressions
SOF makes runtime behavior more explicit and more observable while staying fast on the validated hot paths

That claim is intentionally scoped:

the measurements above are historical validated release-line results, live checks, and profiled slices
they are not a promise that every later branch will reproduce the exact same absolute numbers
they are not a claim about every mixed workload or every host topology
ingress still determines how early a host sees the data

If the earliest possible visibility is the goal, private raw distribution, validator-adjacent ingress, and host placement still matter more than any local runtime optimization.