Tuning and Environment Controls
SOF exposes many environment variables, but most operators should not touch most of them.
The first performance decision is ingress choice, not knob count. If the host sees traffic late, local tuning cannot recover that loss.
Safe Baseline
For first deployments, keep to:
RUST_LOGSOF_BINDSOF_GOSSIP_ENTRYPOINTwhen using gossip bootstrap
Enable SOF's runtime-owned probe and scrape listener only when you actually need it:
SOF_OBSERVABILITY_BIND
When enabled, SOF exposes:
/metrics/healthz/readyz
If you are starting in processed provider mode instead of raw-shred ingest, the safe baseline is different:
- keep provider endpoint and auth config explicit in code
- keep replay and durability settings at their defaults first
- do not tune packet-worker, dataset-worker, FEC, or relay/repair knobs before you have measured a raw-shred runtime, because built-in provider mode does not use those stages the same way
If you configure the runtime in code, prefer RuntimeSetup and sof-gossip-tuning instead of raw
string overrides.
Placement Guidance
Pinning and placement should be read narrowly:
- SOF exposes useful single-host controls
- SOF does not claim full NUMA-aware scheduling
- multi-socket hosts still need measured placement decisions
Current playbook:
- public single-socket VPS: start from
sof-gossip-tuning's validatedVpspreset - processed provider mode: tune replay, durability, and source health before touching packet/shred worker knobs
- trusted raw-shred mode: keep receive, packet-worker, and dataset-worker placement on the same socket when possible
- multi-socket hosts: prefer one socket first unless measurement proves cross-socket fanout helps more than it hurts
If the real requirement is lower latency rather than cleaner operations, revisit ingress before revisiting thread knobs:
- private raw feed
- direct validator-adjacent ingress
- better host placement
Preferred Tuning Order
- keep defaults
- apply a typed
sof-gossip-tuningpreset - measure
- change one advanced knob at a time
That measurement step is explicit. For the actual optimization workflow and release-level measured results, use Performance and Measurement.
For transaction plugins, prefer API-level fast paths before runtime knob tuning:
- use
TransactionDispatchMode::Inlinewhen the plugin actually benefits from earliest tx visibility - use
TransactionPrefilterfor signature or account-key matching instead of heavier custom logic when possible
Signs You Are Overtuning
- queue capacities keep rising but latency keeps getting worse
- worker counts exceed the host's ability to keep caches warm
- repair traffic grows faster than recovered useful data
- changes are made without any before/after capture
- pinning is applied without proving that locality improved
Practical Advice
- prefer typed profiles for repeatable hosts
- keep a host-specific tuning log
- change one family of knobs at a time
- watch queue depth, drop counters, replay health, and repair behavior before and after each change
- do not keep a knob or code change just because it sounded faster before measurement