ByAUJay
How Do Modern Proof Aggregation Layers Keep Latency Low When Batching Mixed Plonk and STARK Proofs?
Summary: The fastest proof aggregation stacks keep latency low by recursively compressing STARKs, using accumulation schemes to defer Plonk verifier work, and wrapping everything in a tiny, EVM‑friendly SNARK—while engineering the queues, hardware, and proof types to minimize tail delays. Below is a concrete blueprint with timings, gas math, and operational patterns you can deploy now.
Who this is for
- Decision‑makers evaluating zk rollups, zk coprocessors, or prover networks who need predictable p95/p99 latency—not just low average gas.
- Teams already producing both Plonk(ish) and STARK proofs and struggling to batch them without adding seconds or minutes of delay.
The real latency budget in mixed-proof aggregation
When your aggregator ingests heterogeneous proofs (Plonk, STARK, and Plonkish/FRI variants), end‑to‑end latency is governed by four knobs:
- Batch fill time
- If arrivals are roughly Poisson with rate λ, waiting to collect N proofs takes ≈ N/λ; with a time cap T, median wait ≈ T/2. In low‑throughput periods, this dominates. Engineer adaptive batch windows rather than a fixed N. (7blocklabs.com)
- Leaf proving times
- STARK VMs: today’s top networks split execution into shards, prove shards in parallel, then recursively compress; they can also skip local simulation to shave minutes in some SDKs. (docs.succinct.xyz)
- Plonk circuits: per‑proof prover time is circuit‑dependent; aggregation lets you push verification to the end of the pipeline so leaves don’t block each other (details below). (eprint.iacr.org)
- Aggregation and wrapping time
- STARK→SNARK wrapping is often a fixed overhead: e.g., ~6s extra when wrapping to Groth16 and ~70s to Plonk in one widely used stack; that cost doesn’t depend on the guest program size, so it dominates for “tiny” updates. Choose wrappers deliberately when latency matters. (docs.succinct.xyz)
- On‑chain inclusion
- Verification gas is largely constant per batch when you aggregate; Ethereum’s EIP‑1108 repriced bn128 pairings to ~45k + 34k·k gas, keeping Groth16 on BN254 the low‑latency default for L1. If you target Pectra‑era chains, BLS12‑381 precompiles are available too. (eips.ethereum.org)
What “mixed Plonk + STARK batching” looks like in practice
Modern stacks converge on a three‑layer pattern:
-
Layer A — leaf proofs:
- STARK receipts from a zkVM (RISC‑V/EVM).
- Plonk proofs from app‑specific circuits (payments, Merkle checks, custom logic).
-
Layer B — compression/accumulation:
- STARKs: recursive FRI verification inside a recursion circuit; optional “packing” to amortize Merkle/FRI queries across many receipts. (proxima-one.github.io)
- Plonk: accumulation/aggregation (e.g., aPlonK, SnarkFold) to defer pairings and shrink verifier work to O(1) or O(log n). (eprint.iacr.org)
-
Layer C — final wrapper:
- Emit a small Groth16 or Plonk proof that attests “the batch verifier accepted K Plonk instances and M STARK receipts.” This keeps on‑chain gas predictable and tiny. (docs.succinct.xyz)
Two production‑grade references illustrate the pattern:
- RISC Zero: segment receipts → STARK recursion (succinct receipt) → identity_p254 → Groth16 “receipt” for EVM. (dev.risczero.com)
- Succinct SP1: shard → STARK recursion → optional Groth16/Plonk wrapper; their network parallelizes proving across machines for lower tail latency. (docs.succinct.xyz)
Cryptographic building blocks that cut latency (without cutting corners)
- Recursive FRI for STARKs
- Recursion circuits re‑verify FRI proofs inside a succinct arithmetic circuit; most cost sits in Poseidon‑based Merkle checks, so “packing” FRI queries (e.g., STARKPack) reduces marginal work per additional proof. Use this to aggregate many small zkVM segments quickly before any SNARK wrap. (proxima-one.github.io)
- Accumulation/folding for Plonk
- aPlonK aggregates multiple Plonk statements with a multi‑polynomial commitment; proof and verification scale as O(log n). SnarkFold takes the “defer expensive checks” idea further, folding many proofs so the heavy pairing work only happens once at the end. Both approaches suppress per‑leaf costs that would otherwise add latency. (eprint.iacr.org)
- Cross‑system recursion (Plonk + STARK together)
- When you must verify “wrong‑field” artifacts (e.g., BN254 pairings for a Plonk/KZG proof) inside a STARK recursion circuit over Goldilocks or Stark‑prime, use non‑native arithmetic. Recent work like WARPfold and HyperNova/ProtoStar‑style folding provides practical recipes to handle multiple fields and arithmetizations in one recursive pipeline. (eprint.iacr.org)
- A small, fast outer SNARK
- For EVM today, Groth16 on BN254 remains the latency/gas sweet spot thanks to EIP‑1108. With Pectra, BLS12‑381 precompiles became available, expanding options for final wrappers; choose based on deployment constraints and existing SRS. (eips.ethereum.org)
- Vector‑commitment sidekicks for “many openings now”
- If your aggregator opens lots of commitments per batch, modern VCs like FlexProofs can compute all openings in O(N) with a tunable batch parameter b; early results show up to 6× speedups over prior art for N=2^16—useful when per‑proof metadata threatens to become the bottleneck. (arxiv.org)
Engineering the queues so batches don’t add seconds
-
Adaptive batch windows
- Maintain two triggers: size Nmax and time Tmax; fire when either trips. Tune Nmax/Tmax per input stream (Plonk vs STARK) so neither starves the other. Model arrivals with a simple Poisson approximation; in low traffic, rely on Tmax to bound waiting. (7blocklabs.com)
-
Split queues by work profile
- Run separate K8s queues for: (i) leaf proving, (ii) recursion/folding, (iii) final SNARK wrap. Co‑locate recursion with fast NVMe for transcript shuffles and route to GPU MIG slices with enough memory. Several prover‑network vendors recommend isolating recursion layers explicitly. (7blocklabs.com)
-
Skip unnecessary pre‑steps
- In SDKs that default to local simulation before network proving, enable “skip simulation” for latency‑critical flows; this can save minutes on large traces. Gate it behind admission checks in your CI. (docs.succinct.xyz)
-
Choose the right wrapper for latency
- If p95 wall‑clock matters, prefer Groth16 wrapping over Plonk when using stacks where Plonk wrapping adds ~70s fixed latency. Use Plonk wrapping when you need a universal SRS or specific compatibility. (docs.succinct.xyz)
-
Outsource parallelism when you hit single‑box limits
- Distributed prover networks parallelize across many machines so latency does not scale linearly with trace size; benchmark on a latency‑optimized endpoint (not the default) to avoid misleading results. (docs.succinct.xyz)
-
Reserve capacity where it counts
- Decentralized AVS‑backed prover networks (e.g., Lagrange on EigenLayer) now offer institution‑grade operators and subnetwork reservations so your batches don’t sit behind the public queue. Contract for SLOs when your p95 must stay below a fixed bound. (prnewswire.com)
Low‑latency blueprints for two common “mixed” scenarios
A) Oracle/co‑processor: 1,000 Plonk micro‑proofs + 50 zkVM STARK receipts per batch
Goal: < 10–12s p95 from “last proof received” to “on‑chain verified.”
-
Step 1 — Plonk accumulation:
- As micro‑proofs arrive, add them to an aPlonK/SnarkFold accumulator. This defers KZG pairings to the end; marginal cost per new Plonk proof is tiny and parallelizable. Fire the accumulator every Tmax = 2s or when you reach 1,000 proofs. (eprint.iacr.org)
-
Step 2 — STARK recursion/packing:
- Pack 50 zkVM receipts via a recursion circuit and STARKPack so the Merkle/FRI verification work amortizes across all receipts. Keep this in a separate queue to not block Plonk accumulation. (nethermind.io)
-
Step 3 — Final wrapper:
- Verify (i) the Plonk accumulator state and (ii) the compressed STARK proof inside a single outer Groth16. Post that single ~260‑byte proof on L1/L2; base verification gas stays ~200–300k on EVM today. If you target a Pectra‑aligned chain and your stack already lives on BLS12‑381, consider a BLS‑based wrapper instead. (eips.ethereum.org)
What to expect:
- The outer Groth16 wrap avoids dozens of per‑proof on‑chain transactions. Teams report base on‑chain costs around a few hundred thousand gas in similar “super‑proof” patterns; per‑proof inclusion checks, when needed, can be ~16k gas each. (docs.electron.dev)
B) Rollup checkpoint: STARK zkVM state transition + external Plonk proofs (side conditions)
Goal: bound tail latency while preserving L1 gas predictability.
-
Step 1 — zkVM shards → recursion:
- Prove block execution via STARK shards → succinct receipt through recursive joins; this is the path used by RISC Zero and SP1 to make proof size constant and enable fast wrap. (dev.risczero.com)
-
Step 2 — Ingest side‑condition Plonk proofs:
- Batch them through aPlonK or SnarkFold; if fields don’t match the zkVM’s, keep them separate until the outer wrap to avoid wrong‑field arithmetic inside the recursion circuit. (eprint.iacr.org)
-
Step 3 — Wrap once:
- Outer Groth16 over BN254 (or BLS12‑381 on Pectra chains) verifies both the recursion result and the Plonk accumulator. Avoid multi‑wrap patterns (e.g., wrapping each receipt), which add fixed seconds per wrap. (docs.succinct.xyz)
A note on real‑time targets:
- In 2025, a production prover network demo proved most Ethereum L1 blocks in ≈10–12s using a large GPU cluster, highlighting what’s achievable when you parallelize end‑to‑end. If you need sub‑12s p95, design for distributed proving from day one. (theblock.co)
On‑chain verification design: gas and curves you should plan for
-
BN254 today, BLS12‑381 increasingly viable
- BN254 remains cheapest because of EIP‑1108; your Groth16 verifier cost scales with the number of pairings plus a small fixed term. With Pectra (May 7, 2025), BLS12‑381 precompiles landed, making BLS‑based wrappers practical when your stack already uses BLS curves. Keep verifiers upgradable (with timelocks) so you can swap curves without migrating the whole rollup. (eips.ethereum.org)
-
Keep public inputs tiny
- Gas rises with public signals; hash bulky data (Merkle roots/accumulators) into one field element whenever possible. This applies equally to Groth16 and Plonk verifiers.
-
When aggregation stays in SNARK‑land
- If you aggregate only Groth16/Plonk user‑proofs on‑chain, expect a fixed base verification cost plus small per‑proof inclusion checks rather than per‑proof pairings. This is often net‑cheaper and lower‑latency than verifying each proof individually. (docs.electron.dev)
Implementation details that matter for latency (and get missed)
-
“Wrong‑field” arithmetic is real latency
- Verifying BN254 pairings inside a STARK recursion over a different field burns cycles. If you must, use dedicated non‑native gadgets (Halo2‑style “wrong‑field” bigints) or WARPfold‑style folding to keep the cost bounded—and budget the time. (eprint.iacr.org)
-
Unify hash and field choices where possible
- Aligning hash functions and base fields across Plonkish and STARK pipelines reduces translator cost inside recursion circuits (e.g., fewer non‑native ops, leaner Merkle verifiers). Plonky2/FRI recursion shows Merkle hashing dominates; don’t fight that with avoidable heterogeneity. (proxima-one.github.io)
-
Pre‑warm CRS/SRS and verifying keys
- Cache SRS and VKs across aggregator instances to avoid cold‑start spikes. Pin VK hashes in on‑chain verifiers; production stacks recommend version attestation and pinned verifiers to prevent accidental upgrades from spiking latency. (7blocklabs.com)
-
Choose “compressed/recursion‑friendly” proof flavors
- Some zkVMs require a special compressed proof type for in‑circuit verification/aggregation; use the prescribed flavor to avoid unexpected recursion failures and retries. (docs.succinct.xyz)
Operating with a prover network: keeping p95 low in the wild
-
Reserve capacity and co‑locate queues
- Ask vendors for reserved lanes and SLAs; co‑locate recursion and wrapping jobs with high‑bandwidth NVMe/GPU to shrink queueing and I/O overheads. (docs.succinct.xyz)
-
Decentralize for liveness and burst absorption
- AVS‑backed networks like Lagrange run many independent operators (Coinbase, OKX, Nethermind, etc.), offer subnetworks with dedicated bandwidth, and can absorb bursts without spiking latency. This materially improves batch completion times during traffic surges. (prnewswire.com)
-
Set the right SDK flags
- Use “network” proving modes rather than local proving; validate environment flags so requests are actually distributed across machines, not stuck on one GPU. (docs.succinct.xyz)
Brief, in‑depth: how a single outer proof can attest mixed Plonk + STARK batches
- Inside the outer circuit (Groth16 or Plonk):
- Verify a Plonk accumulator:
- Given aPlonK/SnarkFold accumulator state (commitments and linear‑check witnesses), prove it corresponds to k underlying Plonk proofs without re‑running all pairings. Only the final multi‑pairing check happens once. (eprint.iacr.org)
- Verify a compressed STARK:
- Run the recursion verifier gadget that checks FRI or “packed” FRI openings (STARKPack), plus Merkle paths, for m receipts; this reuses the same hash/finiteness assumptions tuned for recursion. (nethermind.io)
- Enforce cross‑batch invariants:
- Check that the Plonk accumulator and STARK batch commit to the same global state root or epoch. Then emit one succinct proof for on‑chain verification; gas is almost constant in m+k. (eips.ethereum.org)
- Verify a Plonk accumulator:
What “good” looks like in 2026: targets and sanity checks
- Batch assembly: adaptive N/T windows per stream; median wait ≤ T/2 under low load. (7blocklabs.com)
- Recursion/packing time: sub‑second marginal cost per added receipt on well‑tuned FRI recursion; total dominated by Merkle hashing, so pick hash/field combos wisely. (proxima-one.github.io)
- Wrapper choice: Groth16 when p95 matters; only pick Plonk if the SRS/story demands it (remember ~6s vs ~70s wrap overheads reported by one major stack). (docs.succinct.xyz)
- On‑chain verify: target 200–300k gas per aggregate on EVM today; with Pectra chains, BLS12‑381 options may shift exact numbers—prototype both. (eips.ethereum.org)
- Prover network: measure on a latency‑optimized endpoint; distributed proving should flatten the size→latency curve. Real‑time block‑level proving demos have hit ≈10–12s with large GPU clusters, which is a useful North Star for p95 design. (docs.succinct.xyz)
Emerging best practices we recommend to clients
- Start STARK, end SNARK (on Ethereum): recurse/pack STARK receipts, then wrap once in Groth16 for predictable gas and small calldata. (docs.succinct.xyz)
- Keep public inputs minimal: hash bulky statements into one field element before verification; saves gas and reduces encoding delays.
- Don’t SNARK‑wrap lots of medium receipts independently: always recurse first; wrapping adds a fixed‑seconds penalty per proof. (docs.succinct.xyz)
- Avoid wrong‑field work unless necessary: if you can, verify Plonk accumulators in the outer SNARK instead of inside a STARK recursion circuit. Use WARPfold‑style techniques only when requirements force mixed‑field recursion. (eprint.iacr.org)
- Reserve capacity: for strict SLOs, combine a latency‑optimized private lane on a decentralized prover network with pinned verifiers and version attestation. (docs.lagrange.dev)
Final take
If you aggregate mixed Plonk and STARK proofs and care about latency, the winning formula is: recurse/pack STARKs early, fold/accumulate Plonk proofs off the critical path, and wrap once with a tiny Groth16/Plonk outer proof—while engineering the queues and hardware for the long tail. With the 2025–2026 tooling (recursive FRI, aPlonK/SnarkFold, STARKPack, Pectra’s BLS12‑381 precompiles, and mature prover networks), you can hold p95 in the low‑seconds to sub‑dozen‑seconds range for meaningful batch sizes—without compromising verifiability on L1. (nethermind.io)
References (selected)
- Succinct SP1 docs (recursion, wrappers, network usage and latencies). (docs.succinct.xyz)
- RISC Zero recursion & STARK‑to‑SNARK pipeline. (dev.risczero.com)
- STARKPack (Nethermind) for FRI‑based aggregation. (nethermind.io)
- aPlonK and SnarkFold for Plonk aggregation. (eprint.iacr.org)
- EIP‑1108 (bn128 repricing) and Pectra/BLS12‑381 precompiles. (eips.ethereum.org)
- Lagrange ZK Prover Network (EigenLayer, operators, subnetworks). (prnewswire.com)
- Plonky2 recursion profile (Merkle/FRI dominates). (proxima-one.github.io)
- Real‑time proving milestone (context for latency ceilings). (theblock.co)
Like what you're reading? Let's build together.
Get a free 30‑minute consultation with our engineering team.

