Summary: Launching a new rollup in 2026 means your “proof verification API” is a first‑order cost and UX lever. Here’s exactly what to benchmark for gas, latency, and throughput on Ethereum post‑Pectra/EIP‑2537 and post‑4844, with concrete target numbers, formulas, and vendor‑grade test cases you can copy into an RFP.

For a New Rollup Launch, What Proof Verification API Benchmarks Should I Request—Gas, Latency, and Throughput?

Decision-makers today don’t just choose a proving stack—they choose how, where, and how often proofs are verified. With Ethereum’s 2025 Pectra upgrade (BLS12‑381 precompiles + calldata repricing), Dencun (4844 blobs), and the rise of verification layers, the right benchmarks can cut L1 gas by 10–90%, shave p95 latencies from hours to minutes, and raise verified‑proof throughput by an order of magnitude. Below we lay out the precise metrics and target values to request from vendors. (blog.ethereum.org)

The one‑page TL;DR benchmarks to ask for

Gas per on‑chain verification:
- Groth16 on BN254: prove the vendor can hit ≤230k gas for l≈2 public inputs and report the exact slope w.r.t. public inputs: 207,700 + 7,160·l gas. Ask for a Foundry/Hardhat harness and transaction traces. (hackmd.io)
- Groth16/PLONK on BLS12‑381 (via EIP‑2537): show pairing‑only baseline of 37,700 + 32,600·k gas for k pairings; report full verifier totals including calldata. For a 3‑pairing check, budget ≈135,500 gas before scaffolding. (eips.ethereum.org)
- STARK on‑chain verification: disclose end‑to‑end gas (commonly multi‑million) and proof byte size; if offering “SNARK‑wrapped STARK,” report the wrapper’s Groth16/PLONK gas and the recursion overhead off‑chain. Expect raw STARK on Ethereum to be ≫1M gas; Starknet’s SHARP trains reference ≈6M gas per train as a reality check. (community.starknet.io)
- KZG point evaluation (4844): fixed 50,000 gas per check; vendor must show where and why it’s used (e.g., matching blobs to commitments in settlement flows). (eips.ethereum.org)
Latency SLOs (p50/p95/p99):
- On‑chain verify path: time‑to‑inclusion (slots) + time‑to‑finality (epochs). Expect inclusion in 1–2 slots (12–24s) and economic finality ≈2 epochs (~12.8 min) if you need “finalized” guarantees; ask vendors to disclose whether they gate UX on inclusion, safe head, or finality. (blocknative.com)
- Verification layers/aggregators: publish p95 “proof‑accepted” to “result on Ethereum” under light and heavy load, and the batching cadence. Modern layers return signed results within 1 block and settle in a few blocks; require a fallback to immediate‑post mode. (blog.alignedlayer.com)
Throughput:
- Native L1 verifying capacity math: with today’s gas limits, report proofs per block for your scheme. Example: at 45M block gas, ~204 Groth16 verifications at 220k gas fit in one block (~17/s at 12s block time). Make vendors do this math for their circuits and public‑input counts. (theblock.co)
- Verification layer targets: insist on published, measured proofs/sec and per‑proof amortized gas. State‑of‑the‑art mainnet beta numbers show 200 proofs/s and ≈2,100 gas per proof at scale; require reproducible dashboards and batch sizing disclosures. (blog.alignedlayer.com)
Cost sensitivity to calldata changes:
- Ask vendors to quantify the effect of EIP‑7623 (calldata floor cost) on their proof bytes and VK/public‑input payloads, and to show mitigation via aggregation or BLS compression. (eips.ethereum.org)

Why verification benchmarks changed in 2025–2026

Pectra made BLS12‑381 first‑class on Ethereum. You can now do pairings/MSM/mappings in precompiles at fixed gas schedules (e.g., pairing 37,700 + 32,600·k). That reduces the gas advantage BN254 previously had and lets you choose curves for security/interop, not just gas. Require vendors to support both BN254 and BLS12‑381 verifiers. (blog.ethereum.org)
Dencun’s 4844 blobs moved DA off calldata. It didn’t make your verifier calldata cheaper; proof bytes still pay calldata pricing when sent to a contract. The right response is proof aggregation and minimizing public inputs—benchmarks must surface those bytes and their cost. (eips.ethereum.org)
Ethereum’s higher block gas limits increase peak verifying throughput. Validators raised the limit through 2025 (to ~45M), directly affecting how many on‑chain verifications fit per block. Tie your throughput benchmarks to current limits, not outdated 30M assumptions. (theblock.co)

Benchmark dimension 1: Gas (and exactly how to measure it)

Ask vendors to run your reference proofs through three paths and report precise gas with traces:

Direct L1 verification (no external layer)

Groth16 over BN254 with l public inputs:
- Expect ≈207,700 + 7,160·l gas; have them show the slope by sweeping l∈{0,2,8,32}. This isolates public‑input decoding and MSM costs. (hackmd.io)
Groth16/PLONK over BLS12‑381:
- Pairing baseline: 37,700 + 32,600·k (k pairings), then add ECADD/ECMUL/MSM and calldata. Supply exact k counts from the verifier and compare to BN254. (eips.ethereum.org)
Raw STARK:
- Require end‑to‑end gas and calldata bytes; for context, community data and production systems peg STARK verification and proof sizes orders of magnitude above SNARKs on L1. If they can’t show sub‑million gas on Ethereum for raw STARKs, demand a SNARK wrapper plan. (community.starknet.io)

SNARK‑wrapped STARK

Vendors should demonstrate that the outer Groth16/PLONK verify lands in the same 200–350k‑gas band and quantify any recursion overhead paid off‑chain. Push for a breakdown: verifier gas, calldata for proof bytes, and storage access. (7blocklabs.com)

Verification layer / aggregation service

Aligned’s current public materials: single‑proof batch ≈350k gas; ~40k gas/proof at n≈20; BLS aggregation verification ≈113k gas; target ~2,100 gas/proof at large batch sizes. Make them prove these figures on mainnet with your proof type. (blog.alignedlayer.com)
Electron Labs‑style super‑proofs: budget ~380k base + ~16k per consumer contract inclusion call amortized across n proofs. Have them show per‑proof cost curves as n grows. (docs.electron.dev)

Don’t forget the “invisible” gas line items:

BN254 precompiles (post‑EIP‑1108): 34,000·k + 45,000 pairing; 6,000 ECMUL; 150 ECADD—make vendors account for these, not just their “headline” pairing count. (eips.ethereum.org)
4844 KZG check: fixed 50,000 gas via 0x0a precompile when applicable. (eips.ethereum.org)
Calldata: with EIP‑7623’s floor cost for data‑heavy txs, aggregation that reduces bytes pays for itself faster—demand before/after bytes and gas. (eips.ethereum.org)

Concrete budgeting example you can reuse:

If you verified 10,000 Groth16 proofs/day at ~220k gas each and paid 20 gwei base fee, that’s ≈44 ETH/day just for verification. This is why aggregation and verification layers are now a standard line item in rollup budgets.

Benchmark dimension 2: Latency (time to usable finality)

Define what “verified” means in your product:

Inclusion vs. “safe head” vs. finalized. Inclusion is one block (~12s), finalized is ~2 epochs (~12.8 minutes). In your SLOs, write separate targets for p95 inclusion and p95 finality. (blocknative.com)

What to demand from vendors:

On‑chain path: p50/p95 slots‑to‑inclusion under normal load, and behavior during blob‑fee spikes (post‑4844). Require them to specify the posting strategy (e.g., immediate vs. batching every N blocks). (eips.ethereum.org)
Verification layer path: p95 “proof accepted” → “readable on chain,” batch interval, minimum confirmations before they expose a result, and a switch to “no‑batch” mode when the backlog grows. Aligned’s docs disclose “results readable after one block; batching for a few blocks to save gas.” Use that as the bar. (blog.alignedlayer.com)
ZK‑validity for OP‑stack style rollups: if you adopt ZK finality (e.g., OP Succinct), get the end‑to‑end proof submission cadence (today commonly ~hourly) and roadmap to minute‑scale finality. You’ll still pay on‑chain verification costs; vendors must separate proving latency from verify latency in their SLOs. (blog.succinct.xyz)

Benchmark dimension 3: Throughput (proofs/sec, proofs/block)

On Ethereum L1, capacity is gas‑bound:

Use current gas limits (≈45M/block as of mid‑2025) in your math. Examples:
- Groth16 at 220k gas → ~204 proofs/block (~17 proofs/s at 12s blocks).
- STARK verifier at 1M gas → ~45 proofs/block (~3.7 proofs/s).
- BLS aggregate signature verify (k=2) ~103k gas → ~437 verifies/block. Have vendors compute using their actual verifier gas, not brochure numbers. (theblock.co)

For verification layers, insist on real measurements:

Mainnet beta figures of ~200 proofs/s today; roadmaps cite 2,500+ proofs/s in tests. Ask for sustained throughput over 30–60 minutes, with success‑rate and queue‑length telemetry, and how per‑proof gas amortizes as batch size grows. (blog.alignedlayer.com)

Practical examples (copy these into your RFP)

“BN254 Groth16 verifier, l=2 public inputs”

Must prove ≤230k gas on Holesky/Sepolia and mainnet; show a linear fit to 207,700 + 7,160·l with traces. Provide the exact calldata byte count and how many bytes are zero vs. non‑zero (calldata pricing differs). (hackmd.io)

“BLS12‑381 Groth16 verifier, 3‑pairing check”

Show pairing compute ≈135,500 gas baseline (37,700 + 32,600×3) plus decoding/MSM overhead; compare total vs BN254 equivalent (≈147,000 gas pairing baseline). Include a justification for curve choice (security and interop). (eips.ethereum.org)

“STARK‑wrapped SNARK”

Provide gas for outer Groth16/PLONK verifier; report the recursion overhead (off‑chain), the wrapped STARK proof size, and the net calldata bite with/without EIP‑7623 floor pricing. If proposing raw STARK on Ethereum, show a credible path below ~2–3M gas; else default to wrapping. (community.starknet.io)

“Aligned‑style verification layer”

Target batch with 1 proof ≈350k gas; batch of 20 proofs ≈40k/proof; show per‑proof gas vs batch size up to your stated max. Publish p95 end‑to‑end latency and proofs/sec under load, plus a fallback to immediate posting when queues build up. (blog.alignedlayer.com)

“Electron super‑proof”

Demonstrate ~380k gas base per batch and ~16k gas per consumer inclusion call; supply a cost curve for n∈{8,32,128,512} and operational limits on n. (docs.electron.dev)

Emerging best practices (2026)

Prefer BLS12‑381 on Ethereum when you can
- With EIP‑2537 live, pairings/MSM/mapping are native and slightly cheaper per pairing than BN254, with stronger security. Use BN254 when toolchains demand it, but re‑evaluate defaulting to BN254 for cost reasons—it’s no longer the only cheap option. (blog.ethereum.org)
Minimize public inputs aggressively
- Your Groth16 gas grows linearly with public inputs; put stable data in the VK or commit to it and pass a hash as a single public input. Test with the 207,700 + 7,160·l model. (hackmd.io)
Treat calldata as a precious resource
- EIP‑7623 introduced a floor cost for data‑heavy transactions. If you’re shipping lots of bytes (proofs, signatures, witnesses), aggregate them (proof recursion, BLS fast‑aggregate) and post fewer bytes. This reduces both gas and inclusion latency variance. (eips.ethereum.org)
Use 4844 blobs for DA, not for proof bytes
- Blobs are perfect for your batched L2 data, with a 50k‑gas KZG check per blob evaluation. But your proof verifier still consumes calldata—keep proof objects tiny and well‑aggregated. (eips.ethereum.org)
Engineer for today’s gas limits, monitor changes
- With validators having raised gas limits (e.g., to ~45M in 2025), your per‑block verifying capacity increases. Build dashboards that recompute “proofs per block” as limits change. (theblock.co)

Vendor questions that surface real trade‑offs

Gas and bytes
- What’s your exact gas formula for my verifier (fixed + per public input + per pairing)? Show calldata bytes and their zero/non‑zero composition.
- If we switch BN254→BLS12‑381, how do gas and calldata change?
Latency policy
- Do you expose results at inclusion, at safe head, or at finality? What are p50/p95 under nominal load? Under blob‑fee spikes?
- For verification layers: what is your default batch cadence? When do you switch to “post immediately”?
Throughput headroom
- Publish sustained proofs/sec for 30 minutes and hour‑long runs; report the max batch size before timeouts or calldata limits hit.
- What is your on‑chain gas ceiling per settlement tx? How many proofs does that comfortably amortize?
Failure modes and fallbacks
- If your aggregator stalls, can we bypass and verify directly on L1? What’s the on‑chain selector or config toggle?
- How do you handle partial batches (some proofs bad)? What’s the re‑prove path?
Ops and audits
- Provide Foundry/Hardhat tests demonstrating gas numbers and revert‑free paths; include function‑level traces.
- Share audits of your verifier contracts and precompile call encodings.

A short, concrete “acceptance test” you can require

Provide a repo with:
- A BN254 Groth16 verifier and a BLS12‑381 Groth16 verifier for the same circuit; a script runs both with l∈{0,2,8,32} public inputs and dumps gas and calldata bytes per run. The BN254 runs must fit the 207,700 + 7,160·l model within ±3%. (hackmd.io)
- A KZG point‑evaluation check example demonstrating a 50,000‑gas precompile call with valid/invalid inputs and the exact encoding. (eips.ethereum.org)
- An aggregator path (aligned/super‑proof or equivalent): single‑proof batch, n=20, n=256; report on‑chain gas, proofs/sec, p95 time to “readable on Ethereum,” and fallback toggle to immediate posting. (blog.alignedlayer.com)

Reality checks and back‑of‑the‑envelope math you should do

Proofs per block = floor(block_gas_limit / verifier_gas). At 45M/block and 220k/proof, expect ~204 verifications per block; at 1M/proof, ~45. If your rollup requires more, you either aggregate, use a verification layer, or you won’t keep up at peak. (theblock.co)
STARK proof sizes are big; if anyone proposes posting raw STARKs to Ethereum frequently, ask for bytes and gas. Community and production references show ~100–200kB proofs and multi‑million gas verifies—SNARK‑wrapping is the norm on Ethereum. (perama-v.github.io)

What “good” looks like in 2026

For direct L1 verification:
- Groth16 BN254: 210–260k gas for small l; well‑documented with traces. (hackmd.io)
- BLS12‑381 verifier path: cleanly implemented with 0x0b–0x11 precompiles; pairing counts and gas math documented. (eips.ethereum.org)
For verification layers:
- Batch sizes that keep amortized per‑proof gas under ~40k at modest n (~20), with a credible path toward ~2,100 gas/proof at scale; p95 to on‑chain readability ≤ a few blocks, not minutes. (blog.alignedlayer.com)
For ops:
- Dashboards exposing per‑proof gas, bytes, p95 latency, backlog depth, and block‑by‑block inclusion outcomes; alarms when latency or batch sizes deviate from SLOs.

Appendix: Handy reference numbers you can cite in meetings

BN254 precompiles (EIP‑1108): ECADD 150 gas; ECMUL 6,000; Pairing 45,000 + 34,000·k. (eips.ethereum.org)
Groth16 BN254 gas model: ≈207,700 + 7,160·l. (hackmd.io)
BLS12‑381 precompiles (EIP‑2537): pairing 37,700 + 32,600·k; map Fp2→G2 23,800 gas. Live on mainnet since May 7, 2025 (Pectra). (eips.ethereum.org)
KZG point evaluation (4844): fixed 50,000 gas at 0x0a. (eips.ethereum.org)
Calldata floor pricing (EIP‑7623): increased cost for data‑heavy transactions—aggregate to save bytes. (eips.ethereum.org)
Ethereum timing: 12‑second slots; ~6.4‑minute epochs; finality typically after 2 epochs (~12.8 minutes). (info.etherscan.com)
Block gas limit: raised through 2025 toward ~45M, increasing per‑block verification headroom. Recompute proofs/block when limits change. (theblock.co)
Verification layers (public benchmarks): ~350k gas for single‑proof batch; ~40k/proof at n≈20; targets of ~2,100 gas/proof at scale; 200 proofs/sec today on mainnet beta. Validate with your proof type. (blog.alignedlayer.com)

If you only add one page to your vendor RFP, make it this: exact gas formulas (with traces), latency SLOs split by inclusion/finality, and sustained proofs/sec with amortized gas curves. In 2026, those three numbers—gas, latency, throughput—are the difference between a rollup that scales with predictable unit economics and one that fights the mempool every time the market wakes up.