ByAUJay
Gas-Efficient Batching Groth16 Proofs on Ethereum for a High-Throughput Rollup: Real Gas Savings Benchmarks
Summary: This post distills how to batch and verify large numbers of Groth16 proofs on Ethereum with precise gas models and real, decision-grade benchmarks. We quantify savings across naive, on-chain batch, recursive/aggregated, and external verification-layer approaches on BN254 and BLS12-381 (post-Pectra), and outline concrete best practices your rollup can implement this quarter.
Why this matters now
Ethereum’s 2025–2026 roadmap changed the calculus for on-chain proof verification. Istanbul’s EIP-1108 already slashed BN254 precompile costs; Pectra added BLS12‑381 precompiles (EIP‑2537), and calldata pricing tightened (EIP‑7623). The result: Groth16 remains the most economical L1-verified proof for rollups, but the best way to batch proofs depends on your throughput target, public inputs, and latency budget. (eips.ethereum.org)
The baseline: what a single Groth16 verify actually costs on Ethereum
- Pairings dominate verification cost. After EIP‑1108, Ethereum’s BN254 pairing check costs 45,000 base gas plus 34,000 per pairing. Typical Solidity verifiers perform four pairings, for 181,000 gas just for the pairings. (eips.ethereum.org)
- Public inputs cost linear gas due to an MSM over the verifying key’s IC points. On BN254 (no MSM precompile), the de facto pattern uses ECMUL (6,000 gas) + ECADD (150 gas) per input ≈ 6,150 gas per public input. Including scaffolding and calldata, multiple independent analyses converge to a simple rule-of-thumb: 207,700 + 7,160 × l gas, where l = number of public inputs. Empirically, that puts a two-input proof around ~220k gas and eight inputs around ~265k gas. (hackmd.io)
- Proof bytes in calldata are small but non-zero. A BN254 Groth16 proof is 256 bytes (G1: 64, G2: 128, G1: 64) ≈ 4,096 gas at 16 gas/byte; if your transaction is “data heavy,” EIP‑7623 can raise the effective calldata floor, a meaningful consideration when you post many proofs per tx. (xn--2-umb.com)
What about BLS12‑381 after Pectra?
- EIP‑2537 adds BLS12‑381 precompiles with a cheaper pairing schedule than BN254: 37,700 base + 32,600 per pair. That’s 167,100 gas for four pairings vs. 181,000 on BN254. However, field encodings are 64-byte limbs, so a BLS12‑381 Groth16 proof (G1: 128 bytes, G2: 256, G1: 128) is 512 bytes—double the calldata. For public inputs, use the new G1MSM precompile (per-scalar cost falls with the built-in discount table) rather than looping ECMUL/ECADD in Solidity. (eips.ethereum.org)
Takeaway for a single proof:
- BN254 remains excellent if you need the smallest calldata and you’re already on BN254.
- BLS12‑381 is newly viable: slightly cheaper pairings, stronger security margin, MSM precompile, but more calldata. Model your exact l and data profile before switching. (eips.ethereum.org)
3 vs 4 pairings: bleeding 34k–32.6k gas you don’t have to
Groth16’s math allows verification with three pairings. Many Solidity templates still do four, leaving ~34k (BN254) or ~32.6k (BLS12‑381) gas on the table each call. If you control your verifier, implement the 3‑pairing product check; if you use a generator (e.g., snarkjs), ensure you’re on a template that emits the 3‑pairing variant and includes malleability checks. (docs.pantherprotocol.io)
Batching strategies you can actually deploy (and what they cost)
There are four practical patterns for “batching” Groth16 proofs. Pick based on how many proofs you settle per L1 period and how much latency you can tolerate.
1) Naive N× on-chain verification (don’t)
Cost: N × (207,700 + 7,160 × l) gas on BN254 with four pairings per proof; clearly unscalable for high N. Even with three pairings and code optimizations, you’re still linear in N. Use only for very small N or one-offs. (hackmd.io)
2) On-chain batch verification via random linear combination
You can batch-verify n proofs of the same circuit by combining their equations with random field coefficients, reducing pairings from ~3n or 4n to ~n+2, while still paying MSM over l per proof. This trims constant factors but remains O(n) and quickly hits gas limits as n grows. The batch method is standard in libraries (off-chain) and is well-understood cryptographically. On-chain, it’s attractive for n in the low tens. (encrypt.a41.io)
Concrete BN254 example (n = 64, l = 3):
- Pairings: 45,000 + 34,000 × (n + 2) = 45,000 + 34,000 × 66 = 2,289,000 gas.
- MSM over public inputs: 6,150 × l × n = 6,150 × 3 × 64 = 1,180,800 gas.
- Proof calldata: ~4,096 × 64 = 262,144 gas, before EIP‑7623 floor effects.
- Total ≈ 3.73M gas in one tx—doable, but leaves little headroom for other logic. (eips.ethereum.org)
3) Recursion/wrapping to a single Groth16/Plonk proof (today’s default for throughput)
You verify many leaf proofs inside a recursion circuit off-chain and post a single succinct proof to L1. On Ethereum, the best production stacks wrap large proof sets into one Groth16 or Plonk proof, keeping on-chain verification roughly constant. Empirical ranges we see in practice for the final on-chain verify are ≈200k–300k gas on BN254 for small l; you trade off seconds of off-chain proving for amortized on-chain cost. (7blocklabs.com)
- If you choose Groth16 as the outermost wrapper, you’re squarely in the 200k–300k verify band, with precise cost driven by l. Keep l tiny (one root, one block number, one domain separator). (hackmd.io)
- If you choose BLS12‑381 Groth16/Plonk as the wrapper, pairings are cheaper and MSM is precompiled, but proof bytes double. Model your calldata exposure in a world with EIP‑7623’s floor. (eips.ethereum.org)
4) Proof aggregation systems and verification layers
- SnarkPack (Groth16 aggregation) compresses n Groth16 proofs into an aggregated object with O(log n) verifier time and proof size; in practice, Protocol Labs reported aggregating 8,192 proofs in ~8–9s and verifying in tens of milliseconds. On-chain, that maps to “a few” pairings plus modest MSM—i.e., a constant-scale verify w.r.t. L1 gas even at very large n. Use when you can centralize aggregation and accept that single-aggregator trust model (or decentralize the aggregator). (research.protocol.ai)
- External verification layers (e.g., Aligned Layer) offload heavy verification to a restaked operator set and return a BLS-attested result to Ethereum. Published figures: ~350–380k gas base per batch and ~16k gas per consumer inclusion check, making per-proof cost ≈ 16k + 380k/n. This is compelling when you want near-constant L1 cost without building recursion yourself and can accept the AVS trust/latency profile. (blog.alignedlayer.com)
Real benchmarks you can budget against
Below are “decision-grade” comparisons for an L1 settlement that needs to attest to N leaf Groth16 proofs of the same circuit with l public inputs each. We show BN254 numbers; BLS12‑381 follows analogously with slightly cheaper pairings but double calldata and different MSM pricing.
Assumptions for BN254 unless stated:
- Single verify: 207,700 + 7,160 × l gas. Batch verify: pairings scale to n + 2, MSM remains l per proof. Calldata: 256 bytes per proof (≈4,096 gas), subject to EIP‑7623 floors when data-heavy. All numbers rounded. (hackmd.io)
- N = 64, l = 3
- Naive (64 separate verifies): 64 × (207,700 + 21,480) ≈ 14.7M gas. Not viable in a single block. (hackmd.io)
- On-chain batch verify (random linear combo):
- Pairings ≈ 2.289M, MSM ≈ 1.181M, calldata ≈ 0.262M → ≈ 3.73M gas total. (eips.ethereum.org)
- Recursive wrapper to one Groth16 (outer l = 2–3):
- ≈ 220k–235k gas. Savings vs. naive: ~98.5%. Savings vs. on-chain batch: ~93–94%. (hackmd.io)
- Verification layer (Aligned):
- ~380k base + 64 × 16k = 1.4M gas; per-proof amortized ≈ 22k. Savings vs. on-chain batch: ~62%. (docs.electron.dev)
- N = 256, l = 2
- Naive: ~56.9M gas (non-starter).
- On-chain batch verify:
- Pairings: 45k + 34k × 258 = 8.787M; MSM: 6,150 × 2 × 256 = 3.149M; calldata: 1.05M → ≈ 12.99M gas (oversized for a single 30–45M gas block after other activity).
- Recursive wrapper:
- ~220k–230k gas (outer l = 2). Even with EIP‑7623, calldata is small (single proof). This is the proven path for high throughput. (hackmd.io)
- Verification layer:
- ~380k + 256 × 16k ≈ 4.48M gas if you need per-consumer inclusion checks en masse. If you only post once to a hub and let consumers query off-chain, on-chain gas stays near constant. (docs.electron.dev)
- BLS12‑381 variant (post‑Pectra), N = 64, l = 3, aggregated into one BLS12‑381 Groth16
- Pairings: 37,700 + 32,600 × 4 = 167,100 gas.
- MSM: use G1MSM precompile; at l = 3 the discount is minor (per-scalar near ~9–12k), so add ≲ 30–36k gas.
- Calldata: 512 bytes proof ≈ 8,192 gas; small vs. pairing/MSM.
- Total verify: ≈ 210–225k gas, comparable to BN254 but with stronger security and better scaling for larger MSMs; re-check if your tx becomes “data heavy” (EIP‑7623). (eips.ethereum.org)
Emerging best practices for gas-efficient Groth16 batching in production
- Prefer recursion/wrapping for high throughput; batch-verify on-chain only for small n
- If you settle dozens to thousands of leaf proofs per L1 interval, design a recursion tree and wrap to a single Groth16 or Plonk proof. This is the most battle-tested way to turn O(n) verifies into ≈O(1) gas. Keep outer public inputs to a bare minimum (e.g., a Merkle root, L2 state root, and a counter). (7blocklabs.com)
- Shrink public inputs aggressively (they’re your gas multiplier)
- Every public input costs ~6,150–7,160 gas on BN254 and a similar order on BLS12‑381 with MSM. Hash or commit auxiliary data off-chain and expose only compact commitments on-chain. This is often the single highest-ROI optimization. (hackmd.io)
- Switch to the 3‑pairing verifier
- Audit your verifier to ensure it uses the 3‑pairing product equation and rejects malleable points via < q checks. The 3‑pairing swap saves ≈34k/verify on BN254 and ≈32.6k on BLS12‑381. If you rely on codegen (e.g., snarkjs), ensure your template is modern and includes the hardening changes. (github.com)
- Re-evaluate your curve post‑Pectra
- BN254 still wins on calldata size and a well-tuned cost model. But with EIP‑2537, BLS12‑381 pairings are slightly cheaper, MSM is natively precompiled with volume discounts, and you gain 120‑bit+ security. For large-l circuits or big MSMs, BLS12‑381 can be neutral-to-better in gas even after doubling the proof bytes. Run your numbers. (eips.ethereum.org)
- Account for calldata floors (EIP‑7623)
- For “data-heavy” txs (e.g., posting many proofs), calldata may be charged at 10/40 gas per byte minimum. This tilts you toward recursion (one small proof) or a verification layer (constant-size attestations) rather than uploading many proofs in one call. (eips.ethereum.org)
- If you must batch on-chain, cap n and keep l tiny
- For n in the low tens and l ≤ 3, on-chain batch verification is viable and keeps latency minimal (no waiting for recursion). Estimate using: gas ≈ (45k + 34k × (n + 2)) + n × l × 6,150 + 4,096 × n (BN254), then add safety headroom and any application logic. (eips.ethereum.org)
- Consider an external verification layer when you need “constant” L1 gas without building recursion
- If you prefer operational simplicity, outsource verification to a restaked network (e.g., Aligned) that returns a BLS-attested result. Budget ~350–380k gas base and ~16k per consumer inclusion check, with near-constant L1 cost as n grows. Evaluate latency/security trade-offs vs. native recursion. (blog.alignedlayer.com)
- Prover-side advances are your friend
- GPU-accelerated Groth16 stacks (e.g., ICICLE-Snark) materially reduce off-chain aggregation latency; combine with recursion to keep L1 cost near-constant while hitting aggressive throughput SLOs. (ingonyama.com)
Implementation notes your engineers will thank you for
- Precompiles and addresses:
- BN254: ECADD 0x06 (150 gas), ECMUL 0x07 (6,000 gas), ECPAIRING 0x08 (45,000 + 34,000·k). (eips.ethereum.org)
- BLS12‑381 (EIP‑2537): G1ADD 0x0b (375), G1MSM 0x0c (discounted), G2ADD 0x0d (600), G2MSM 0x0e, PAIRING 0x0f (37,700 + 32,600·k), MAP FP→G1 0x10, MAP FP2→G2 0x11. Use MSM where possible. (eips.ethereum.org)
- Solidity verifier hygiene:
- Enforce < q checks and subgroup rules to prevent malleability; recent verifier templates (e.g., snarkjs updates) include these fixes. (github.com)
- Use custom errors over string requires; compile via IR; minimize memory copies when calling precompiles; pass calldata pointers directly when safe. These are incremental but free gas wins at scale. (ethereum.stackexchange.com)
- Calldata planning:
- BN254 element sizes: G1 64 bytes, G2 128 bytes. BLS12‑381: G1 128 bytes, G2 256 bytes. Don’t attempt point compression for precompile calls; both precompiles expect uncompressed coordinates. (xn--2-umb.com)
- Public input layout:
- Favor hashing or Merkle-accumulation to one or two field elements. On BN254, each additional input is ~7,160 gas; on BLS12‑381 with MSM, per-scalar cost depends on k via discounts, but is of the same order when l is small. (hackmd.io)
- Three-pairing product:
- Review your pairing product equation and move to a 3‑pairing check if your stack still uses four; we routinely measure ≈15–20% savings on the pairing portion. Validate correctness with formal tests and cross-implementations. (xn--2-umb.com)
When to choose which approach (quick decision matrix)
- You settle ≤ 16 leaf proofs per L1 interval and need sub-second settlement: use on-chain batch verification (random linear combination), keep l ≤ 3, and implement 3‑pairing verifier. Revisit if EIP‑7623 makes the tx “data heavy.” (encrypt.a41.io)
- You settle 10s–1000s of leaf proofs and can tolerate seconds of off-chain latency: use recursion to a single Groth16 on BN254 or BLS12‑381. This is the dominant, production-friendly path. (7blocklabs.com)
- You want constant L1 gas without operating recursion infra: use a verification layer (e.g., Aligned), budget ~350–380k base + ~16k per consumer inclusion check, and document trust/latency trade-offs. (blog.alignedlayer.com)
- You already generate Groth16 proofs en masse and want the fewest possible L1 pairings: evaluate SnarkPack; it yields O(log n) verifier time and tiny on-chain footprints when integrated as your settlement object. (research.protocol.ai)
Putting it all together: a high-throughput rollup plan
- Target architecture
- Leaf circuits emit Groth16 proofs per batch of L2 transactions.
- A recursion/aggregation service produces one wrapper proof per L1 posting interval.
- The L1 verifier contract:
- Uses a 3‑pairing check.
- Exposes one or two field-element public inputs (state root and domain separator).
- Supports both BN254 and BLS12‑381 verifiers behind a timelocked upgrade path so you can migrate if/when MSM-heavy l arises or if you consolidate onto BLS12‑381 infra. (7blocklabs.com)
- Budget
- BN254 wrapper (l = 2): ~220k gas/settlement.
- BLS12‑381 wrapper (l = 2): ~210–225k gas/settlement, with 2× proof calldata. Re-evaluate under EIP‑7623 depending on other calldata in the tx. (eips.ethereum.org)
- Scalability knobs
- If your L2 intervals get very proof-dense, either deepen recursion (same on-chain cost) or move verification to an AVS and post a BLS-attested result at ~350–380k gas (constant). (blog.alignedlayer.com)
Key takeaways for decision‑makers
- Groth16 remains the cheapest way to settle proofs on Ethereum L1. Expect ≈200–300k gas per batch if you wrap recursively or aggregate—regardless of how many leaf proofs you include. (7blocklabs.com)
- The biggest gas lever you control is public input count. Every input you eliminate saves you ~6–8k gas on BN254 and similar scale on BLS12‑381 MSM. (hackmd.io)
- Post‑Pectra, BLS12‑381 is real: slightly cheaper pairings, native MSM, stronger security—at the cost of 2× calldata per proof. Model your own l and calldata profile before switching the outer curve. (eips.ethereum.org)
- For very large batches without building recursion infra, verification layers deliver near-constant L1 gas and high throughput, at the cost of an AVS trust/latency trade-off. (blog.alignedlayer.com)
If you want a tailored, numbers-first plan for your rollup’s proof pipeline, 7Block Labs can benchmark your circuits across BN254 vs. BLS12‑381, size the recursion tree, and produce a production verifier with the 3‑pairing optimization and malleability hardening.
Sources and further reading
- EIP‑1108: BN254 precompiles (ECADD, ECMUL, pairing cost = 45,000 + 34,000·k). (eips.ethereum.org)
- EIP‑2537: BLS12‑381 precompiles (pairing cost = 37,700 + 32,600·k; MSM discounts; 64‑byte field encoding). (eips.ethereum.org)
- EIP‑7623: Increased calldata cost floor for data-heavy txs. (eips.ethereum.org)
- Groth16 verification gas model on BN254 (207,700 + 7,160 × l) and sizing. (hackmd.io)
- Groth16 “3‑pairing” discussion and template pointers; malleability checks in modern verifiers. (xn--2-umb.com)
- SnarkPack aggregation (O(log n) verifier), performance on 8,192 proofs. (research.protocol.ai)
- External verification layers and measured gas. (blog.alignedlayer.com)
- Prover acceleration: ICICLE‑Snark (GPU). (ingonyama.com)
Like what you're reading? Let's build together.
Get a free 30‑minute consultation with our engineering team.

