Web3 Application Penetration Testing: A Practical Playbook for Dev Teams

Short description: A 2025-first playbook for pen‑testing modern web3 apps across smart contracts, wallets, L2s, bridges, and MEV surfaces—complete with concrete test cases for EIP‑7702 programmable EOAs, rollup fault proofs, private mempools, and invariant-driven fuzzing.

Why this playbook now

2025 has been unusually volatile for blockchain security. Q1 alone set a record $1.64B in losses, largely driven by a catastrophic single incident but with steady exploit pressure elsewhere. Decision‑makers can’t rely on “just audits” anymore; you need repeatable pen‑testing that covers on-chain and off-chain attack paths. (theblock.co)
At the same time, Ethereum’s Pectra upgrade (May 7, 2025) introduced EIP‑7702 programmable EOAs on mainnet, changing wallet threat models and creating new testing requirements for transaction flows, front‑ends, and signers. (eips.ethereum.org)
Drainer kits and phishing remain a major loss driver: 2024 saw ~$494M drained via malicious signatures, with 2025 waves exploiting 7702-style batch signatures. Pen testing has to validate signature UX, simulation, and transaction policy—not only bytecode. (drops.scamsniffer.io)
Rollups are maturing: OP Stack fault proofs are permissionless on OP Mainnet; Base and Arbitrum have moved to Stage 1 with permissionless validation (BoLD on Arbitrum). Your tests should include forced-exit, challenge‑period, and governance‑key scenarios. (optimism.io)

What follows is a hands‑on playbook we use with engineering teams—from threat modeling to granular checks and tool commands—covering the full stack that real attackers exploit.

Scope first: a web3 pen‑test includes more than contracts

Define the attack surface in six layers and scope each explicitly in your statement of work:

Contracts and protocols

Solidity/Vyper code, proxies, upgrade paths, timelocks, access control, tokenomics and oracle logic.
Standards to map coverage: OWASP Smart Contract Top 10 (2025), EEA EthTrust Security Levels v3, and SCSVS v2 checklists. (scs.owasp.org)

Wallets and key material

EOAs, smart accounts (ERC‑4337), and EIP‑7702 programmable EOAs. Hardware, MPC/TSS, and hot‑wallet boundaries. (ethereum.org)

L2s, bridges, and cross‑chain paths

Canonical bridges, message passing, forced-exit mechanisms, challenge windows, and Security Council controls. Use L2BEAT’s Stages framework as your maturity yardstick. (l2beat.com)

Off‑chain services

Relayers, indexers, oracles, pricing feeds, lambdas, queues, and webhook consumers.

Front‑ends and supply chain

Domains, DNS, CDN, signing modals, NPM dependencies, CI/CD secrets, analytics scripts.

Transaction routing and MEV surface

Public vs. private mempool submission, refund/relay policies, and bundle behavior. (docs.flashbots.net)

Deliverables should include: threat model, test plan per layer, evidence of exploit attempts, PoCs with reproduction steps, and verified remediation guidance keyed to standards (EthTrust levels, OWASP SC Top 10, SCSVS control IDs). (entethalliance.org)

Threat modeling updates for 2025

Programmable EOAs (EIP‑7702): Identify every user journey that can add or change the delegation indicator (0xef0100 || address) and the lifetime of delegation. Explicitly model revocation, replay, and phishing flows, and align signer policies accordingly. (eips.ethereum.org)
Drainers and signature traps: Treat “what can a single blind signature authorize?” as a first‑class risk. Validate that your simulation layer and wallet prompts surface net balance deltas and NFT transfers before signing. Reference recent drainer statistics to prioritize scenarios. (drops.scamsniffer.io)
Rollup governance and exits: Confirm your dependency chains meet Stage 1 properties—≥7‑day challenge windows for optimistic systems and the ≥75% Security Council compromise model—then pen‑test those assumptions via mainnet‑like testnets. (forum.l2beat.com)

A step‑by‑step pen‑test playbook

1) Preparation and baseline

Map standards → tests
- SC01–SC10 from OWASP Smart Contract Top 10 to unit/fuzz/formal checks.
- EthTrust v3 requirements to static/dynamic checks; publish coverage.
- SCSVS v2 items as your “pentest checklist” for contracts, integrations, and design. (scs.owasp.org)
Toolchain and CI
- Static analysis: Slither (detectors plus upgradeability review); wire into CI. (github.com)
- Fuzzing/invariants: Foundry fuzz/invariant tests; Echidna/Medusa for long‑running fuzz; capture corpus for regressions. (blog.trailofbits.com)
- Symbolic testing: Halmos or Manticore for path exploration on tricky math/authorization flows. (github.com)
- Formal verification (selected properties): Certora Prover for critical invariants (equivalences, access controls, token accounting). Track CVL rule coverage in CI. (docs.certora.com)
- Transaction simulation: Tenderly single and bundle simulations—front‑end hooks and CI guards for “dangerous deltas.” (docs.tenderly.co)
Data points to watch
- External risk trendlines (Immunefi/Chainalysis/TRM) to reprioritize tests quarterly—e.g., CeFi private‑key compromises vs. DeFi logic flaws. (theblock.co)

2) Contract layer: attack-driven testing

Static quick pass
- Run Slither on the full Foundry/Hardhat repo; fail builds on critical detectors (reentrancy, delegatecall to untrusted, unprotected upgrade functions). Include the built‑in upgradeability review. (github.com)
Invariant‑driven fuzzing (IDD)
- Write system‑level invariants (e.g., sum(balances) == totalSupply, debt conservation, collateralization bounds). Use Echidna’s exploration and assertion modes; let long jobs run to billions of iterations via cloud runners. Trail of Bits’ Curvance engagement showed how expanding invariants uncovered criticals. (blog.trailofbits.com)
- Keep corpora: rebases can invalidate fuzz corpora—preserve, port, and shrink sequences to keep coverage high after refactors. (blog.trailofbits.com)
Symbolic tests where fuzzing stalls
- Use Halmos to symbolically test authorization matrices or batched flows (e.g., complex approve/transferFrom ladders) and look for counterexamples; Manticore for crafted paths on revert edges. (github.com)
Formal specs for “cannot fail” properties
- Adopt Certora CVL rules for access control, pausing, and token accounting; track CLI 5.0 changes if you upgraded in 2025 (parametric rules coverage). Include rule reports in pen‑test deliverables. (docs.certora.com)
Update your taxonomy
- SWC remains useful but unmaintained; align to EthTrust v3 and OWASP SC Top 10 (2025) to capture newer issues (e.g., oracle manipulation tiers, unchecked external call patterns in modern proxy designs). (github.com)

3) EIP‑7702 programmable EOA testing (post‑Pectra)

Key 7702 mechanics to validate in tests: a Type‑4 tx can set a delegation indicator (0xef0100 || delegate) for the EOA; calls then execute the delegate’s code in the EOA’s context. The authorization list carries tuples [chain_id, address, nonce, y_parity, r, s], and certain intrinsic gas and refund paths differ from pre‑Pectra rules. (eips.ethereum.org)

Concrete test cases:

Enforce a delegate allow‑list: your front‑end should block setting delegation to untrusted delegates; pen‑test by hosting a malicious delegate and ensuring prompts/simulations flag net asset drains. (docs.tenderly.co)
TTL and revocation: verify that delegation resets to null address when revoked; confirm UI and signer policy trigger revocation after a transaction batch or time‑boxed session. (eips.ethereum.org)
Simulation as a policy gate: require simulations for Type‑4 txs and fail if predicted deltas include approvals/transfer of unrelated assets. Include bundle simulations when the delegate performs batched calls. (docs.tenderly.co)
Phishing drill: red‑team drainer flows using “swap‑looking” transactions that approve dozens of tokens via 7702 delegation; validate that your wallet modal and back‑end heuristics detect and block. Industry reports show attackers abusing 7702 batch signatures in 2025. (dig.watch)

Also note: EIP‑3074 has been withdrawn; if legacy code refers to AUTH/AUTHCALL paths or expectations, retire them and port to 7702 semantics. (eips.ethereum.org)

4) L2 and bridge testing

Stage conformance checks (design)
- Use L2BEAT’s Stage 1 principle: only a ≥75% Security Council compromise (besides bugs) can indefinitely block or push invalid L2→L1 messages; optimistic rollups should enforce ≥7‑day challenge periods. Scope tests to these properties. (l2beat.com)
Fault‑proofs and exits (execution)
- OP Stack: verify permissionless fault proofs on OP Mainnet; for OP‑based chains (Base, etc.), confirm the chain’s upgrade to the same mechanism and rehearse inflight‑withdrawal invalidation and reproving after upgrades. (optimism.io)
- Base: validate permissionless proofs and Security Council controls as per its Stage‑1 transition; include outage drills for sequencer downtime and forced withdrawals. (coindesk.com)
- Arbitrum: BoLD is live (Feb 12, 2025). Pen‑test challenge windows, validator participation, and time‑bounded dispute assumptions; ensure bridging UIs surface settlement timelines. (docs.arbitrum.io)
Bridge attack readiness
- Data shows cross‑chain bridge logic attacks have caused multi‑billion‑dollar losses since 2021; add tests for event mismatches, replay, message ordering, and light‑client verification assumptions. (arxiv.org)

5) MEV and transaction‑routing tests

Private orderflow sanity
- Flashbots Protect: test privacy “hint” settings in staging (hash‑only vs. logs/calldata for refunds), mempool failover behavior, and revert protection; ensure dangerous flows never leak to public mempools. (docs.flashbots.net)
- MEV Blocker: benchmark inclusion time and price improvement on swaps; set full‑privacy endpoints for sensitive txs. Incorporate rebates into unit‑economics and confirm the RPC is wired only for swaps where you accept data sharing. (docs.cow.fi)
- Decommissioned products: if you still point to legacy RPCs (e.g., Eden), migrate; some private RPCs shut down in 2025. (theblock.co)
Sandwich and backrun drills
- Replay large swaps via public mempool in a fork to verify your slippage, deadline, and routing; then route through private RPCs and compare price/output, refunds, and failure rates. Publish a “routing SLO” based on your benchmarks. (docs.cow.fi)

6) Wallets, keys, and off‑chain services

Key management tabletop
- Simulate theft and rotation: rotate deployer/governance keys, pause via multisig, and recover via guardians or Security Council as designed. Practice a signer‑compromise scenario quarterly.
4337 paymasters and aggregators
- Review OpenZeppelin’s public 4337 audit notes (gas, deposit records, sig aggregation) and re‑test your paymaster logic. Add invariants around fee‑sponsored flows. (openzeppelin.com)
Front‑end integrity
- Build “modal invariants”: verify that what the user signs matches on‑chain effects (e.g., no extra approvals/NFT transfers). Fail builds if your dependency graph pulls unsigned third‑party scripts into critical user flows.

Putting it into practice: minimal viable test packs

Below are concrete packs teams can implement in a sprint, with examples you can paste into your repos/tools.

Static + upgradeability pack

Run Slither on CI; include upgradeability tool and ERC conformance checks.
Command example:

slither . --checklist --print inheritance,solc-version,contracts,functions
slither-check-upgradeability .

(github.com)

Invariant fuzz pack (Foundry + Echidna)

Create Foundry invariant tests plus Echidna properties mirroring your tokenomics.
Persist corpora in object storage; rebase-aware replay. Trail of Bits shows this materially improves bug‑finding over time. (blog.trailofbits.com)

Symbolic pack (Halmos)

For complex batched flows (DEX routers, multi‑asset accounting), add Halmos symbolic tests alongside Foundry—in particular for paths fuzzers struggle to reach. (github.com)

Formal pack (Certora)

Write a small CVL suite: supply conservation, access control, pause invariants, upgrade preconditions; target critical contracts only. Keep rule reports as artifacts. (docs.certora.com)

7702 policy pack

Add a policy service that rejects Type‑4 transactions pointing to non‑approved delegates; require Tenderly simulations for any delegation change and auto‑revoke after a bundle is mined. (eips.ethereum.org)

MEV routing pack

Default to private RPCs for swaps; set “hash‑only” privacy for sensitive orders, and enable mempool failover only for stale pending txs after >25 blocks. Benchmark against your public‑mempool path quarterly. (docs.flashbots.net)

Governance and rollup‑dependent drills

Security Council exercises
- Follow emerging best practices for Security Councils: key rotation cadence, emergency pause decision trees, and transparent incident comms. Include Council‑triggered pause/unpause drills in quarterly pen‑testing. (blog.openzeppelin.com)
Stage audits
- Ask your L2 providers to document Stage 1/2 status and challenge periods; pen‑test forced‑exit paths and message timeouts that align with L2BEAT’s framework (≥7‑day optimistic challenge). (forum.l2beat.com)

KPIs and what “good” looks like

Coverage and signal
- 90% of critical contracts covered by invariants with long‑run fuzz corpora; at least 3 formal properties proved on the most critical contract. (blog.trailofbits.com)
Time‑to‑detect (TTD) and time‑to‑revoke (TTR) for 7702 delegations
- Simulated deltas must render in <500 ms; revocation flows should complete within one block after bundle execution. (eips.ethereum.org)
Exit drills
- Successful forced‑exit rehearsals on OP/Arbitrum‑based rollups respecting challenge periods and post‑upgrade reproving patterns. Track failures after OP fault‑proof system upgrades. (help.superbridge.app)
MEV routing outcomes
- Private routing delivers measurably better swap output and no observed sandwiches in staging; document price improvement deltas and refund share policies. (docs.cow.fi)

Emerging practices to adopt in 2025

Invariant‑Driven Development (IDD) as a product culture
- Writing invariants first and checking them via fuzz/formal tools consistently outperforms “tests after code.” Bake invariants into design docs and code reviews. (blog.trailofbits.com)
Shift‑left simulation
- Simulate every signature‑producing action (front‑end and back‑end), render balance/NFT deltas, and block “implicit approval” patterns that drain unrelated assets. (docs.tenderly.co)
Standards‑aligned reporting
- Publish EthTrust v3 and SCSVS coverage alongside audit links; align issues to OWASP SC Top 10 categories to help non‑security stakeholders prioritize. (entethalliance.org)

Budget and timeline guidance

Two‑week “MVP” pentest for an MVP protocol/wallet
- Day 1–3: scoping, threat model, standards mapping
- Day 4–7: Slither and quick fuzzing pass; 7702 policy checks; MEV routing benchmarks
- Day 8–10: targeted symbolic/formal checks on critical flows; rollup exit drill on testnet
- Deliverables: prioritized findings, remediation diffs, policy configs, and CI setup
Six‑week “production‑ready” pentest
- Everything above + long‑run fuzz (cloud), formal properties for key invariants, full L2/bridge exercises, Security Council tabletop, and 7702 phishing red‑team campaign.

Final word

Security moved under your feet in 2025: EIP‑7702 changed wallet behavior, Stage‑1 rollups changed exit guarantees, and drainer kits exploited signature UX at scale. A modern pen‑test meets reality where it is—testing transaction policies, simulations, mempool routing, governance keys, and exits just as rigorously as bytecode.

If you need help customizing this playbook to your stack (EVM, OP/Arbitrum stacks, cross‑chain), 7Block Labs can turn it into a concrete backlog, CI pipelines, and drills your team can run every quarter.

References and data points

Worst quarter in hacks (Q1 2025) and monthly figures, Immunefi via The Block. (theblock.co)
2024 wallet‑drainer losses and 2025 phishing waves (7702 batch signatures) as tracked by ScamSniffer and industry recaps. (drops.scamsniffer.io)
Pectra/EIP‑7702 spec, status, and Type‑4 details (ethereum.org and EIPs). (eips.ethereum.org)
EIP‑3074 withdrawal (EIPs). (eips.ethereum.org)
L2BEAT Stage framework and Stage‑1 principle updates. (l2beat.com)
OP Stack fault proofs and Stage‑1 announcement (OP Labs). (optimism.io)
Base Stage‑1 transition; outage context for resilience testing. (coindesk.com)
Arbitrum BoLD deployment (docs and news). (docs.arbitrum.io)
MEV private RPC docs (Flashbots Protect, MEV Blocker) and configuration. (docs.flashbots.net)
Echidna/Medusa invariant work in practice; IDD guidance (Trail of Bits). (blog.trailofbits.com)
Slither static analysis and upgradeability review. (github.com)
Certora Prover docs and 2025 CLI updates. (docs.certora.com)
OWASP Smart Contract Top 10 (2025), EEA EthTrust v3, SCSVS v2. (scs.owasp.org)