Web3 application penetration testing: Scope Creep Traps and How to Avoid Them

A practical guide for scoping Web3 pentests without burning time or budget. Learn the 12 most common scope-creep traps across L1/L2, AA/4337, bridges, oracles, MEV, and supply chain—and how to lock them down with concrete acceptance criteria, examples, and checklists.

Who this is for

Decision-makers and security leads at startups and enterprises planning or running a Web3 security engagement with internal teams or external vendors (audits, contests, or penetration tests).

Why Web3 pentests are uniquely vulnerable to scope creep

Unlike Web2, “the app” in Web3 is a moving target: on-chain logic spread across multiple networks; off-chain services (relayers, bundlers, oracles); cross-chain bridges; wallet libraries; and L2 rollup infrastructure that’s itself evolving. Meanwhile, the cost of missing something material remains high: during 2024, hundreds of millions were still lost to hacks and rug pulls, with Ethereum and BNB Chain the most targeted networks. (coindesk.com)

At the same time, the security baseline is maturing fast. OWASP released a Smart Contract Top 10 that incorporates modern DeFi risks like price oracle manipulation and flash-loan borne exploits. Your scope must reflect this new risk landscape—not just vintage SWC checks. (scs.owasp.org)

Below are the 12 traps where Web3 pentest scopes most often sprawl or miss critical coverage—and exactly how to avoid them.

Trap 1 — L2 finality and “training wheels” are hand‑waved

What goes out of scope:

Fault/fraud proofs (are they live, who can challenge, what’s the challenge window?).
Forced transaction inclusion during sequencer downtime.
Security council powers and upgrade procedures that can bypass “trustless” withdrawals.

Why it matters:

OP Stack chains shipped permissionless fault proofs in 2024 (“Stage 1”), but still retain a Security Council that can intervene. If your protocol relies on L2 withdrawals/finality guarantees, your security assumptions can change overnight without testing those control planes. (cointelegraph.com)
Forced-inclusion semantics (timing, gas caps, and “sequencer window”) materially affect censorship resistance and emergency exits. (docs.optimism.io)

How to scope it right:

Declare targeted networks and their decentralization stage and proof status (e.g., “Base Mainnet, Stage 1 with permissionless fault proofs as of Oct 2024”). (theblock.co)
Include acceptance tests for: forced L2 deposit inclusion via the L1 portal; behavior across <30m, 30m–12h, and >12h sequencer downtime windows. (docs.optimism.io)
State if governance multisigs or security councils that can halt/upgrade rollup contracts are in scope to simulate adverse changes (timelocks, notice periods). (coindesk.com)

Trap 2 — Account Abstraction (ERC‑4337) gets lumped into “wallet testing”

What goes out of scope:

EntryPoint version and configuration.
Bundler simulation guarantees.
Paymaster griefing/abuse and signature packing edge cases.

Why it matters:

Real incidents and research have shown signature/packing weaknesses and paymaster abuse paths if implementers deviate from spec or rely on off-chain packing. (alchemy.com)
The AA spec includes explicit griefing protections and validation constraints that require deterministic testing. (docs.erc4337.io)

How to scope it right:

Pin EntryPoint version(s), target chains, and paymaster types in scope (e.g., verifying paymaster, ERC‑20 paymaster), and require tests for packed UserOperation hashing equivalence between signers and on-chain validation. (alchemy.com)
Include bundler-simulation acceptance criteria: reject UserOps that exceed gas or non-deterministic checks; test griefing via replays and reverted-sponsored calls. (docs.erc4337.io)
If you rely on community reference implementations, include precisely which repos/commits are covered. (docs.erc4337.io)

Trap 3 — “Bridge in the middle” is waved away as a third party

What goes out of scope:

Messaging/verification on each connected chain, guardians/validators, program constraints.
Rate-limits, governance powers (e.g., pausing or upgrading bridge contracts).
Cross-chain replay and partial failure modes.

Why it matters:

Scope boundaries are the difference between catching “extractable TVL” design flaws vs. missing them. Mature bug bounties enumerate exact assets, chain components, and prohibited activities; your pentest should be just as explicit. (immunefi.com)

How to scope it right:

Enumerate every bridge/messaging stack you depend on and what you’re testing: “our integration only,” “our integration plus bridge contracts on chain X,” or “end-to-end including guardians/validators.” Borrow the pattern of “Assets in Scope” and “Impacts in Scope” tables from bounties. (immunefi.com)
Respect prohibited activities (e.g., no mainnet destructive tests; use local forks); require POCs on forked networks. (immunefi.com)
For risk context, align on a bridge risk taxonomy (native verification vs. external validators vs. optimistic bridges) and test accordingly. (forum.l2beat.com)

Trap 4 — Oracle risk is treated as “read-only”

What goes out of scope:

Heartbeat/deviation thresholds and staleness checks.
Confidence-interval aware pricing and adversarial selection.
Multi-feed composition drift.

Why it matters:

Chainlink feeds update on deviation or heartbeat; stale data and per-chain differences can break assumptions. Pyth exposes confidence intervals that should drive conservative valuations and settlement policies. These are testable behaviors. (docs.chain.link)

How to scope it right:

Require invariant tests for “price freshness” using
```
updatedAt
```
/round timestamps (Chainlink) or “no older than” guards (Pyth), including failure actions. (docs.chain.link)
For derivatives/lending, specify confidence-interval usage (e.g., use μ+σ for liabilities) and define thresholds that trigger pauses. (docs.pyth.network)
Declare precise feeds (addresses, networks) and expected heartbeat/deviation to test against; don’t assume mainnet values across chains. (data.chain.link)

Trap 5 — Upgradeability is “just OZ proxies”

What goes out of scope:

UUPS vs. Transparent vs. Beacon differences.
ERC‑1967 slots, proxiable UUID checks, and non‑UUPS upgrade lock‑in.
Timelock coverage and governance execution testing.

Why it matters:

Mis-scoped upgrades can brick proxies or remove safety checks; security of upgrades depends on the pattern and access control, not just “we use OpenZeppelin.” (docs.openzeppelin.com)

How to scope it right:

State upgrade pattern(s) and test cases: reject non‑UUPS implementations; verify
```
proxiableUUID
```
; simulate admin mistakes (direct calls to implementation) and ensure
```
onlyProxy()
```
enforcement. (docs.openzeppelin.com)
Include TimelockController behaviors in scope—minimum delays, proposer/executor role setup, and user exit windows tied to timelocks. (docs.openzeppelin.com)

Trap 6 — MEV/front‑running is filed under “out of scope mempool”

What goes out of scope:

Route‑level sandwich risk for swaps/mints.
Private orderflow policies and Protect RPC behavior.
Refunds and delivery guarantees for private transactions.

Why it matters:

In many protocols, the “exploit” is profitable reordering, not a bug. If UX depends on private submission (e.g., via Flashbots Protect), you must test it—including defaults, rate limits, and fallbacks when private inclusion fails. (docs.flashbots.net)

How to scope it right:

Specify whether to use private RPC in tests; pin Protect settings (builders, retries, mempool fallback) and acceptance: “no on‑chain reveal if revert,” “retry N blocks,” “status observable via Protect API.” (docs.flashbots.net)
Include simulated sandwich attempts on forks vs. private-path submission to validate mitigations. (docs.flashbots.net)

Trap 7 — Supply chain risks are “DevOps’ problem”

What goes out of scope:

Package integrity for wallet connectors and SDKs.
Build provenance and attestations.
Release controls for security‑sensitive libraries.

Why it matters:

A December 2023 Ledger Connect Kit compromise injected a wallet drainer through a popular npm package. Dapps integrating it were impacted until they upgraded—classic scope leak across “someone else’s library.” (ledger.com)
Modern frameworks like SLSA define practical levels for build provenance and hardened builders; you can require them in your SoW. (slsa.dev)

How to scope it right:

Make supply chain controls in‑scope: require signed releases or SLSA Build L2+ attestations for critical off‑chain artifacts (SDKs, connectors, relayers); simulate dependency downgrade/poison tests in staging. (slsa.dev)
Document dependency pins and update playbooks; include rollback testing for compromised packages. (ledger.com)

Trap 8 — “Runtime security” is left to ops, not tested

What goes out of scope:

On‑chain monitoring of admin actions, pausing, minting, and upgrade queues.
Automated response (pause, scope‑limited killswitches) and alert fidelity.

Why it matters:

You can and should test whether monitors would have caught critical events and whether responses (pauses, role revocations) work as designed. With OpenZeppelin pivoting to open‑source Monitor/Relayer and sunsetting Defender SaaS by July 1, 2026, teams must own their runtime stack. (blog.openzeppelin.com)

How to scope it right:

Include monitors in scope: define event filters (e.g., Timelock
```
CallScheduled
```
, role grants, Pausable toggles), alert routes, and “break-glass” Actions. Validate alert-to-action SLOs on testnets/forks. (docs.openzeppelin.com)

Trap 9 — Contest vs. audit vs. pentest: mismatched expectations

What goes out of scope:

Commit freezes, scope diffs, and mitigation reviews.
Repo preparedness (docs, threat model, test coverage).
Off‑chain components (APIs, bots, UIs) in code‑only contests.

Why it matters:

Contest platforms (Sherlock, Code4rena) optimize for breadth and speed on code snapshots; they can miss integrated behaviors unless scoped deliberately and backed by mitigation reviews. (sherlock.xyz)

How to scope it right:

Decide upfront: contest for code diff coverage + follow‑up mitigation review, or deep integrated pentest across off‑chain infra. Require scouts/pre‑audit scoping where available to confirm SoW realism. (code4rena.dev)

Trap 10 — “Tools = coverage” (SWC-only checklists)

What goes out of scope:

Modern vulnerability classes (oracle manipulation, AA griefing, bridge logic).
The reality that static/dynamic tools miss real‑world bugs without invariants and formal rules.

Why it matters:

SWC registry is no longer actively maintained; rely on modern standards (OWASP SCSVS, EthTrust) and layer tools with invariants and properties. Empirical work shows tools can underperform on real‑world code without realistic datasets. (github.com)

How to scope it right:

Require: Slither for static, Echidna for property fuzzing, Foundry invariants, and formal specs (Certora) for high‑risk modules—with specific properties to prove. (blog.trailofbits.com)

Trap 11 — “Mainnet fork” is unspecified

What goes out of scope:

Which block to fork, which RPC, and chain‑specific quirks.
Impersonation and rate limits that affect reproducibility.

Why it matters:

Reproducible forks depend on pinned block numbers and reliable archive RPCs; defaults change behavior test‑to‑test. (hardhat.org)

How to scope it right:

Pin per‑test forks (network, block, RPC URL); document impersonation accounts; verify limits (e.g., Flashbots Protect RPC read limits) for integrated tests. (v2.hardhat.org)

Trap 12 — “Governance is out of scope”

What goes out of scope:

Timelock windows, proposal flows, and edge‑case execution testing.
Emergency powers (pause/upgrade) and exit windows for users.

Why it matters:

Governance is your last control surface in production. Testing it requires simulated proposals, queueing, execution, and user exits within specified delays. (docs.openzeppelin.com)

How to scope it right:

Include governance E2E tests in scope: simulate proposals that upgrade implementations, change oracle feeds, or toggle Pausable; verify minimum delays are enforced and alerts fire. (docs.openzeppelin.com)

A copy‑paste scoping template that prevents scope creep

Use this as a starting point in your SoW/RFP.

Networks and stages
- Networks: Ethereum Mainnet (block X), Arbitrum One (block Y), Base (Stage 1 as of 2024‑10), OP Mainnet (Stage 1 as of 2024‑06). (cointelegraph.com)
- Forked testing: pin blocks and RPCs per test suite (Hardhat/Anvil). (hardhat.org)
On‑chain targets (exact)
- Contract addresses, ABIs, and proxy patterns (UUPS/Transparent/Beacon), with commit hashes for the code under review. Example format drawn from public audits that pin commits and folder scopes. (openzeppelin.com)
Cross‑chain and bridges
- Bridge(s) included, chains covered, and which components: “integration only,” “messaging contracts on X,” or “guardians/validators.” Add “Impacts in Scope” table style. (immunefi.com)
Account Abstraction (if any)
- EntryPoint version, bundler environment, paymaster type, sponsorship rules, griefing protections to test. (docs.erc4337.io)
Oracles
- Chainlink feed addresses with expected heartbeat/deviation; Pyth feeds with confidence thresholds and staleness windows; required fallback/pausing behavior. (docs.chain.link)
MEV and transaction delivery
- Use of private RPC, builder share, retries, refunds; explicit acceptance outcomes for revert behavior and on‑chain exposure. (docs.flashbots.net)
Governance and upgrades
- Timelock delays, proposer/executor roles, emergency pause tests; UUPS upgrade checks (
```
proxiableUUID
```
  ,
```
onlyProxy
```
  ). (docs.openzeppelin.com)
Runtime security
- Monitoring rules (events/functions), alert channels, and automated actions tested on testnet/fork. (docs.openzeppelin.com)
Supply chain and build provenance
- Require SLSA Build L2+ attestations or equivalent for critical off‑chain artifacts; verify lockfiles and release signing. (slsa.dev)
Out of scope (explicit)
- E.g., mainnet destructive testing, third‑party SaaS backends unless listed, off‑protocol UIs—mirroring bounty “Prohibited Activities.” (immunefi.com)
Deliverables
- Reproducible forks, PoCs, invariant/property specs, formal verification rules for critical modules, mitigation review window.

Two concrete example scopes

DeFi lending on an OP Stack L2 with Chainlink oracles and UUPS upgrades

In scope: lending markets, UUPS proxies, Timelock governance, Chainlink ETH/USD, WBTC/USD feeds; forced inclusion of L1 deposits to L2 in sequencer downtime tests; Stage‑1 fault proof assumptions. Tests must reject non‑UUPS upgrades and verify
```
proxiableUUID
```
. Governance delays must be enforced; alerts on
```
CallScheduled
```
and
```
Upgraded
```
events. (optimism.io)
Acceptance: stale price detection via
```
updatedAt
```
; pause on heartbeat miss; forced inclusion accepted after window expiry; timelock min delay ≥ specified hours. (docs.chain.link)

ERC‑4337 smart account + ERC‑20 paymaster on Base

In scope: EntryPoint v0.6+/v0.7+/v0.8.x (pin), VerifyingPaymaster, bundler simulation, signature packing equivalence, griefing resistance (invalid ops, replay, sponsorship policy). Include private submission path tests via Protect RPC for gasless UX. (docs.erc4337.io)
Acceptance: changing
```
initCode
```
/
```
callData
```
after signature must be caught; deterministic
```
validatePaymasterUserOp
```
; bundler rejects ops exceeding gas bounds; private path keeps reverted ops off‑chain. (alchemy.com)

Emerging best practices to bake into every 2025 scope

Use the OWASP Smart Contract Top 10 (2025) as your baseline coverage categories—don’t rely solely on SWC. (scs.owasp.org)
Map contest/audit outputs to SCSVS controls and demand a mitigation review—especially after public contests. (scs.owasp.org)
Treat bridges as first‑class dependencies with explicit impact scopes and multi‑chain PoCs. (immunefi.com)
Assume AA stacks are in scope whenever users don’t hold native gas; test paymaster economics and griefing protection. (docs.erc4337.io)
Require SLSA‑style attestations for wallet connectors/SDKs and add dependency compromise drills to your security runbooks. (slsa.dev)
Align L2 assumptions with current decentralization stage and proof systems, and test forced inclusion/withdrawal paths. (cointelegraph.com)

Tooling that supports strong scoping (and how to require it)

Static analysis: Slither (baseline). (blog.trailofbits.com)
Property fuzzing: Echidna; Foundry fuzz/invariants with pinned blocks and replay of counterexamples. (blog.trailofbits.com)
Formal verification: Certora Prover (rules for debt accounting, liquidation invariants, governance correctness). (docs.certora.com)
Private submission & MEV hygiene: Flashbots Protect RPC with explicit configuration. (docs.flashbots.net)

Put these in the SoW as required methodologies and deliverables—not “nice to have” tools.

Final checklist: questions that kill scope creep early

Which exact commits, addresses, and networks are you testing? Are forks pinned?
Which rollup proof/upgrade model are you assuming? Are forced‑inclusion and challenge windows tested?
Are AA components (EntryPoint, bundlers, paymasters) covered with deterministic validation tests?
Which oracles/feeds are covered with staleness and confidence‑aware invariants?
Which bridges/messaging layers are in scope, on which chains, and what impacts are covered?
Are governance and upgrades tested end‑to‑end with timelock delays and alerts?
Do you test runtime monitoring and break‑glass procedures?
Are supply chain protections (attestations, signed releases) verified?

If you can answer these concretely, you’ll avoid 80% of the time/cost overrun we see in Web3 pentests.

How 7Block Labs runs “scope‑tight” Web3 pentests

Scope Gating Workshop: 90 minutes to apply the template above, confirm dependencies, and negotiate out‑of‑scope items with business justification.
Release Candidate Pinning: code freeze and commit hash(es); generate a testable “audit kit” to eliminate documentation gaps before day 1. (This mirrors the commit‑pinned scopes used in public audits.) (openzeppelin.com)
Multi‑modal Testing: static + property fuzzing + invariants on forks + formal rules for high‑risk modules; optional private‑orderflow path tests. (blog.trailofbits.com)
Mitigation Review + Runtime Drill: fix‑review with the same team, plus a runtime monitoring tabletop to validate alerts and runbooks. (docs.openzeppelin.com)

Want a scoping session tailored to your stack (L2s, bridges, AA, oracles)? We’re happy to run one with your engineering and product teams.

References (selected)

OWASP Smart Contract Top 10 (2025), SCSVS. (scs.owasp.org)
2024 loss trends and targets. (coindesk.com)
OP Stack fault proofs and stages. (cointelegraph.com)
Forced transactions on OP Stack. (docs.optimism.io)
Wormhole bounty scope patterns. (immunefi.com)
ERC‑4337 risks and protections. (alchemy.com)
UUPS/1967 proxy safety. (docs.openzeppelin.com)
Chainlink and Pyth oracle best practices. (docs.chain.link)
Flashbots Protect RPC behavior. (docs.flashbots.net)
Ledger Connect Kit supply‑chain incident. (ledger.com)
SLSA supply chain levels. (slsa.dev)
Foundry/Hardhat forking guidance. (hardhat.org)
OpenZeppelin Defender sunset and OSS Monitor/Relayer. (blog.openzeppelin.com)

Web3 application penetration testing: Scope Creep Traps and How to Avoid Them

Who this is for

Why Web3 pentests are uniquely vulnerable to scope creep

Trap 1 — L2 finality and “training wheels” are hand‑waved

Trap 2 — Account Abstraction (ERC‑4337) gets lumped into “wallet testing”

Trap 3 — “Bridge in the middle” is waved away as a third party

Trap 4 — Oracle risk is treated as “read-only”

Trap 5 — Upgradeability is “just OZ proxies”

Trap 6 — MEV/front‑running is filed under “out of scope mempool”

Trap 7 — Supply chain risks are “DevOps’ problem”

Trap 8 — “Runtime security” is left to ops, not tested

Trap 9 — Contest vs. audit vs. pentest: mismatched expectations

Trap 10 — “Tools = coverage” (SWC-only checklists)

Trap 11 — “Mainnet fork” is unspecified

Trap 12 — “Governance is out of scope”

A copy‑paste scoping template that prevents scope creep

Two concrete example scopes

Emerging best practices to bake into every 2025 scope

Tooling that supports strong scoping (and how to require it)

Final checklist: questions that kill scope creep early

How 7Block Labs runs “scope‑tight” Web3 pentests

References (selected)

Like what you're reading? Let's build together.

Related Posts

web3 anwendungs-penetrationstests: Testfälle für Smart-Account Wallets und Signaturen

web3 application penetration testing for Wallet Connectors: Common Attack Paths

Is WalletConnect Safe? Security Analysis for 2025