Web3 Application Penetration Testing: Securing Smart Contracts and Frontends

Summary: Web3 security failures are no longer “just smart contract bugs”—recent exploits show equally critical weaknesses in dapp frontends, build pipelines, and L2 bridges. This post lays out a concrete, 2025-ready penetration testing playbook—what to test, how to test it, and which emerging standards and tools to incorporate to actually reduce risk.

Why pen‑test Web3 differently in 2025

In the past 24 months, the riskiest incidents were as much about UI and software supply chain as about on-chain logic:

Ledger Connect Kit’s npm package was hijacked via a compromised maintainer session, weaponizing many dapp frontends with a drainer payload within hours. (ledger.com)
Major crypto sites (CoinMarketCap in June 2025 and Cointelegraph days later) suffered front-end code injection that spawned fake wallet prompts—no protocol bug needed. (coindesk.com)
Phishing/drainer operations ballooned: Scam Sniffer tallied ~$494M stolen via wallet-drainer campaigns in 2024 alone. (drops.scamsniffer.io)

At the same time, the base layer evolved: Dencun (EIP‑4844) changed L2 data publishing; SELFDESTRUCT behavior was limited by EIP‑6780; Solidity compilers added performance and safety semantics; new account‑abstraction patterns (ERC‑4337, EIP‑7702 proposal) expanded the attack surface. Your pen test must reflect these changes. (eips.ethereum.org)

What changed technically—and why it affects your test scope

EIP‑4844 (proto‑danksharding) moved rollup data to “blobs” that are cheap and pruned in ~18 days; the EVM can’t read blob contents. That’s great for fees, but it alters assumptions about data availability and mempool behavior your app may implicitly make. Pen tests should validate you’re not relying on blob persistence or EVM access. (eips.ethereum.org)
EIP‑6780 changed SELFDESTRUCT: outside “create‑and‑destroy in the same tx,” it no longer deletes code/storage—only transfers ETH. This breaks old “metamorphic” patterns and affects upgrade/downgrade strategies your tests should probe. (eips.ethereum.org)
Solidity 0.8.25–0.8.29: compiler and EVM upgrades matter to security and performance tests—MCOPY for faster byte handling (0.8.25), custom errors in require, big IR pipeline perf wins (0.8.26–0.8.28), and experimental EOF backend (0.8.29). Validate your build settings, optimizer, and EVM target in CI; fuzz under those exact settings. (soliditylang.org)
Account abstraction: ERC‑4337 is widely deployed (smart accounts, paymasters, bundlers), but adds new DoS and griefing vectors. EIP‑7702 is being discussed as a way for EOAs to temporarily act like smart wallets—factor its semantics (and its status) into threat models as you plan upgrades. (docs.erc4337.io)

A Web3 pen‑testing scope that actually maps to real risk

We divide assessments into three planes:

On‑chain contracts (Solidity/Vyper/Rust on EVM/L2s/alt‑L1s)
Frontend and build supply chain (npm, CI/CD, scripts, CSP)
Off‑chain crypto plumbing (AA wallets, relayers/paymasters, oracles, indexers, bridges/L2 messaging)

The value is in testing the seams between these planes, where most incidents originate.

Smart contracts: tests and checks that catch today’s failures

Upgradeability and storage layout

UUPS/Transparent/Diamond proxies: verify
```
authorizeUpgrade
```
controls, EIP‑1967 slots, storage gaps, and forbid initializer re‑entry. Run storage‑collision diffing between versions. Validate there’s no implicit reliance on SELFDESTRUCT‑based redeploy patterns broken by EIP‑6780. (github.com)

Example: hardened UUPS gate

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.20;

import {UUPSUpgradeable} from "openzeppelin-contracts-upgradeable/proxy/utils/UUPSUpgradeable.sol";
import {AccessControlUpgradeable} from "openzeppelin-contracts-upgradeable/access/AccessControlUpgradeable.sol";

contract Treasury is UUPSUpgradeable, AccessControlUpgradeable {
    bytes32 public constant UPGRADER_ROLE = keccak256("UPGRADER_ROLE");
    uint256 public reserveCap;
    uint256[49] private __gap; // storage gap

    function initialize(address admin, uint256 cap) public initializer {
        __UUPSUpgradeable_init();
        __AccessControl_init();
        _grantRole(DEFAULT_ADMIN_ROLE, admin);
        _grantRole(UPGRADER_ROLE, admin);
        reserveCap = cap;
    }

    function _authorizeUpgrade(address newImpl)
        internal
        override
        onlyRole(UPGRADER_ROLE)
    {}
}

ERC‑4337 smart accounts and paymasters

Validate
```
validateUserOp()
```
determinism and gas bounds; test griefing on paymaster deposits with intentionally invalid ops; run bundler simulations and PostOp edge cases. Include fuzzing around timeouts and nonce handling. (docs.erc4337.io)

Permits and approvals (EIP‑2612, Permit2)

Replay protection: verify domain separator strategy (chainId changes, potential forks), per‑owner nonces, and deadlines. Simulate front‑running of
```
permit()
```
relays and UX impact. For Permit2 integrations, validate signer prompts in the UI and the max allowance duration logic. Pen tests should include phishing‑style flows that trick users into signing off‑chain approvals. (eips.ethereum.org)

L2 semantics, bridges, and cross‑domain messages

On OP Stack, test two‑step withdrawals (prove/finalize), Superchain pause hooks, and message replay/versioning, including behavior under paused messengers. Validate token bridges’ constraints (fee‑on‑transfer/rebasing). (optimism.io)
Perform adversarial tests against your app’s assumptions about message finality windows and replay across versions. Review known classes of bridge failures from recent SoK research to ensure design defenses are in place. (arxiv.org)

Fuzzing, invariants, and formal verification that scale

Foundry fuzz/invariants: build stateful fuzz harnesses that cover multi‑call sequences (e.g., deposit→borrow→liquidate→reenter) and invariants (e.g., total supply conservation, solvency). Use public examples as starting templates. (github.com)
Formal methods: integrate Certora Prover in CI to prove protocol invariants (e.g., “no unauthorized mint,” “reserves never negative”). Certora open‑sourced the Prover in 2025—use it on every change. (certora.com)

Maturity against compiler/EVM changes

Compile and fuzz against your production compiler version (e.g., 0.8.28/0.8.29), confirm MCOPY availability, and benchmark gas/memory changes. These details directly affect DoS costs, loop bounds, and on‑chain unit-economics. (soliditylang.org)

Frontend and supply chain: what to actually test now

Recent incidents show that a single compromised npm dependency or ad script can drain users at scale, even if your contracts are perfect.

Actionable test cases:

Malicious package injection: emulate the Ledger Connect Kit compromise by swapping a transitive dependency during build; confirm SRI checks and lockfiles prevent tampered bundles from shipping, and that CI fails on integrity mismatches. (ledger.com)
Third‑party script abuse: simulate in‑page script injection (via CDN or CMS slot) and verify your CSP blocks inline script execution and restricts connect-src to known RPC/wallet endpoints; enforce Trusted Types to shut down DOM‑XSS sinks. (developer.mozilla.org)
Malvertising and embeddables: validate that wallet connect prompts cannot be triggered from third‑party iframes/ads; reproduce the CoinMarketCap/Cointelegraph popup patterns against your site and ensure they fail under your CSP/reporting. (coindesk.com)
Email and analytics vendors: rehearse post‑breach comms and detection for list‑provider compromise (like CoinGecko’s GetResponse incident) to reduce phishing blast radius. (coingecko.com)
Ad networks on explorers: incorporate a “no‑click” control and script‑source denylist to protect users who open your site from compromised ad slots (multiple Etherscan‑targeted campaigns used ad routers). (getblock.net)

Supply chain hardening steps to verify:

Enforce strict lockfiles and checksum verification (npm/pnpm/yarn), require 2FA for maintainers, and adopt provenance/signing (e.g., Sigstore) in CI. Test that tampered artifacts fail the pipeline. (github.blog)
Use Subresource Integrity for any CDN‑hosted script and verify failure mode in browsers; do a red‑team drill where the CDN serves a modified hash. (developer.mozilla.org)
Turn on CSP with
```
require-trusted-types-for 'script'
```
and audited Trusted Types policies; run Chrome Lighthouse “Trusted Types” audit as a gate. (developer.chrome.com)

Off‑chain crypto plumbing: wallets, relayers, and simulations

Transaction simulation in the UI is now table stakes: integrate Tenderly simulations or equivalent so users preview balance deltas and approvals before signing. Pen tests should verify that malicious transactions are caught by the simulator and that UX blocks them. (docs.tenderly.co)
Operational reality check: If you use OpenZeppelin Defender for monitors/relayers, plan migration—new sign‑ups were disabled on June 30, 2025, with final shutdown scheduled July 1, 2026. We validate that observability and auto‑pauses work post‑migration. (docs.openzeppelin.com)
Wallet prompts: forbid raw
```
eth_sign
```
; prefer EIP‑712 typed data; surface spender/chainId/expiry in human‑readable form; verify your app never asks for unlimited allowances silently. Include phishing‑style tests around Permit2 signature prompts and ensure explainers warn users appropriately. (eips.ethereum.org)

Account Abstraction specifics: what we break on purpose

Paymaster griefing: we flood your paymaster with invalid userOps to probe deposit drain and check that
```
validatePaymasterUserOp()
```
is deterministic, gas‑bound, and guarded by staking slashing incentives. We also verify
```
postOp()
```
paths against revert‑loops and unexpected gas spikes. (docs.erc4337.io)
Session keys/delegation: we attempt scope escalation (e.g., bypassing
```
validUntil
```
/target restrictions) and test revocation UX. If you’re exploring EIP‑7702 patterns, we validate chain‑specific rules, storage side effects, and front‑run resistance of delegation updates. (docs.erc4337.io)
Bundler ecosystem: ensure off‑chain simulation parity with on‑chain behavior across networks (L2s with different gas accounting). We attempt to poison simulation caches and vary mempool conditions.

L2 and bridge testing: assumptions are where bugs hide

OP Stack bridge: validate prove/finalize flow, check replay protection and pause mechanisms (including Superchain‑wide pauses), and rehearse incident runbooks that use those levers. We simulate paused messengers and ensure deposits/withdrawals degrade safely. (optimism.io)
“EVM equivalence” expectations: we regression‑test for historical divergences (e.g., SELFDESTRUCT semantics) and ensure your contracts don’t depend on undefined behavior across L1/L2. (github.com)
Cross‑chain design: we map your architecture to a taxonomy of bridge design flaws and test for signature verification bypass, state verification gaps, centralized validator sets, and operator key risks using SoK‑derived checklists. (arxiv.org)

Red‑teaming the frontend like attackers did

We reproduce real‑world attacker TTPs:

DNS registrar social engineering → subdomain takeover → malicious WalletConnect prompt (as alleged in Balancer’s 2023 incident). We verify registrar lock settings, 2FA, and registry guardrails. (cointelegraph.com)
Ad‑script malvertising on high‑traffic pages → fake airdrop banners. Your CSP should block it; your client‑side detectors should flag it; your incident banner should route users to safety within minutes. We validate all three steps. (coindesk.com)
GTM/script‑tag injection → swapped approval target (KyberSwap’s 2022 UI exploit class). We confirm SRI breaks the load and your build disallows inline script execution. (blog.kyberswap.com)
npm maintainer/session compromise (Ledger 2023): we test whether a compromised dependency can ship to production or whether your pipeline’s provenance/SBOM/integrity gates block it. (ledger.com)

Practical examples you can copy into your backlog

Add simulation to approval flows:
- For any token approval (incl. Permit2), show a pre‑sign summary with spender, allowance, expiry, and chainId. Reject signatures if the sim shows transfers you didn’t initiate. Back it with Tenderly single‑tx simulation. (docs.tenderly.co)

Harden CSP and Trusted Types:

Enforce

Content-Security-Policy: default-src 'self'; script-src 'self' https://cdn.youcontrol/ 'strict-dynamic'; require-trusted-types-for 'script'; trusted-types appPolicy; connect-src 'self' https://mainnet.infura.io ...

Add SRI to any CDN script and assert that builds fail when hashes drift. (developer.chrome.com)

Approvals safety:
- Prefer bounded allowances and expiring approvals. Use EIP‑2612 with nonces+deadline; cache the domain separator correctly to avoid cross‑chain replay during forks. (eips.ethereum.org)
Compiler pinning & test realism:
- Pin to your production compiler (e.g., 0.8.28 or 0.8.29), enable via‑IR, and rerun fuzz+invariants after each update; track the Dencun/Cancun EVM targets. (soliditylang.org)

Tooling upgrades worth making this quarter

Slither‑MCP: augment code review with modern Slither and LLM‑assisted workflows to find variant bugs earlier in CI. We add detectors and train prompts on your codebase. (trailofbits.com)
Formal proofs in CI: adopt the now‑open‑source Certora Prover on the few invariants that matter for your protocol (e.g., solvency, one‑way upgrades). Treat proofs as build gates for high‑risk paths. (github.com)
Simulations everywhere: wire Tenderly RPC/API into staging and prod UIs, displaying asset deltas for every tx; fail closed if simulation returns unexpected side effects. (docs.tenderly.co)

A concise 2‑week pen‑test plan we run for startups and enterprises

Day 1–2: Architecture review and threat modeling (contracts, AA flows, L2 bridges, frontends, CI/CD). Artifact: threat map with attack trees tied to business impact.
Day 3–6: On‑chain testing (static analysis, Foundry fuzz/invariants, targeted manual review) under your exact compiler/EVM settings; formal checks on critical invariants.
Day 5–8: Frontend/supply‑chain exercises (CSP/Trusted Types/SRI validation, npm tamper drills, malvertising and popup phishing playbook).
Day 7–10: Off‑chain plumbing (bundler/paymaster griefing; simulation parity; bridge prove/finalize and pause drills; cross‑chain replay tests).
Day 11–12: Exploit reproduction and risk quantification with business metrics (loss ceilings, exploit paths, blast radius by user cohort).
Day 13–14: Fix‑oriented report and retest, with PR‑level patches or policy diffs your team can merge immediately.

Emerging best practices (2025) to institutionalize

Treat blob data (EIP‑4844) as ephemeral and off‑EVM—don’t design UX or proofs that assume long‑term blob availability or EVM access. Pen‑test for failure when blobs expire. (eips.ethereum.org)
Assume wallet drainer campaigns are continuous: require simulation and human‑readable typed data for every signature flow; never use
```
eth_sign
```
. Track phishing stats as a KPI. (drops.scamsniffer.io)
Pre‑approve incident levers on L2s: verify Superchain‑pause paths, messenger pause, and withdrawal halts work end‑to‑end with signoff from governance. (gov.optimism.io)
Keep up with compilers/EVM: budget for quarterly re‑tests after compiler/EVM upgrades (0.8.29’s EOF is experimental but already changes assumptions in assembly‑heavy code). (soliditylang.org)

Bottom line for decision‑makers

The biggest Web3 losses now often start outside your contracts—your pen test must attack the frontend, the vendor chain, the wallet prompts, and the bridge assumptions with the same rigor as it audits Solidity.
Modernize your security stack: simulations in‑UI, strong CSP/Trusted Types/SRI, AA/paymaster hardening, L2 bridge drills, formal proofs for critical invariants, and storage‑safe upgrades.
Make it continuous: tie these controls to CI, freeze builds on integrity or proof failures, and budget for re‑testing after protocol/compiler releases.

If you want a pen test that mirrors how 2025 attackers actually operate—and leaves you with patches and policies you can merge—7Block Labs can help. We’ll scope an engagement around your protocol’s unique risk profile and ship fixes, not just findings.