ByAUJay
Blockchain intelligence for Web3: Entity Resolution for Wallet Clusters and Exchanges
Short description: How leading Web3 teams reliably map pseudonymous addresses to real entities today. We go deep on practical heuristics for UTXO and account-based chains, PoR-backed exchange labeling, 4337/AA impacts, cross‑chain bridges, quality metrics, and a concrete 90‑day build plan.
Why entity resolution matters now
If you’re building in Web3, you already know “who owns what” is the root of market intelligence, compliance, and incident response. In 2025, two shifts make accurate entity resolution mission‑critical:
- Exchange transparency is rising (ongoing Proof‑of‑Reserves, address attestations), giving us high‑precision anchors to seed label graphs. OKX alone reports >650k verified addresses across 20+ chains and publishes signed “I am an OKX address” proofs with open‑source verification tooling. (prnewswire.com)
- The onchain UX stack is changing. Account abstraction (EIP‑4337, and EIP‑7702 in the Pectra era) moves many user txns behind bundlers and paymasters, altering attribution and clustering patterns. 2024–2025 saw >100M+ 4337 UserOps and heavy paymaster coverage, with a Base‑led spike and then a Q1 2025 pullback—shifting the “who pressed send” surface from EOAs to infrastructure. (ethereum.org)
This post is a practical field guide for decision‑makers: precise heuristics that work, where they break, what’s changed across major chains, and exactly how to stand up a production‑grade pipeline in 90 days.
From addresses to entities: what still works (and what doesn’t)
UTXO chains (Bitcoin, Litecoin, etc.)
Core heuristics
- Common‑input ownership (CIO): multiple inputs in one tx imply a single controller. This remains the safest baseline. (cacm.acm.org)
- Change‑address identification: conservative variants help cluster inputs with their (one‑time) change outputs; use sparingly to avoid super‑clusters. (cacm.acm.org)
- Miner coinbase clustering: useful for mining pools and validators. (mdpi.com)
Where it breaks
- CoinJoin/PayJoin disrupt CIO. 2024–2025 enforcement actions (e.g., Samourai/Whirlpool) reduced some coordinator‑led flows, but privacy tech (Wasabi/WabiSabi) still creates real false positives if you over‑cluster. Treat coinjoin‑adjacent flows as “do not merge” unless corroborated. (justice.gov)
- Regulatory landscape shifts are non‑trivial signals. OFAC’s 2022 Tornado Cash designation—and subsequent legal challenges—show how fast risk labels can change jurisdictionally. Use sanctions feeds as dynamic features, not absolute truths. (home.treasury.gov)
Practical tip
- Maintain a “mixer proximity score” that down‑weights CIO linkages within N hops of designated mixers or known coinjoin coordinators; don’t hard‑cut unless you’re in a sanctions‑screening workflow. (home.treasury.gov)
Account‑based chains (Ethereum, Tron, others)
Core heuristics
- Deposit‑address clustering to exchanges: unique deposit accounts that periodically sweep to one or more exchange hot wallets are high‑fidelity pivots. Academic and industry analyses show large, discoverable deposit graphs and high reuse rates by users. (researchgate.net)
- Behavior/interaction patterns: “airdrop multi‑participation” and “self‑authorization” add precision for EOAs; use contract‑interaction signatures and temporal motifs to link related EOAs. (researchgate.net)
Exchange anchors you can trust (examples)
- Etherscan name tags for well‑known exchange wallets (e.g., Binance “14”; Coinbase deposit gateways) are maintained and widely referenced. Use them as seeds, not the entire label set. (etherscan.io)
- Proof‑of‑Reserves (PoR) disclosures with signed address attestations (e.g., OKX “I am an OKX address” plus a public VerifyAddress tool) are gold‑standard ground truth across chains. (okx.com)
Network‑specific factors
- Tron’s USDT gravity well: with >$75–80B USDT on TRON in 2025 and dominant transfer share, entity resolution on TRON is largely stablecoin‑centric. Prioritizing TRC‑20 USDT flows to/from exchange clusters yields outsized coverage. (coindesk.com)
- XRP/Stellar memos: exchange custody often uses a shared address + destination tag/memo. Your entity resolver must treat “address, tag/memo” as the primary key; a missing tag isn’t a new user—it's a misattributed deposit. SEP‑29 even lets Stellar exchanges mark “memo required” on‑chain. (stellar.org)
AA/4337 impact
- Post‑4337, “who signed” ≠ “who sent.” UserOperations originate offchain; bundlers submit to EntryPoint; paymasters may fund gas. Attribution now requires decoding UserOps logs and mapping bundler/paymaster entities, not just the msg.sender of a transaction. 2024–2025 data shows heavy paymaster use (87% of UserOps in 2024) and concentration in a handful of bundlers (Coinbase, Alchemy, Pimlico, Biconomy), so maintain an AA‑specific label table. (medium.com)
Exchange labeling: from public proofs to complete clusters
A robust exchange entity graph is built from multiple independent anchors:
- PoR and signed ownership
- Use exchanges that publish address inventories and signatures (e.g., OKX’s signed “I am an OKX address” plus zk‑STARK/Merkle proof repos). Verify cryptographically and snapshot at block heights for auditability. (okx.com)
- Explorer name tags and label APIs
- Pull Etherscan name tags (and their API) for “Exchange” labels. Cross‑validate sample balances and historical activity to avoid squatting or vanity tags. (info.etherscan.com)
- Characteristic sweep patterns
- Deposit addresses receiving funds from many unique EOAs and sweeping to a small set of hot/warm wallets on tight cadences (e.g., 5–60 minutes) with “round‑amount” consolidation are strong exchange signals. Validate against a known seed (e.g., Binance hot “0x28C6…bf21d60”) and grow the cluster outward via bipartite flow patterns. (etherscan.io)
- Attestations and audits
- Some exchanges enable customer‑verifiable Merkle proofs (e.g., Kraken’s PoR flow) that indirectly confirm liabilities and reserve relationships. While not address labels per se, these increase confidence in the exchange’s on‑chain footprint. (kraken.com)
- Sanctions and takedowns
- Integrate OFAC actions (e.g., Garantex re‑designations in 2025) and law‑enforcement seizures as negative labels. Down‑rank clusters downstream of sanctioned CEX/OTC hubs; infer successor entities (e.g., Grinex) only with multi‑source corroboration. (home.treasury.gov)
Emerging best practice
- Prefer cryptographic ownership proofs (PoR files + signatures) over purely heuristic labeling. Where available, verifying ownership transforms an otherwise probabilistic cluster into ground truth, dramatically raising precision. (github.com)
Case‑level heuristics that deliver
Below are “ready to implement” heuristics with the knobs that actually matter in 2025.
Heuristic A: Deposit‑sweep discovery (EVM chains)
Goal: discover exchange deposit addresses and their hot wallets.
Signals
- In‑degree: N unique funders over T days (e.g., N ≥ 50 funders / 7 days).
- Sweep cadence: median time‑to‑sweep ≤ 60 minutes to a small set K of recipients (K ≤ 5).
- Net flow: >90% of balances empty within 24h, with residual dust.
- Token mix: presence of stables + long‑tail ERC‑20s consistent with an exchange’s listing set.
- Counterparties: intersection with known exchange hot(s) (seed set). (walletfinder.ai)
Validation
- Sample 30 addresses that meet criteria; confirm 10 via explorer labels or PoR‑published sets; iterate thresholds until ≥95% precision on the sample.
Heuristic B: XRP/Stellar shared‑address pinning
Goal: resolve entity at “shared address + tag/memo.”
Signals
- Repeated deposits to the same XRP address with unique DestinationTags that later credit a single offchain user account; treat “address, tag” as the entity key.
- Stellar accounts with config.memo_required set (SEP‑29) flag custodial receivers; wallets should block memo‑less sends to those accounts. (stellar.org)
Edge case
- Missing tag ≠ new entity; it’s a miscredited deposit. Keep a recovery queue, not a “new cluster.” (support.bitpay.com)
Heuristic C: TRON stablecoin funnels
Goal: prioritize TRC‑20 USDT flows for coverage.
Signals
- Given USDT’s dominance on TRON (>75B outstanding in 2025), cluster on USDT paths first; you’ll cover most exchange cash‑in/out behavior. Weight USDT flows > other tokens. (coindesk.com)
Compliance overlay
- Watch for sanctioned hubs where stablecoin issuers have frozen assets (e.g., wallet blocks tied to sanctioned exchanges). Frozen events change outflow graph dynamics. (reuters.com)
Heuristic D: 4337 attribution
Goal: attribute “who initiated” when a bundler submits the txn.
Signals
- Decode EntryPoint events to extract sender smart account, factory, paymaster. Maintain label tables for top bundlers/paymasters (Coinbase, Alchemy, Pimlico, Biconomy). For user‑level clustering, anchor on the smart account address; for B2B risk, anchor on the paymaster. (medium.com)
Cross‑chain flows: bridges as “connective tissue”
Bridges are now standard laundering and UX rails; entity resolution must follow messages across chains.
- Identify canonical bridges per network (e.g., Base’s L2StandardBridge on L1 and L2, and the Base‑Solana bridge contracts). Treat the bridge contracts as “portals” and preserve the entity label across the hop. (docs.base.org)
- TRM’s 2025 analysis notes a shift from mixers to cross‑chain bridges for obfuscation—your resolver should assign an “obfuscation score” to bridge‑heavy routes, especially when hops are frequent and values fragment. (trmlabs.com)
Practical example
- Label: “CEX‑A on Ethereum” (seeded via Etherscan/PoR). Observe withdrawals bridging to Base via L1StandardBridge, then receiving on Base to a known “CEX‑A on Base” hot wallet—propagate the entity label across the bridge and close the cluster under a single entity. (docs.base.org)
Quality control: how to know your clusters are right
Metrics that matter
- Precision first. Use PoR sets, signed messages, or explorer‑verified tags as ground truth.
- Gini impurity on tags per cluster: near‑zero Gini indicates purity; rising Gini signals over‑merging. Track Gini vs. threshold changes. (mdpi.com)
- Temporal stability: entity graphs should be stable over 30–90 days; high churn = over‑fitting to short‑term behavior.
Nuisance patterns to guard against
- Over‑aggressive change heuristics (UTXO) creating super‑clusters. Be conservative; CIO is safer. (cacm.acm.org)
- CoinJoin/Whirlpool residues: exclude or heavily down‑weight. Recent cases (arrests, pleas, sentencings) changed the topology; annotate your model with time‑bounded policy events. (justice.gov)
- FIFO/temporal linkers on mixers: promising but use with caution; 2025 research shows FIFO matching increases deanonymization, but confirm via multi‑signal corroboration. (arxiv.org)
Architecture blueprint: your 90‑day build
What you need to stand up a production‑grade resolver across BTC, ETH‑family, TRON, XRP/Stellar:
Days 1–15: Data and anchors
- Nodes/indexers: archive/API access for BTC, ETH L1/L2s, TRON, XRP/Stellar.
- Seed labels:
- PoR repos and signed address lists (OKX as exemplar) + VerifyAddress tooling. (okx.com)
- Explorer tags (Etherscan exchange label sets via API). (docs.etherscan.io)
- Sanctions feeds (OFAC SDN updates, mixer/exchange actions). (home.treasury.gov)
- Storage and compute: Spark/Cassandra or equivalent lakehouse; open‑source GraphSense’s stack is a solid reference for scale profiles. (graphsense.org)
Days 16–45: Heuristics and graph
- Implement CIO and conservative change heuristics for UTXO; add mixer proximity scores. (cacm.acm.org)
- Implement deposit‑sweep clustering (EVM) with thresholds above; add 4337 decoder and bundler/paymaster tables. (medium.com)
- Add XRP/Stellar composite keys (address + tag/memo) and SEP‑29 detection. (stellar.org)
- Add bridge‑aware pathfinding (Base standard bridges; other canonical L2 bridges). (docs.base.org)
Days 46–75: Evaluation and feedback
- Build a labeling workbench: show cluster purity (Gini), PoR overlap, explorer‑tag overlap, sanctions proximity. (mdpi.com)
- Human‑in‑the‑loop rules: require at least two independent anchors (e.g., signed address + sweep motif) for promotion to “trusted.”
Days 76–90: Operationalization
- Streaming updates: Kafka‑style ingestion of new txns, daily re‑scoring.
- Alerting: e.g., “first‑touch from sanctioned hub” or “bridge‑plus‑CEX cashout pattern.” TRM/Chainalysis trends suggest bridges are now a prime obfuscation stage; alert on those sequences. (trmlabs.com)
- Data products: entity‑level flows (by asset, chain), risk‑scored counterparties, and “who owns what” dashboards for BD, compliance, and intel teams. Nansen’s label/entity concepts are a useful mental model for UX. (docs.nansen.ai)
Practical examples (with real anchors)
Example 1: Growing a Binance cluster on Ethereum
- Start from Etherscan‑tagged hot wallets (e.g., “Binance 14” 0x28C6…bf21d60).
- Identify deposit addresses with N ≥ 50 unique funders/7d that sweep ≥ 90% of inflows to 0x28C6… or co‑moving warms within 60 min.
- Validate 30 samples against explorer tags; promote matches to “trusted deposit” set; expand the entity. (etherscan.io)
Example 2: Coinbase deposit gateway recognition
- Etherscan‑tagged Coinbase deposit address “Coinbase 33” is your anchor. Find upstream EOAs repeatedly funding this address category, then follow exchange internal sweep patterns to warm/cold wallets. Keep these labels partitioned by function: “deposit,” “hot,” “cold.” (etherscan.io)
Example 3: Bridge‑to‑CEX cashout on Base
- Detect L1→Base bridge events (L1StandardBridge/L2StandardBridge), trace receiver on Base, then watch for short‑horizon transfers to a known CEX hot cluster on Base; attribute the cross‑chain leg to the same entity. (docs.base.org)
Example 4: TRON USDT retail funnels
- Prioritize TRC‑20 USDT transfers to/from tagged exchange clusters; expect coverage to dwarf non‑stable flows given >$75B USDT on TRON in 2025. (coindesk.com)
Risk and compliance notes for decision‑makers
- Sanctions change topology. OFAC actions against high‑risk exchanges (e.g., Garantex re‑designation and successor mapping to Grinex in 2025) create forks in entity graphs; model lineage explicitly and time‑bound labels. (home.treasury.gov)
- Mixers, coins, and courts: Tornado Cash’s legal status is evolving across jurisdictions; treat mixer adjacency as a risk feature, not a verdict. (reuters.com)
- Bridges are the new mixers. TRM’s 2025 report: ransomware and other actors shifted laundering from mixers to cross‑chain bridges—adjust monitoring budgets accordingly. (trmlabs.com)
Common pitfalls (and how to avoid them)
- Over‑merging UTXO clusters with aggressive change rules: keep CIO as the primary signal; gate change‑based merges behind quality thresholds and mixer proximity scores. (cacm.acm.org)
- Treating shared custodial addresses as single users on XRP/Stellar: always include tag/memo in the entity key. (stellar.org)
- Mis‑attributing 4337 txns: make bundler/paymaster decoding mandatory in EVM resolvers; otherwise, you’ll attribute actions to infra rather than the controlling entity. (ethereum.org)
What “good” looks like in 2025
- Exchange coverage comes from verified anchors (PoR + signed addresses), not just heuristics. OKX demonstrates a replicable pattern: public address lists, signatures, and open‑source PoR verification. Push other exchanges you work with to publish similar proofs. (okx.com)
- Entity graphs are bridge‑aware, mixer‑aware, and AA‑aware. They include bundler/paymaster roles, canonical bridge contracts, and sanctions dynamics. (medium.com)
- Quality is measured continuously (precision vs. PoR/explorer seeds, Gini impurity, temporal stability), and labels are versioned over time. (mdpi.com)
Brief deep dive: FIFO and temporal heuristics on privacy systems
Recent research demonstrates that even privacy tools leak through behavior. A 2025 cross‑chain study of Tornado Cash shows 5–12% linkage from reuse/transactional heuristics and an additional 15–22 percentage‑point lift via a FIFO temporal matching rule—underscoring why multi‑signal correlation outperforms any single heuristic. Use temporal matchers for leads, but require independent confirmation before promotion to “trusted.” (arxiv.org)
Your next 30–90 days: an actionable checklist
- Ingest PoR and signed address sets for top counterparties; verify signatures programmatically. (github.com)
- Stand up CIO + conservative change heuristics for BTC; deposit‑sweep and AA decoding for EVM; tag/memo keys for XRP/Stellar; USDT‑first clustering for TRON. (cacm.acm.org)
- Make bridges first‑class: index canonical bridge contracts and propagate entity labels across chains. (docs.base.org)
- Add sanctions proximity features and a mixer/bridge obfuscation score. (home.treasury.gov)
- Evaluate with Gini impurity and PoR/explorer overlap; require two independent anchors for “trusted” promotion. (mdpi.com)
How 7Block Labs can help
We deploy these pipelines for exchanges, L2s, and fintechs: from fast PoR‑seeded exchange maps to AA‑aware EVM resolvers and cross‑chain bridge tracking. If you need a pilot that moves the needle in 90 days—and stands up to audits—we’ll bring the playbook and the code.
Sources and further reading (selected)
- Account abstraction status, EIP‑4337/7702 and adoption metrics. (ethereum.org)
- Classic and modern clustering heuristics (BTC, ETH). (cacm.acm.org)
- Exchange labeling: Etherscan tags, OKX PoR with signed address verification, Kraken PoR examples. (info.etherscan.com)
- TRON stablecoin dominance and operational implications. (coindesk.com)
- Sanctions and enforcement: Tornado Cash, Garantex/Grinex. (home.treasury.gov)
- Mixer and privacy system leakage (FIFO temporal matches). (arxiv.org)
Like what you're reading? Let's build together.
Get a free 30‑minute consultation with our engineering team.

