Blockchain Indexing Tools and On-Chain Indexer: How to Pick Indexed Ethereum Data Providers

A practical buyer’s guide for startups and enterprises selecting Ethereum indexing providers after the Pectra upgrade—covering vendor capabilities, verifiable on-chain indexing patterns, SLAs, latency, data sinks, and architecture choices with concrete examples.

In this post, we translate 2025–2026 changes across Ethereum data infrastructure into an actionable checklist and decision tree, with specific product capabilities and emerging best practices your team can apply this quarter.

Why this matters now

Ethereum’s Pectra mainnet upgrade went live on May 7, 2025, doubling average blob throughput (from ~3 to ~6 blobs per block) and raising the validator max effective balance to 2,048 ETH. This materially increases L2 data publication throughput and can alter event and trace volumes your indexers must ingest and reconcile. Your provider choice should reflect this new ceiling on data rates and reorg handling. (blog.ethereum.org)
The Graph fully sunset its hosted service in June 2024 and now serves subgraphs on its decentralized network; the protocol migrated economic activity to Arbitrum in 2024 to lower costs. If your stack still points at hosted-service endpoints, you must migrate or swap providers. (thegraph.com)

The four dominant approaches to indexed Ethereum data

Managed multi-chain data APIs

Examples: Covalent GoldRush, Alchemy Enhanced APIs, Infura Archive, QuickNode Streams.
Pros: Fast time-to-value, unified schemas, enterprise support and SLAs, streaming options.
Cons: Vendor schemas and rate limits; limited custom transforms unless paired with your own data pipeline. (covalenthq.com)

Decentralized indexing (subgraphs on The Graph Network)

Pros: Open standard (GraphQL), community subgraphs, decentralization, 100k free monthly queries.
Cons: Performance/latency depends on indexers; complex features may require Substreams or custom indexing. (thegraph.com)

Managed subgraph platforms and pipelines

Examples: Goldsky Subgraphs and Turbo/Mirror/Pipelines; QuickNode Streams to sinks; dedicated subgraph nodes.
Pros: “Drop-in” subgraph support with faster syncs, dedicated performance tuning, multi-RPC verification, real-time sinks (Postgres/ClickHouse/Kafka/S3).
Cons: Centralized operator; extra cost for dedicated capacity. (goldsky.com)

Build-your-own indexer stack

Examples: Subsquid SDK/Archives (ArrowSquid), Envio HyperIndex/HyperSync, Sequence Indexer for wallet-centric use cases.
Pros: Full control, lowest query latency when tuned, custom transforms and multi-chain logic.
Cons: Ops burden (reorgs, traces, hot storage), hiring specialized data infra skills. (medium.com)

2026 vendor landscape: what’s actually new and useful

The Graph Network (decentralized)

Hosted service is permanently sunset; all queries run on the decentralized network with Subgraph Studio providing 100k free queries/month. The protocol moved rewards and economics to Arbitrum as of June 28, 2024 to reduce costs. If you rely on open subgraphs and decentralization, this is your default baseline. (thegraph.com)
Post-Sunrise usage has climbed (e.g., 12,402 active subgraphs in Q1 2025 and record quarterly queries), signaling continued developer adoption. Plan capacity for decentralized query latency variance. (messari.io)

Practical tip: When running mission-critical workloads on subgraphs, ask indexers for their reorg handling policy and average time-to-index after Pectra (target sub-second to a few seconds for new events; minutes for finalized state). Use cost models before committing to heavy-query dashboards. (messari.io)

Goldsky (managed subgraphs and pipelines)

“Serverless” subgraphs for general use; “Dedicated indexers” for custom RPC tuning, caching, and DB optimizations. One-line migration from The Graph; multi-provider RPC with global cache; versioning/blue-green cutovers. (docs.goldsky.com)
Turbo Pipelines stream events with millisecond-scale latency to sinks like Postgres, ClickHouse, Kafka, S3; supports TypeScript transforms, live data inspection, and dynamic address tables. Ideal when subgraphs aren’t flexible enough. (goldsky.com)
Alchemy sunset its Subgraphs on December 8, 2025 and partnered with Goldsky—if you’re still on Alchemy Subgraphs, migrate. (alchemy.com)

Alchemy (enhanced APIs)

Transfers API provides historical and real-time transfers (external, internal, token) in a single call; Token API (balances/metadata), webhooks (Notify), and Trace endpoints lower integration effort and CU costs vs raw RPC fan-outs. Use when you need “wallet/asset views” fast. (alchemy.com)

Infura (archive + traces access)

Archive access now bundled with free tiers up to 25k archive requests/day; good for analytics backfills and contract state audits without running your own archive nodes. Validate your daily needs vs plan limits. (infura.io)

QuickNode (Streams)

Streams delivers event-driven blockchain data to webhooks/S3 and handles retries and reorgs—reducing polling and simplifying ETL. Available to Build-plan users and higher. Useful as a clean source feeding your lakehouse or queue. (blog.quicknode.com)

Covalent GoldRush (unified multi-chain API and streaming)

GoldRush APIs (100+ chains) are available via Google Cloud Marketplace; exposes balances, transactions, logs, traces, NFT metadata; Streaming API public beta offers sub-second streams for high-frequency use cases. Consider when you need breadth across L1/L2s with one schema. (covalenthq.com)

Dune (programmatic SQL and connectors)

Dune’s APIs let you turn any DuneSQL query into a REST endpoint and integrate with BI tools; rate limits scale by plan up to 1000 rpm for high-limit endpoints. Great for analytics and ops dashboards without operating infra. (dune.com)

Subsquid (ArrowSquid real-time)

New ArrowSquid introduces real-time indexing of unfinalized blocks and Apache Arrow-based ingestion from Archives for heavy squids; traced internal transactions support is planned shortly after launch. Consider for custom ETL with GraphQL APIs on top. (medium.com)

Envio (HyperIndex/HyperSync)

Envio continues expanding chain coverage (e.g., Tempo support) and publishes scenario-based benchmarks showing fast sync times for specific datasets (e.g., Uniswap V3 pool)—treat results as workload-specific and benchmark your own contracts. (docs.envio.dev)

Sequence Indexer (wallet-centric real-time)

If you’re building wallet/portfolio features, Sequence’s indexer advertises sub-second event availability and cached reads <300ms across many EVM networks, with webhook subscriptions for receipts/events. (sequence.xyz)

“On-chain indexer” and verifiable indexing: what’s feasible in 2026

If your business requires cryptographic guarantees for aggregated or historical data, pair conventional indexing with a ZK coprocessor that proves off-chain computation/results on-chain:

Axiom’s OpenVM (Axiom pivoted from Axiom V2 to OpenVM) ships a modular zkVM with audited releases. OpenVM v1.0.0 (Mar 31, 2025) proved mainnet blocks for ~$0.0015/tx in under 3 minutes on CPU; v1.4.0 (Sep 2, 2025) added a GPU prover cutting latency to ~15s and costs to ~$0.0003/tx, with further gains reported subsequently. This makes verifiable post-processing practical for select data products (proof-of-analytics, verifiable risk). (axiom.xyz)
Lagrange’s ZK Coprocessor pre-indexes state into a proof-friendly Verifiable Database, enabling SQL-like queries with ZK proofs returned on-chain, and announced a path to decentralize proving (including AVS on EigenLayer). It’s designed for scalable queries over large storage slot sets rather than one-off slot proofs. (lagrange.dev)

Pattern: your indexer (Goldsky/Subsquid/Envio/etc.) generates fast views; your “on-chain indexer” is a thin smart contract that accepts ZK proofs from a coprocessor (OpenVM/Lagrange) for critical claims (e.g., “30-day unique counterparties across X contracts”). You thus get the speed of off-chain indexing with on-chain verifiability for the KPIs that matter.

Selection criteria that actually predict success

Data domains and depth

Do you need only logs/events and token balances, or full traces and state diffs? Alchemy Enhanced APIs and Covalent cover balances/logs/traces quickly; deep traces for custom compute may still require archive access or your own indexer. (alchemy.com)

Latency targets vs finality policy

For consumer UIs, sub-second “seen” latency from Streams/Goldsky/Sequence is workable with a reorg-safe time window. For financial reporting, query finalized blocks only (e.g., 64–128 block confirmations on L1; L2 finality varies). Verify each provider’s reorg strategy post-Pectra. (blog.quicknode.com)

Reorgs and data consistency

Ask vendors for: number of confirmations before “final,” idempotent upsert strategy, and how they rectify divergent RPC views. Goldsky documents multi-RPC cross-checking and auto-recovery; QuickNode Streams handles retries and chain reorgs for pushes. (goldsky.com)

Throughput and scaling headroom

Pectra doubled average blob throughput; plan for higher L2 posting rates. Stress test with brownout drills: can your provider sustain bursty event rates and trace-heavy spikes? EF materials detail the blob throughput change and block size cap to offset bandwidth. (blog.ethereum.org)

Data sinks and warehouse ergonomics

If you drive BI/ML from your own lakehouse: Goldsky Turbo Pipelines -> ClickHouse/Kafka/Postgres/S3; QuickNode Streams -> S3/webhooks; then ingest to Snowflake via Snowpipe Streaming (10 GB/s per table, typical <10s ingest-to-query). These capabilities matter for near-real-time analytics SLAs. (docs.goldsky.com)

Decentralization and portability

The Graph offers decentralization, open subgraphs, and a vibrant ecosystem; Goldsky provides one-line migration and dedicated indexers when you need managed performance without rewriting queries. Factor in vendor portability and schema lock-in. (thegraph.com)

Programmatic analytics

If your team prefers SQL-first with materialized views, Dune’s API turns any query into REST with rate limits scaling by plan. Excellent for ops dashboards and ad-hoc reporting. (dune.com)

Verifiability

For “provable KPIs,” integrate ZK coprocessors. OpenVM and Lagrange have production-ready components and credible performance envelopes today. Budget for proof costs and latency only for the claims that matter. (axiom.xyz)

A short decision tree (use-case driven)

You’re shipping a cross-chain wallet or portfolio view in 4–6 weeks

Start with Covalent GoldRush for balances/transactions/logs/NFT metadata and pricing; add QuickNode Streams to push new transfers and token changes to your webhook/S3. If you already use subgraphs, consider Goldsky for zero-downtime migration and webhooks per entity. (covalenthq.com)

You need a decentralized, open query layer with community subgraphs

The Graph Network with Subgraph Studio is your baseline. Validate latency and indexer quality; for heavy backfills or advanced transforms consider Substreams or running dedicated managed subgraphs on Goldsky for performance-sensitive endpoints. (thegraph.com)

You need custom ETL at scale with your own schema and warehouse

Build with Subsquid or Envio; pipe to ClickHouse/Snowflake via Kafka/S3. Subsquid’s Arrow-based ingestion accelerates heavy historical loads; Envio’s HyperIndex focuses on high-speed sync for specific workloads. Benchmark on your contracts. (medium.com)

You must prove results on-chain (compliance/settlement-grade)

Keep your fast off-chain indexer for UX. For monthly attestation, have a job recompute the metric via a coprocessor and post a proof to an “attester” smart contract. OpenVM’s proving pipeline and Lagrange’s verifiable database approach are both viable in 2026. (axiom.xyz)

Practical examples with exact details

Base + Ethereum “DeFi user journey” analytics in under 2 weeks

Data ingest: QuickNode Streams pushes swaps/transfers to S3; Goldsky Turbo Pipelines mirror specific DEX pools to ClickHouse for joins, with TypeScript transforms adding token price tags. (blog.quicknode.com)
Warehouse: Snowflake Snowpipe Streaming consumes S3 events via an intermediary or direct streaming client; expect typical <10s ingest-to-query. KPI compute runs every minute (unique wallets per pool, volume by token). (docs.snowflake.com)
Dashboard: Dune SQL API for public-facing charts sourced from your materialized views, or serve your own Grafana; rate limits scale with plan. (dune.mintlify.app)

NFT portfolio view with verified balances and metadata

Use Covalent GoldRush balances API for ERC721/ERC1155 + metadata, then subscribe to the Streaming API beta for sub-second updates; cache results in Redis for <100ms UI loads. For historic correctness checks, backfill via Infura archive for specific token transfers when anomalies are detected. (covalenthq.com)

Verifiable TVL attestation for a lending protocol

Off-chain ETL: Subsquid indexes deposits/withdrawals and computes TVL per block-height into Postgres.
Monthly proof: Re-run the TVL computation with OpenVM over the canonical historical state range; post a proof of “TVL at block N” to your Attester contract. Publish proof hash + methodology on-chain for auditors. Expect proof latencies in tens of seconds with GPU-backed proving for representative blocks. (blog.openvm.dev)

Emerging best practices we see working in 2026

Plan for blob throughput growth: Pectra’s doubled blob capacity can raise L2 posting rates—budget extra CPU/network for indexers and ensure your sinks (Kafka/ClickHouse/Snowflake) keep up during bursts. Cap downstream consumers with backpressure. (blog.ethereum.org)
Separate “time-to-seen” from “time-to-final”: Present preliminary results fast (sub-second via Streams/Turbo Pipelines/Sequence), but reconcile to finalized blocks on a timer (e.g., 64–128 confirmations) to avoid drift on dashboards and P&L. (blog.quicknode.com)
Design for replays: Persist a deterministic event_id (chain_id, block_number, tx_hash, log_index, trace_addr) and make all upserts idempotent. Most providers support retries; use them. (quicknode.com)
Use columnar + vectorized engines: When running your own pipelines, Arrow-native ingestion and engines like DataFusion or ClickHouse materially reduce cost for joins and time-series scans. Subsquid’s Arrow-based approach is aligned with this trend. (medium.com)
Version and canary your subgraphs: Goldsky supports tagging/blue-green; roll out new mapping logic safely without downtime. Keep a shadow indexer for diffs before cutover. (docs.goldsky.com)
Only prove what you must: ZK proofs are now fast enough for periodic attestations, not for every screen refresh. Use coprocessors to certify monthly KPIs, liquidation windows, or regulatory snapshots—not real-time feeds. (axiom.xyz)

A concise scorecard you can copy into your RFP

Weight the following (sum to 100):

Coverage (chains, traces, NFT/media, pricing): 15
Latency tiers and finality/reorg policy documented: 15
Sinks and streaming (Postgres/ClickHouse/Kafka/S3/Snowflake, sub-second support): 15
SLAs and support (on-call, incident history, migration tooling): 10
Decentralization/portability (subgraphs availability, one-line migration, open specs): 10
Cost model transparency (per-call, egress, dedicated indexers): 10
Observability (retries, dead-letter queues, lineage/metrics): 10
Verifiability options (ZK coprocessor integration patterns): 15

Ask every vendor for:

“What’s your average and p95 time-to-availability for a new L1 event? L2?”
“How many confirmations before you mark data ‘final’?”
“Show me a reorg that you auto-recovered in the last 30 days.”
“Which sinks and schemas do you support out of the box?”
“Can we replicate this workload in a 72-hour paid POC with our contracts?”
“If we leave, how do we export our full, normalized dataset?”

Bottom line

If you need decentralization and ecosystem parity: start on The Graph Network; use managed subgraph providers (e.g., Goldsky) when you outgrow baseline performance. (thegraph.com)
If you need fastest time-to-product for wallets/dashboards: combine a unified API (Covalent GoldRush or Alchemy Enhanced APIs) with event Streams to your warehouse. (covalenthq.com)
If you need custom, high-throughput ETL under your control: Subsquid or Envio + columnar sinks. Benchmark with your contracts. (medium.com)
If you need cryptographic assurances: add a ZK coprocessor (OpenVM/Lagrange) and prove critical KPIs on-chain on a schedule. (axiom.xyz)

Choose the smallest stack that satisfies your verifiability, latency, and data governance requirements today—then layer in dedicated indexers or coprocessors only where your product’s trust wall actually begins.