Low-Latency Indexing on Sui with gRPC Streaming
Sui’s indexing stack supports gRPC streaming as a first-class data source, enabling low-latency ingestion without sacrificing correctness or recoverability
Main Takeaways
- gRPC streaming allows indexers to ingest checkpoints as soon as they are finalized, reducing latency compared to polling-based approaches.
- Streaming is paired with polling-based sources to backfill historical data and recover from interruptions, ensuring correctness and resilience.
- The Custom Indexing Framework lets developers adopt streaming incrementally without changing how checkpoints are processed.
Overview
Accessing blockchain data has traditionally been harder than it should be. On most networks, developers rely on endpoints that were not designed for high-volume, structured, or real-time reads. As applications grow beyond simple transaction lookups, like wallets, dashboards, explorers, and analytics, the cracks start to show.
Polling-heavy designs introduce unnecessary latency, duplicate work, and fragile retry logic. Teams end up maintaining custom indexers just to get predictable reads, and those pipelines often become one-off systems that are expensive to scale and painful to operate.
On Sui, we’re taking a different approach. Instead of treating data access as an afterthought, the data stack is built around checkpoints as a first-class abstraction and indexers that advance deterministically with the ledger. gRPC streaming is a natural extension of that design. It is now supported as a first-class data source in the Custom Indexing Framework and used in practice by the General-Purpose Indexer.
Custom Indexing Framework: Decoupling Processing from Ingestion
The Custom Indexing Framework provides a checkpoint-driven processing pipeline that is intentionally agnostic to where checkpoint data comes from. Indexers consume checkpoints in order and transform them into application-specific views, but the ingestion layer is fully configurable.
This separation matters because data access requirements change over time. Early-stage projects often optimize for simplicity. Production systems optimize for latency, reliability, and cost. The framework supports both without forcing an early commitment.
Today, data sources fall into two categories: polling-based sources that periodically fetch checkpoints, and push-based sources that stream checkpoints as they are produced. gRPC streaming is the only push-based option.
What gRPC Streaming Unlocks
With gRPC streaming, fullnodes push new checkpoint data to your indexer as soon as it becomes available. There is no need to poll, no guesswork around timing, and no artificial delay introduced by fetch intervals.
This changes the shape of data ingestion. Instead of repeatedly asking “is there something new?”, your indexer reacts to new checkpoints as events. For latency-sensitive workloads, this can significantly reduce the time between a checkpoint being finalized and it being processed downstream.
Run your indexer with the streaming-url argument to configure a full node as the source for checkpoints via gRPC streaming. A polling-based remote-store-url source is used as fallback to the push-based source, as discussed in the next section.
cargo run -- --streaming-url https://fullnode.testnet.sui.io:443 --remote-store-url https://checkpoints.testnet.sui.ioWhy Streaming Alone is not Enough
Streaming is fast, but it is not designed to be a complete data source on its own. A streaming connection only delivers checkpoints from the moment the connection is established. It does not provide historical data and cannot fill gaps caused by transient failures.
Rather than hiding these constraints, the Custom Indexing Framework embraces them explicitly. Every streaming setup requires a polling-based fallback source, such as a remote checkpoint store or RPC endpoint. The indexer uses polling to backfill historical checkpoints or recover from interruptions, then switches back to streaming once it catches up to the chain tip.
This hybrid model avoids the two common failure modes of indexing systems: real-time pipelines that lose data, and reliable pipelines that lag behind.
Here is an example illustrating how the indexer can automatically switch between polling and streaming:
let mut current_checkpoint = start_checkpoint;
while (current_checkpoint < end_checkpoint) {
if let Ok(network_latest) = stream.peek() {
// Ingest using two sources in parallel:
// 1. Use polling-based source to backfill until the network latest.
// This uses a reliable source with retries built in to the client.
polling_handle = backfill(current_checkpoint, network_latest);
// 2. Use streaming for the real time latest checkpoints. Connections to
// the streaming source may break and no resumption option is provided.
streaming_handle = stream.start(network_latest, end_checkpoint);
} else {
// If stream fails to start, we use polling based source to backfill
// for BACKOFF_SIZE checkpoints before retrying.
polling_end = current_checkpoint + BACKOFF_SIZE;
polling_handle = backfill(current_checkpoint, polling_end);
streaming_handle = noop_handle();
}
// Wait for both ingestion handles to complete in parallel, either due to
// range completion or interruptions in streaming.
let (streaming_end, polling_result) = futures::join(polling_handle, streaming_handle);
// Error out if backfill fails, which indicates overall cancellation or
// issues that need developer attention.
if polling_result.is_err() { return error; }
// Ingestion has been completed until streaming_end so we update the current to it.
current_checkpoint = streaming_end;
}How the General-Purpose Indexer Uses gRPC Streaming
The General-Purpose Indexer is designed to serve the core infrastructure for GraphQL RPC. That means it needs to stay close to the chain head while remaining correct and resilient under load.
In practice, this means using gRPC streaming as the primary ingestion path, with polling-based sources always configured as a safety net. Streaming keeps the indexed data fresh. Polling ensures the system can restart cleanly, recover from failures, and backfill seamlessly.
This approach lets us run a single, modular indexer.
What this Enables for Custom Indexers
You do not need to be running a general-purpose indexer to benefit from gRPC streaming. Custom indexers can adopt streaming incrementally, based on their requirements.
Streaming is a good fit when you care about freshness or responsiveness, such as real-time monitoring, analytics, or alerting systems. Polling-only setups remain a perfectly valid choice for simpler workflows or offline processing.
The key point is that the framework does not lock you in. You can start with polling, add streaming later, or run both together from day one. The processing model stays the same. If you have an existing custom indexer that is based on the official indexing framework and only uses the polling source:
cargo run -- -remote-store-url https://checkpoints.testnet.sui.ioYou can add gRPC streaming source to it by just passing a streaming-url argument:
cargo run -- --streaming-url https://fullnode.testnet.sui.io:443 --remote-store-url https://checkpoints.testnet.sui.ioStreaming-First Data Access on Sui
gRPC streaming is part of a broader shift in how Sui exposes data. The goal is to move away from loosely typed, request-response APIs toward structured, composable, and streaming-friendly interfaces.
As more of the Sui data stack adopts this model, developers spend less time managing bespoke infrastructure and more time building applications. Indexers become easier to reason about, failures become easier to recover from, and real-time use cases stop feeling like special cases.