© StreamingFast
At StreamingFast, we believe there is a need to find a definite home for codified blockchain data intelligence.
Smart contracts are often a subtle blend of efficiency and expressivity, which means sometimes, for the sake of efficiency, what is logged by the chain isn’t the most expressive — logs and transaction data don’t tell the whole story.
It also means that to interpret the data for certain contracts, lots of prior knowledge is required. This knowledge first lives with the authors of the protocols involved, and then with analysts that study protocol data in order to provide actionable insights, dashboards, and apps on top of it, and lastly developers who want to utilise that raw data for their own purposes.
This is precious knowledge.
Unfortunately, blockchains typically provide pretty raw data. Users will query, then decode that data to feed some of their systems. Sometimes, this will be codified for the benefit of the community (The Graph’s subgraphs are a seminal instantiation of this paradigm), and some other times, they will be kept to oneself, and sent to the internal systems of a company or developer.
With the pace of innovation in Web3, new protocols and new blockchains crop up every day, and before you know it, a dozen teams have done the same work interpreting what is happening on chain. Yet another format, another API, another technology and language. Each with several tradeoffs. Often, with disparate values, sowing confusion in the community. “Isn’t there one way to know what’s the price of token X on Dex Y”, they rightly ask?
As we’ve heard from many teams, producing high quality interpretation of blockchain data takes a lot of effort. Subtle bugs are often discovered long after Smart Contracts are deployed; certain unforeseen on-chain events start meddling with the data, skewing its values.
So, having dozens of teams discreetly tracking these, all looking at the same bits of data, trying to extract the same meaning, all while keeping the quality of data high sounds like clear duplication and lots of wasted effort.
When things happen in silos, data analysts typically request raw blockchain data as a first step, then receive the data, to then decode and interpret it. Let’s see how this can be improved.
In the Shared Intelligence Layer that we envision, decoding happens earlier in the stack. When developers request it, it is already decoded, nicely abstracted. For example: numbers with decimal places, token names included, nicely laid out prices instead of raw curve values.
By doing decoding and interpretation earlier in the stack — leveraging Substreams as an engine and as an intelligence encoding layer, with its community modules — we can provide all downstream systems with higher quality data, with no trade off.
Because the technology supports low-latency streaming as well as high-speed batch use cases, it means someone can use large scale SQL analytics stores, subgraphs, or they can consume the data for their trading operations or alerting systems. It is without compromise, and therefore is the very best abstraction/place to encode intelligence about the sort of data we find on blockchains.
By moving decoding to happen earlier on, the flow now becomes: decode, request, receive.
This means that once someone has fully understood a set of contracts, that knowledge can be shared with everyone needing that data. And if any updates need to be made, everyone also shares in that improved intelligence. Less bugs, higher quality, more eyes reviewing the same code ensuring its accuracy, helping countless developers, and making it easier to build on protocols that provide Substreams modules.
StreamingFast’s latest data indexing stack, Substreams, builds upon this original composability idea introduced by subgraphs, but at a different level. By breaking down every data source into distinct modules, data consumers can stack or combine multiple modules to retrieve the more specific data they need. And once a Substreams provider has indexed a module, that information gets cached, meaning that subsequent consumers will have near-instant access to it. Remember… flat files FTW!
Substreams also change one important aspect of the new data flow. We know that speed is important, so instead of decode, request and receive data, Substreams’ flow is rather decode, listen for new data, and push it out ASAP. This small, yet significant change, means that Substreams providers will race to push data to you as soon as it is available. No more polling or reduced latency — get closer to real-time than ever before, and make better decisions quicker.
Furthermore, we have recently launched the home for all things Substreams, at substreams.dev, where you’ll find access to a public registry of all Substreams packages. This will help to enable the Shared Intelligence Layer, giving you direct access to the collective knowledge of all other Substreams developers.
Most blockchain implementations end up building a PostgreSQL plugin, and then some more plugins when they get serious. But many have realised that coupling database connectors with full nodes is not the right approach; and it pains them to think they’ll need to build connectors for All The Things™. Some have built indexing frameworks, with protocol-specific plugins, but that’s another lot of effort duplication; by core blockchain teams this time.
With the Intelligence layer that we have offered, a nice and easy-to-integrate data layer (Firehose and Substreams), all those targets (call them sinks, or plugins) can be built once, without losing the insights and high-quality code that interprets on-chain data (the Substreams modules).
Things like the Substreams:SQL product make it easy to pipe Substreams data into PostgreSQL or Clickhouse. All sorts of sinks exist for Substreams (go ahead and Star some of these), even one that pipes into Google Sheets (for those accountants tired of copy & pasting from Block Explorers out there).
Let’s take a quick look at an example of what this Shared Intelligence Layer may look like.
A trader wants to create a program that will automatically trade NFTs on OpenSea. For this, he’ll need to know all of the listings, bids, and collections on OpenSea, and the actual current price of all tokens that those NFTs may be listed in.
First, he’ll start by combining price feeds from 3 different sources: Uniswap V3, Sushiswap, and Chainlink. Instead of having to decode all of that himself, he can simply go to substreams.dev, reuse those already-available Substreams modules, and combine them into a new general purpose (and freely available) pricing Substreams module. Each of the different colours shown below represent a different developer, each writing Substreams modules for the parts of the data layer that they understand.
Now that he has accurate, and up-to-the-moment, pricing information, he can combine that with all of the OpenSea data he needs.
Together, this feeds his trading program. And with granular and highly composable components, should he need to tweak anything, everything is highly contained. Giving him confidence that he can isolate any issues or bugs as they arise, and speed up his development cycle considerably.
If you’d like to watch StreamingFast’s CTO, Alex Bourget, present this concept (as well as the tech that enables it — Firehose and Substreams), be sure to watch his quick presentation from Messari Mainnet 2023 (and you can find the transcript here).
If you’d like to take part in the revolution to move towards a shared intelligence layer, join the StreamingFast and The Graph Discord servers. Check out substreams.dev and see what Substreams packages are already available to you. Contribute where you can, as we each have our own expertise.