Substreams’ new index module brings a major performance boost

Substreams’ new index module brings a major performance boost
Published on
October 25, 2024

Substreams’ new index module brings a major performance boost

Substreams’ new index module brings a major performance boost

For some smart contracts with sparse data, we’ve observed an impressive 50x performance increase!

One of the most important features of Substreams is the ability to select which specific data you want to retrieve from the blockchain. This is possible because you have access to the full and raw data of every block through a Protobuf schema (the Block Protobuf schema for each protocol).

To start developing a Substreams, you import the Block Protobuf, and therefore, you will automatically process all the data of that specific block. While this is powerful, it also comes with a disadvantage: you might not be interested in all the data contained in the block.

For example, if you are developing an application that extracts transactions from the Aave protocol, you can use index modules to skip the blocks that do not contain transactions related to the Aave protocol, reducing cost and increasing speed.

How Does It Work?
An index module is a special type of module that creates an index that specifies the data contained within each block. Consider that you want to retrieve all the Ethereum events matching a specific address. Currently, you would iterate through all the logs of each block, searching for those that contain log.address == ADDRESS.

By utilizing an index module, you would have a pre-cached module with the information of all the event addresses in the block. Instead of reading the full Ethereum block, you can search in the events index (event cache) and avoid decoding the data of those blocks that do not contain events you are interested in. This cache is really a set of strings containing all the event addresses found within each block.

In the following diagram, you can see three blocks with their corresponding indexes. Consider that you want to retrieve all the events where log.address == 0x345…. Without an index module, you would have to go through the data of every block, but with an index module, you can skip the blocks that do not contain the event that you want.

In an index module, the events of every block are pre-cached in a special store (in the diagram below, the keys), so when you look for events where log.address == 0x345…, you can simply search the index store of the block. If the key is contained within the block, then you decode the data. If not, you skip it. In the following diagram, Block 1 and Block 2 contain an event where log.address == 0x345…, but Block 3 does not.

Now, developing Substreams and extracting the blockchain data that you need is even more performant! Check out the documentation on index modules to learn how to develop your own index module.