Kafka in the Cloud: Why it’s 10x better with Confluent | Find out more

Allium and Confluent: How to Build a Foundational Data Platform for Blockchain

Written By

Blockchains enable individuals and organizations to record and verify transactions as part of a secure, transparent, and decentralized system. They form the basis of most cryptocurrencies, and are used as the foundation of many other applications—smart contracts, supply chain management, and digital identity verification, to name but a few.

Despite the growing popularity of blockchain technology (this report predicts the market to grow by a CAGR of 87.7% between 2023 and 2030), working with blockchain data is not without its challenges. It might be free and public, but it’s difficult to access (i.e., blockchains are optimized for writes, not reads), understand, and maintain. For developers and data users seeking to retrieve information from a blockchain, reading or indexing it is a common method. The sheer volume and complexity of blockchain data, however, makes it tedious and technically challenging for companies to analyze blockchain activity, invest in blockchain opportunities, and build blockchain products. Answering a simple question like “Who are the biggest Ethereum token holders over time?” requires an engineering team to run their own infrastructure, ingest the full history of the blockchain, clean the data, transform the data, and create complex SQL queries.

Allium, a global finalist in Confluent’s inaugural $1M Data Streaming Startup Challenge and a member of the Confluent for Startups program since April 2023, is an industry leader in this space aiming to change this by serving up 50+ blockchains and 1000+ schemas through their platform. They want to make blockchain data as accessible and usable as Google made webpages or Bloomberg made financial information. Their platform simplifies access to blockchain data, enabling developers to build real-time applications with blockchain data and analysts to gain valuable insights on transactions with as few SQL queries as possible. And at the core of their platform is Confluent.

In this blog, we’ll explain the role of data streaming in Allium's mission of “driving trust and transparency in blockchain, and helping people understand and build with full confidence.” We’ll first cover the business and technical challenges that led them to Confluent, before exploring how they’re aiming to drive further value by shifting processing and governance “left” or to the point where data is created.

The challenge: Indexing every blockchain while ensuring data freshness

Allium provides two main products, Allium Explorer and Allium Developer. Explorer enables organizations to query and visualize blockchain data across 50+ blockchains and share those insights via interactive charts, while Developer enables engineers to build real-time applications based on blockchain data and integrate it with a range of downstream customer systems (e.g., Snowflake, Databricks, BigQuery, S3/GCS, Postgres, and Apache Kafka®).

One of Allium’s core value propositions across both products is the large (and ever-increasing) number of blockchains they support. Each new indexed blockchain expands Allium's potential market and enhances its cross-chain analysis capabilities. Moreover, the freshness of the data in Allium’s indexed blockchains is also critical for Allium’s customers to drive quicker time to insight.

Before deploying Confluent, however, Allium was limited in the number of blockchains they could support. This was due primarily to the resources required in order to build and manage integrations between their main messaging technology of the time, Google Pub/Sub, and their downstream cloud data warehouse, Snowflake.

Allium’s in-house ingestion system would extract data from blockchain nodes and send it to Google Pub/Sub, before a self-built solution would consume the data from Pub/Sub and upsert it into Snowflake. Managing and tuning this solution was time-consuming and limited, and held the team back from scaling their products.

Recognizing the need for a more scalable and future-proof solution, Allium turned to Confluent’s data streaming platform.

The solution: Confluent’s data streaming platform

Confluent’s data streaming platform now enables Allium to index the entire blockchain market, enhancing their ability to offer new products and attract larger customers. Allium chose Confluent for several reasons:

  1. Confluent’s fully managed connectors allow them to stream large volumes of blockchain data in real time to downstream applications or warehouses (e.g., Snowflake) without having to worry about the underlying infrastructure.

  2. Confluent’s Stream Sharing feature enables Allium to preview real-time blockchain data with prospects and customers, facilitating new opportunities.

  3. Confluent is cloud agnostic, providing Allium with a greater degree of flexibility as they scale.

Additionally, Confluent’s support of the startup ecosystem (i.e., through initiatives such as free cloud credits and developer training accessible via Confluent for Startups) gave Allium extra reassurance that Confluent was dedicated to their success.

Implementation

This high-level reference architecture demonstrates how Allium unlocks blockchain data for various use cases.

Fig. 1. – High-level reference architecture

(See full-size image)

Allium’s Data Ingestion Service first extracts data from blockchain nodes via API calls, before sending it either directly to Confluent Cloud as a raw stream of events, or real-time transformation pipelines on Google Cloud Platform (GCP). These transformation pipelines based on Google Dataflow decode and enrich (i.e., apply additional context) blockchain data.

These transformation pipelines are subsequently synced to Confluent Cloud, and then sent to Snowflake or a self-built observability service on GCP via fully managed connectors. Allium is also able to share streams of blockchain data directly to customers via Confluent Cloud’s Stream Sharing function; a feature that allows for the secure sharing of real-time streams of data between organizations.

The result: Secure data sharing and blockchains, made accessible to enterprises

Data streaming with Confluent now plays a fundamental role in Allium's blockchain platform. Aside from enabling Allium to index more blockchains (they’re currently aiming to index around 1 new blockchain a week), Confluent allows Allium to share previews of blockchain data with potential customers and demonstrate the value of their 1000+ schemas.

With Confluent at the core of their platform, Allium provides a secure, scalable backbone for the delivery of blockchain data to customers for both historical analyses (Allium Explorer) and real-time blockchain applications (Allium Developer).

This has enabled Allium’s customers to drive value with blockchain data. Visa, for example, has published a public dashboard showing a normalized view of stablecoin (i.e., a type of cryptocurrency currency designed to maintain stable value), allowing for better analysis of real-time world adoption and trends.

Fig. 2. – An example report from Visa’s On-chain Analytics dashboard

(See full-size image)

Another example can be seen with Phantom, a cryptocurrency wallet. By using Allium Developer to find active EVM addresses (i.e., a unique identifier on the Ethereum Virtual Machine), Phantom has reduced their P50/P99 latency by 30-45% and have avoided sharp latency spikes.

The future: Shift left to clean and process data at the source

As Allium continues its data streaming journey, it’s looking to take advantage of the “shift-left” pattern made possible by the data streaming platform, which involves moving data processing and governance “leftwards,” closer to the source of data. This helps to prevent data errors at the source, reduce processing costs for the data that needs to be processed downstream in analytical systems, and increase developer/engineer productivity by giving them high-quality, ready-to-use datasets.

In Allium’s context, Confluent’s “shift-left” pattern would enable them to move away from a medallion architecture, where terabytes or petabytes of blockchain data is formatted into “tiers” on Snowflake, before being consumed by downstream applications. Instead of processing data (and applying data quality rules) on Snowflake, Allium could shift some or all of their processing and data governance upstream to the point where data is created using Confluent Cloud for Apache Flink®.

Data streams can also be turned into Iceberg tables using Tableflow (currently in private early access), to pipe data into other downstream analytical systems. With this pattern, only high-quality, ready-to-query data reaches Snowflake, reducing compute costs from doing unnecessary and redundant cleaning of the data within the data warehouse while reducing the time to value of blockchain data for analytics.

Allium: Leveraging data streaming to unlock blockchain data

With the help of Confluent, Allium is making blockchain data accessible for analytics and real-time customer applications through their data platform. With a growing number of blockchains indexed, Allium is enabling organizations to gain reliable, aggregate insights on blockchain transactions and deploy custom workflows with real-time blockchain data. As Allium continues to unlock the full functionality of Confluent’s data streaming platform, they’re discovering exciting possibilities for maximizing the value of blockchain data and scaling their operations to the next level.

  • Will Stolton is a Product Marketing Manager at Confluent, where he focuses on communicating the value of data streaming through the lens of solutions.

  • Tim Graczewski is the Global Head of Confluent for Startups. A two-time venture backed entrepreneur, Tim has also held senior strategy and business development roles at Oracle and Intuit.

  • Paul Chun is the Founding Engineer at Allium, where he oversees the technical architecture and the engineering team. He spent 6+ years as a tech lead at Google, building infrastructure to process planet scale data with Google Cloud, and real-time cloud game streaming with Google Stadia.

Did you like this blog post? Share it now