Level Up Your Kafka Skills in Just 5 Days | Join Season of Streaming On-Demand
Over years of managing the data streaming needs of top global financial institutions, our system engineers and field CTOs have amassed a wealth of expertise around the critical path to modernizing a core banking system. This blog provides real-world insights into the practicalities of evolving core banking with a streaming data platform based on their first-hand engagements with these customers—some of which make up the top 10 U.S. banks.
To be precise, when the term core banking modernization is used below, it is in reference to a truly event-driven platform that can respond and scale to meet customer demand in near-real time. This contrasts with monolithic batch or micro batch-based architectures constrained by pre-allocated compute and storage.
A core banking system is the workhorse of a financial institution, the back-end system that handles the bank’s most mission-critical operations including transaction processing, account management, payments, and loans. These powerful engines are designed to process high volumes of transactions daily across every connected branch of the bank and interface with the general ledger, all with unwavering resilience.
For financial institutions, traditional core banking processes often revolve around a mainframe. About 90-95% of all credit card transactions globally go through a mainframe. And yes, by mainframe I mean the big closet-size computer—where someone is plugging away on a green screen—that you thought only existed in an 80s-movie hacker scene.
While mainframe refers to a whole class of machines that do large scale, high-transaction business data processing, they are typically IBM AS400 or IBM Z series machines running Hogan or FIS systematics software. Part of the reason why mainframes are central to core banking is because they are extremely good at processing transactions—reliably and quickly.
But despite their widespread success as the engine for mission-critical banking data, mainframes have their challenges. The primary coding language that runs the mainframe is COBOL— a 60-year old language that requires specialist programmers who unfortunately are in pretty low supply (as we saw with the unemployment payment crisis during the height of the pandemic).
On top of this skills shortage, there is an additional problem of retaining domain knowledge from industry veterans — the subject matter experts that thoroughly understand mainframe systems, in general, are aging out of the workforce, and with their departure goes their niche field expertise. The total cost of ownership to maintain these high-value mainframe workloads is also quite substantial when you consider the aggregate cost of general processing (GPU) cycles. Aside from the physical space and power required for on-premises data centers, the cost you pay IBM per cycle can be significant.
With that said, the industry is evolving and the institutions we’ve been closely working with are now tasked with finding ways to operationalize real-time processes on this legacy data infrastructure. In large part, this effort around core banking modernization is in response to increasing customer expectations for real-time product experiences (view my transactions for the entire lifetime of my account, as I scroll on the mobile app, etc.) Banks are looking for scalable and cost-effective ways to unlock their critical mainframe data, minimize disruptions, and power their ever growing landscape of cloud-native applications and systems.
For most banks, absorbing their mainframe into a cloud-native platform in a swift big-bang is completely out of the question. Simply put, the risk of disruption to any mission-critical banking operation transcends customer experience or future innovation. Still, we have worked with these banks to meet them where they are and developed an incremental modernization process that allows them to unlock their mainframe data for product innovation while keeping security at the forefront.
This approach enables banks to extend their streaming data platform to the mainframe—whether that is for getting that mainframe data unlocked to cloud systems or migrating mainframe workloads incrementally to a real-time platform to reduce GPU costs and improve scale.
For example, you might have a next-best-offer (NBO) application running in AWS that seeks to send push notifications to customers who have recently executed large deposits, prompting them to take some action (e.g., opening an interest-bearing account). The NBO team runs entirely in AWS, using tools like Lambda and Twilio. Still, they have a dependency on mainframe data because that is where ledger data lies—this will unfortunately add to your mainframe bill. Mainframe capacity usage, in the form of MSUs, accrues each time data is either emitted to or read back into the mainframe. Data read/write to the mainframe is not uncommon for a single app that has any dependency on mainframe data, so these MSUs can easily stack up. (Please see the resources section at the bottom of this article for more information on mainframe migrations.)
Outside of mainframes, there are a number of other systems that require some kind of integration or unlock in order to achieve core banking modernization. Extending our NBO example from above, you might have account creation data emitted by software like Flexcube (financial services application provided by Oracle). If one is to make a truly event driven/modern core banking platform, you need to JOIN this Flexcube data with your ledger data from the mainframe, as the customer journey is unfolding.
Why is that a requirement? Let’s imagine you have a customer conversion problem. Your NBO team might have zero visibility into Flexcube – this is actually more likely than not because the team that owns Oracle vs. who leads up cloud native applications at an organization are usually separate. However, you need to know precisely where the customer churns to address this problem. Is it after getting NBO’s push notification? Is it after they have created an account and clicked around the product some? Having updates on customer activity on a more real-time basis makes all the difference in how you would approach solving this problem.
These compounding factors are understandably driving business leaders to take a second look at how they might safely and incrementally modernize core banking processes. The following technologies are common tools in the modern core banking stack:
Unlock core banking system of record. MQ on Z(IIP) connector supports mainframe unlock without incurring MSU costs, while the Oracle CDC connector enables Oracle unlock without GoldenGate licensing costs.
Enrich data in real-time. Apache Flink® provides scalable cloud-native stream processing.
Decoupled system of record and system of access MongoDB is a modern non-relational DB purpose-built for microservices and developers.
You will notice the MQ on Z and Oracle CDC connectors are centered on reducing costs. As of Fall 2023, IT budgets are leaner than previous years. In regards to specific mainframe costs saved from MQ on Z, I recommend watching Fidelity’s deep dive to hear from them directly—the cliff notes include around a 90% in GPU cycles and 50% reduction in CHINIT (i.e., network-related costs) or around $3M total cost reduction in mainframe operations. With regards to the Oracle CDC connector, it does not rely on GoldenGate to function—therefore we usually see, on average, approximately $3M saved per connector deployed. (Confluent has entire teams dedicated to running TCO assessments if you would like to determine potential savings given your unique workloads.)
After data is unlocked into a streaming data platform with connectors such as Oracle CDC or the MQ on Z, customers often expect near real-time data enrichment to occur. To revisit our NBO example from above, imagine your customer executed a large deposit and the NBO app sends a push notification, but it occurs hours after they’ve logged off—the likelihood the customer converts to an interest bearing account is now much lower. Flink is designed precisely for use cases like this—taking streams of data and enriching them as they flow through a streaming data platform.
Once data is in its final form, MongoDB is a very popular destination for landing data out of the streaming data platform. Perhaps the main business driver for adopting Mongo is that it can be deployed as an elastic, cloud-native system (e.g., Mongo Atlas). This contrasts with on-premises legacy systems of record such as Oracle which are historically overprovisioned to accommodate peak workloads.
As the name implies, the MQ on Z connector runs local to the ZIIP processor of the mainframe. As soon as data is emitted onto an MQ queue, the connector picks it up and produces it to the streaming data platform like so:
The connector gains direct access to the queue manager in bindings mode which gives us the benefits of concurrency and shared memory—basically better security and performance as opposed to a solution running outside the mainframe.
The Oracle CDC connector tails the logminer API of the Oracle DB and will ensure any modifications or updates to the data in Oracle is produced to the streaming data platform like so:
Once data is unlocked to the streaming data platform, Flink is used for the streaming data enrichment. Using the NBO use case as an example, the architecture would look like this:
Finally, we simply write data out to Mongo Atlas to round out our modern core banking architecture. The Mongo Atlas sink connector is commonly run fully managed by Confluent:
Each of these considerations can be more easily evaluated with the help of your Confluent account team—please involve them if you are curious about any of these solutions. With that said, below is a shortlist of considerations for each of the solutions reviewed so far:
The MQ on Z connector deployment has some considerations to review. It is best to align mainframe security, mainframe engineering, and Kafka teams to meet these assumptions.
The Oracle CDC connector has some database requirements that should be reviewed prior to adopting the solution. Because the connector relies on the Logminer API, it is helpful to benchmark that APIs performance to get a ballpark of anticipated connector performance. Additionally, you will likely need representation from a few teams including DBAs, Kafka, and security.
As of Fall 2023, Flink SQL is currently available in AWS for Confluent Cloud. However, for most core banking modernization efforts, more advanced Flink APIs may better suit your needs. Therefore, self-hosting Flink clusters in your choice of compute (e.g., VMs) until those features reach Confluent Cloud is commonplace. Teams you will likely need involvement from include: Kafka, Flink developer, and security.
The considerations for Mongo Atlas connectors can be found here. The three teams we often see come together for this connector are similar to the Oracle CDC, the only additional team that is occasionally pulled in is the cloud networking team.
To learn more about mainframe migrations, explore these resources:
VIDEO: Fidelity Investments Journey Taking Event Streaming for IBM zSystems Mainstream
ONLINE TALK: Show Me How: Streaming Data Pipelines from Your IBM Mainframe
The modernization of a core banking operation is slow, iterative, and typically not big bang. Many large financial institutions are still in the early innings of this modernization. However, there are compelling factors—such as the talent shortage of mainframe subject matter experts and the number of customers expecting real-time product experiences—motivating many business leaders to identify their strategy and begin this process immediately. A streaming data platform is often at the heart of this modernization, in part because it can meet you wherever your data resides, deploys on your cloud of choice, delivers cloud-native resilience and enterprise-grade security features, and is available virtually everywhere around the globe.
This blog explores how cloud service providers (CSPs) and managed service providers (MSPs) increasingly recognize the advantages of leveraging Confluent to deliver fully managed Kafka services to their clients. Confluent enables these service providers to deliver higher value offerings to wider...
With Confluent sitting at the core of their data infrastructure, Atomic Tessellator provides a powerful platform for molecular research backed by computational methods, focusing on catalyst discovery. Read on to learn how data streaming plays a central role in their technology.