わずか5日間で Kafka スキルをレベルアップ | ストリーミングシーズンに参加
Today’s data is in motion and incredibly difficult to pin down. For organizations to deliver rich, in-the-moment customer experiences and drive efficiency within business operations, they must support a continuous, real-time flow of data.
Also known as data in transit or data in flight, data in motion refers to digital information as it flows in and out of various locations.
Unlike the old model of data at rest, data in motion implies not just the ability to stream data, but the ability to process and use that data as soon as it's generated. After all, the front-end customer experiences and backend operations of a company are activities that almost always happen continuously and in real time.
That effort requires a centralized data architecture that aggregates real-time data streams from a variety of sources and makes it constantly available to all kinds of applications. Once you understand the concept of data in motion, and why it’s superior to a model of data at rest, you can hone in on the technologies that best support your organization’s needs.
Rideshare Apps
Consider the difference between real-time and legacy operations:
The traditional way: call a taxi and wait for a human operator to notify another human that they should drive to you and pick you up. You might get a rough estimate of when the taxi would arrive, but a lot of variables could change the outcome of that equation — traffic, weather, an unexpected incident on the road.
The modern, on-demand way: with rideshare apps like Lyft and Uber, you request a car via an app on your phone or other connected device. Pulling from multiple real-time data sources — your location, the nearest driver, your destination — the app automatically finds the best rider-driver match and routes the car to you. The driver’s location and ETA are updated in real time on your app view; you know when they’re a block away. At the same time, pricing is dynamically updated based on real-time traffic data, using real-time data streaming and integration.
This modern experience of ridesharing is fluid and easy for both driver and rider, which is why it has virtually usurped the traditional taxi industry and created wild upheaval in regulatory law. But despite its simplicity and convenience for users, behind the scenes, there’s a lot happening with various kinds of data, including location data, pricing data, traffic data, review data, and more. All of this data in motion converges in your easy, convenient ride to work.
In another example, we'll explore global, omnichannel retailers trying to connect brick-and-mortar stores with digital inventory. The only way this could be successful is if inventory data is accurate and up-to-the-minute. If a customer in one store purchases the last of an item, a sales rep in another store should be able to see this automatically, in real time. At the same time, an online shopper would see the item became out of stock and be shown a similar item.
The number of widgets in inventory at the end of each month is an example of data at rest. In the moment a purchase is enacted — or a withdrawal made from a bank account — data is in use. The actual sending of that data across applications and devices is data in motion. It’s a concept that’s critical to every kind of business use case today.
It’s in the moment of the transaction that the data matters most to you, which is why we focus on the idea of data in motion as being at the core of every important activity and experience consumers and businesses have.
And of these are very simple models of data in motion, which applies to all kinds of applications, industries, operations, experiences, and use cases.
Data in motion is at the center of every kind of company today. It transcends lines of businesses and continues to provide valuable, real-time insights, streamlined efficiencies, and transform user experiences.
To get a little more technical for a moment, engineers typically agree that data holds the potential to exist in three possible states.
Data at rest is data that's sitting idle in storage, like a database or data warehouse, on a data storage device, or in the cloud.
In a traditional data architecture model, data sits in various siloed repositories, each with its own specialized tools for access.To perform analytics and use any of that data, it must be collected from various sources and aggregated in one place — by which point it has already become stale and outdated.
Sometimes referred to as data in transit or data flow, this refers to digital information that's actively being transferred between one place and another. To get a deeper understanding of data in transit, data pipelines and ETL, ELT, or streaming ETL provide a thorough introduction to data movement.
The moment data is actively created, updated, or processed, it’s considered in use. For example, data in motion could be an email being sent, data being sent from your phone to iCloud, or hourly weather updates based on IoT sensors. Cloud and SaaS providers often consider data in use when it’s being currently processed by an application.
The notion of data states is important in computing because information security endeavors to encrypt data differently in each state. Data at rest is almost always encrypted, while data that’s been pulled into processing is more difficult to encrypt, and historically, that often made it vulnerable to cybercrime,
While traditional data architects thought of data at rest as “most secure” and data in motion as less secure, that paradigm has shifted, too, with the advent of new platforms such as Confluent that are specifically designed to support the needs of enterprise customers with a high priority on protecting both data.
Confluent customers across highly regulated industries such as financial services, healthcare, government, energy, and high-tech require optimal data security, and that’s what they get, with layered security controls designed to protect and secure customer data, no matter where that data resides.
The language typically used to talk about data in motion is data streaming. Envision a data stream like a rushing torrent, with so much information being generated continuously, at massive volumes that is too challenging to hold onto. To harness and manage data in motion, you need a platform that can handle real-time data processing.
Unlike batch processing, which takes data at rest and processes it at a particular time, data streaming handles data on a continuous basis, which enables it to use real-time data within applications. The data architecture of a company is critical to enabling data in motion. Systems that carry out business operations and deliver customer experiences have to be integrated to support a continuous flow of data from across the company and allow applications to be built that can process data in real time.
You need a platform for data in motion — not just an add-on feature but a bottoms-up rethinking of your entire data infrastructure. Most enterprises that were not born digital (and born recently) simply weren’t built this way. The concept of data in motion is a vast departure from the idea of databases architectured to bring queries to stored, at-rest data.
Created by the original creators of Apache Kafka, Confluent is the only fully managed, cloud-native data streaming platform that enables you to easily access, store, and manage data as continuous, real-time streams with enterprise scalability, security, and performance.
It’s not just a fully managed Kafka, but a complete data system with automated, pre-built connectors, the ability to create a universal data pipeline, and real-time stream processing. Simply put, Confluent is how you set your data in motion.