[Webinar] AI-Powered Innovation with Confluent & Microsoft Azure | Register Now
Implementing real-time data pipelines is still a challenge. Even more so for data scientists who have often been brought up with batch processing and files and typically have only heard of Kafka, but never really used it. Now if your team consists only of data scientists and you want them to implement a real-time data pipeline, your fate seems to be sealed. You need to hire real-time and streaming experts first - and you are guaranteed to lose months before you can even start implementing. But is there really no other way? At Forecasty.AI, developer of a SaaS platform for commodity price forecasting called Commodity Desk, we thought again. And came up with a way how to bridge the gap between the batch and file-based world of most data scientists and the world of real-time and streaming.
Our solution is a new Open Source library allowing any Python programmer, including the aforementioned data scientists, to access the Kafka API in an easier way than ever before. kash.py (""Kafka Shell for Python"") offers a large number of easy-to-use abstractions on top of the Kafka API, including bash-inspired commands like ""ls"" or ""l"" for listing Kafka topics and ""cat"", ""head"" or ""tail"" for displaying the content of topics. kash.py bridges the gap between the file and the streaming worlds with commands like ""upload"" to upload a file to a topic, or ""download"" to download a topic to a file, and offers commands inspired by functional programming to do Kafka-Kafka/File-Kafka or Kafka-File stream processing (""map"", ""flatMap"", ""filter"" etc.) in one-liners - of course with full support for JSON Schema, Avro and Protobuf.
In this session, we show you how kash.py can be used to bring the two disparate worlds of files and streaming together - and thus not only save a lot of time and money hiring real-time and streaming experts, but also make your data scientists start loving real-time.