kash.py - How to Make Your Data Scientists Love Real-time

« Kafka Summit London 2023

Implementing real-time data pipelines is still a challenge. Even more so for data scientists who have often been brought up with batch processing and files and typically have only heard of Kafka, but never really used it. Now if your team consists only of data scientists and you want them to implement a real-time data pipeline, your fate seems to be sealed. You need to hire real-time and streaming experts first - and you are guaranteed to lose months before you can even start implementing. But is there really no other way? At Forecasty.AI, developer of a SaaS platform for commodity price forecasting called Commodity Desk, we thought again. And came up with a way how to bridge the gap between the batch and file-based world of most data scientists and the world of real-time and streaming.

Our solution is a new Open Source library allowing any Python programmer, including the aforementioned data scientists, to access the Kafka API in an easier way than ever before. kash.py (""Kafka Shell for Python"") offers a large number of easy-to-use abstractions on top of the Kafka API, including bash-inspired commands like ""ls"" or ""l"" for listing Kafka topics and ""cat"", ""head"" or ""tail"" for displaying the content of topics. kash.py bridges the gap between the file and the streaming worlds with commands like ""upload"" to upload a file to a topic, or ""download"" to download a topic to a file, and offers commands inspired by functional programming to do Kafka-Kafka/File-Kafka or Kafka-File stream processing (""map"", ""flatMap"", ""filter"" etc.) in one-liners - of course with full support for JSON Schema, Avro and Protobuf.

In this session, we show you how kash.py can be used to bring the two disparate worlds of files and streaming together - and thus not only save a lot of time and money hiring real-time and streaming experts, but also make your data scientists start loving real-time.

Presenter

Ralph M. Debusmann

Forecasty.ai

Ralph is part technologist, part solution architect, part software engineer and part AI researcher working as the CTO of Forecasty.ai. He has received his PhD in computer science focusing on Natural Language Processing and Artificial Intelligence in 2006 and has spent 14 years at SAP and Bosch before joining Forecasty in 2021.

kash.py - How to Make Your Data Scientists Love Real-time

Presenter

Ralph M. Debusmann

Related Links

How Confluent Completes Apache Kafka eBook

Leverage a cloud-native service 10x better than Apache Kafka

Confluent Developer Center

Spend less on Kafka with Confluent, come see how