a-d
Apache Kafka: Benefits and Use Cases
Apache Kafka is an open-source distributed streaming platform that's incredibly popular due to being reliable, durable, and scalable. Created at LinkedIn in 2011 to handle real-time data feeds, today, it's used by over 80% of the Fortune 100 today to build streaming data pipelines, integrate data, enable event-driven architecture, and more.
Application Programming Interface (API)
An application programming interface (API) is a set of protocols that help computer programs interact with one another. Learn how APIs work, with examples, an introduction to each API type, and the best tools to use.
Application Security (AppSec)
Application security refers to the different sets of processes, practices, and tools maintaining the security of the software application against any external threat or vulnerability.
Automotive SPICE
ASPICE is a framework designed to assess and enhance the software development processes within the automotive industry.
Batch Processing
Batch processing is when the processing and analysis happens on a set of data that have already been stored over a period of time. An example is payroll and billing systems that have to be processed weekly or monthly. Learn how batch processing works, when to use it, common tools, and alternatives.
Beam: Unified Data Pipelines, Batch Processing, and Streaming
Apache Beam is a unified model that defines and executes batch and stream data processing pipelines. Learn Beam architecture, its benefits, examples, and how it works.
Bring Your Own Cloud
Bring Your Own Cloud (BYOC) involves deploying a vendor's software in a customer's cloud environment, typically within their own VPC (Virtual Private Cloud), while data resides in that customer’s cloud environment.
Change Data Capture (CDC)
Change Data Capture (CDC) is a software process that identifies, processes, and tracks changes in a database. Ultimately, CDC allows for low-latency, reliable, and scalable data movement and replication between all your data sources.
CI/CD
In today’s fast-paced environment, success in software development depends significantly on development speed, reliability, and security.
Cloud Migrations
There are plenty of benefits for moving to the cloud, however cloud migrations are not a simple, one-time project. Learn how cloud migrations work, and the best way to undergo this complex process.
Command Query Responsibility Segregation (CQRS)
CQRS is an architectural design pattern that helps handle commands to read and write data in a scalable way. Learn how it works, its benefits, use cases, and how to get started.
Complex Event Processing (CEP)
Similar to event stream processing, complex event processing (CEP) is a technology for aggregating, processing, and analyzing massive streams of data in order to gain real-time insights from events as they occur.
Data Fabric
Data fabric architectures enable consistent data access and capabilities across distributed systems. Learn how it’s used, examples, benefits, and common solutions.
Data Flow
Also known as dataflow or data movement, data flow refers to how information moves through a system. Learn how it works, its benefits, and modern dataflow solutions.
Data Governance
Data governance is a process to ensure data access, usability, integrity, and security for all the data enterprise systems, based on internal data standards and policies that also control data usage. Effective data governance ensures that data is consistent and trustworthy and doesn't get misused. It's increasingly critical as organizations face new data privacy regulations and rely more and more on data analytics to help optimize operations and drive business decision-making.
Data in Motion
Also known as data in transit or data in flight, data in motion is a process in which digital information is transported between locations either within or between computer systems. The term can also be used to describe data within a computer's RAM that is ready to be read, accessed, updated or processed. Data in motion is one of the three different states of data; the others are data at rest and data in use.
Data Ingestion
Data ingestion is the extraction of data from multiple sources into a data store for further processing and analysis. Learn about ingestion architectures, processes, and the best tools.
Data Integration
Data integration works by unifying data across disparate sources for a complete view of your business. Learn how data integration works with benefits, examples, and use cases.
Data Lakes, Databases, and Data Warehouses
Learn the most common types of data stores: the database, data lake, relational database, and data warehouse. You'll also learn the difference, commonalities, and which to choose.
Data Mesh Basics, Principles and Architecture
Data mesh is a decentralized approach for data management, data federation, governance designed to enhance data sharing and scalability within organizations.
Data Pipeline
A data pipeline is a set of data processing actions to move data from source to destination. From ingestion and ETL, to streaming data pipelines, learn how it works with examples.
Data Routing
If computer networks were cities, routing would be the interstates and freeways connecting them all, and vehicles would be the data packets traveling along those routes.
Data Serialization
Data serialization can be defined as the process of converting data objects to a sequence of bytes or characters to preserve their structure in an easily storable and transmittable format.
Data Streaming
Streaming Data is the continuous, simultaneous flow of data generated by various sources, which are typically fed into a data streaming platform for real-time processing, event-driven applications, and analytics.
Databases & DBMS
A database is a collection of structured data (or information) stored electronically, which allows for easier access, data management, and retrieval. Learn the different types of databases, how they're used, and how to use a database management system to simplify data management.
Distributed Control System
A Distributed Control System (DCS) is a control system used in industrial processes to manage and automate complex operations.
Distributed Systems
Also known as distributed computing, a distributed system is a collection of independent components on different machines that aim to operate as a single system.
e-l
Enterprise Service Bus (ESB)
An ESB is an architectural pattern that centralizes integrations between applications.
Event Streaming
Event streaming (similar to event sourcing, stream processing, and data streaming) allows for events to be processed, stored, and acted upon as they happen in real-time.
Event-Driven Architecture
Event-driven architecture is a software design pattern that can detect, process, and react to real-time events as they happen. Learn how it works, benefits, use cases, and examples.
Event Sourcing
Event sourcing tracks the current state of the system, and how it evolves over time. Learn how event sourcing works, its benefits and use cases, and how to get started.
Extract Load Transform (ELT)
ELT (Extract, Load, Transform) is a data integration process where raw data is loaded first and transformation happens after.
Extract Transform Load (ETL)
Extract, Transform, Load (ETL) is a three-step process used to consolidate data from multiple sources. Learn how it works, and how it differs from ELT and Streaming ETL.
Federal Information Processing Standards (FIPS)
FIPS or Federal Information Processing Standards is a set of publicly announced standards developed by the National Institute of Standards and Technology (NIST).
Flink: Unified Streaming and Batch Processing
Apache Flink is an open-source framework that unifies real-time distributed streaming and batch processing. Learn about Flink architecture, how it works, and how it's used.
Flume: Log Collection, Aggregation, and Processing
Apache Flume is an open-source distributed system designed for efficient data extraction, aggregation, and movement from various sources to a centralized storage or processing system.
Generative AI (GenAI)
GenAI refers to deep-learning models that generate text, images, audio, and videos from trained data in real time. Learn how GenAI works with use case examples.
Interoperability
Interoperability is when disparate systems, devices, and software can communicate and exchange data in order to accomplish tasks. Here’s why it’s important, how it works, and how to get started.
Infrastructure as Code (IaC)
IaC is a transformative approach to managing IT infrastructure by allowing organizations to define and provision their resources through code rather than manual processes.
Kafka Backup
Apache Kafka's backup mechanisms are essential components of a robust data infrastructure strategy.
Kafka Benefits and Use Cases
Apache Kafka is the most commonly used stream processing / data streaming system. Learn how Kafka benefits companies big and small, why it's so popular, and common use cases.
Kafka Message Key
A Kafka message key is an attribute that you can assign to a message in a Kafka topic. Each Kafka message consists of two primary components: a key and a value.
Kafka Message Size Limit
The Kafka message size limit is the maximum size a message can be to be successfully produced to a Kafka topic.
Kafka Partition Key
A partition key in Apache Kafka is a fundamental concept that plays a critical role in Kafka's partitioning mechanism.
Kafka Partition Strategy
Apache Kafka partition strategy revolves around how Kafka divides data across multiple partitions within a topic to optimize throughput, reliability, and scalability.
Kafka Sink Connector
A Kafka Sink Connector is part of Kafka Connect, the robust API for data integration. It makes data integration easy by providing automation of data movement both in and out of Kafka.
Kafka Source Connectors
Kafka Source Connectors are integral components of Kafka Connect, designed to import data into Kafka topics from various data sources.
Kafka Streams
Apache Kafka Streams is a client library for building applications and microservices that process and analyze data stored in Kafka.
Kafka Topic Naming Convention
Kafka Topic Naming convention keeps your data organized and makes it easier to understand, scale, and maintain.
m-r
Message Brokers
Message brokers facilitate communication between applications, systems, and services. Learn how they work, their use cases, and how to get started.
Microservices Architecture
Microservices refers to an architectural approach where software applications are composed of small, independently deployable services that communicate with each other over a network.
Middleware
Middleware is a type of messaging that simplifies integration between applications and systems. Learn how middleware works, its benefits, use cases, and common solutions.
Nifi - Data Processing, Routing, and Distribution
Apache NiFi is an integrated data logistics platform for automating the movement of data between disparate systems. It can handle various data types and support various protocols. Learn how it's used, and how it works.
NIST SSDF
The National Institute of Standards and Technology's Secure Software Development Framework NIST SSDF is a set of guidelines that are intended to assist organizations in developing their software securely.
Observability
Observability is the ability to measure the current state or condition of your system based on the data it generates. With the adoption of distributed systems, cloud computing, and microservices, observability has become more critical, yet complex.
Publish-Subscribe Messaging
Pub/sub is a messaging framework commonly used for inter-service communication and data integration pipelines. Learn how it works, with examples, benefits, and use cases.
RabbitMQ
RabbitMQ is a message broker that routes messages between two applications. Learn how RabbitMQ works, common use cases, pros, cons, and best alternatives.
RabbitMQ vs Apache Kafka
RabbitMQ and Apache Kafka are both open-source distributed messaging systems, but they have different strengths and weaknesses.
RAG
RAG leverages real-time, domain-specific data to improve the accuracy of LLM-generated responses and prevent hallucinations. Learn how RAG works with use case examples.
Real-Time Data & Analytics
Real-time data (RTD) refers to data that is processed, consumed, and/or acted upon immediately after it's generated. While data processing is not new, real-time data streaming is a newer paradigm that changes how businesses run.
Redpanda vs Kafka
A complete comparison of Kafka vs Redpanda and two cloud Kafka services - Confluent vs Redpanda. Learn how each works, the pros and cons, and how their features stack up.
Refactoring
Refactoring is an important part of software development that optimizes the code's internal structure without changing how the application works on the outside.
Rest API
REST API stands for Representational State Transfer. Learn more about REST API, how it simplifies server communication, and how it leverages large-scale data.
s-z
Shift Left
Shift Left in data integration is derived from the software engineering principles of Shift Left Testing where tests are performed earlier in the software development lifecycle to improve quality of software, accelerate time to market, and identify issues earlier.
Static Application Security Testing (SAST)
SAST is a method that checks for security flaws in code before it reaches production.
Stream Processing
Stream processing allows for data to be ingested, processed, and managed in real-time, as it's generated. Learn how stream processing works, its benefits, common use cases, and the best technologies to get started.
Streaming Analytics
Streaming analytics is an approach to business analytics and business intelligence where data is analyzed in real-time. Learn how streaming analytics works, common use cases, and technologies.
Streaming Data Pipelines
Streaming data pipelines move data from multiple sources to multiple target destinations in real time. Learn how they work, with examples and demos.
Streaming ETL vs ELT vs ETL
What is ETL vs ELT streaming, and how are they different from streaming ETL pipelines? Learn the differences between data pipeline and integration tools, their processes, and which to choose.
Technical Debt
Technical debt is a concept that originated in software engineering and refers to the future costs due to shortcuts taken in system development.