/ˈkɑːfkə/
noun — "high-throughput distributed event streaming platform."
Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant, and scalable messaging. It implements a **publish-subscribe** model where producers publish messages to topics, and consumers subscribe to those topics to receive messages asynchronously. This architecture decouples producers and consumers, enabling independent scaling and real-time data processing across distributed systems.
Technically, Kafka consists of brokers, topics, partitions, producers, and consumers. Messages are stored in **append-only logs** partitioned across brokers. Partitions provide parallelism and ordering guarantees, while replication ensures durability and fault tolerance. Consumers maintain offsets to track their progress, allowing replay and exactly-once or at-least-once processing semantics.
Kafka supports streaming pipelines where data flows continuously from sources to sinks. Producers push structured or unstructured events, the Kafka cluster persists them reliably, and consumers process these events in real-time. It integrates with stream processing frameworks, such as Kafka Streams, Flink, and Spark Streaming, to perform transformations, aggregations, or analytics on live data.
In workflow terms, a web application might publish user activity events to Kafka topics. Multiple downstream services consume these events: one updates a recommendation engine, another logs analytics metrics, and a third triggers notifications. Each service scales independently and can replay events as needed, without disrupting producers or other consumers.
Conceptually, Kafka acts as a durable, distributed, and ordered message bus for streaming data. It functions as the backbone for real-time analytics, event-driven architectures, and scalable microservices ecosystems.