How to use these notes: Read top to bottom. Each section builds on the previous one. By the end, you'll be able to explain Kafka in an interview, draw a whiteboard diagram, and know exactly how Spring Boot connects to a Kafka cluster.
Apache Kafka is an Event Streaming Platform.
Event Streaming means:
| # | Capability |
|---|---|
| 1 | Publish (write) and subscribe (read) streams of events |
| 2 | Store streams of events durably and reliably for as long as you want |
| 3 | Process streams as they occur in real time or retrospectively |
All of this is provided in a distributed, fault-tolerant, elastic, and secure manner. Kafka can be deployed on bare metal, VMs, on-premises, or in the cloud.
An event is "something that happened" — a record or message.
When you read or write data to Kafka, you do it in the form of events.
Every event conceptually has:
transactionId, userId)
Producers are client applications that publish (write) events to Kafka.
Consumers are client applications that read and process events from Kafka.
A Topic is the fundamental way to organize data in Kafka.
Think of a topic like a folder in a filesystem, and events are the files inside that folder.
| Property | Detail |
|---|---|
| Multi-producer | One, two, or many producers can write to the same topic |
| Multi-consumer | One, two, or many consumers can read from the same topic |
| Events are not deleted on consumption | Consumers can re-read events at any time |
| Retention is configurable | e.g., keep events for 7 days, or forever |
| Topics are partitioned | For scalability and parallelism (see next section) |
Use descriptive, hyphen-separated names. Examples:
payments-processed
user-signup-events
order-created
A Partition is the physical subdivision of a Kafka topic. This is the most important concept for scalability.
A Topic is logical. A Partition is physical (an actual append-only log on disk).
Instead of storing all events of a topic in one place, Kafka splits (shards) the topic into multiple partitions spread across brokers.
| Reason | Explanation |
|---|---|
| Scalability | Data is spread across multiple brokers; many producers/consumers work in parallel |
| Throughput | Parallel reads and writes = higher throughput |
| Ordering guarantee | Order is guaranteed within a partition, not across partitions |
| Scenario | Behavior |
|---|---|
| Message has a Key | Kafka hashes the key → same key always goes to the same partition (order preserved per key) |
| No Key (default) | Kafka uses sticky partitioning — picks a partition for a batch, then rotates |
| Manual override | Producer explicitly specifies partition number |
Topic: orders ----------------------------- P0 → [event1] [event2] [event3] ← appended in order P1 → [event4] [event5] P2 → [event6] [event7]
Events within a partition are ordered and immutable. New events are only ever appended to the end.
A broker is a single Kafka server (a single running process/node).
A Kafka Cluster is a group of multiple brokers working together.
Kafka Cluster ┌───────────────────────────────────────┐ │ Broker 1 Broker 2 Broker 3 │ │ (Node 1) (Node 2) (Node 3) │ └───────────────────────────────────────┘
| ZooKeeper (old) | KRaft (new, Kafka 3+) | |
|---|---|---|
| Role | External service managing cluster metadata | Built-in consensus, no external dependency |
| Status | Deprecated | Current standard |
| Setup | Requires separate ZooKeeper cluster | Self-contained — just start Kafka |
Use KRaft mode for all new setups. This is what the local setup commands below use.
This is the most important concept for understanding how Kafka achieves both performance and fault tolerance.
Every partition has:
Partition P1: ┌──────────────────────────────────────┐ │ Broker-2 (Leader) ← ALL traffic │ │ Broker-1 (Follower) ← replica only │ │ Broker-3 (Follower) ← replica only │ └──────────────────────────────────────┘
This is how Kafka achieves fault tolerance.
The Replication Factor (RF) defines how many copies of a partition exist across the cluster.
Replication Factor = 3 → 1 leader copy + 2 follower copies = 3 total replicas
Broker-1 Broker-2 Broker-3
-------- -------- --------
P0 (Leader) P1 (Leader) P0 (Follower)
P1 (Follower) P0 (Follower) P1 (Follower)
| Setting | Meaning | Use Case |
|---|---|---|
| RF = 1 | No replication — if broker dies, data is lost | Dev/testing only |
| RF = 2 | One backup copy | Acceptable for non-critical data |
| RF = 3 | Two backup copies | Standard for production |
Rule of thumb: RF should never exceed the number of brokers.
RF = 3 requires at least 3 brokers.
WITHOUT replication: Broker crashes → partition data gone forever ❌ WITH replication (RF=3): Broker crashes → follower becomes new leader → no data loss ✅
bootstrap-servers (initial contact only).
The acks setting controls durability guarantees:
acks value
|
Meaning | Risk |
|---|---|---|
acks=0
|
Fire and forget — no acknowledgment | Possible data loss |
acks=1
|
Leader writes to its log, then acks | Data loss if leader crashes before replication |
acks=all
|
Leader + all ISR followers must acknowledge | Strongest guarantee — use in production |
This is a very common interview question. Understanding this end-to-end is essential.
# application.properties spring.kafka.bootstrap-servers=broker1:9092,broker2:9092,broker3:9092 spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer
Common misconception: "The producer only talks to the bootstrap server."
Reality: Bootstrap servers are just the initial contact point. After the first handshake, the producer talks directly to whichever broker is the partition leader.
Step 1: Bootstrap Connection
Spring Boot Producer
│
│ Initial connection (just for metadata)
▼
Broker-1 (any broker in list)
│
│ Responds with cluster metadata:
│ "Partition 0 → Leader: Broker-1"
│ "Partition 1 → Leader: Broker-2"
│ "Partition 2 → Leader: Broker-3"
▼
Step 2: Partition Selection
Producer hashes the message key
→ selects Partition 1
Step 3: Direct Send to Leader
Producer sends message DIRECTLY to Broker-2
(leader of Partition 1)
Step 4: Replication
Broker-2 (Leader)
│ replicates
├──→ Broker-1 (Follower of P1)
└──→ Broker-3 (Follower of P1)
+----------------------+
| Spring Boot Producer|
+----------+-----------+
│
│ (1) Bootstrap connection
▼
+----------------------+
| Broker-1 |
| (Metadata request) |
+----------+-----------+
│
│ (2) Metadata response:
│ Partition → Leader mapping
▼
Topic: payments
-------------------------
Partition 0 → Broker-1
Partition 1 → Broker-2
Partition 2 → Broker-3
│
│ (3) Select partition (key hash / round-robin)
▼
+----------------------+
| Broker-2 | ← Leader of Partition 1
+----------+-----------+
│
│ (4) Replication
──────────────────────────────
│ │
▼ ▼
+-------------+ +-------------+
| Broker-1 | | Broker-3 |
| (Follower) | | (Follower) |
+-------------+ +-------------+
"Producer connects to any bootstrap broker to fetch cluster metadata — topics, partitions, and their leader brokers. When sending a message, Kafka determines the target partition based on the message key (via hashing) or using round-robin if no key is provided. The producer then sends the message directly to the leader broker of that partition, not necessarily the bootstrap broker. After receiving the message, the leader replicates it to follower brokers. If a broker goes down, Kafka elects a new leader and the producer automatically updates its metadata and continues — no manual intervention needed."
A consumer reads events from one or more partitions of a topic.
__consumer_offsets internal topic).
A Consumer Group is a set of consumers that work together to consume a topic.
Core rule: Within a consumer group, each partition is consumed by at most one consumer. But multiple consumer groups can each independently consume the same topic.
Topic: orders (3 partitions) Consumer Group: payment-service P0 → Consumer-1 P1 → Consumer-2 P2 → Consumer-3
3 partitions, 3 consumers → perfect parallelism ✅
P0 P1 P2
│ │ │
C1 C2 C3
3 partitions, 2 consumers → C1 handles 2 partitions
P0 P1 P2
│ │ │
C1 C1 C2
3 partitions, 4 consumers → C4 is idle
P0 P1 P2 (nothing)
│ │ │ │
C1 C2 C3 C4 ← idle ❌
Maximum useful parallelism = number of partitions
This is where Kafka truly shines. The same data can be consumed independently by completely different services.
Topic: payments (2 partitions)
─────────────────────────────────
P0 P1
Group A (Payment Service): C1 C2
Group B (Fraud Detection): C3 ← reads both P0 and P1
Group C (Analytics): C4 ← reads both P0 and P1
Group D (Audit Logging): C5 ← reads both P0 and P1
Same payment event is consumed by:
"If consumers belong to different consumer groups, each group independently consumes the same data. The one-partition-per-consumer rule applies only within a consumer group. So multiple services can each receive every event, enabling fan-out patterns. Offsets are tracked per consumer group, so each group reads at its own pace without affecting others."
Kafka Connect is a framework for moving data into and out of Kafka without writing custom code.
In enterprise systems, Kafka acts as a central data backbone. But you need to:
Writing this integration code manually is error-prone, repetitive, and fragile. Kafka Connect solves this with ready-made connectors.
Data flows into Kafka from:
Data flows out of Kafka to:
| Problem | Why Kafka Connect Helps |
|---|---|
| Database overload | Kafka buffers data; downstream systems don't query DB directly |
| Point-to-point integrations | One Kafka topic can feed many consumers instead of N×M integrations |
| No real-time streaming | Kafka provides millisecond-latency event propagation |
| No replay capability | Kafka retains events; consumers can replay |
Source Systems Kafka Sink Systems
───────────── ────────────── ──────────────────
MySQL DB → │ │ → Elasticsearch
PostgreSQL → │ Kafka │ → Snowflake DW
File System → │ Cluster │ → AWS S3
Oracle → │ │ → Another Kafka
───────────── ────────────── ──────────────────
↑ Source ↑ Sink
Connectors Connectors
Choosing the right number of partitions is a critical design decision. Too few = bottleneck. Too many = overhead.
BAD DESIGN (too few partitions — bottleneck):
Topic (1 Partition)
│
▼
Broker-1 (Leader)
(ALL traffic hits here) ❌
GOOD DESIGN (distributed load):
Topic (6 Partitions)
P0 → Broker-1 P3 → Broker-1
P1 → Broker-2 P4 → Broker-2
P2 → Broker-3 P5 → Broker-3
✔ Load spread across all brokers
✔ Parallel producers & consumers
| Factor | Guidance |
|---|---|
| Max consumer parallelism | Partitions = max number of consumers you'll ever want in a group |
| Throughput target | Measure throughput per partition, then divide target by that |
| Number of brokers | Partitions should be a multiple of broker count for even distribution |
| Ordering requirements | If you need strict ordering for an entity (e.g., per user), all messages for that entity go to one partition via key |
Desired partition count ≈ max(
target throughput / throughput per partition,
max consumer instances you'll scale to
)
→ Use 12 partitions (covers both throughput and consumer parallelism)
You can increase partitions later, but you cannot decrease them.
Increasing partitions can also break key-based ordering for existing keys.
So: over-partition slightly rather than under-partition.
| Cluster Size | Recommendation |
|---|---|
| Small / Dev | 3–6 partitions per topic |
| Medium | 12–24 partitions for high-throughput topics |
| Large / Enterprise | 50–100+ partitions, based on SLA and scaling targets |
# Step 1: Generate Cluster ID KAFKA_CLUSTER_ID="$(./bin/kafka-storage.sh random-uuid)" # Step 2: Format storage ./bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/server.properties # Step 3: Start the broker ./bin/kafka-server-start.sh config/server.properties
# Create a topic ./bin/kafka-topics.sh \ --create \ --topic payments \ --partitions 3 \ --replication-factor 1 \ --bootstrap-server localhost:9092 # Describe a topic (shows partition distribution, leaders, ISR) ./bin/kafka-topics.sh \ --describe \ --topic payments \ --bootstrap-server localhost:9092 # List all topics ./bin/kafka-topics.sh \ --list \ --bootstrap-server localhost:9092 # Delete a topic ./bin/kafka-topics.sh \ --delete \ --topic payments \ --bootstrap-server localhost:9092
# Start a console producer ./bin/kafka-console-producer.sh \ --topic payments \ --bootstrap-server localhost:9092 # Start a console consumer (read from beginning) ./bin/kafka-console-consumer.sh \ --topic payments \ --from-beginning \ --bootstrap-server localhost:9092 # Consumer in a group ./bin/kafka-console-consumer.sh \ --topic payments \ --group payment-service \ --bootstrap-server localhost:9092
Before Kafka 3.x, Kafka required ZooKeeper as a separate service to manage cluster metadata (leader elections, broker registry, etc.).
KRaft (Kafka Raft) removes this dependency. Kafka now manages its own metadata internally using a built-in Raft consensus protocol.
In server.properties:
# Combined mode (broker + controller in one process) — good for local dev process.roles=broker,controller # Recommended log directory configuration log.dirs=/path/to/kafka-broker-logs metadata.log.dir=/path/to/kafka-metadata-logs
For production, these directories should be on separate disks for reliability.
| Directory | Purpose |
|---|---|
log.dirs
|
Broker data logs — actual topic/partition event data |
metadata.log.dir
|
KRaft controller metadata — cluster state, leader info |
Separating them ensures metadata writes (which need low latency) don't compete with data writes.
This is a real architecture using Kafka to decouple services:
External Config Kafka Middleware Targets
Sources Manager Cluster Service (3rd Party)
───────── ───────── ───────── ───────── ─────────
┌─────────┐
Merchant → CM Service →│bin data │→ MW consumes → Cache
Data │merchant │ merchant & (Redis)
│ topic │ bin data → Payment
└─────────┘ Processor A
→ Payment
┌─────────┐ Processor B
Transaction → App →│txn │→ TLM consumes → DB (save txn)
Events │events │
│ topic │
└─────────┘
Kafka Cluster
Flow:
Why Kafka here?
| Question | One-Line Answer |
|---|---|
| What is Kafka? | A distributed, fault-tolerant event streaming platform for publishing, storing, and processing real-time data streams. |
| What is a topic? | A logical category for organizing events, similar to a folder, split into partitions for scalability. |
| What is a partition? | A physical, ordered, append-only log that is the actual unit of storage and parallelism in Kafka. |
| What is a broker? | A single Kafka server that stores partitions and serves producer/consumer requests. |
| What is a partition leader? | The one broker responsible for all reads and writes for a given partition. |
| What is replication factor? | The number of copies of each partition across the cluster — ensures fault tolerance. |
| How does Spring Boot connect to Kafka? | It connects to bootstrap servers for initial metadata, then routes messages directly to the partition leader. |
| What is a consumer group? | A set of consumers that together consume a topic, with each partition assigned to exactly one consumer in the group. |
| How does Kafka scale? | By increasing partition count and adding brokers, distributing leader partitions across the cluster. |
| Why can't consumers share a partition in a group? | To guarantee ordering — only one consumer reads from a partition at a time within a group. |
| Setting | Common Value |
|---|---|
| Replication Factor (production) | 3 |
| Min brokers for RF=3 | 3 |
| Default retention | 7 days |
| Max consumers useful per topic | = number of partitions |
These notes cover: Event Streaming, Kafka Architecture, Topics, Partitions, Brokers, Leaders & Followers, Replication, Producers, Spring Boot Bootstrap, Consumer Groups, Kafka Connect, Partition Sizing, Topic CLI Commands, and KRaft mode.