2025-06-01 | ← Blog
A bottom-up walkthrough of Kafka: what it is, how it works internally, and how Spring Boot connects to it. Written as study notes refined from real production experience.
Apache Kafka is an Event Streaming Platform.
Event Streaming means:
| # | Capability |
|---|---|
| 1 | Publish and subscribe to streams of events |
| 2 | Store streams durably for as long as needed |
| 3 | Process streams in real time or retrospectively |
Kafka is distributed, fault-tolerant, and elastic. It runs on bare metal, VMs, or cloud.
An event is something that happened — a record or message. Every event has:
transactionId, userId)
Producers publish (write) events to Kafka.
Consumers read and process events from Kafka.
A Topic is the fundamental way to organize data in Kafka.
Think of a topic like a folder in a filesystem, and events as the files inside it.
| Property | Detail |
|---|---|
| Multi-producer | Many producers can write to the same topic |
| Multi-consumer | Many consumers can read from the same topic |
| No deletion on read | Consumers can re-read events at any time |
| Configurable retention | Keep events for 7 days, or indefinitely |
| Partitioned | Split for scalability and parallelism |
Use descriptive, hyphen-separated names:
payments-processed
user-signup-events
order-created
A Partition is the physical subdivision of a topic. This is the most important concept for scalability.
A Topic is logical. A Partition is physical — an actual append-only log on disk.
| Reason | Explanation |
|---|---|
| Scalability | Data spread across brokers; parallel producers/consumers |
| Throughput | Parallel reads and writes |
| Ordering | Guaranteed within a partition, not across partitions |
| Scenario | Behavior |
|---|---|
| Message has a Key | Kafka hashes the key → same key always hits same partition |
| No Key | Sticky partitioning — picks a partition per batch, then rotates |
| Manual override | Producer explicitly specifies partition number |
Topic: orders ----------------------------- P0 → [event1] [event2] [event3] ← appended in order P1 → [event4] [event5] P2 → [event6] [event7]
Events within a partition are ordered and immutable. Only ever appended to the end.
A broker is a single Kafka server. It:
A Kafka Cluster is multiple brokers working together:
Kafka Cluster ┌───────────────────────────────────────┐ │ Broker 1 Broker 2 Broker 3 │ │ (Node 1) (Node 2) (Node 3) │ └───────────────────────────────────────┘
Adding more brokers = horizontal scaling.
| ZooKeeper (old) | KRaft (new, Kafka 3+) | |
|---|---|---|
| Role | External service managing cluster metadata | Built-in consensus, no external dependency |
| Status | Deprecated | Current standard |
| Setup | Requires separate ZooKeeper cluster | Self-contained |
Use KRaft mode for all new setups.
Every partition has:
Partition P1: ┌──────────────────────────────────────┐ │ Broker-2 (Leader) ← ALL traffic │ │ Broker-1 (Follower) ← replica only │ │ Broker-3 (Follower) ← replica only │ └──────────────────────────────────────┘
One leader means simpler consistency — no stale reads. The leader is the single source of truth.
Replication Factor (RF) = how many copies of a partition exist across the cluster.
RF = 3, Topic: payments, Partition P0 Broker 1 → [P0 Leader] ← handles all reads/writes Broker 2 → [P0 Replica] ← stays in sync Broker 3 → [P0 Replica] ← stays in sync
| RF | Meaning |
|---|---|
| 1 | No redundancy. Broker dies → data lost |
| 2 | One backup. Rarely used in prod |
| 3 | Standard for production |
You need at least as many brokers as your replication factor.
Replicas that are caught up to the leader. If a replica falls behind, it's removed from ISR.
min.insync.replicas=2 with RF=3 means: at least 2 replicas must acknowledge a write before it's confirmed.
Producers write events to Kafka. Key config:
acks
|
Meaning | Risk |
|---|---|---|
0
|
Fire and forget | Message can be lost |
1
|
Leader ACKs only | Lost if leader fails before replication |
all
|
All ISR replicas ACK | Safest — use for financial/critical data |
enable.idempotence=true
Guarantees exactly-once delivery to a partition even if retries happen.
Producers batch messages before sending for throughput efficiency:
| Config | Purpose |
|---|---|
linger.ms
|
Wait up to N ms to fill a batch |
batch.size
|
Max bytes per batch |
compression.type
|
snappy, lz4, gzip — reduces network usage
|
This is how the connection actually works — not just "add bootstrap servers to config".
spring:
kafka:
bootstrap-servers: broker1:9092,broker2:9092,broker3:9092
Bootstrap servers are just an initial contact point. Spring Boot connects to any one of them to fetch cluster metadata (all brokers, all topics, all partition leaders).
From that initial connection, the client gets a full map of the cluster:
Topic: payments Partition 0 → Leader: Broker 2 Partition 1 → Leader: Broker 1 Partition 2 → Leader: Broker 3
After metadata fetch, the producer routes messages directly to the partition leader — not through the bootstrap server.
Spring Boot Producer
│
├──→ Broker 2 (Leader for P0) ← payment with key "txn-001"
├──→ Broker 1 (Leader for P1) ← payment with key "txn-002"
└──→ Broker 3 (Leader for P2) ← payment with key "txn-003"
Key insight: You don't need all brokers in bootstrap-servers — just enough that at least one is reachable at startup.
A Consumer Group is a set of consumers that together consume a topic.
Topic: payments (3 partitions) Consumer Group A (payment-service): Consumer 1 → P0 Consumer 2 → P1 Consumer 3 → P2 Consumer Group B (audit-service): Consumer 1 → P0, P1, P2 ← reads all partitions independently
Kafka triggers a rebalance — partitions are redistributed across the group.
Offset = position of the last consumed message in a partition.
| Mode | Behaviour |
|---|---|
| Auto commit | Kafka commits offset periodically (risk of re-processing on crash) |
| Manual commit | Consumer commits after processing (safer for financial systems) |
Stored in internal topic: __consumer_offsets
For payment systems, always use manual commit:
@KafkaListener(topics = "payments", groupId = "payment-service")
public void consume(ConsumerRecord<String, String> record,
Acknowledgment ack) {
process(record);
ack.acknowledge(); // commit only after successful processing
}
More partitions = more parallelism, but also more overhead.
| Factor | Guidance |
|---|---|
| Max consumer parallelism | Partitions = max consumers you'll ever want in a group |
| Throughput target | Measure per-partition throughput, divide target by that |
| Broker count | Partitions should be a multiple of broker count for even spread |
| Ordering | If strict per-entity ordering is needed, use key-based partitioning |
Desired partitions ≈ max(
target throughput / throughput per partition,
max consumer instances you'll scale to
)
Example: 600 MB/s target, 100 MB/s per partition, max 12 consumers → 12 partitions
You can increase partitions later but cannot decrease them.
Increasing partitions can break key-based ordering for existing keys.
Over-partition slightly rather than under-partition.
| Cluster Size | Recommendation |
|---|---|
| Small / Dev | 3–6 partitions per topic |
| Medium | 12–24 for high-throughput topics |
| Large / Enterprise | 50–100+ based on SLA |
# Generate Cluster ID KAFKA_CLUSTER_ID="$(./bin/kafka-storage.sh random-uuid)" # Format storage ./bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/server.properties # Start the broker ./bin/kafka-server-start.sh config/server.properties
# Create ./bin/kafka-topics.sh --create --topic payments \ --partitions 3 --replication-factor 1 \ --bootstrap-server localhost:9092 # Describe ./bin/kafka-topics.sh --describe --topic payments \ --bootstrap-server localhost:9092 # List ./bin/kafka-topics.sh --list --bootstrap-server localhost:9092 # Delete ./bin/kafka-topics.sh --delete --topic payments \ --bootstrap-server localhost:9092
# Produce ./bin/kafka-console-producer.sh --topic payments \ --bootstrap-server localhost:9092 # Consume from beginning ./bin/kafka-console-consumer.sh --topic payments \ --from-beginning --bootstrap-server localhost:9092 # Consume in a group ./bin/kafka-console-consumer.sh --topic payments \ --group payment-service --bootstrap-server localhost:9092
# Combined mode — good for local dev process.roles=broker,controller # Separate directories for data and metadata log.dirs=/path/to/kafka-broker-logs metadata.log.dir=/path/to/kafka-metadata-logs
In production, put these on separate disks — metadata writes need low latency and shouldn't compete with data writes.
This is based on the payment processing platform I work on:
External Config Kafka Middleware Targets
Sources Manager Cluster Service (3rd Party)
───────── ───────── ───────── ───────── ─────────
┌─────────┐
Merchant → CM Service →│merchant │→ MW consumes → Redis Cache
Data │ topic │ → Payment Processor A
└─────────┘ → Payment Processor B
┌─────────┐
Transaction → App →│txn │→ TLM consumes → DB (save txn)
Events │ topic │
└─────────┘
Why Kafka here?
| Question | Answer |
|---|---|
| What is Kafka? | Distributed, fault-tolerant event streaming platform for publishing, storing, and processing real-time data streams |
| What is a topic? | Logical category for organizing events, split into partitions for scalability |
| What is a partition? | Physical, ordered, append-only log — the actual unit of storage and parallelism |
| What is a broker? | Single Kafka server that stores partitions and serves producer/consumer requests |
| What is a partition leader? | The one broker responsible for all reads and writes for a given partition |
| What is replication factor? | Number of copies of each partition — ensures fault tolerance |
| How does Spring Boot connect? | Connects to bootstrap servers for metadata, then routes directly to partition leaders |
| What is a consumer group? | Set of consumers sharing a topic, with each partition assigned to exactly one consumer |
| Setting | Value |
|---|---|
| Replication Factor (prod) | 3 |
| Min brokers for RF=3 | 3 |
| Default retention | 7 days |
| Max useful consumers per topic | = number of partitions |
Tags: kafka, distributed-systems, backend, spring-boot