Apache Kafka — A Complete Guide

2025-06-01 | ← Blog

A bottom-up walkthrough of Kafka: what it is, how it works internally, and how Spring Boot connects to it. Written as study notes refined from real production experience.


Table of Contents


1. What is Kafka?

Apache Kafka is an Event Streaming Platform.

Event Streaming means:

Kafka's 3 Core Capabilities

# Capability
1 Publish and subscribe to streams of events
2 Store streams durably for as long as needed
3 Process streams in real time or retrospectively

Kafka is distributed, fault-tolerant, and elastic. It runs on bare metal, VMs, or cloud.


2. Core Concepts

Events

An event is something that happened — a record or message. Every event has:

Producers

Producers publish (write) events to Kafka.

Consumers

Consumers read and process events from Kafka.


3. Topics

A Topic is the fundamental way to organize data in Kafka.

Think of a topic like a folder in a filesystem, and events as the files inside it.
Property Detail
Multi-producer Many producers can write to the same topic
Multi-consumer Many consumers can read from the same topic
No deletion on read Consumers can re-read events at any time
Configurable retention Keep events for 7 days, or indefinitely
Partitioned Split for scalability and parallelism

Naming Convention

Use descriptive, hyphen-separated names:


4. Partitions

A Partition is the physical subdivision of a topic. This is the most important concept for scalability.

A Topic is logical. A Partition is physical — an actual append-only log on disk.

Why Partitions Exist

Reason Explanation
Scalability Data spread across brokers; parallel producers/consumers
Throughput Parallel reads and writes
Ordering Guaranteed within a partition, not across partitions

How Events Land in a Partition

Scenario Behavior
Message has a Key Kafka hashes the key → same key always hits same partition
No Key Sticky partitioning — picks a partition per batch, then rotates
Manual override Producer explicitly specifies partition number

Partition as Append-Only Log

Topic: orders
-----------------------------
P0 → [event1] [event2] [event3]   ← appended in order
P1 → [event4] [event5]
P2 → [event6] [event7]

Events within a partition are ordered and immutable. Only ever appended to the end.


5. Brokers & Clusters

A broker is a single Kafka server. It:

A Kafka Cluster is multiple brokers working together:

Kafka Cluster
┌───────────────────────────────────────┐
│  Broker 1    Broker 2    Broker 3     │
│  (Node 1)    (Node 2)    (Node 3)     │
└───────────────────────────────────────┘

Adding more brokers = horizontal scaling.

ZooKeeper vs KRaft

  ZooKeeper (old) KRaft (new, Kafka 3+)
Role External service managing cluster metadata Built-in consensus, no external dependency
Status Deprecated Current standard
Setup Requires separate ZooKeeper cluster Self-contained

Use KRaft mode for all new setups.


6. Partition Leader & Follower

Every partition has:

One leader means simpler consistency — no stale reads. The leader is the single source of truth.

What Happens When a Leader Fails?


7. Replication Factor

Replication Factor (RF) = how many copies of a partition exist across the cluster.

RF = 3, Topic: payments, Partition P0

  Broker 1 → [P0 Leader]   ← handles all reads/writes
  Broker 2 → [P0 Replica]  ← stays in sync
  Broker 3 → [P0 Replica]  ← stays in sync
RF Meaning
1 No redundancy. Broker dies → data lost
2 One backup. Rarely used in prod
3 Standard for production

You need at least as many brokers as your replication factor.

ISR — In-Sync Replicas

Replicas that are caught up to the leader. If a replica falls behind, it's removed from ISR.

min.insync.replicas=2 with RF=3 means: at least 2 replicas must acknowledge a write before it's confirmed.


8. Producers

Producers write events to Kafka. Key config:

Acknowledgement Modes (acks)

acks Meaning Risk
0 Fire and forget Message can be lost
1 Leader ACKs only Lost if leader fails before replication
all All ISR replicas ACK Safest — use for financial/critical data

Idempotent Producer

enable.idempotence=true

Guarantees exactly-once delivery to a partition even if retries happen.

Batching

Producers batch messages before sending for throughput efficiency:

Config Purpose
linger.ms Wait up to N ms to fill a batch
batch.size Max bytes per batch
compression.type snappy, lz4, gzip — reduces network usage

9. How Spring Boot Connects to Kafka

This is how the connection actually works — not just "add bootstrap servers to config".

Step 1: Bootstrap

spring:
  kafka:
    bootstrap-servers: broker1:9092,broker2:9092,broker3:9092

Bootstrap servers are just an initial contact point. Spring Boot connects to any one of them to fetch cluster metadata (all brokers, all topics, all partition leaders).

Step 2: Metadata Fetch

From that initial connection, the client gets a full map of the cluster:

Topic: payments
  Partition 0 → Leader: Broker 2
  Partition 1 → Leader: Broker 1
  Partition 2 → Leader: Broker 3

Step 3: Direct Routing

After metadata fetch, the producer routes messages directly to the partition leader — not through the bootstrap server.

Spring Boot Producer
        │
        ├──→ Broker 2 (Leader for P0) ← payment with key "txn-001"
        ├──→ Broker 1 (Leader for P1) ← payment with key "txn-002"
        └──→ Broker 3 (Leader for P2) ← payment with key "txn-003"

Key insight: You don't need all brokers in bootstrap-servers — just enough that at least one is reachable at startup.


10. Consumers & Consumer Groups

Consumer Group

A Consumer Group is a set of consumers that together consume a topic.

What Happens If a Consumer Joins/Leaves?

Kafka triggers a rebalance — partitions are redistributed across the group.

Offset Management

Offset = position of the last consumed message in a partition.

Mode Behaviour
Auto commit Kafka commits offset periodically (risk of re-processing on crash)
Manual commit Consumer commits after processing (safer for financial systems)

Stored in internal topic: __consumer_offsets

enable.auto.commit=false

For payment systems, always use manual commit:

@KafkaListener(topics = "payments", groupId = "payment-service")
public void consume(ConsumerRecord<String, String> record,
                    Acknowledgment ack) {
    process(record);
    ack.acknowledge(); // commit only after successful processing
}

11. How to Decide Partition Count

More partitions = more parallelism, but also more overhead.

Factor Guidance
Max consumer parallelism Partitions = max consumers you'll ever want in a group
Throughput target Measure per-partition throughput, divide target by that
Broker count Partitions should be a multiple of broker count for even spread
Ordering If strict per-entity ordering is needed, use key-based partitioning

Practical Formula

Desired partitions ≈ max(
    target throughput / throughput per partition,
    max consumer instances you'll scale to
)

Example: 600 MB/s target, 100 MB/s per partition, max 12 consumers → 12 partitions

You can increase partitions later but cannot decrease them.
Increasing partitions can break key-based ordering for existing keys.
Over-partition slightly rather than under-partition.
Cluster Size Recommendation
Small / Dev 3–6 partitions per topic
Medium 12–24 for high-throughput topics
Large / Enterprise 50–100+ based on SLA

12. Running Kafka Locally (KRaft Mode)

Start the Broker

# Generate Cluster ID
KAFKA_CLUSTER_ID="$(./bin/kafka-storage.sh random-uuid)"

# Format storage
./bin/kafka-storage.sh format --standalone -t $KAFKA_CLUSTER_ID -c config/server.properties

# Start the broker
./bin/kafka-server-start.sh config/server.properties

Topic Operations

# Create
./bin/kafka-topics.sh --create --topic payments \
  --partitions 3 --replication-factor 1 \
  --bootstrap-server localhost:9092

# Describe
./bin/kafka-topics.sh --describe --topic payments \
  --bootstrap-server localhost:9092

# List
./bin/kafka-topics.sh --list --bootstrap-server localhost:9092

# Delete
./bin/kafka-topics.sh --delete --topic payments \
  --bootstrap-server localhost:9092

Test with CLI Producer/Consumer

# Produce
./bin/kafka-console-producer.sh --topic payments \
  --bootstrap-server localhost:9092

# Consume from beginning
./bin/kafka-console-consumer.sh --topic payments \
  --from-beginning --bootstrap-server localhost:9092

# Consume in a group
./bin/kafka-console-consumer.sh --topic payments \
  --group payment-service --bootstrap-server localhost:9092

KRaft server.properties

# Combined mode — good for local dev
process.roles=broker,controller

# Separate directories for data and metadata
log.dirs=/path/to/kafka-broker-logs
metadata.log.dir=/path/to/kafka-metadata-logs

In production, put these on separate disks — metadata writes need low latency and shouldn't compete with data writes.


13. Real-World Architecture Example

This is based on the payment processing platform I work on:

External        Config      Kafka          Middleware     Targets
Sources         Manager    Cluster         Service        (3rd Party)
─────────      ─────────  ─────────       ─────────      ─────────
                          ┌─────────┐
 Merchant  →  CM Service →│merchant │→ MW consumes  →   Redis Cache
 Data                     │ topic   │                →   Payment Processor A
                          └─────────┘               →   Payment Processor B

                          ┌─────────┐
 Transaction →  App      →│txn      │→ TLM consumes → DB (save txn)
 Events                   │ topic   │
                          └─────────┘

Why Kafka here?


14. Interview Cheat Sheet

One-Line Answers

Question Answer
What is Kafka? Distributed, fault-tolerant event streaming platform for publishing, storing, and processing real-time data streams
What is a topic? Logical category for organizing events, split into partitions for scalability
What is a partition? Physical, ordered, append-only log — the actual unit of storage and parallelism
What is a broker? Single Kafka server that stores partitions and serves producer/consumer requests
What is a partition leader? The one broker responsible for all reads and writes for a given partition
What is replication factor? Number of copies of each partition — ensures fault tolerance
How does Spring Boot connect? Connects to bootstrap servers for metadata, then routes directly to partition leaders
What is a consumer group? Set of consumers sharing a topic, with each partition assigned to exactly one consumer

Key Numbers

Setting Value
Replication Factor (prod) 3
Min brokers for RF=3 3
Default retention 7 days
Max useful consumers per topic = number of partitions

Phrases Worth Remembering


Tags: kafka, distributed-systems, backend, spring-boot

← Blog | Home