2 posts tagged with "Kafka"

Building a Multi Language Worker Architecture with Kafka and Spring Boot

June 3, 2025 · 4 min read

Developer

In the age of microservices and polyglot development, it's common for teams to use different languages for different tasks Java for orchestration, Python for AI, and C# for enterprise system integration. To tie all this together, Apache Kafka shines as a powerful messaging backbone. In this blog post, we’ll explore how to build a multi-language worker architecture using Spring Boot and Kafka, with workers written in Java, Python, and C#.

Why Use Kafka with Multiple Language Workers?

Kafka is a distributed message queue designed for high-throughput and decoupled communication. Using Kafka with multi-language workers allows you to:

Scale task execution independently per language.
Use the best language for each task.
Decouple orchestration logic from implementation details.
Add or remove workers without restarting the system.

Architecture Overview

+-----------------------------+         Kafka Topics          +-------------------------+
|      Spring Boot App        | ---------------------------> |                         |
|      (Orchestrator)         |       [task-submission]       |     Java Worker         |
|                             |                               |   - Parses DOCX         |
|  - Accepts job via REST     | <---------------------------  |   - Converts to PDF     |
|  - Sends JSON tasks to Kafka|       [task-result]           +-------------------------+
|  - Collects results         |                               +-------------------------+
+-----------------------------+                               |                         |
                                                              |     Python Worker       |
                                                              |   - Runs ML Inference   |
                                                              |   - Extracts Text       |
                                                              +-------------------------+
                                                              |                         |
                                                              |     C# (.NET) Worker    |
                                                              |   - Legacy System API   |
                                                              |   - Data Enrichment     |
                                                              +-------------------------+

Topics

task-submission: Receives tasks from orchestrator
task-result: Publishes results from workers

Common Message Format

All communication uses a shared JSON message schema:

{
  "jobId": "123e4567-e89b-12d3-a456-426614174000",
  "taskType": "DOC_CONVERT",
  "payload": {
    "source": "http://example.com/sample.docx",
    "outputFormat": "pdf"
  }
}

Spring Boot Orchestrator

Dependencies (Maven)

<dependency>
    <groupId>org.springframework.kafka</groupId>
    <artifactId>spring-kafka</artifactId>
</dependency>

REST + Kafka Integration

@RestController
@RequestMapping("/jobs")
public class JobController {

    private final KafkaTemplate<String, String> kafkaTemplate;
    private final ObjectMapper objectMapper = new ObjectMapper();

    public JobController(KafkaTemplate<String, String> kafkaTemplate) {
        this.kafkaTemplate = kafkaTemplate;
    }

    @PostMapping
    public ResponseEntity<String> submitJob(@RequestBody Map<String, Object> job) throws JsonProcessingException {
        String jobId = UUID.randomUUID().toString();
        job.put("jobId", jobId);
        String json = objectMapper.writeValueAsString(job);
        kafkaTemplate.send("task-submission", jobId, json);
        return ResponseEntity.ok("Job submitted: " + jobId);
    }

    @KafkaListener(topics = "task-result", groupId = "orchestrator")
    public void receiveResult(String message) {
        System.out.println("Received result: " + message);
    }
}

Java Worker Example

@KafkaListener(topics = "task-submission", groupId = "java-worker")
public void consume(String message) throws JsonProcessingException {
    ObjectMapper mapper = new ObjectMapper();
    Map<String, Object> task = mapper.readValue(message, new TypeReference<>() {});
    // ... Process ...
    Map<String, Object> result = Map.of(
            "jobId", task.get("jobId"),
            "status", "done",
            "worker", "java"
    );
    kafkaTemplate.send("task-result", mapper.writeValueAsString(result));
}

Python Worker Example

from kafka import KafkaConsumer, KafkaProducer
import json

consumer = KafkaConsumer('task-submission', bootstrap_servers='localhost:9092', group_id='py-worker')
producer = KafkaProducer(bootstrap_servers='localhost:9092', value_serializer=lambda v: json.dumps(v).encode())

for msg in consumer:
    task = json.loads(msg.value.decode())
    print("Python Worker got task:", task)

    result = {
        "jobId": task["jobId"],
        "status": "completed",
        "worker": "python"
    }
    producer.send("task-result", result)

C# Worker Example (.NET Core)

using Confluent.Kafka;
using System.Text.Json;

var config = new ConsumerConfig { BootstrapServers = "localhost:9092", GroupId = "csharp-worker" };
var consumer = new ConsumerBuilder<Ignore, string>(config).Build();
var producer = new ProducerBuilder<Null, string>(new ProducerConfig { BootstrapServers = "localhost:9092" }).Build();

consumer.Subscribe("task-submission");

while (true)
{
    var consumeResult = consumer.Consume();
    var task = JsonSerializer.Deserialize<Dictionary<string, object>>(consumeResult.Message.Value);
    
    var result = new {
        jobId = task["jobId"],
        status = "done",
        worker = "csharp"
    };

    producer.Produce("task-result", new Message<Null, string> {
        Value = JsonSerializer.Serialize(result)
    });
}

Monitoring & Logging

Use Prometheus + Grafana to monitor worker throughput and failures.
Add structured logs with jobId for end-to-end traceability.

Local Testing Tips

Use Docker to spin up Kafka quickly (e.g., Bitnami Kafka).
Use test producers/consumers (kafka-console-producer, kafka-console-consumer) to verify topics.
Use Postman or cURL to submit jobs via Spring Boot.

Benefits of This Architecture

Feature	Benefit
Kafka decoupling	Workers can scale independently
Multi-language support	Best language per use case
Spring Boot Orchestrator	Central control and REST API
Standard JSON format	Easy integration and testing

Conclusion

This architecture empowers teams to build distributed, language-agnostic workflows powered by Kafka. By combining the orchestration strength of Spring Boot with the flexibility of multi-language workers, you can build scalable, fault-tolerant systems that grow with your needs.

Stateful Document Processing with Spring StateMachine, Kafka, and CSV

May 19, 2025 · 3 min read

Byju Luckose

Developer

In this blog post, we'll build a real-world application that combines Spring StateMachine, Apache Kafka, and CSV-based document ingestion to manage complex document lifecycles in a scalable and reactive way.

Use Case Overview

You have a CSV file that contains many documents. Each row defines a document and an event to apply (e.g., START, COMPLETE). The system should:

Read the CSV file
Send a Kafka message for each row
Consume the Kafka message
Trigger a Spring StateMachine transition for the related document
Persist the updated document state

Sample CSV Format

documentId,title,state,event
doc-001,Contract A,NEW,START
doc-002,Contract B,NEW,START
doc-003,Report C,PROCESSING,COMPLETE

Technologies Used

Java 17
Spring Boot 3.x
Spring StateMachine
Spring Kafka
Apache Commons CSV
H2 Database

Enum Definitions

public enum DocumentState {
    NEW, PROCESSING, COMPLETED, ERROR
}

public enum DocumentEvent {
    START, COMPLETE, FAIL
}

StateMachine Configuration

@Configuration
@EnableStateMachineFactory
public class DocumentStateMachineConfig extends StateMachineConfigurerAdapter<DocumentState, DocumentEvent> {

    @Override
    public void configure(StateMachineTransitionConfigurer<DocumentState, DocumentEvent> transitions) throws Exception {
        transitions
                .withExternal().source(DocumentState.NEW).target(DocumentState.PROCESSING).event(DocumentEvent.START)
                .and()
                .withExternal().source(DocumentState.PROCESSING).target(DocumentState.COMPLETED).event(DocumentEvent.COMPLETE)
                .and()
                .withExternal().source(DocumentState.NEW).target(DocumentState.ERROR).event(DocumentEvent.FAIL);
    }
}

Document Entity

@Entity
public class Document {

    @Id
    private String id;
    private String title;

    @Enumerated(EnumType.STRING)
    private DocumentState state;

    // Getters and Setters
}

Kafka Producer and CSV Processing

@Component
public class CsvProcessor {

    @Autowired
    private KafkaTemplate<String, String> kafkaTemplate;

    public void processCSV(InputStream inputStream) throws IOException {
        try (BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
             CSVParser parser = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(reader)) {

            for (CSVRecord record : parser) {
                String documentId = record.get("documentId");
                String event = record.get("event");
                kafkaTemplate.send("document-events", documentId, event);
            }
        }
    }
}

REST Upload Endpoint

@RestController
@RequestMapping("/api/documents")
public class DocumentUploadController {

    @Autowired
    private CsvProcessor csvProcessor;

    @PostMapping("/upload")
    public ResponseEntity<String> upload(@RequestParam("file") MultipartFile file) throws IOException {
        csvProcessor.processCSV(file.getInputStream());
        return ResponseEntity.ok("CSV processed successfully");
    }
}

Kafka Listener and State Transition

@Component
public class DocumentEventListener {

    @Autowired
    private StateMachineFactory<DocumentState, DocumentEvent> stateMachineFactory;

    @Autowired
    private DocumentRepository documentRepository;

    @KafkaListener(topics = "document-events")
    public void onMessage(ConsumerRecord<String, String> record) {
        String docId = record.key();
        DocumentEvent event = DocumentEvent.valueOf(record.value());

        StateMachine<DocumentState, DocumentEvent> sm = stateMachineFactory.getStateMachine(docId);
        sm.start();
        sm.sendEvent(event);

        Document doc = documentRepository.findById(docId).orElseThrow();
        doc.setState(sm.getState().getId());
        documentRepository.save(doc);
    }
}

Document Repository

public interface DocumentRepository extends JpaRepository<Document, String> {}

Final Thoughts

This architecture provides:

Decoupled, event-driven state management
Easily testable document lifecycles
A scalable pattern for batch processing from CSVs

You can extend this with:

Retry transitions
Error handling
Audit logging
UI feedback via WebSockets or REST polling

Let me know if you'd like the full GitHub repo, Docker setup, or integration with a frontend uploader!

Why Use Kafka with Multiple Language Workers?​

Architecture Overview​

Topics​

Common Message Format​

Spring Boot Orchestrator​

Dependencies (Maven)​

REST + Kafka Integration​

Java Worker Example​

Python Worker Example​

C# Worker Example (.NET Core)​

Monitoring & Logging​

Local Testing Tips​

Benefits of This Architecture​

Conclusion​

Use Case Overview​

Sample CSV Format​

Technologies Used​

Enum Definitions​

StateMachine Configuration​

Document Entity​

Kafka Producer and CSV Processing​

REST Upload Endpoint​

Kafka Listener and State Transition​

Document Repository​

Final Thoughts​

This architecture provides:​

You can extend this with:​