Skip to main content

2 posts tagged with "Kafka"

View All Tags

· 4 min read
Byju Luckose

In the age of microservices and polyglot development, it's common for teams to use different languages for different tasks Java for orchestration, Python for AI, and C# for enterprise system integration. To tie all this together, Apache Kafka shines as a powerful messaging backbone. In this blog post, we’ll explore how to build a multi-language worker architecture using Spring Boot and Kafka, with workers written in Java, Python, and C#.

Why Use Kafka with Multiple Language Workers?

Kafka is a distributed message queue designed for high-throughput and decoupled communication. Using Kafka with multi-language workers allows you to:

  • Scale task execution independently per language.

  • Use the best language for each task.

  • Decouple orchestration logic from implementation details.

  • Add or remove workers without restarting the system.

Architecture Overview


+-----------------------------+ Kafka Topics +-------------------------+
| Spring Boot App | ---------------------------> | |
| (Orchestrator) | [task-submission] | Java Worker |
| | | - Parses DOCX |
| - Accepts job via REST | <--------------------------- | - Converts to PDF |
| - Sends JSON tasks to Kafka| [task-result] +-------------------------+
| - Collects results | +-------------------------+
+-----------------------------+ | |
| Python Worker |
| - Runs ML Inference |
| - Extracts Text |
+-------------------------+
| |
| C# (.NET) Worker |
| - Legacy System API |
| - Data Enrichment |
+-------------------------+



Topics

  • task-submission: Receives tasks from orchestrator

  • task-result: Publishes results from workers

Common Message Format

All communication uses a shared JSON message schema:


{
"jobId": "123e4567-e89b-12d3-a456-426614174000",
"taskType": "DOC_CONVERT",
"payload": {
"source": "http://example.com/sample.docx",
"outputFormat": "pdf"
}
}


Spring Boot Orchestrator

Dependencies (Maven)

<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>

REST + Kafka Integration

@RestController
@RequestMapping("/jobs")
public class JobController {

private final KafkaTemplate<String, String> kafkaTemplate;
private final ObjectMapper objectMapper = new ObjectMapper();

public JobController(KafkaTemplate<String, String> kafkaTemplate) {
this.kafkaTemplate = kafkaTemplate;
}

@PostMapping
public ResponseEntity<String> submitJob(@RequestBody Map<String, Object> job) throws JsonProcessingException {
String jobId = UUID.randomUUID().toString();
job.put("jobId", jobId);
String json = objectMapper.writeValueAsString(job);
kafkaTemplate.send("task-submission", jobId, json);
return ResponseEntity.ok("Job submitted: " + jobId);
}

@KafkaListener(topics = "task-result", groupId = "orchestrator")
public void receiveResult(String message) {
System.out.println("Received result: " + message);
}
}


Java Worker Example


@KafkaListener(topics = "task-submission", groupId = "java-worker")
public void consume(String message) throws JsonProcessingException {
ObjectMapper mapper = new ObjectMapper();
Map<String, Object> task = mapper.readValue(message, new TypeReference<>() {});
// ... Process ...
Map<String, Object> result = Map.of(
"jobId", task.get("jobId"),
"status", "done",
"worker", "java"
);
kafkaTemplate.send("task-result", mapper.writeValueAsString(result));
}

Python Worker Example


from kafka import KafkaConsumer, KafkaProducer
import json

consumer = KafkaConsumer('task-submission', bootstrap_servers='localhost:9092', group_id='py-worker')
producer = KafkaProducer(bootstrap_servers='localhost:9092', value_serializer=lambda v: json.dumps(v).encode())

for msg in consumer:
task = json.loads(msg.value.decode())
print("Python Worker got task:", task)

result = {
"jobId": task["jobId"],
"status": "completed",
"worker": "python"
}
producer.send("task-result", result)


C# Worker Example (.NET Core)


using Confluent.Kafka;
using System.Text.Json;

var config = new ConsumerConfig { BootstrapServers = "localhost:9092", GroupId = "csharp-worker" };
var consumer = new ConsumerBuilder<Ignore, string>(config).Build();
var producer = new ProducerBuilder<Null, string>(new ProducerConfig { BootstrapServers = "localhost:9092" }).Build();

consumer.Subscribe("task-submission");

while (true)
{
var consumeResult = consumer.Consume();
var task = JsonSerializer.Deserialize<Dictionary<string, object>>(consumeResult.Message.Value);

var result = new {
jobId = task["jobId"],
status = "done",
worker = "csharp"
};

producer.Produce("task-result", new Message<Null, string> {
Value = JsonSerializer.Serialize(result)
});
}

Monitoring & Logging

  • Use Prometheus + Grafana to monitor worker throughput and failures.

  • Add structured logs with jobId for end-to-end traceability.

Local Testing Tips

  • Use Docker to spin up Kafka quickly (e.g., Bitnami Kafka).

  • Use test producers/consumers (kafka-console-producer, kafka-console-consumer) to verify topics.

  • Use Postman or cURL to submit jobs via Spring Boot.

Benefits of This Architecture

FeatureBenefit
Kafka decouplingWorkers can scale independently
Multi-language supportBest language per use case
Spring Boot OrchestratorCentral control and REST API
Standard JSON formatEasy integration and testing

Conclusion

This architecture empowers teams to build distributed, language-agnostic workflows powered by Kafka. By combining the orchestration strength of Spring Boot with the flexibility of multi-language workers, you can build scalable, fault-tolerant systems that grow with your needs.

· 3 min read
Byju Luckose

In this blog post, we'll build a real-world application that combines Spring StateMachine, Apache Kafka, and CSV-based document ingestion to manage complex document lifecycles in a scalable and reactive way.

Use Case Overview

You have a CSV file that contains many documents. Each row defines a document and an event to apply (e.g., START, COMPLETE). The system should:

  1. Read the CSV file

  2. Send a Kafka message for each row

  3. Consume the Kafka message

  4. Trigger a Spring StateMachine transition for the related document

  5. Persist the updated document state

Sample CSV Format


documentId,title,state,event
doc-001,Contract A,NEW,START
doc-002,Contract B,NEW,START
doc-003,Report C,PROCESSING,COMPLETE

Technologies Used

  • Java 17

  • Spring Boot 3.x

  • Spring StateMachine

  • Spring Kafka

  • Apache Commons CSV

  • H2 Database

Enum Definitions


public enum DocumentState {
NEW, PROCESSING, COMPLETED, ERROR
}

public enum DocumentEvent {
START, COMPLETE, FAIL
}

StateMachine Configuration

@Configuration
@EnableStateMachineFactory
public class DocumentStateMachineConfig extends StateMachineConfigurerAdapter<DocumentState, DocumentEvent> {

@Override
public void configure(StateMachineTransitionConfigurer<DocumentState, DocumentEvent> transitions) throws Exception {
transitions
.withExternal().source(DocumentState.NEW).target(DocumentState.PROCESSING).event(DocumentEvent.START)
.and()
.withExternal().source(DocumentState.PROCESSING).target(DocumentState.COMPLETED).event(DocumentEvent.COMPLETE)
.and()
.withExternal().source(DocumentState.NEW).target(DocumentState.ERROR).event(DocumentEvent.FAIL);
}
}

Document Entity

@Entity
public class Document {

@Id
private String id;
private String title;

@Enumerated(EnumType.STRING)
private DocumentState state;

// Getters and Setters
}

Kafka Producer and CSV Processing


@Component
public class CsvProcessor {

@Autowired
private KafkaTemplate<String, String> kafkaTemplate;

public void processCSV(InputStream inputStream) throws IOException {
try (BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
CSVParser parser = CSVFormat.DEFAULT.withFirstRecordAsHeader().parse(reader)) {

for (CSVRecord record : parser) {
String documentId = record.get("documentId");
String event = record.get("event");
kafkaTemplate.send("document-events", documentId, event);
}
}
}
}

REST Upload Endpoint

@RestController
@RequestMapping("/api/documents")
public class DocumentUploadController {

@Autowired
private CsvProcessor csvProcessor;

@PostMapping("/upload")
public ResponseEntity<String> upload(@RequestParam("file") MultipartFile file) throws IOException {
csvProcessor.processCSV(file.getInputStream());
return ResponseEntity.ok("CSV processed successfully");
}
}

Kafka Listener and State Transition


@Component
public class DocumentEventListener {

@Autowired
private StateMachineFactory<DocumentState, DocumentEvent> stateMachineFactory;

@Autowired
private DocumentRepository documentRepository;

@KafkaListener(topics = "document-events")
public void onMessage(ConsumerRecord<String, String> record) {
String docId = record.key();
DocumentEvent event = DocumentEvent.valueOf(record.value());

StateMachine<DocumentState, DocumentEvent> sm = stateMachineFactory.getStateMachine(docId);
sm.start();
sm.sendEvent(event);

Document doc = documentRepository.findById(docId).orElseThrow();
doc.setState(sm.getState().getId());
documentRepository.save(doc);
}
}

Document Repository


public interface DocumentRepository extends JpaRepository<Document, String> {}

Final Thoughts

This architecture provides:

  • Decoupled, event-driven state management

  • Easily testable document lifecycles

  • A scalable pattern for batch processing from CSVs

You can extend this with:

  • Retry transitions

  • Error handling

  • Audit logging

  • UI feedback via WebSockets or REST polling

Let me know if you'd like the full GitHub repo, Docker setup, or integration with a frontend uploader!