MeteorOps is looking for a freelance Kafka troubleshooting & modernization specialist to step into an older on-prem Kafka environment that currently has no dedicated Kafka owner and limited observability. The cluster supports real-time market quote / HFT tick data at very high throughput (potentially millions of messages/sec) and feeds downstream systems including downsampling services and a SQL Server writer, eventually supporting trading execution workflows.
The Kafka setup is 6–7 years old, deployed on VMware on-prem VMs with 10 Kafka brokers and 5 ZooKeepers, running Kafka 2.13-3.0.0. Each broker has multiple data disks (currently stated as 7 disks ~1TB each; prior notes mention higher disk counts—part of the engagement will be to verify actual layout). Historically disk usage sits around ~10%, but recently one or more brokers spiked toward 100%, coinciding with application Kafka errors and broker/topic instability (e.g., missing leader, invalid partition, impaired topic failover).
You’ll diagnose the incident and underlying risks, produce a clear findings + recommendations report, and help the engineering team implement pragmatic improvements: monitoring, tooling, operational runbooks, resilience/failover improvements, and an assessment of upgrade options (including a path away from ZooKeeper).
Must-have
Nice-to-have



.avif)

.avif)


%20(2).avif)







Submit your CV, LinkedIn, and GitHub via the form. We’ll review your profile.
If your skills align, we'll reach out for a quick conversation to understand your experience and project preferences.
Once selected, we’ll match you with a client project that fits your expertise. A brief onboarding ensures you're set up with our tools and ready to start.