We’re looking for a GPU Performance Engineer to help teams turn “it runs” into “it scales.” This role is for someone who can move confidently between GPU kernels, distributed training communication, and system-level bottlenecks—then translate findings into practical fixes that improve throughput, latency, and cost.
You’ll focus on performance work across CUDA code paths, NCCL-backed multi-GPU/multi-node communication, and end-to-end profiling. The goal is not theoretical optimization—it’s measurable wins: faster training steps, better GPU utilization, stable scaling efficiency, and clear evidence for what changed and why.
You’ll profile real workloads, form hypotheses, run controlled experiments, and deliver improvements that hold up in production. You’ll work across the stack—from kernel-level tuning and memory behavior to overlap of compute/communication and network-aware scaling.
Success looks like sustained performance gains that are easy to verify: improved step time, higher effective TFLOPs, better scaling efficiency, fewer performance regressions, and a repeatable methodology the team can keep using after your engagement.
You’ll collaborate closely with ML engineers, systems/platform teams, and anyone touching the training stack. You’ll be expected to communicate clearly—sharing traces, explaining tradeoffs, and recommending the next highest-leverage change instead of chasing micro-optimizations.
You will make GPU and multi-node training/inference faster, more stable, and easier to operate by turning profiling data into concrete fixes across kernels, communication, and system configuration.
Start by reproducing current performance baselines, validating profiling methodology, and mapping the top bottlenecks across compute, memory, and communication. You’ll work closely with ML engineers and systems teams to prioritize fixes and define success metrics.
.avif)


.avif)
%20(2).avif)











Submit your CV, LinkedIn, and GitHub via the form. We’ll review your profile.
If your skills align, we'll reach out for a quick conversation to understand your experience and project preferences.
Once selected, we’ll match you with a client project that fits your expertise. A brief onboarding ensures you're set up with our tools and ready to start.