Jobid=ac6db5186452 (0.0565)
Senior Software Engineer – Inference Infrastructure
Nebul – Leiden, South Holland, Netherlands (Hybrid)
AI Inference Systems | GPU Performance | Distributed Infrastructure
At Nebul, we are building Europe’s sovereign AI infrastructure. Our Private AI platform delivers secure, high-performance LLM inference across dedicated GPU clusters for enterprises and governments that require full control over their AI stack.
We are hiring a Senior Software Engineer – Inference Infrastructure to design and scale large-scale GPU serving systems. This is a systems-level engineering role working close to runtime internals, CUDA performance, and distributed inference – not model research.
This role is about real AI performance: efficiency, reliability, and scale in production environments.
What You Will Do
- Design and scale distributed inference systems from dozens to thousands of GPUs
- Build and optimize serving pipelines using vLLM, TensorRT, Triton, or SGLang
- Implement performance optimizations: batching, caching, quantization, parallelism
- Develop CUDA-level improvements and GPU memory optimization strategies
- Operate large-scale GPU workloads on Kubernetes with custom scheduling
- Improve GPU utilization, latency, throughput, and fault tolerance
- Own observability, monitoring, and performance in production
What You Bring
- 3–7+ years in systems engineering or ML infrastructure
- Strong experience with GPU systems and CUDA programming
- Hands-on with vLLM, TensorRT, Triton, or similar inference runtimes
- Strong programming skills in Python and Go, Rust, or C++
- Experience with Kubernetes and distributed serving in production
- Solid understanding of performance engineering and debugging
Bonus Experience
- GPU kernel tuning (PTX, CUTLASS, Nsight)
- Mixed precision or quantization (FP8, INT4)
- Ray Serve, Tensor Parallelism, NVIDIA/TorchDynamo
- Background from GPU-scale environments (NVIDIA, Mistral, Hugging Face, Cohere, Together AI, Aleph Alpha, Nebius)
Why Join Nebul
- Build Europe’s sovereign AI infrastructure
- Work on deep performance challenges across large GPU clusters
- Engineering-led culture with fast execution and high ownership
- Competitive salary + equity
- Hybrid setup in Leiden headquarters
Interested in building AI at production scale?
Apply now via Frank Poll and join Nebul — powering secure, independent AI infrastructure.
Deel deze vacature:
