Nebul

Jobid=ac6db5186452 (0.0565)

Senior Software Engineer – Inference Infrastructure

Nebul – Leiden, South Holland, Netherlands (Hybrid)

AI Inference Systems | GPU Performance | Distributed Infrastructure

At Nebul, we are building Europe’s sovereign AI infrastructure. Our Private AI platform delivers secure, high-performance LLM inference across dedicated GPU clusters for enterprises and governments that require full control over their AI stack.

We are hiring a Senior Software Engineer – Inference Infrastructure to design and scale large-scale GPU serving systems. This is a systems-level engineering role working close to runtime internals, CUDA performance, and distributed inference – not model research.

This role is about real AI performance: efficiency, reliability, and scale in production environments.

What You Will Do

Design and scale distributed inference systems from dozens to thousands of GPUs
Build and optimize serving pipelines using vLLM, TensorRT, Triton, or SGLang
Implement performance optimizations: batching, caching, quantization, parallelism
Develop CUDA-level improvements and GPU memory optimization strategies
Operate large-scale GPU workloads on Kubernetes with custom scheduling
Improve GPU utilization, latency, throughput, and fault tolerance
Own observability, monitoring, and performance in production

What You Bring

3–7+ years in systems engineering or ML infrastructure
Strong experience with GPU systems and CUDA programming
Hands-on with vLLM, TensorRT, Triton, or similar inference runtimes
Strong programming skills in Python and Go, Rust, or C++
Experience with Kubernetes and distributed serving in production
Solid understanding of performance engineering and debugging

Bonus Experience

GPU kernel tuning (PTX, CUTLASS, Nsight)
Mixed precision or quantization (FP8, INT4)
Ray Serve, Tensor Parallelism, NVIDIA/TorchDynamo
Background from GPU-scale environments (NVIDIA, Mistral, Hugging Face, Cohere, Together AI, Aleph Alpha, Nebius)

Why Join Nebul

Build Europe’s sovereign AI infrastructure
Work on deep performance challenges across large GPU clusters
Engineering-led culture with fast execution and high ownership
Competitive salary + equity
Hybrid setup in Leiden headquarters

Interested in building AI at production scale?

Apply now via Frank Poll and join Nebul — powering secure, independent AI infrastructure.

Lees hier meer

Deel deze vacature:

Senior Software Engineer – Inference Infrastructure – Nebul – Leiden

Nebul