Nebul

Senior Software Engineer – Inference Infrastructure – Nebul – Leiden

Jobid=ac6db5186452 (0.0565)

Senior Software Engineer – Inference Infrastructure

Nebul – Leiden, South Holland, Netherlands (Hybrid)

AI Inference Systems | GPU Performance | Distributed Infrastructure

At Nebul, we are building Europe’s sovereign AI infrastructure. Our Private AI platform delivers secure, high-performance LLM inference across dedicated GPU clusters for enterprises and governments that require full control over their AI stack.

We are hiring a Senior Software Engineer – Inference Infrastructure to design and scale large-scale GPU serving systems. This is a systems-level engineering role working close to runtime internals, CUDA performance, and distributed inference – not model research.

This role is about real AI performance: efficiency, reliability, and scale in production environments.

What You Will Do

  • Design and scale distributed inference systems from dozens to thousands of GPUs
  • Build and optimize serving pipelines using vLLM, TensorRT, Triton, or SGLang
  • Implement performance optimizations: batching, caching, quantization, parallelism
  • Develop CUDA-level improvements and GPU memory optimization strategies
  • Operate large-scale GPU workloads on Kubernetes with custom scheduling
  • Improve GPU utilization, latency, throughput, and fault tolerance
  • Own observability, monitoring, and performance in production

What You Bring

  • 3–7+ years in systems engineering or ML infrastructure
  • Strong experience with GPU systems and CUDA programming
  • Hands-on with vLLM, TensorRT, Triton, or similar inference runtimes
  • Strong programming skills in Python and Go, Rust, or C++
  • Experience with Kubernetes and distributed serving in production
  • Solid understanding of performance engineering and debugging

Bonus Experience

  • GPU kernel tuning (PTX, CUTLASS, Nsight)
  • Mixed precision or quantization (FP8, INT4)
  • Ray Serve, Tensor Parallelism, NVIDIA/TorchDynamo
  • Background from GPU-scale environments (NVIDIA, Mistral, Hugging Face, Cohere, Together AI, Aleph Alpha, Nebius)

Why Join Nebul

  • Build Europe’s sovereign AI infrastructure
  • Work on deep performance challenges across large GPU clusters
  • Engineering-led culture with fast execution and high ownership
  • Competitive salary + equity
  • Hybrid setup in Leiden headquarters

Interested in building AI at production scale?

Apply now via Frank Poll and join Nebul — powering secure, independent AI infrastructure.

Lees hier meer

Deel deze vacature:

Deel deze vacature: