Lead GPU Engineer

Reposted 13 Days Ago
Be an Early Applicant
Hiring Remotely in Paris, Île-de-France, FRA
In-Office or Remote
Expert/Leader
Gaming
The Role
The Lead GPU Engineer will develop and optimize Kog's inference engine focusing on GPU kernels and performance, collaborating closely with architecture and engineering teams to improve generation speed and execution efficiency.
Summary Generated by Built In
About Kog

Kog builds the fastest LLM inference engine on standard datacenter GPUs. Our Kog Inference Engine generates 3,000 output tokens per second per request on a single 8× AMD MI300X node and 2,100 on an 8× NVIDIA H200 node (FP16, batch size 1, no speculative decoding).

The hot path is a monokernel implemented with handwritten CUDA (with PTX inline assembly) on NVIDIA, and HIP (with CDNA ISA inline assembly) on AMD.

We optimize at the low level with engine/kernel/model co-design, using reverse engineering to understand and exploit the details of how the GPU hardware works at the micro level.

We are a team of 11 people, including 10 engineers and 4 PhDs.

Test it at playground.kog.ai. Read the technical details on the Kog Labs blog.

What you will work on

You will own the technical execution of the Kog inference engine at the hardware boundary. You write code, review kernels, define performance priorities, make architecture calls, and drive a small team toward improvements that matter in production.

You move between microscopic detail and system-level judgment with the same rigor.

  • Own the monokernel pipeline, the single persistent GPU program that covers the full decode pass from QKV projection to LM head sampling, across AMD and NVIDIA architectures.

  • Drive low-level GPU optimization, including impossibly-fast grid synchronizations and inter-GPU collectives, and optimized GEMM and attention kernels for specific batch sizes and context lengths.

  • Build and maintain profiling infrastructure inside a monokernel, including custom instrumentation, device-timestamp frameworks, and per-stage analysis to translate machine behavior into concrete engineering decisions.

  • Scale the stack to third-party MoE models such as DeepSeek v4 and Qwen 3 to push generation speed on the models that matter in production today.

  • Contribute to building AI agents that will perform GPU Engineering research and kernel optimization autonomously, calibrated to hardware target and workload, starting from the inference foundations we are building now.

  • Set technical direction for a small team, raise the bar through code and reviews, and connect ambitious performance targets to work that can be shipped, measured, and iterated on quickly.

What we look for

You have written GPU kernels for production workloads where performance was the central constraint. Showing the code is a requirement to move forward in the process.

You have operated with real ownership over difficult technical work and raised the standard of the people around you through code, reviews, and decision-making. You are comfortable carrying both individual technical depth and team-level responsibility in the same role.

Stronger signals include inline PTX or CDNA ISA in public repositories, experience with latency-sensitive execution paths, understanding of why MBU matters more than MFU at batch size 1, and a background in inference engine components. A top engineering school or a PhD with concrete GPU work counts, even without extensive industry experience.

PyTorch custom ops are an acceptable starting point if the kernels show a genuine understanding of the hardware below the framework level.

Top 0.1% for this role

The strongest candidates have already developed original judgment at the hardware boundary. They find performance wins that require going beyond what documentation exposes. They can explain why those wins worked, what tradeoffs they introduced, and how those decisions improved real token-by-token generation speed.

They shorten the loop between observation, hypothesis, implementation, and measured result. They know when to push deeper into the machine, when to change the execution plan, and when to influence model structure so the whole system moves faster. And they make the engineers around them better.

What we offer
  • Direct access to AMD and NVIDIA datacenter GPUs from day one

  • A team where creativity and technical judgment carry weight and where the people closest to the problem shape the key decisions

  • Problems that sit on the critical path of model execution speed and that directly influence what the system can become

  • Compensation aligned with top technical profiles in the Paris AI market, including equity

Skills Required

  • Experience writing GPU kernels for production workloads
  • Understanding of memory hierarchy, scheduling, occupancy
  • Track record of shipping optimizations with measurable impact
  • Real ownership over technical work and team influence
  • Comfortable with individual technical depth and team responsibility
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
35 Employees
Year Founded: 1994

What We Do

KOG Studios is a South Korean video game developer based in Daegu that specializes in producing online free-to-play games, including Elsword, KurtzPel: Bringer of Chaos, and Grand Chase.

Similar Jobs

ServiceNow Logo ServiceNow

Customer Success Executive

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Issy-les-Moulineaux, Hauts-de-Seine, Île-de-France, FRA
29000 Employees

Circle (circle.so) Logo Circle (circle.so)

Designer

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Easy Apply
Remote
31 Locations
250 Employees
100K-120K Annually

LogicMonitor Logo LogicMonitor

Account Executive

Artificial Intelligence • Cloud • Information Technology • Machine Learning • Software
Easy Apply
Remote or Hybrid
27 Locations
1100 Employees

Datadog Logo Datadog

Senior Software Engineer

Artificial Intelligence • Cloud • Security • Software • Cybersecurity
Easy Apply
Remote or Hybrid
10 Locations
6500 Employees

Similar Companies Hiring

DraftKings Thumbnail
Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics
Boston, MA
6400 Employees
bet365 Thumbnail
Digital Media • Gaming • Software • Esports • Automation
Denver, Colorado
10000 Employees
ARB Interactive Thumbnail
Gaming • Software
Miami, Florida
175 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account