Lead GPU Engineer

Posted 14 Days Ago
Be an Early Applicant
Hiring Remotely in Paris, Île-de-France, FRA
In-Office or Remote
Expert/Leader
Gaming
The Role
The Lead GPU Engineer will develop and optimize Kog's inference engine focusing on GPU kernels and performance, collaborating closely with architecture and engineering teams to improve generation speed and execution efficiency.
Summary Generated by Built In

A shift is happening in AI that most people have not fully priced in. As models become more capable and agents take over more software work, inference becomes the critical bottleneck. The question stops being whether a model can do the work and becomes whether it can run fast enough to feel like thinking.

Kog was built for that shift.

We co-design the execution engine and the model architecture together, specifically for AMD MI300X hardware. Our monokernel runs from first token to last without returning control to the CPU. Our Laneformer architecture is designed to overlap computation and communication by deferring all-reduce by one layer.

Today, Kog serves 2,500 tokens per second. Our next target is 5,000.

Our MoE v3 already outperforms Llama 3.2-3B on CORE benchmarks and shows emergent reasoning capabilities where dense models of similar size score zero.

We are a team of 11 people, including 10 engineers and 4 PhDs, building a different kind of inference company from first principles.

Why this role matters now

Inference speed is becoming a product constraint, a model constraint, and a company constraint at the same time. At Kog, this role sits directly on that bottleneck. The work you do here will shape token-by-token generation speed, influence which model designs become viable, and determine how quickly engineering judgment turns into measurable performance.

The problem

Most inference systems still carry architectural decisions that made sense for an earlier generation of workloads. Sequential generation still absorbs synchronization costs, CPU handoffs, and memory behavior that become limiting when every token matters.

Kog took a different route. We built a monokernel execution path and co-designed the model architecture with the hardware. That created a different set of opportunities and a higher level of technical difficulty. Progress comes from understanding the machine at a very fine-grained level, making strong tradeoffs, and turning them into real gains in generation speed.

The role

You will own the technical execution of the Kog inference engine at the hardware boundary. You will work close to the machine, close to the model, and close to the people making the most consequential architectural decisions in the company.

This is a hands-on leadership role. You will write code, review kernels, define performance priorities, make architecture calls, and drive a team toward improvements that matter in production. You will be expected to move between microscopic detail and system-level judgment with the same rigor.

What you will work on

  • Kernel architecture for the monokernel pipeline, including memory hierarchy choices, scheduling behavior, and strategies that hide HBM latency behind useful computation

  • Low-level optimization work on modern GPU hardware, with profiling that turns machine behavior into concrete engineering decisions

  • Execution strategies that improve end-to-end sequential generation speed rather than isolated wins on local kernels

  • Close collaboration with model architecture to turn model constraints into execution opportunities, and execution constraints into model design feedback

  • Technical direction for a small team working on the critical path of generation speed

  • Engineering milestones that connect ambitious performance targets to work that can be shipped, measured, and iterated on quickly

Must-have

  • You have written GPU kernels for production workloads where performance was central to the system outcome

  • You understand memory hierarchy, scheduling, occupancy, and execution behavior at the level where you can anticipate likely bottlenecks before profiling confirms them

  • You have shipped optimizations with measurable impact and can explain the exact decisions that created the result

  • You have operated with real ownership over difficult technical work and raised the standard of the people around you through code, reviews, and decision-making

  • You are comfortable carrying both individual technical depth and team-level responsibility in the same role

Strong signal

  • You have deep low-level GPU performance experience on AMD, NVIDIA, or both

  • You have worked on inference engine components such as attention kernels, KV cache management, quantization-aware execution, or communication-sensitive execution paths

  • You have built or shaped systems where model behavior and execution behavior had to be designed together

  • You have a public trace of serious low-level work, such as benchmarks, repositories, technical writing, conference talks, or profiling methods adopted by others

Top 0.1% for this role

The strongest candidates for this role have already developed original judgment at the hardware boundary. They have found performance wins that were not obvious from documentation alone. They can explain why those wins worked, what tradeoffs they introduced, and how those decisions improved real token-by-token generation speed.

They have a track record of shortening the loop between observation, hypothesis, implementation, and measured result. They bring both authorship and taste. They know when to push deeper into the machine, when to change the execution plan, and when to influence model structure so the whole system moves faster.

What we offer

  • Direct access to AMD MI300X clusters from day one, with enough compute to validate serious work at real scale

  • A team where technical judgment carries weight and where the people closest to the problem shape the key decisions

  • Problems that sit on the critical path of model execution speed and that directly influence what the system can become

  • A remote-first working model, with regular time overlap close to France time and monthly Paris weeks for engineering depth, alignment, and time together

  • Compensation aligned with top technical profiles in the Paris AI market, including meaningful equity

Skills Required

  • Experience writing GPU kernels for production workloads
  • Understanding of memory hierarchy, scheduling, occupancy
  • Track record of shipping optimizations with measurable impact
  • Real ownership over technical work and team influence
  • Comfortable with individual technical depth and team responsibility
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
35 Employees
Year Founded: 1994

What We Do

KOG Studios is a South Korean video game developer based in Daegu that specializes in producing online free-to-play games, including Elsword, KurtzPel: Bringer of Chaos, and Grand Chase.

Similar Jobs

Circle (circle.so) Logo Circle (circle.so)

Designer

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Easy Apply
Remote
31 Locations
250 Employees
100K-120K Annually

Circle (circle.so) Logo Circle (circle.so)

Senior Account Executive

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Easy Apply
Remote
31 Locations
250 Employees
150K-190K Annually

Dynatrace Logo Dynatrace

Solutions Engineer

Artificial Intelligence • Big Data • Cloud • Information Technology • Software • Big Data Analytics • Automation
Remote or Hybrid
Boulogne-Billancourt, Hauts-de-Seine, Île-de-France, FRA
5200 Employees

ServiceNow Logo ServiceNow

Business Development Manager

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation
Remote or Hybrid
Issy-les-Moulineaux, Hauts-de-Seine, Île-de-France, FRA
29000 Employees

Similar Companies Hiring

DraftKings Thumbnail
Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics
Boston, MA
6400 Employees
bet365 Thumbnail
Digital Media • Gaming • Software • Esports • Automation
Denver, Colorado
9000 Employees
ARB Interactive Thumbnail
Gaming • Software
Miami, Florida
175 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account