Research Engineer

Posted 3 Days Ago
Be an Early Applicant
Paris, Île-de-France, FRA
Hybrid
Mid level
Gaming
The Role
Design and run experiments to measure how architecture decisions affect inference; create architecture variants optimized for inference speed; own post-training pipelines (fine-tuning, evaluation, adaptation); scale inference for large MoE models; publish research and build agent-driven research tooling.
Summary Generated by Built In

About Kog
Kog builds the fastest LLM inference engine on standard datacenter GPUs. Our Kog Inference Engine generates 3,000 output tokens per second per request on a single 8× AMD MI300X node and 2,100 on an 8× NVIDIA H200 node (FP16, batch size 1, no speculative decoding).
We co-design the model architecture and the execution engine together. Our Laneformer model uses Delayed Tensor Parallelism (DTP), a novel architecture that restructures the Transformer dependency graph so inter-GPU communication overlaps with computation rather than blocking it.
We pretrained a 2B-parameter DTP model on 6T tokens on 256 H100 GPUs.
We are a team of 11 people, including 10 engineers and 4 PhDs.
Test it at playground.kog.ai. Read the technical details on the Kog Labs blog.
What you will work on
You will imagine, design and run experiments to understand how architectural decisions propagate through inference behavior, morph existing open-weight models into architecture variants optimized for speed, and turn findings into measurable gains in generation speed and model quality.

  • Design new model architecture variants, including routing strategies, attention mechanisms, and MoE structure, with execution constraints as a first-order design input.

  • Extend the Laneformer thesis by exploring inference-aware architectural variants such as DTP, Ladder Residual, and PT-Transformer, and finding what compounds at scale.

  • Own the post-training pipeline across fine-tuning, evaluation methodology, and adaptation of existing open-weight models toward architecture variants optimized for inference speed.

  • Scale the stack to large MoE models such as DeepSeek v4 and Qwen 3, working through routing, expert parallelism, and communication patterns at inference time.

  • Write up findings as research papers, submit them to top venues, and present them at conferences.

  • Contribute to building AI agents that will perform architecture research and training experiments autonomously, starting from the research foundations we are building now.
    What we look for
    You are rigorous, curious, and comfortable working at the intersection of model design and hardware constraints.
    You have worked on complex AI problems and have something concrete to show for it. A paper, a repository, a thesis, or a side project with evidence of serious technical thinking is what we want to see.
    Strong signals include experience adapting or modifying existing model architectures, understanding of how communication structure and layer dependencies affect inference behavior, and fluency in Transformers and MoE with enough depth to reason across trade-offs.
    Experience in post-training methods such as fine-tuning, preference optimization, or quantization is a plus, even without production-scale exposure.
    What we offer

  • Direct access to AMD and NVIDIA datacenter GPUs from day one

  • A team where creativity and technical judgment carry weight and where the people closest to the problem shape the key decisions

  • Problems that sit on the critical path of model execution speed and that directly influence what the system can become

  • A remote-friendly working model, though you'll spend at least 50% of your time in our Paris office

Skills Required

  • Demonstrable work on complex AI problems (paper, repository, thesis, or substantial side project)
  • Experience adapting or modifying existing model architectures
  • Fluency in Transformers and Mixture of Experts (MoE) architectures
  • Understanding of communication structure, layer dependencies, and how they affect inference
  • Experience scaling models for inference (routing, expert parallelism, communication patterns)
  • Experience with post-training methods such as fine-tuning, quantization, or preference optimization
  • Ability to write and submit research papers and present at conferences
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
35 Employees
Year Founded: 1994

What We Do

KOG Studios is a South Korean video game developer based in Daegu that specializes in producing online free-to-play games, including Elsword, KurtzPel: Bringer of Chaos, and Grand Chase.

Similar Jobs

UMA Logo UMA

Scientist

Blockchain • Fintech • Software • Financial Services
In-Office
Paris, Île-de-France, FRA
27 Employees

UMA Logo UMA

Scientist

Blockchain • Fintech • Software • Financial Services
In-Office
Paris, Île-de-France, FRA
27 Employees
In-Office or Remote
Paris, Île-de-France, FRA
35 Employees

Mistral AI Logo Mistral AI

Scientist

Artificial Intelligence
Remote or Hybrid
7 Locations
92 Employees

Similar Companies Hiring

DraftKings Thumbnail
Digital Media • Gaming • Information Technology • Software • Sports • Esports • Big Data Analytics
Boston, MA
6400 Employees
bet365 Thumbnail
Digital Media • Gaming • Software • Esports • Automation
Denver, Colorado
10000 Employees
ARB Interactive Thumbnail
Gaming • Software
Miami, Florida
175 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account