About Kog
Kog builds the fastest LLM inference engine on standard datacenter GPUs. Our Kog Inference Engine generates 3,000 output tokens per second per request on a single 8× AMD MI300X node and 2,100 on an 8× NVIDIA H200 node (FP16, batch size 1, no speculative decoding).
We co-design the model architecture and the execution engine together. Our Laneformer model uses Delayed Tensor Parallelism (DTP), a novel architecture that restructures the Transformer dependency graph so inter-GPU communication overlaps with computation rather than blocking it.
We pretrained a 2B-parameter DTP model on 6T tokens on 256 H100 GPUs.
We are a team of 11 people, including 10 engineers and 4 PhDs.
Test it at playground.kog.ai. Read the technical details on the Kog Labs blog.
What you will work on
You will imagine, design and run experiments to understand how architectural decisions propagate through inference behavior, morph existing open-weight models into architecture variants optimized for speed, and turn findings into measurable gains in generation speed and model quality.
Design new model architecture variants, including routing strategies, attention mechanisms, and MoE structure, with execution constraints as a first-order design input.
Extend the Laneformer thesis by exploring inference-aware architectural variants such as DTP, Ladder Residual, and PT-Transformer, and finding what compounds at scale.
Own the post-training pipeline across fine-tuning, evaluation methodology, and adaptation of existing open-weight models toward architecture variants optimized for inference speed.
Scale the stack to large MoE models such as DeepSeek v4 and Qwen 3, working through routing, expert parallelism, and communication patterns at inference time.
Write up findings as research papers, submit them to top venues, and present them at conferences.
Contribute to building AI agents that will perform architecture research and training experiments autonomously, starting from the research foundations we are building now.
What we look for
You are rigorous, curious, and comfortable working at the intersection of model design and hardware constraints.
You have worked on complex AI problems and have something concrete to show for it. A paper, a repository, a thesis, or a side project with evidence of serious technical thinking is what we want to see.
Strong signals include experience adapting or modifying existing model architectures, understanding of how communication structure and layer dependencies affect inference behavior, and fluency in Transformers and MoE with enough depth to reason across trade-offs.
Experience in post-training methods such as fine-tuning, preference optimization, or quantization is a plus, even without production-scale exposure.
What we offerDirect access to AMD and NVIDIA datacenter GPUs from day one
A team where creativity and technical judgment carry weight and where the people closest to the problem shape the key decisions
Problems that sit on the critical path of model execution speed and that directly influence what the system can become
A remote-friendly working model, though you'll spend at least 50% of your time in our Paris office
Skills Required
- Demonstrable work on complex AI problems (paper, repository, thesis, or substantial side project)
- Experience adapting or modifying existing model architectures
- Fluency in Transformers and Mixture of Experts (MoE) architectures
- Understanding of communication structure, layer dependencies, and how they affect inference
- Experience scaling models for inference (routing, expert parallelism, communication patterns)
- Experience with post-training methods such as fine-tuning, quantization, or preference optimization
- Ability to write and submit research papers and present at conferences
What We Do
KOG Studios is a South Korean video game developer based in Daegu that specializes in producing online free-to-play games, including Elsword, KurtzPel: Bringer of Chaos, and Grand Chase.






