LLM Algorithm Engineer

Posted 4 Days Ago
Be an Early Applicant
Changshu Shi, Suzhou, Jiangsu, CHN
In-Office
Mid level
Gaming • Payments • Software • Sports
The Role
Design and deploy high-throughput LLM inference systems on NVIDIA hardware, optimize GPU cluster and KV cache strategies for long contexts, lead model fine-tuning with PEFT (LoRA/QLoRA), implement prompt engineering and RAG, build LangChain/AutoGen-based applications, and develop enterprise code-assist and customer-service platforms integrating IVR/ACD.
Summary Generated by Built In

Job Summary:

We are looking to add a Large Language Model (LLM) Algorithm Engineer in Changsha, China within our EM Labs team.

It is a great opportunity to work in a tech-driven company. In a relaxed and friendly environment, our headquarters are in the heart of the city, at Runhe Financial Center, full of interesting and challenging projects.

Company Intro:

EveryMatrix is a leading B2B SaaS provider delivering iGaming software, content and services. We provide casino, sports betting, platform and payments, and affiliate management to 200 customers worldwide. The company is profitable, has over EUR 100m in annual revenues, and 1200+ employees in offices across ten countries in Europe, Asia and the US. EveryMatrix was founded in 2008 and remains a founder-owned private company.

What You'll get to do:

  • Production Deployment & Optimization of LLM Environments:
  • LLM Application Development:
  • Design distributed deployment solutions based on NVIDIA hardware architecture (NVLink/NVSwitch). Lead framework selection and performance tuning for vLLM, TensorRT-LLM, SGLang, etc., to achieve high-throughput inference services.
  • Build a multi-modal GPU cluster management system to optimize KV Cache storage and loading strategies, improving service efficiency for long-context scenarios.
  • Model optimization and engineering deployment.
  • Design Prompt Engineering strategies combined with RAG (Retrieval-Augmented Generation) technology to enhance response accuracy in scenarios like intelligent customer service and knowledge-based Q&A. Familiarity with LangChain/AutoGen frameworks is required.
  • Lead model fine-tuning (Fine-tuning) using efficient parameter tuning techniques like LoRA/QLoRA to address long-tail issues in vertical domains. Proficiency in PEFT (Parameter-Efficient Fine-Tuning) methods is essential.
  • Develop enterprise-grade internal toolchains, including code assistance/generation tools, code review systems, and private knowledge-based systems.
  • Design external customer systems, such as smart customer service platforms (integrating speech recognition, ticket management, and compliance auditing). Build multi-Agent collaborative online assistants leveraging multi-Agent task allocation mechanisms.

Requirements:

  • Education & Experience:
  • Technical Proficiency:
  • Preferred Qualifications or Experience with:
  • Master’s degree or above in Computer Science, Artificial Intelligence, or related fields.
  • 3+ years of professional experience in NLP/LLM projects.
  • Proficiency in PyTorch/TensorFlow frameworks, with deep understanding of Transformer architecture and optimization of attention mechanisms.
  • Familiarity with CUDA programming and NVLink topology design. Experience in NVIDIA chip operator development (e.g., CUDA kernel optimization) is a plus.
  • Mastery of development and deployment frameworks such as LangChain, vLLM, and SGLang, with the ability to independently develop and deploy API services.
  • Pre-training or fine-tuning of open-source large models (e.g., LLaMA, DeepSeek).
  • Development of intelligent customer service systems (knowledge of IVR, ACD, and call center technologies required).
  • Enterprise-level code assistance tools (e.g., code generation, code review systems).
  • Construction of knowledge graphs in domains like e-commerce or internet industries.

Here's what we offer:

  • Start with 20 days of annual leave, with 2 additional days added each year, up to 30 days by your fifth year with us. Enjoy an additional 13 public holidays and time off for special events, including parental leave, sick leave, bereavement leave, and marriage leave.

Stay Healthy: 10 sick leave days per year, no doctor's note required.

Support for New Parents:

22 weeks of paid maternity leave, with the flexibility to work from home full-time until your child turns 1 year old.

4 weeks of paternity leave, plus the flexibility to work from home full-time until your child is 13 weeks old.

  • Our office perks include on-site massages, and frequent team-building activities in various locations.

Benefits & Perks:

  • Monthly lunch allowance.
  • English courses.
  • Onsite gym.
  • Access online learning platforms like Udemy for Business and LinkedIn Learning, and a budget for external training.

At EveryMatrix, we're committed to creating a supportive and inclusive workplace where you can thrive both personally and professionally. Come join us and experience the difference!

Skills Required

  • Master's degree or above in Computer Science, Artificial Intelligence, or related fields
  • 3+ years of professional experience in NLP/LLM projects
  • Proficiency in PyTorch and/or TensorFlow
  • Deep understanding of Transformer architecture and optimization of attention mechanisms
  • Familiarity with CUDA programming and NVLink topology design
  • Mastery of development and deployment frameworks such as LangChain, vLLM, and SGLang and ability to deploy API services
  • Proficiency in PEFT methods and experience with LoRA/QLoRA for fine-tuning
  • Experience in NVIDIA chip operator development or CUDA kernel optimization
  • Pre-training or fine-tuning of open-source large models (e.g., LLaMA, DeepSeek)
  • Experience developing intelligent customer service systems and knowledge-based Q&A (IVR, ACD, call center technologies)
  • Experience building enterprise code assistance tools or knowledge graphs in e-commerce/internet domains
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
1,500 Employees
Year Founded: 2008

What We Do

EveryMatrix is a leading B2B provider of iGaming software, solutions, and services, including casino, sports betting, payments, and affiliate management. They serve global tier-1 operators and lotteries with a modular, scalable, and compliant platform. Founded in 2008, the company has grown to over 1,500 employees across 16 offices, focusing on innovation and delivering outstanding player experiences in regulated markets worldwide.

Similar Jobs

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

COACH-店铺主管-徐州杉杉奥莱

eCommerce • Fashion • Retail • Sales • Wearables • Design
Hybrid
Xuzhou, Jiangsu, CHN
16000 Employees

Magna International Logo Magna International

Engineer Mechanical Design Inverter

Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Hybrid
Changzhou, Jiangsu, CHN
171000 Employees

Ericsson Logo Ericsson

Manager of Distribution

Cloud • Information Technology • Internet of Things • Machine Learning • Software • Cybersecurity • Infrastructure as a Service (IaaS)
In-Office
Nanjing, Jiangsu, CHN
88000 Employees

Tapestry - Coach and Kate Spade Logo Tapestry - Coach and Kate Spade

COACH-店铺主管-无锡万象城

eCommerce • Fashion • Retail • Sales • Wearables • Design
Hybrid
Wuxi, Jiangsu, CHN
16000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account