Infrastructure Engineer

Posted 5 Days Ago
Be an Early Applicant
Toronto, ON, CAN
In-Office
Senior level
Software
The Role
Operate and maintain production infrastructure and internal services, including DGX/GPU clusters, bare-metal Kubernetes servers, CI/CD pipelines, and the customer-facing AI Gateway. Drive reliability, security hardening, observability, and DevOps best practices; contribute to product hardening and roadmap. Ensure uptime for internal and customer-facing systems.
Summary Generated by Built In
At Shakudo, we're building the world's first operating system for data and AI. We use the term "operating system" in the truest sense: just like iOS, Windows, or Linux, Shakudo's end-to-end OS provides ever-evolving, fully automated, best-in-class open-source components tailored to each business's unique needs.
 
We are seeking an Infrastructure Engineer to join our Business Automation team to own and operate the internal systems, infrastructure, and AI Gateway product that power Shakudo at scale. This is a hands-on role for someone who thrives on keeping production systems reliable, secure, and fast. You will be responsible for everything from physical servers and DGX machines to CI/CD pipelines and customer-facing AI Gateway infrastructure. You will also contribute directly to product hardening, security, and DevOps practices across the platform.
 
At Shakudo, our culture is proactive, collaborative, and supportive — we succeed together by building strong partnerships and solving complex challenges. We expect high ownership: you will be hands-on, driving outcomes directly rather than delegating or waiting for direction. Individual contribution matters here — your work will have a visible, measurable impact on the company's operations and product.

Key Responsibilities

  • Maintain and operate internal services for the rest of the Shakudo employees, including proprietary applications for sales and ETL pipelines
  • Maintain and operate DGX machines that host LLMs for the team's use
  • Maintain and operate Shakudo's product for Shakudo's internal use, and contribute to product hardening, security, and DevOps practices
  • Maintain and operate physical servers for Kubernetes clusters and ensure uptime
  • Create CI/CD pipelines for internal deployments
  • Maintain and operate the AI Gateway product for customers, ensure uptime, and contribute to product roadmap

Qualifications

  • 8+ years of experience across software, data, platform, or AI engineering roles
  • 5+ years of strong experience with Kubernetes cluster operation and DevOps, and bare-metal server operations
  • Experience operating production infrastructure at scale, including physical servers, GPU clusters, and CI/CD systems
  • Strong background in security hardening, observability, and reliability engineering
  • Proficiency in Rust is preferred
  • Experience with AI/ML infrastructure, including LLM hosting and inference serving is preferred 

Why Shakudo Stands Out

    Work with cutting-edge technologies in machine learning and high-performance computing. Contribute to a platform that transforms how organizations leverage data and AI. Join a dynamic team that values innovation, efficiency, and diversity.
     
    Shakudo offers a high-impact package: competitive salary, meaningful equity so you share in the upside of transformational technology, and comprehensive health benefits that have you fully covered. We provide a flexible vacation policy—because building transformational technology requires supporting the people who build it. More importantly, you'll work on technology that matters.
     
    This role is based onsite in Toronto to support the high security requirements of our clients and enable effective collaboration. We have a welcoming office environment with a very focused and passionate team, doing meaningful, impactful work together.
     
    Shakudo is an equal opportunity employer and encourages candidates of all backgrounds to apply. We foster diversity and inclusivity and welcome applications from a broad range of backgrounds and experiences.

Skills Required

  • 8+ years of experience across software, data, platform, or AI engineering roles
  • 5+ years of experience with Kubernetes cluster operation, DevOps, and bare-metal server operations
  • Experience operating production infrastructure at scale, including physical servers, GPU clusters, and CI/CD systems
  • Strong background in security hardening, observability, and reliability engineering
  • Proficiency in Rust
  • Experience with AI/ML infrastructure, including LLM hosting and inference serving
  • Onsite work in Toronto
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Toronto, Ontario
29 Employees
Year Founded: 2021

What We Do

Shakudo is an easy to use data platform that has everything that a data team needs to deliver products end-to-end and continuously adds new integrations that data teams want. By using Shakudo data teams become less reliant on engineers. Shakudo’s platform automates many common engineering and development tasks, and comes with built-in tools that simplify the process of scaling data solutions

Similar Jobs

Opendoor Logo Opendoor

Security Engineer

eCommerce • Fintech • Real Estate • Software • PropTech
Hybrid
Toronto, ON, CAN
1600 Employees

Maven Robotics Logo Maven Robotics

Infrastructure Engineer

Artificial Intelligence • Robotics
In-Office or Remote
8 Locations
14 Employees
In-Office
Oakville, ON, CAN
27053 Employees
79K-131K Annually
In-Office
Oakville, ON, CAN
9112 Employees
79K-131K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account