You will be a core contributor to our cloud infrastructure and delivery engineering practice. As a DevOps Engineer, you will own the design and operation of our multi-cloud environments across Azure, AWS, and GCP — building the pipelines, platforms, and automation that empower development teams to ship with speed and confidence. This is a hands-on, high-ownership role within a small-to-medium software team. You will be expected to bring strong opinions, take initiative, and continuously improve how we build and operate our systems.
What You'll Do:
Design, build, and maintain CI/CD pipelines that support continuous delivery across multiple cloud platforms (Azure DevOps, GitHub Actions, GitLab CI)
Architect and manage cloud infrastructure across Azure, AWS, and GCP using Infrastructure as Code (Terraform, Bicep, CloudFormation)
Manage containerized application workloads using Kubernetes (AKS, EKS, or GKE) and Docker
Implement and maintain cloud security best practices: IAM policies, network segmentation, secrets management, vulnerability scanning
Design and maintain observability stacks — logging, metrics, alerting — using tools such as Azure Monitor, CloudWatch, Datadog, or Grafana/Prometheus
Collaborate with software and ML engineering teams to define deployment strategies, optimize release pipelines, and reduce deployment risk
Evaluate and introduce tooling improvements that enhance reliability, scalability, and developer productivity
Contribute to incident response and post-mortem processes, driving root cause analysis and corrective actions
Build and maintain internal documentation on infrastructure architecture, operational runbooks, and DR procedures
Mentor junior team members and provide technical guidance on cloud and DevOps best practices
What You Bring:
Degree or equivalent work experience in Computer Science, Systems Engineering, or a related discipline
3–6 years of progressive DevOps, cloud engineering, or site reliability engineering experience
Strong hands-on experience with at least two of: Azure, AWS, GCP — multi-cloud exposure is highly valued
Proven experience building and maintaining CI/CD pipelines in production environments
Proficiency with Infrastructure as Code: Terraform required; Bicep, Pulumi, or CDK are a plus
Solid Kubernetes experience: cluster management, Helm charts, workload scaling, networking
Scripting fluency in Python, Bash, or PowerShell for automation and tooling
Experience implementing cloud security controls: IAM, RBAC, network policies, key management
Understanding of software delivery lifecycle and agile development practices
Strong troubleshooting ability across networking, compute, storage, and application layers
Desirable:
Relevant cloud certifications: AZ-104 / AZ-400, AWS Solutions Architect, GCP Professional Cloud Architect
Experience supporting ML/AI workloads: GPU clusters, model deployment pipelines, MLflow, Kubeflow
Background in GitOps practices using ArgoCD or Flux
Experience with service mesh technologies (Istio, Linkerd)
Exposure to FinOps principles and cloud cost optimization
Prior experience in a startup or scale-up software environment
Familiarity with compliance frameworks relevant to Canadian tech companies (SOC 2, PIPEDA)
Skills Required
- 3-6 years of progressive DevOps, cloud engineering, or site reliability engineering experience
- Degree or equivalent work experience in Computer Science, Systems Engineering, or a related discipline
- Strong hands-on experience with at least two of Azure, AWS, GCP
- Proficiency with Infrastructure as Code: Terraform required; Bicep, Pulumi, or CDK are a plus
- Solid Kubernetes experience
- Experience implementing cloud security controls: IAM, RBAC, network policies, key management
What We Do
AltaML is a leading developer of AI-powered solutions. Working with organizations that want to leverage their data using artificial intelligence (AI), AltaML develops solutions that create operational efficiency, reduce risk, and generate new sources of revenue. Through a deep understanding of organizational pain points and challenges, AltaML develops solutions that encompass the entire machine learning (ML) life cycle, from evaluating potential use cases and determining feasibility, to piloting solutions, putting code into production, and ensuring models evolve over time.








