Cloud Platform DevOps Engineer - Assistant Vice President

Reposted 12 Days Ago
Be an Early Applicant
Mississauga, ON, CAN
In-Office
94K-142K Annually
Senior level
Fintech • Financial Services
The Role
The role involves enhancing the stability and performance of AI and DevOps platforms, leading design and implementation of infrastructure, managing databases and messaging systems, and driving DevOps practices within the team.
Summary Generated by Built In

We are seeking an experienced (5+ years), motivated, and hands-on Cloud Platform DevOps Engineer to join our North American AI and DevOps Platform Engineering team. In this critical role, you will be responsible for enhancing the stability, reliability, and performance of our AI and DevOps platforms, which support a diverse ecosystem of AI applications, developer tools, and CI/CD pipeline technologies across the organization. You will actively contribute to infrastructure design, implementation, and maintenance, and facilitate agile development within the team. The ideal candidate is a strong technical leader who champions agile practices, drives continuous improvement, and excels in both coding and coaching, possessing a deep understanding of infrastructure and operational considerations for Artificial Intelligence and Machine Learning initiatives, with proven hands-on experience in DevOps tools and technologies such as Kubernetes, Docker, HELM, Ansible, DevOps tools, or similar CI/CD platforms, and proficiency in scripting and automation (e.g., Python, Bash). We are looking for someone with a track record of implementing scalable, resilient, and high-performance solutions, coupled with strong communication and collaboration skills, and an ability to mentor and guide junior team members, as you join a dynamic team committed to fostering innovation and collaboration.

Responsibilities:

Hands-on DevOps & Infrastructure Engineering

  • Design & Implementation: Lead the design, implementation, and ongoing management of secure, scalable, and resilient infrastructure components.

  • Secret & Certificate Management: Administer and maintain secret and certificate management solutions using HashiCorp Vault, including policy definition and integration.

  • Database Management: Perform hands-on administration and optimization of database systems (PostgreSQL, Oracle, MongoDB), including performance tuning, backup, and recovery strategies.

  • Workflow Orchestration: Deploy, monitor, and troubleshoot data orchestration workflows using Apache Airflow, and develop/optimize DAGs.

  • Messaging Systems: Implement and manage messaging queues such as Kafka and IBM MQ, including cluster setup and configuration.

  • API Integrations: Develop, maintain, and troubleshoot RESTful API and SOAP integrations critical for system connectivity.

  • Build Automation: Implement and optimize build and deployment processes using Gradle.

  • Container Orchestration: Design, implement, and manage container orchestration platforms with Kubernetes and Helm, including integration with CyberArk and HashiCorp for secrets management. Create, debug, and troubleshoot Kubernetes PODs, Jobs, and Deployments using YAML.

  • Storage Management: Configure and manage persistent storage solutions including PVC, SONiC NAS, and S3, with an awareness of storage requirements for AI/ML workloads.

  • Networking & Load Balancing: Set up and maintain load balancing solutions (e.g., Nginx, HAProxy, AWS ELB/ALB, Kubernetes Ingress controllers) for high availability and performance.

  • Monitoring & Logging: Implement, configure, and utilize comprehensive monitoring and logging solutions (Prometheus, Grafana, ELK Stack) to ensure system health and proactively identify issues, including those relevant to AI/ML applications.

  • Automation & Scripting: Develop robust automation scripts and tools using Python, Bash, Go, or similar languages to streamline operations and enhance efficiency.

  • Incident Response: Participate actively in on-call rotations, responding to and resolving critical incidents with hands-on troubleshooting.

  • Documentation: Create and maintain technical documentation, architecture diagrams, and runbooks for infrastructure components and processes.

  • Impediment Resolution: Proactively identify and resolve technical impediments and process bottlenecks within the team and across organizational boundaries, paying special attention to unique challenges posed by AI/ML infrastructure.

  • Backlog Refinement: Collaborate closely with stakeholders (e.g., product owners, technical leads) to ensure a well-defined and prioritized backlog for infrastructure work, technical debt, operational improvements, and AI/ML platform needs.

  • Process Improvement: Drive continuous improvement in the team's agile and DevOps practices, helping them adapt and optimize their workflow for maximum efficiency and quality.

Required Qualifications:

Hands-on DevOps & Infrastructure Engineering Expertise

  • Secret & Certificate Management: Proven hands-on experience with HashiCorp Vault (installation, configuration, policy management, integrations).

  • Database Administration: Strong hands-on experience with at least two of PostgreSQL, Oracle, or MongoDB (installation, tuning, replication, backup/restore).

  • Workflow Orchestration: Hands-on experience deploying, managing, and developing DAGs for Apache Airflow.

  • Messaging Systems: Solid hands-on experience with Kafka and/or IBM MQ (cluster setup, topic management, producer/consumer configuration).

  • Container Orchestration: In-depth hands-on experience with Kubernetes and Helm, including YAML configuration, troubleshooting PODs/Jobs/Deployments, and integrations with secrets management (CyberArk, HashiCorp).

  • Storage Management: Practical experience with Kubernetes PVCs, Persistent Volumes, S3, and/or enterprise NAS solutions (e.g., SONiC NAS).

  • Monitoring & Logging: Strong hands-on experience with Prometheus, Grafana, and the ELK Stack (setup, dashboard creation, query optimization, alert configuration).

  • Scripting & Automation: High proficiency in Python, Bash, or Go for automation, tooling development, and system administration.

  • Cloud Platforms: Extensive hands-on experience with at least one major cloud provider (AWS, Azure, GCP).

  • Infrastructure as Code (IaC): Proficiency with IaC tools such as Terraform or Ansible.

  • CI/CD: Experience designing, implementing, and maintaining CI/CD pipelines (e.g., Jenkins, GitLab CI, GitHub Actions).

  • API Integration: Experience with RESTful API and SOAP web services.

  • Build Tools: Proficiency with Gradle for build automation.

AI/ML Awareness & Support

  • AI/ML Infrastructure Concepts: Understanding of the specific infrastructure requirements for deploying, managing, and scaling Artificial Intelligence and Machine Learning workloads (e.g., GPU resources, specialized storage, MLOps pipelines).

  • Data for AI/ML: Awareness of data management strategies and data governance principles relevant to AI/ML models and training datasets.

  • Monitoring AI/ML Systems: Familiarity with metrics and monitoring approaches for the performance and health of AI/ML applications and their underlying infrastructure.

Agile & Leadership Skills

  • Working Scrum Master Experience: Proven experience acting as a Scrum Master within a technical team where you also performed significant hands-on engineering.

  • Agile & Scrum Mastery: In-depth knowledge and practical application of Agile principles and the Scrum framework.

  • Facilitation & Coaching: Excellent facilitation, coaching, and mentoring skills within a technical context.

  • Communication: Strong verbal and written communication skills, able to bridge technical and process discussions.

  • Technical Leadership: Ability to guide technical discussions, influence architectural decisions, and drive best practices.

Preferred Qualifications:

  • Certified ScrumMaster (CSM) or Professional Scrum Master (PSM) certification.

  • Relevant cloud certifications (e.g., AWS Certified DevOps Engineer, Azure DevOps Engineer Expert, GCP Professional Cloud DevOps Engineer).

  • Experience with site reliability engineering (SRE) principles and practices.

  • Familiarity with other Agile scaling frameworks (e.g., SAFe, LeSS).

  • Exposure to MLOps platforms or tools (e.g., Kubeflow, MLflow).

Education:

  • Bachelor's or Master's degree in computer science, Engineering, or a related technical field or equivalent experience

------------------------------------------------------

Job Family Group: Technology

------------------------------------------------------

Job Family:Applications Development

------------------------------------------------------

Time Type:Full time

------------------------------------------------------

Primary Location Full Time Salary Range:$94,300.00 - $141,500.00

------------------------------------------------------

Most Relevant Skills Please see the requirements listed above.

------------------------------------------------------

Other Relevant Skills For complementary skills, please see above and/or contact the recruiter.

------------------------------------------------------

Automated Processing and AI

We use automated processing, including artificial intelligence, for our legitimate business interests (or our reasonable and appropriate business purposes) to identify and align the candidate's skills and abilities with a specific job opening. Additionally, if you so choose, or consent, we can match your skills and abilities to other suitable roles at Citi.

Importantly, all our hiring processes and decisions, including determining your suitability for a role, are conducted, checked, and decided by individuals. Our automated processing and AI do not involve relying on automatic or autonomous decision-making. Please refer to any Jurisdictional Considerations, with specific provisions for your country (where relevant) for further details.

------------------------------------------------------

This job opening is for an existing job vacancy.

------------------------------------------------------

Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.

 

If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.

Skills Required

  • 5+ years of hands-on DevOps & Infrastructure Engineering experience
  • Proven experience with HashiCorp Vault
  • Hands-on experience with Postgres, Oracle, or MongoDB
  • Experience deploying and managing Apache Airflow
  • Solid experience with Kafka and/or IBM MQ
  • In-depth hands-on experience with Kubernetes and Helm
  • Practical experience with Kubernetes PVCs, S3, and enterprise NAS solutions
  • Strong experience with Prometheus, Grafana, and the ELK Stack
  • High proficiency in scripting with Python or Bash
  • Experience with major cloud platforms (AWS, Azure, GCP)
  • Proficiency with IaC tools such as Terraform or Ansible
  • Experience with CI/CD pipelines
  • Experience with RESTful API and SOAP web services
  • Proficiency with Gradle for build automation

Citi Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Citi and has not been reviewed or approved by Citi.

  • Healthcare Strength Benefits coverage is positioned as comprehensive, including health, dental, and vision insurance plus on-site clinics, prescription drug support, and disability coverage. Family-building support such as fertility assistance is described as a notable differentiator within the overall package.
  • Retirement Support Retirement benefits are framed as strong, highlighted by a 401(k) with matching and additional plan options like a Roth 401(k). Financial support is reinforced through discounts and broader financial guidance resources tied to the benefits ecosystem.
  • Wellbeing & Lifestyle Benefits Wellbeing support extends beyond insurance through programs like an Employee Assistance Program, counseling/legal resources, and gym or wellness reimbursement. These offerings increase the perceived total rewards value even when cash compensation sentiment varies by role.

Citi Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Kwun Tong, Kowloon
223,850 Employees

What We Do

Citi's mission is to serve as a trusted partner to our clients by responsibly providing financial services that enable growth and economic progress. Our core activities are safeguarding assets, lending money, making payments and accessing the capital markets on behalf of our clients. We have 200 years of experience helping our clients meet the world's toughest challenges and embrace its greatest opportunities. We are Citi, the global bank – an institution connecting millions of people across hundreds of countries and cities.

Similar Jobs

Hybrid
Toronto, ON, CAN
897 Employees

Inspiren Logo Inspiren

Senior Data Scientist

Artificial Intelligence • Hardware • Healthtech • Software
Easy Apply
In-Office or Remote
3 Locations
150 Employees
170K-200K Annually

ZS Logo ZS

Consultant

Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting
Hybrid
Toronto, ON, CAN
15000 Employees
120K-136K Annually

Mastercard Logo Mastercard

Lead Data Scientist

Blockchain • Fintech • Payments • Consulting • Cryptocurrency • Cybersecurity • Quantum Computing
Hybrid
Toronto, ON, CAN
38800 Employees
127K-203K Annually

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account