With a company culture rooted in collaboration, expertise and innovation, we aim to promote progress and inspire our clients, employees, investors and communities to achieve their greatest potential. Our work is the catalyst that helps others achieve their goals. In short, We Enable Possibility℠.
The Manager of DevOps Engineering is responsible for leading the design, implementation, and operational excellence of enterprise-scale CI/CD systems, infrastructure automation, and engineering. This role requires deep expertise in modern DevOps tooling, distributed systems, and cloud-native architectures, while also providing technical leadership to a team of DevOps Engineers
This individual will drive the evolution of our DevOps practices, ensuring automation-first delivery pipelines, hardened infrastructure, and highly available services that underpin mission-critical business applications.
Core Responsibilities
DevOps Engineering & CI/CD
- Own and scale enterprise-wide CI/CD pipelines using modern orchestration tools (e.g., GitHub Actions, CI, ArgoCD).
- Architect developer self-service platforms with Infrastructure-as-Code (IaC)
- Implement role-based access controls (RBAC) across Kubernetes, cloud IAM, and toolchains to ensure compliance and security.
- Build extensible automation frameworks enabling teams to provision, deploy, and monitor workloads with minimal friction.
- Evaluate and integrate next-generation CI/CD features such as ephemeral environments, policy-as-code enforcement, and test environment provisioning on demand.
- Establish and govern standardization of base container images, Helm charts, and deployment templates to promote consistency and reduce security drift across development teams.
Infrastructure & Cloud Automation
- Manage cloud-native infrastructure (Azure and AWS) with a focus on resiliency, scalability, and cost optimization as it pertains to product workloads.
- Lead adoption of Kubernetes and container orchestration platforms with advanced configuration (e.g., service mesh, Cilium, Calico, OPA/Gatekeeper).
- Standardize configuration management using Terraform, Terragrunt, or ArgoCD “Helm”, and integrate with CI/CD pipelines for immutable deployments.
- Optimize cloud spend and resource utilization by implementing advanced autoscaling strategies, rightsizing recommendations, and reserved instance/savings plan management using FinOps best practices.
Reliability & Observability
- Define, measure, and enforce Service Level Objectives (SLOs) and Service Level Agreements (SLAs) for platforms and services.
- Establish observability practices through metrics, distributed tracing, and logging using tools such as Prometheus, Grafana, ELK/EFK, and Dyantrace.
- Drive proactive capacity management, chaos testing, and resilience engineering to validate system recovery under failure scenarios.
- Advance the maturity of AIOps initiatives, leveraging machine learning techniques on telemetry data to predict and preempt potential service degradation.
Security & Compliance
- Integrate DevSecOps practices into pipelines (e.g., SysDig, Artifactory X-Ray, dependency scanning, container image hardening).
- Enforce least-privilege principles and manage secrets with tools like HashiCorp Vault, AWS Secrets Manager, Azure Vault, or Kubernetes Secrets.
- Ensure compliance with regulatory requirements and organization requirements.
- Manage secrets rotation, key generation, and Public Key Infrastructure (PKI) at scale, ensuring cryptographic best practices are applied across all environments.
Disaster Recovery & Business Continuity
- Architect and validate multi-region, multi-cloud disaster recovery strategies with automated failover testing.
- Design recovery procedures to minimize RTO/RPO and validate through game-day exercises.
- Document and evangelize clear runbooks and incident response plans for all major infrastructure platforms, supporting a 24/7 on-call rotation.
- Develop and automate failover testing for all distributed systems, ensuring minimal impact during simulated regional outages.
Leadership & Strategy
- Lead, mentor, and grow a high-performing DevOps team with a focus on engineering excellence and automation-first culture.
- Translate business requirements into technical roadmaps for DevOps platforms and reliability engineering initiatives.
- Collaborate with software engineering, security, and product leadership to align DevOps strategy with enterprise goals.
- Partner with vendors and service providers to evaluate, implement, and optimize third-party DevOps tooling.
Qualifications & Technical Expertise
- Proven expertise with cloud platforms (Azure and AWS required).
- Strong hands-on experience with Kubernetes, service mesh, and containerized deployments at scale.
- Deep knowledge of Infrastructure-as-Code (Terraform, Terragrunt) and configuration management.
- Proficiency in scripting/programming (Python, Go, Bash, PowerShell) for automation and tooling.
- Advanced understanding of CI/CD best practices, including GitOps workflows and progressive delivery (canary, blue/green).
- Familiarity with networking (VPC design, ingress/egress, load balancing, CNI plugins).
- Strong foundation in distributed systems, scaling, fault tolerance, and reliability engineering.
- Experience with observability stacks (Prometheus, Grafana, ELK, Dynatrace) and incident management.
- Understanding of modern security controls, secrets management, and compliance frameworks.
#LI-Hybrid
#LI-ZP1
For individuals assigned or hired to work in the location(s) indicated below, the base salary range is provided. Range is as of the time of posting. Position is incentive eligible.
$130,000 - $223,700/year
Total individual compensation (base salary, short & long-term incentives) offered will take into account a number of factors including but not limited to geographic location, scope & responsibilities of the role, qualifications, talent availability & specialization as well as business needs. The above pay range may be modified in the future.
Arch is committed to helping employees succeed through our comprehensive benefits package that includes multiple medical plans plus dental, vision and prescription drug coverage; a competitive 401k with generous matching; PTO beginning at 20 days per year; up to 12 paid company holidays per year plus 2 paid days of Volunteer Time Offer; basic Life and AD&D Insurance as well as Short and Long-Term Disability; Paid Parental Leave of up to 10 weeks; Student Loan Assistance and Tuition Reimbursement, Backup Child and Elder Care; and more. Click here to learn more on available benefits.
Do you like solving complex business problems, working with talented colleagues and have an innovative mindset? Arch may be a great fit for you. If this job isn’t the right fit but you’re interested in working for Arch, create a job alert! Simply create an account and opt in to receive emails when we have job openings that meet your criteria. Join our talent community to share your preferences directly with Arch’s Talent Acquisition team.
14400 Arch Insurance Group Inc.Skills Required
- Expertise with cloud platforms Azure and AWS
- Hands-on experience with Kubernetes, service mesh, and containerized deployments at scale (Cilium, Calico)
- Infrastructure-as-Code experience (Terraform, Terragrunt) and GitOps (ArgoCD)
- CI/CD pipeline design and orchestration experience (GitHub Actions, ArgoCD, progressive delivery: canary/blue-green)
- Proficiency in scripting/programming for automation (Python, Go, Bash, PowerShell)
- Observability and monitoring experience (Prometheus, Grafana, ELK/EFK, Dynatrace) and incident management
- Knowledge of security, secrets management, and DevSecOps tooling (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, SysDig, Artifactory X-Ray)
- Networking familiarity (VPC design, ingress/egress, load balancing, CNI plugins)
- Experience with disaster recovery, multi-region/multi-cloud failover and resilience engineering
- Leadership experience leading and mentoring DevOps/Platform engineering teams
- Experience applying FinOps best practices for cloud cost optimization
What We Do
Arch Capital Group Ltd. (Arch Capital or ACGL), a Bermuda public limited liability company, writes insurance and reinsurance on a worldwide basis through operations in Bermuda, the United States, Canada, Europe and Australia, with a focus on specialty lines. Arch Capital Services LLC is owned by ACGL and provides corporate, legal and other support services to Arch Capital. ACGL provides insurance, reinsurance and mortgage insurance on a worldwide basis through operations in Bermuda, the United States, Canada, Europe, Australia and Hong Kong.








