Lead Site Reliability Engineer

Reposted 6 Days Ago
Hiring Remotely in United States
Remote
170K-200K Annually
Senior level
Software
The Role
Lead SRE to define SRE strategy, architecture, and roadmap; design and operate containerized, compliant cloud environments; build observability, incident management, automation, and developer platform capabilities; mentor SRE team and collaborate with security, compliance, and product teams to ensure reliability at scale.
Summary Generated by Built In
Mattermost is the leading collaborative workflow platform for defense, intelligence, security, and critical infrastructure. Trusted by the U.S. Department of War and Fortune 500s, our platform runs on-premises and in private clouds, delivering secure messaging, file sharing, workflow automation, audio/screenshare, and project management—all with full data and operational control. Mattermost powers high-stakes workflows across mission planning, real-time, real-world operations, DevSecOps, incident response, and cyber defense—enabling secure collaboration from tactical edge and DDIL environments to enterprise HQ. Teams operate across web, desktop, and mobile, with embedded interoperability for Microsoft Teams, Outlook, and Microsoft 365.
To learn more, visit www.mattermost.com

Mattermost is seeking an experienced and visionary Lead Site Reliability Engineer (SRE) to guide the architecture, reliability, and operational excellence of the infrastructure powering our secure, mission-critical collaboration platform. 

In this role, you will provide technical leadership across our SRE function, driving strategic initiatives for scalability, observability, performance, and automation across cloud and hybrid environments. You will mentor engineers, establish best practices, and collaborate closely with development, security, and operations teams to ensure our customers in defense, government, and critical infrastructure sectors experience exceptional reliability and performance. 

Responsibilities Include:

  • Define the strategy, architecture, and roadmap for Mattermost’s site reliability engineering function, aligning infrastructure initiatives with product and business goals. 
  • Lead the design, deployment, and optimization of production-grade containerized workloads, infrastructure-as-code, and compliant cloud environments for regulated domains (e.g., FedRAMP, DoD). 
  • Establish and evolve observability, monitoring, and alerting frameworks to ensure performance, reliability, and capacity planning at scale. 
  • Drive incident management processes, including on-call rotations, root cause analysis, and systemic reliability improvements. 
  • Partner with security and compliance teams to meet data sovereignty, security, and regulatory requirements. 
  • Champion automation and operational excellence to improve efficiency, reduce risk, and scale operations. 
  • Oversee cloud cost management and capacity planning to optimize infrastructure spending while meeting performance targets. 
  • Build and maintain a developer platform that enables fast, secure software delivery and improves application stability in production. 
  • Mentor and coach SRE team members, fostering a culture of learning, collaboration, and technical excellence. 

 Requirements:

  • BS in Computer Science, Cybersecurity, Software Engineering, or a related technical field, or equivalent experience, with 5+ years of relevant experience in site reliability engineering, DevOps, or cloud infrastructure roles. 
  • Proven expertise in container orchestration platforms, ideally Kubernetes. 
  • Extensive experience with infrastructure-as-code, ideally Terraform. 
  • Strong background in cloud platforms, ideally AWS. 
  • Demonstrated experience designing and implementing monitoring, alerting, and performance optimization strategies. 
  • Exceptional troubleshooting and incident management skills for distributed systems. 
  • Proficiency in at least one scripting or programming language for automation. 
  • Excellent communication skills with a track record of influencing cross-functional teams. 
  • Experience leading globally distributed teams in a remote-first environment. 

 Preferences:

  • Familiarity with observability stacks such as Grafana and Prometheus. 
  • Experience designing high-availability, disaster recovery, and scaling architectures. 
  • Exposure to GCP and Azure cloud environments. 
  • Leadership experience in highly regulated industries such as defense, finance, or critical infrastructure. 
  • Experience with U.S. federal compliance frameworks and authorization processes, including FedRAMP, DoD ATO, NIST 800-53, and related government standards. 
  • Experience preparing, delivering, and maintaining software offerings through AWS Marketplace and other cloud provider marketplaces (e.g., Azure Marketplace, Google Cloud Marketplace), including packaging, compliance validation, and ongoing operational support. 
  • Open-source contributions in reliability, DevOps, or infrastructure tooling. 
  • Certifications in cloud infrastructure, reliability, or DevOps engineering (e.g., CKA, CKAD, AWS Certified Solutions Architect). 

Compensation 

Salary range: $145,000 – $200,000

Mattermost takes a market-based approach to pay. Compensation is determined based on skills, experience, qualifications, and work location. Ranges may be updated as market conditions evolve..

U.S. Eligibility & Compliance

This role may require obtaining and maintaining a U.S. government security clearance. Candidates must meet federal eligibility requirements to be considered. For more information visit Security Clearances — United States Department of State

Applicants must meet eligibility requirements for access to export-controlled information as defined by U.S. export control laws, including EAR and ITAR. For more information visit the Bureau of Industry and Security and the Directorate of Defense Trade Controls.


 

Mattermost is an EEO Employer, we are a remote-first, open-source company.
 
We are continually working to expand our hiring in more countries and regions, ensuring compliance with local laws and regulations, which takes time.
 
Mattermost values your unique perspective—we welcome all applicants. We encourage individuals from all backgrounds to apply and are committed to assessing candidates based on their skills and qualifications. We do not tolerate discrimination against staff or applicants based on race, religion, national origin, age, disability, pregnancy status, veteran status, or other personal characteristics.
 
If you require accommodations during the interview process, please let us know—we’re happy to assist.

Skills Required

  • BS in Computer Science, Cybersecurity, Software Engineering, or related technical field, or equivalent experience
  • 5+ years relevant experience in site reliability engineering, DevOps, or cloud infrastructure roles
  • Proven expertise in container orchestration platforms (ideally Kubernetes)
  • Extensive experience with infrastructure-as-code (ideally Terraform)
  • Strong background in cloud platforms (ideally AWS)
  • Designing and implementing monitoring, alerting, and performance optimization strategies
  • Exceptional troubleshooting and incident management skills for distributed systems
  • Proficiency in at least one scripting or programming language for automation
  • Excellent communication skills with track record influencing cross-functional teams
  • Experience leading globally distributed teams in a remote-first environment
  • Ability to obtain and maintain a U.S. government security clearance in the future (U.S. applicants must be U.S. citizens and eligible)
  • Eligibility for access to export-controlled information per U.S. export control laws (EAR, ITAR)
  • Familiarity with observability stacks such as Grafana and Prometheus
  • Experience designing high-availability, disaster recovery, and scaling architectures
  • Exposure to GCP and Azure cloud environments
  • Leadership experience in highly regulated industries (defense, finance, critical infrastructure)
  • Experience with U.S. federal compliance frameworks and authorization processes (FedRAMP, DoD ATO, NIST 800-53)
  • Experience preparing and maintaining offerings on cloud provider marketplaces (AWS, Azure, Google)
  • Open-source contributions in reliability, DevOps, or infrastructure tooling
  • Certifications in cloud infrastructure, reliability, or DevOps engineering (e.g., CKA, CKAD, AWS Certified Solutions Architect)
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Palo Alto, CA
165 Employees
Year Founded: 2011

What We Do

Mattermost’s mission is to make the world safer and more productive by developing and delivering secure, open source collaboration software that is trusted, flexible and offers fast time-to-value. Mattermost’s first product is a collaboration platform built to accelerate DevOps workflows in high-trust environments by offering secure messaging across web, desktop and native mobile devices. www.mattermost.com

Similar Jobs

Airwallex Logo Airwallex

Senior Site Reliability Engineer

Artificial Intelligence • Fintech • Payments • Business Intelligence • Financial Services • Generative AI
Remote or Hybrid
San Francisco, CA, USA
2200 Employees
Remote
United States
1233 Employees

Launch Potato Logo Launch Potato

Site Reliability Engineer

AdTech • Big Data • Consumer Web • Digital Media • Marketing Tech
Remote
United States
160 Employees

Launch Potato Logo Launch Potato

Site Reliability Engineer

AdTech • Big Data • Consumer Web • Digital Media • Marketing Tech
Remote
United States
160 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
31 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account