Job Summary
We are seeking a highly experienced Senior / Lead Linux Engineering Support Engineer to lead and develop a small team supporting engineering systems within a fast-paced AI-focused environment.
This role combines deep Linux expertise with strong leadership, automation, and DevOps practices to ensure systems are reliable, scalable, and supportable at scale. A key aspect of the role is establishing and operating within a configuration-as-code environment, where system configuration and operational processes are managed through automation, pipelines, and source control rather than manual administration.
You will be responsible for leading incident response, driving operational improvements, and setting standards for how Linux systems are managed and supported across the organisation.
While the role includes leadership responsibilities, it will initially require a hands-on approach, including direct involvement in troubleshooting, system support, and automation efforts, while building team capability and scaling processes.
Working closely with engineering teams, platform engineers, and infrastructure specialists, you will ensure systems remain stable, performant, and aligned with evolving business and product delivery needs.
The Team
You’ll be joining a multi-disciplinary team with strong technical skills and a very supportive culture. We work closely together, regularly share knowledge, and your skills will make a direct impact on our business. It’s an exciting and pivotal moment for us right now, with plenty of new projects ahead. If you're looking to solve interesting problems and see your work deliver real-world results, this is the team for you.
Responsibilities and Duties
- Lead, mentor, and develop a team of Linux Engineering Support Engineers, establishing clear roles, responsibilities, and ways of working
- Own and oversee support for Linux-based systems and engineering environments, ensuring stability, performance, and availability
- Act as an escalation point for complex technical issues and outages, providing hands-on support where required
- Diagnose and resolve high-impact system and interoperability issues across mixed and distributed environments
- Perform hands-on investigation and troubleshooting to understand issues and drive effective solutions
- Lead incident response activities, including triage, coordination, and resolution
- Own and drive Root Cause Analysis (RCA) processes, ensuring preventative improvements are identified and implemented
- Establish and improve incident management processes, driving operational maturity and reliability
- Drive adoption of automation and configuration-as-code practices across Linux systems
- Ensure system changes are delivered through controlled, auditable processes wherever possible
- Oversee development and implementation of automation solutions for system management and operational tasks
- Promote and enforce use of Git-driven workflows and CI/CD pipelines for configuration and operational processes
- Identify and prioritise opportunities to reduce manual effort through automation and improved tooling
- Work closely with engineering teams to support development environments and system requirements
- Act as a senior technical liaison between engineering teams and infrastructure/platform functions
- Support onboarding of new systems, services, and environments using standardised and automated approaches
- Ensure system configurations remain consistent and aligned with defined standards and governance
- Oversee integration points (e.g. identity, CI/CD, tooling) and ensure issues are resolved effectively
- Identify and drive improvements in system performance, scalability, and maintainability
- Contribute to and enforce documentation, standards, and operational best practices
- Ensure systems meet audit, compliance, and governance requirements, with full traceability of changes
Essential
- Extensive experience administering and supporting Linux-based systems in complex technical or engineering environments
- Strong troubleshooting skills across operating systems, networking, storage, and application layers
- Proven experience diagnosing and resolving complex technical issues, including across mixed or distributed environments
- Proven experience handling major incidents and outages, including leading resolution and contributing to Root Cause Analysis (RCA)
- Strong experience with automation and scripting (e.g. Bash, Python, or similar)
- Strong experience with configuration management or infrastructure-as-code tools (e.g. Ansible, Terraform, Puppet, or similar)
- Experience working with configuration-as-code practices and Git-driven workflows
- Experience designing, implementing, or supporting CI/CD pipelines for configuration and operational processes
- Strong understanding of system interoperability across distributed environments
- Experience working within defined standards, governance frameworks, and controlled processes
- Strong communication skills and ability to work closely with engineering, platform, and infrastructure teams
- Experience mentoring or supporting the development of other engineers
- Ability to operate effectively across time zones in a distributed organisation
- Proven ability to operate independently, set direction, and deliver outcomes
Desirable
- Experience leading or coordinating incident response activities
- Experience working alongside DevOps, platform, or infrastructure engineering teams
- Experience with monitoring, observability, and logging systems
- Experience supporting AI/ML or high-performance computing environments
- Understanding of identity and access management concepts
- Experience building or scaling operational processes or support functions
- Experience administering and supporting Linux-based systems in a technical or engineering environment
Skills Required
- Verification experience in relevant industry
- Proven leadership and planning skills
- Experience of the verification process applied in CPU and/or ASIC environments
- Ability to work across teams and programming languages
Graphcore Compensation & Benefits Highlights
-
Healthcare Strength — Health coverage includes medical and dental insurance, with US plans through Cigna and Kaiser, HDHP options with employer‑funded HSA contributions, a health cash plan, EAP access, and dedicated mental‑health support. These provisions extend to family options in some regions, reinforcing broad medical and wellbeing support.
-
Retirement Support — Retirement programs include a UK pension match up to 5% and a US 401(k) with a 100% company match up to 6% (with a true‑up). This pairing signals strong, predictable long‑term savings support across key locations.
-
Leave & Time Off Breadth — Time‑off policies feature “unlimited” holiday in the UK and flexible, generous PTO with paid US holidays. Paid family leave for birthing parents and bonding further broadens time‑away support.
Graphcore Insights
What We Do
At Graphcore, we’re building the future of AI compute. We’re a team of semiconductor, software and AI experts, with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale. As part of the SoftBank Group, backed by significant long-term investment, we are delivering key technology into the fast-growing SoftBank AI ecosystem. To meet the vast and exciting AI opportunity, Graphcore is expanding its teams around the world. We are bringing together the brightest minds to solve the toughest problems, in a place where everyone has the opportunity to make an impact on the company, our products and the future of artificial intelligence.
Why Work With Us
Our team is at the forefront of the machine intelligence revolution, enabling innovators from all industries to build AI-native products to expand human potential. What we do at Graphcore really makes a difference.
Gallery
Graphcore Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.
At Graphcore, we value wellbeing and flexibility to support a healthy work/life balance. Our hybrid approach encourages office-based colleagues to work onsite three days a week, with trusted flexibility built on trust and transparency for everyone.





