Director, Platform Management & Observability

Sorry, this job was removed at 04:12 p.m. (CST) on Friday, Jul 18, 2025
Easy Apply
Be an Early Applicant
2 Locations
In-Office
Artificial Intelligence • Semiconductor
Joining Graphcore gives you a seat at the top-table, shaping the future of Artificial Intelligence.
The Role
About Graphcore

How often do you get the chance to build a technology that transforms the future of humanity? Graphcore products have set the standard in made-for-AI compute hardware and software, gaining global attention and industry acclaim. Now we are developing the next generation of artificial intelligence compute with systems that will allow AI researchers to develop more advanced models, help scientists unlock exciting new discoveries, and power companies around the world as they put AI at the heart of their business. Graphcore recently joined SoftBank Group – bringing large and ongoing investment from one of the world’s leading backers of innovative AI companies.  

Job Summary 

As the engineering Director for Platform Management & Observability, you will be responsible for building, managing and guiding a team of talented engineers focussed on the architecture, implementation and deployment of highly scalable management solutions for AI infrastructure built using our next-generation products.  Covering monitoring, observability, control, and data centre infrastructure management, you will work closely with software, cloud and customer-facing teams, to establish first-hand knowledge of these solutions, enabling the creation of proof-of-concepts, reference designs and integrations with third-party tooling. 

You team will work closely with product, architecture and other delivery teams to ensure that functionally complete, simple to deploy, and easy to use solutions are deployed internally to support engineering efforts and supply reference designs to our customers. 

 

Responsibilities and Duties  
  • Manage a team contributing to all phases of overall product development, from product definition, architecture, and design, through implementation, debugging, testing and early customer support. 
  • Deliver and manage the operation of an internal management & observability service for use by engineering teams to aid debugging, performance analysis, benchmarking, test/QA, etc. of our systems from system bring-up thru customer release, at all scales. 
  • Foster evaluations of new technologies and innovation to both anticipate future customer needs and develop a strategy for Graphcore data center management solutions. 
  • Take ownership of rapidly prioritizing team objectives in response to dynamic business objectives. 
  • Identify and act on opportunities for process improvement within the team, leading initiatives to enhance team efficiency and improve quality. 
  • Work with product management, other engineering team leads, our customer-facing teams, and internal customers to ensure timely delivery of team deliverables. 
  • Champion quality by ensuring solutions are continually and thoroughly tested. 
  • Work with senior management to establish strategic plans and objectives.
  • Mentor and guide junior engineers; coach managers and team leads.
  • Foster a culture of continuous learning and improvement. 
Skills and Experience 
  • BSc or MSc degree in Computer Engineering, Computer Science, or related degree or equivalent experience. 
  • Proven experience with over 10 years of managing engineering teams. 
  • Comfortable working on complex issues where problems are not clearly defined and where fundamental principles do not fully apply. 
  • Detail-oriented and comfortable with multitasking in a dynamic environment with shifting priorities and changing requirements. 
  • Strong analytical, creative, and problem-solving skills. 
  • Excellent written and verbal communication skills. 
  • Experience in the use of Jira and Confluence for project management. 
Desirable: 
  • 14+ years of relevant post-degree experience. 
  • Familiarity with component technologies, such as Prometheus, Grafana, OpenTelemetry, Clickhouse, Kafka, Superset, in addition to common integrated stacks such as Elastic Stack, Better Stack, LGTM. 
  • Familiarity with commercial observability solutions like Datadog, Dynatrace and Splunk. 
Benefits

In addition to a competitive salary, Graphcore offers flexible working, a generous annual leave policy, private medical insurance and health cash plan, a dental plan, pension (matched up to 5%), life assurance and income protection. We have a generous parental leave policy and an employee assistance programme (which includes health, mental wellbeing, and bereavement support). We offer a range of healthy food and snacks at our central Bristol office and have our own barista bar! We welcome people of different backgrounds and experiences; we’re committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments.

Applicants for this position must hold the right to work in the UK. Unfortunately at this time, we are unable to provide visa sponsorship or support for visa applications

What the Team is Saying

Monika
Dionysia
Dave

Similar Jobs

Graphcore Logo Graphcore

Software Engineer

Artificial Intelligence • Semiconductor
Hybrid
Bristol, England, GBR
488 Employees

Graphcore Logo Graphcore

Software Infrastructure Kubernetes Engineer

Artificial Intelligence • Semiconductor
Hybrid
Bristol, England, GBR
488 Employees

Graphcore Logo Graphcore

Software Engineer

Artificial Intelligence • Semiconductor
Hybrid
Bristol, England, GBR
488 Employees

Graphcore Logo Graphcore

Infrastructure and MLOps Engineer

Artificial Intelligence • Semiconductor
Hybrid
Bristol, England, GBR
488 Employees
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Bristol
488 Employees
Year Founded: 2016

What We Do

At Graphcore, we’re building the future of AI compute.

We’re a team of semiconductor, software and AI experts, with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale.

As part of the SoftBank Group, backed by significant long-term investment, we are delivering key technology into the fast-growing SoftBank AI ecosystem.

To meet the vast and exciting AI opportunity, Graphcore is expanding its teams around the world.

We are bringing together the brightest minds to solve the toughest problems, in a place where everyone has the opportunity to make an impact on the company, our products and the future of artificial intelligence.

Why Work With Us

Our team is at the forefront of the machine intelligence revolution, enabling innovators from all industries to build AI-native products to expand human potential. What we do at Graphcore really makes a difference.

Gallery

Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery
Gallery

Graphcore Offices

Hybrid Workspace

Employees engage in a combination of remote and on-site work.

At Graphcore, we value wellbeing and flexibility to support a healthy work/life balance. Our hybrid approach encourages office-based colleagues to work onsite three days a week, with trusted flexibility built on trust and transparency for everyone.

Typical time on-site: 3 days a week
HQHeadquarters
Austin Office
Bengaluru Office
Cambridge Office
Gdańsk Office
Hsinchu Office
London Office
Learn more

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account