How often do you get the chance to build a technology that transforms the future of humanity? Graphcore products have set the standard in made-for-AI compute hardware and software, gaining global attention and industry acclaim. Now we are developing the next generation of artificial intelligence compute with systems that will allow AI researchers to develop more advanced models, help scientists unlock exciting new discoveries, and power companies around the world as they put AI at the heart of their business. Graphcore recently joined SoftBank Group – bringing large and ongoing investment from one of the world’s leading backers of innovative AI companies.
Job SummaryAs the engineering Director for Platform Management & Observability, you will be responsible for building, managing and guiding a team of talented engineers focussed on the architecture, implementation and deployment of highly scalable management solutions for AI infrastructure built using our next-generation products. Covering monitoring, observability, control, and data centre infrastructure management, you will work closely with software, cloud and customer-facing teams, to establish first-hand knowledge of these solutions, enabling the creation of proof-of-concepts, reference designs and integrations with third-party tooling.
You team will work closely with product, architecture and other delivery teams to ensure that functionally complete, simple to deploy, and easy to use solutions are deployed internally to support engineering efforts and supply reference designs to our customers.
Responsibilities and Duties
- Manage a team contributing to all phases of overall product development, from product definition, architecture, and design, through implementation, debugging, testing and early customer support.
- Deliver and manage the operation of an internal management & observability service for use by engineering teams to aid debugging, performance analysis, benchmarking, test/QA, etc. of our systems from system bring-up thru customer release, at all scales.
- Foster evaluations of new technologies and innovation to both anticipate future customer needs and develop a strategy for Graphcore data center management solutions.
- Take ownership of rapidly prioritizing team objectives in response to dynamic business objectives.
- Identify and act on opportunities for process improvement within the team, leading initiatives to enhance team efficiency and improve quality.
- Work with product management, other engineering team leads, our customer-facing teams, and internal customers to ensure timely delivery of team deliverables.
- Champion quality by ensuring solutions are continually and thoroughly tested.
- Work with senior management to establish strategic plans and objectives.
- Mentor and guide junior engineers; coach managers and team leads.
- Foster a culture of continuous learning and improvement.
- BSc or MSc degree in Computer Engineering, Computer Science, or related degree or equivalent experience.
- Proven experience with over 10 years of managing engineering teams.
- Comfortable working on complex issues where problems are not clearly defined and where fundamental principles do not fully apply.
- Detail-oriented and comfortable with multitasking in a dynamic environment with shifting priorities and changing requirements.
- Strong analytical, creative, and problem-solving skills.
- Excellent written and verbal communication skills.
- Experience in the use of Jira and Confluence for project management.
- 14+ years of relevant post-degree experience.
- Familiarity with component technologies, such as Prometheus, Grafana, OpenTelemetry, Clickhouse, Kafka, Superset, in addition to common integrated stacks such as Elastic Stack, Better Stack, LGTM.
- Familiarity with commercial observability solutions like Datadog, Dynatrace and Splunk.
In addition to a competitive salary, Graphcore offers flexible working, a generous annual leave policy, private medical insurance and health cash plan, a dental plan, pension (matched up to 5%), life assurance and income protection. We have a generous parental leave policy and an employee assistance programme (which includes health, mental wellbeing, and bereavement support). We offer a range of healthy food and snacks at our central Bristol office and have our own barista bar! We welcome people of different backgrounds and experiences; we’re committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments.
Applicants for this position must hold the right to work in the UK. Unfortunately at this time, we are unable to provide visa sponsorship or support for visa applications
Similar Jobs
What We Do
At Graphcore, we’re building the future of AI compute.
We’re a team of semiconductor, software and AI experts, with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale.
As part of the SoftBank Group, backed by significant long-term investment, we are delivering key technology into the fast-growing SoftBank AI ecosystem.
To meet the vast and exciting AI opportunity, Graphcore is expanding its teams around the world.
We are bringing together the brightest minds to solve the toughest problems, in a place where everyone has the opportunity to make an impact on the company, our products and the future of artificial intelligence.
Why Work With Us
Our team is at the forefront of the machine intelligence revolution, enabling innovators from all industries to build AI-native products to expand human potential. What we do at Graphcore really makes a difference.
Gallery
Graphcore Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.
At Graphcore, we value wellbeing and flexibility to support a healthy work/life balance. Our hybrid approach encourages office-based colleagues to work onsite three days a week, with trusted flexibility built on trust and transparency for everyone.





