SUSE Jobs

Platform Engineer, AI

SUSE

Platform Engineer, AI

Reposted Yesterday

30 Locations

In-Office or Remote

Senior level

Software

The Role

Hands-on platform engineering role building and operating an internal Agentic AI platform: implement infrastructure-as-code, security and secrets management, GitOps delivery, observability, autoscaling, API/AI gateway routing, and run day-to-day operations including incident response, backups, and runbooks.

Summary Generated by Built In

About Us

SUSE is a global leader of enterprise open source software. By transforming community innovations into secure, sovereign and AI-ready solutions, SUSE empowers customers to escape vendor lock-in and regain control of their IT destiny. Through industry-leading Linux, Kubernetes, Edge and AI infrastructure solutions, SUSE delivers the flexibility to innovate everywhere—from the data center to multi-cloud and out to the edge. Only SUSE also manages many Linux and Kubernetes distributions. At SUSE, Choice Happens because we prioritize community, interoperability and relentless innovation. Discover how we power mission-critical resilience at www.suse.com.

Platform Engineer, AI

Job Description

About the Role

SUSE Internal IT is hiring a Platform Engineer to join the Shared Services team. Your primary focus is building and operating our internal Agentic AI Platform, working as one of a pair of engineers who share ownership of it end-to-end, but the role also flexes into the team's wider Shared Services portfolio (source control, CI/CD, monitoring, backup, and the other systems the team runs) as priorities require, including shared operational rotations. This role reports to the Senior Platform Engineering Manager, Shared Services.

This is a hands-on engineering role. You will be equally responsible for building new capabilities, keeping systems operationally healthy, and maintaining the infrastructure-as-code and documentation that underpins them: both on the Agentic AI Platform and, when called on, elsewhere in Shared Services. On the platform specifically, you'll pair with one other engineer: sharing ownership, peer-reviewing each other's work, and developing complementary depth over time.

The platform is in active delivery. You will join at a point where the core infrastructure is running and the next phase of security hardening, automation, and observability is under way. There is meaningful work to deliver from day one.

Key Responsibilities:

You'll build out the platform's foundational layers: security and secrets management, policy enforcement, observability, delivery automation, workload scaling, and AI traffic routing. You will also translate architectural designs into production-grade infrastructure-as-code, rather than working against a fixed feature list. Day to day, you own the platform's operational health: monitoring, incident response, root-cause analysis, and remediation, including standing up and operating local LLM inference capability (e.g. vLLM on GPU nodes) so internal workloads can run inference on-prem. You'll manage secrets rotation, certificate lifecycle, and identity configuration as ongoing responsibilities, and participate in planned high-stakes procedures such as secrets infrastructure initialisation and rotation events. Beyond the platform itself, you'll contribute engineering and review capacity to Shared Services more broadly, and take part in the team's shared operational and on-call rotations. You're expected to keep your infrastructure-as-code versioned and peer-reviewed, proactively chase down technical debt, maintain runbooks and documentation so any team member can operate what you've built, and peer-review your platform counterpart's changes in turn.

Required Skills:

Candidates will need to demonstrate hands-on production delivery experience, not just conceptual familiarity. We expect evidence of real delivery against each of these areas at the interview.

Kubernetes – production cluster operation (RKE2, EKS, GKE, or equivalent); Helm, RBAC design, multi-namespace workload management
Secrets management — production deployment of a secrets management platform (HashiCorp Vault or equivalent), covering PKI, dynamic credentials, and workload secrets injection
Policy-as-code – admission control policy authoring and enforcement in production Kubernetes environments (OPA/Rego, Kyverno, or equivalent)
GitOps – Fleet, ArgoCD, Flux, or equivalent at production scale; declarative drift reconciliation, rollback strategy, multi-environment targeting
Observability stack – log aggregation, log pipeline design, distributed tracing (OpenTelemetry or equivalent), and metrics dashboards (Prometheus/Grafana or equivalent)
API/AI gateway and model serving – production deployment of an API or AI gateway (Kong, Envoy, or equivalent) and local LLM inference serving (vLLM or equivalent), including GPU-aware scheduling
Linux platform engineering – networking fundamentals, TLS and PKI, CSI storage operations, container runtime

This position is subject to background checks after offer is confirmed

If this role is filled in Italy, the expected Total Target Compensation (“TTC”) range is between 58,000 EUR and 78,000 EUR gross annually. The TTC includes both the annual base salary and target corporate bonus opportunity, which is typically paid quarterly, as well as access to an attractive benefits package.
Actual compensation will be determined based on objective, non-discriminatory criteria including experience, skills, qualifications, geographical location, internal equity, and budget considerations. Bonus payments are subject to the terms of the applicable bonus plan and company policies. Please note that this compensation information is applicable to roles hired in Italy only.

This position is subject to a background check(s), including criminal, credit, and/or employment references. The candidate is required to complete the background check(s) once an offer has been accepted. This will be conducted by SUSE’s external provider, where legally permitted.

Job

Information Technology

What We Offer

We empower you to be bold, driving your career to create the future you want. We celebrate and reward your achievements. 

SUSE is a dynamic environment that is evolving rapidly, thus requiring agility, strong entrepreneurship and an open mind.

This is a compelling opportunity for the right person to join us as we continue to scale and prosper.  

If you’re a big thinker, obsessed by execution and thrive in a dynamic environment in which you can tangibly create a lasting legacy, then please apply now! 

We give you the freedom to be yourself. You will work in a global community of unique individuals – like you – with different backgrounds, talents, skills and perspectives. A truly open community where everyone is welcome, has a voice and is encouraged to reach their full potential regardless of age, gender, race, nationality, disability, sexual orientation, religion, or any other characteristics.  

Sounds like the right fit for you? Click Apply to submit your resume. A recruiter will contact you if your skills match our current or any future positions. In the meantime, stay updated on the latest SUSE news and job vacancies by joining our Talent Community.

SUSE Values

SUSE's culture is centered on four key values - Choice, Community, Trust, and Innovation - which are deeply integrated with our open source ethos. SUSE fosters a diverse and inclusive environment where our people are encouraged to be themselves.

Choice:

We are continuously making choice happen
We are accountable for our choices
We never get complacent

Community:

Nobody is smarter than everybody
We embrace diversity of thought
We are “open source first, upstream first” where collaboration benefits all

Trust:

We are trusted to deliver with integrity
We offer trust by default, and do not wait for it to be earned
We foster an environment where everyone trusts each other

Innovation:

We foster a culture of experimentation, and embrace change by challenging the norm
We are committed to continuous improvement, creativity and adaptability
Ideas are great, but without execution they are just ideas

Skills Required

Production Kubernetes cluster operation (RKE2, EKS, GKE or equivalent)
Helm, RBAC design, and multi-namespace workload management
Production deployment and operation of a secrets management platform (HashiCorp Vault or equivalent) including PKI and dynamic credentials
Policy-as-code authoring and enforcement in Kubernetes (OPA/Rego, Kyverno or equivalent)
GitOps at production scale (Fleet, ArgoCD, Flux or equivalent) with drift reconciliation and rollback strategies
Design and operate observability stack: log aggregation, log pipeline design, distributed tracing (OpenTelemetry), Prometheus/Grafana metrics and dashboards
API/AI gateway engineering and operation (Kong, Envoy or equivalent) including route management and rate limiting
Linux platform engineering: networking fundamentals, TLS and PKI, CSI storage operations, and container runtime experience
Infrastructure-as-code development and maintenance with version-controlled configurations and peer review
Operational experience: incident response, root-cause analysis, backups, failover testing, capacity management and runbook creation
Secrets rotation, certificate lifecycle management, policy drift detection, and identity configuration operations
Design and implement workload autoscaling for AI workflow workers to balance cost and performance
Experience implementing governed AI model routing/gateway layers with auditable routing and per-consumer rate limiting
Demonstrable hands-on production delivery experience across the listed areas (not just conceptual familiarity)

View all jobs at SUSE

View SUSE Profile

Report Job

Am I A Good Fit?

beta

Get Personalized Job Insights.

Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company

HQ: Nuremberg

2,483 Employees

Year Founded: 1992

What We Do

SUSE is a global leader in innovative, reliable and secure enterprise open source solutions, including SUSE Linux Enterprise (SLE), Rancher and NeuVector. More than 60% of the Fortune 500 rely on SUSE to power their mission-critical workloads, enabling them to innovate everywhere – from the data center to the cloud, to the edge and beyond. SUSE puts the “open” back in open source, collaborating with partners and communities to give customers the agility to tackle innovation challenges today and the freedom to evolve their strategy and solutions tomorrow. For more information, visit www.suse.com