Agentic AI Solutions Engineer

Posted 4 Hours Ago
Be an Early Applicant
28 Locations
Remote
Senior level
Software
The Role
Hands-on platform engineering role building and operating an internal Agentic AI platform: implement infrastructure-as-code, security and secrets management, GitOps delivery, observability, autoscaling, API/AI gateway routing, and run day-to-day operations including incident response, backups, and runbooks.
Summary Generated by Built In

About Us


SUSE is a global leader of  enterprise open source software. By transforming community innovations into secure, sovereign and AI-ready solutions, SUSE empowers customers to escape vendor lock-in and regain control of their IT destiny. Through industry-leading Linux, Kubernetes, Edge and AI infrastructure solutions, SUSE delivers the flexibility to innovate everywhere—from the data center to multi-cloud and out to the edge. Only SUSE also manages many Linux and Kubernetes distributions. At SUSE, Choice Happens because we prioritize community, interoperability and relentless innovation. Discover how we power mission-critical resilience at www.suse.com

Agentic AI Solutions Engineer

  

Job Description

   

About the Role
SUSE Internal IT is hiring a Senior Platform Engineers to join the team building and operating our internal Agentic AI Platform.
This is an hands-on engineering role. You will be equally responsible for building new platform capabilities, keeping the platform operationally healthy, and maintaining the infrastructure-as-code and documentation that underpins it. You will work with one other engineer as a pair: sharing ownership of the full platform, peer-reviewing each other's work, and developing complementary depth across the stack over time.
The platform is in active delivery. You will join at a point where the core infrastructure is running and the next phase of security hardening, automation, and observability is under way. There is meaningful work to deliver from day one.
Key Responsibilities
Build
  • Implement new platform capabilities from architectural designs, translating security, governance, and infrastructure requirements into production-grade infrastructure-as-code
  • Design and build the platform security and secrets management layer, ensuring all workloads operate with least-privilege credentials and certificates issued through a governed PKI hierarchy
  • Implement and enforce security policy across the cluster using admission control, covering workload configuration, image standards, network traffic, and resource constraints
  • Build and establish the platform observability stack, providing consistent log aggregation, metrics, distributed tracing, and alerting across all platform components
  • Design and implement GitOps delivery automation, ensuring all platform changes flow through version-controlled, auditable pipelines with drift reconciliation
  • Build and configure workload autoscaling, ensuring AI workflow workers scale efficiently and cost-effectively in response to demand
  • Implement the AI model routing and gateway layer, enabling governed, auditable routing of model traffic with per-consumer rate limiting

Operate
  • Own the day-to-day operational health of the platform: monitor for issues, respond to incidents, conduct root-cause analysis, and implement lasting remediation
  • Maintain the health of platform data services — database cluster, job queue, and object storage — including backup schedules, failover testing, and capacity management
  • Monitor and tune autoscaling and resource configuration as workload patterns evolve, ensuring the platform scales responsively without over-provisioning
  • Manage secrets rotation, certificate lifecycle, policy drift detection, and identity configuration as ongoing operational responsibilities
  • Participate in planned high-stakes operational procedures — such as secrets infrastructure initialisation and rotation events — applying disciplined, documented execution

Maintain
  • Own and evolve the infrastructure-as-code for your areas of the platform; keep all configurations versioned, peer-reviewed, and aligned with the architectural design
  • Proactively identify and resolve technical debt — manual processes, undocumented configurations, legacy credential management, and gaps in observability coverage
  • Produce and maintain operational runbooks for all platform procedures, ensuring any team member can execute them safely and independently
  • Peer-review all platform infrastructure changes produced by your engineering counterpart, providing challenge and quality assurance across the full stack
  • Contribute to platform documentation and knowledge-sharing, supporting the wider team's understanding of the platform as it matures
Required Skills:

Candidates will need to demonstrate hands-on production delivery experience, not just conceptual familiarity. We expect evidence of real delivery against each of these areas at interview.
 
  • Kubernetes — production cluster operation (RKE2, EKS, GKE, or equivalent); Helm, RBAC design, multi-namespace workload management
  • Secrets management — production deployment of a secrets management platform (HashiCorp Vault or equivalent), covering PKI, dynamic credentials, and workload secrets injection
  • Policy-as-code — admission control policy authoring and enforcement in production Kubernetes environments (OPA/Rego, Kyverno, or equivalent)
  • GitOps — Fleet, ArgoCD, Flux, or equivalent at production scale; declarative drift reconciliation, rollback strategy, multi-environment targeting
  • Observability stack — log aggregation, log pipeline design, distributed tracing (OpenTelemetry or equivalent), and metrics dashboards (Prometheus/Grafana or equivalent)
  • API gateway engineering — production deployment and operation of an API or AI gateway (Kong, Envoy, or equivalent); rate limiting, plugin/policy authoring, route management
  • Linux platform engineering — networking fundamentals, TLS and PKI, CSI storage operations, container runtime

Job

Information Technology

What We Offer 

We empower you to be bold, driving your career to create the future you want. We celebrate and reward your achievements. 

SUSE is a dynamic environment that is evolving rapidly, thus requiring agility, strong entrepreneurship and an open mind. 

This is a compelling opportunity for the right person to join us as we continue to scale and prosper. 

If you’re a big thinker, obsessed by execution and thrive in a dynamic environment in which you can tangibly create a lasting legacy, then please apply now! 

We give you the freedom to be yourself. You will work in a global community of unique individuals – like you – with different backgrounds, talents, skills and perspectives. A truly open community where everyone is welcome, has a voice and is encouraged to reach their full potential regardless of age, gender, race, nationality, disability, sexual orientation, religion, or any other characteristics. 

Sounds like the right fit for you? Click Apply to submit your resume. A recruiter will contact you if your skills match our current or any future positions. In the meantime, stay updated on the latest SUSE news and job vacancies by joining our Talent Community.


SUSE Values 

SUSE's culture is centered on four key values - Choice, Community, Trust, and Innovation - which are deeply integrated with our open source ethos. SUSE fosters a diverse and inclusive environment where our people are encouraged to be themselves.


Choice: 

We are continuously making choice happen 

We are accountable for our choices 

We never get complacent


Community: 

Nobody is smarter than everybody 

We embrace diversity of thought 

We are “open source first, upstream first” where collaboration benefits all


Trust: 

We are trusted to deliver with integrity 

We offer trust by default, and do not wait for it to be earned

We foster an environment where everyone trusts each other 


Innovation: 

We foster a culture of experimentation, and embrace change by challenging the norm

We are committed to continuous improvement, creativity and adaptability 

Ideas are great, but without execution they are just ideas

Skills Required

  • Production Kubernetes cluster operation (RKE2, EKS, GKE or equivalent)
  • Helm, RBAC design, and multi-namespace workload management
  • Production deployment and operation of a secrets management platform (HashiCorp Vault or equivalent) including PKI and dynamic credentials
  • Policy-as-code authoring and enforcement in Kubernetes (OPA/Rego, Kyverno or equivalent)
  • GitOps at production scale (Fleet, ArgoCD, Flux or equivalent) with drift reconciliation and rollback strategies
  • Design and operate observability stack: log aggregation, log pipeline design, distributed tracing (OpenTelemetry), Prometheus/Grafana metrics and dashboards
  • API/AI gateway engineering and operation (Kong, Envoy or equivalent) including route management and rate limiting
  • Linux platform engineering: networking fundamentals, TLS and PKI, CSI storage operations, and container runtime experience
  • Infrastructure-as-code development and maintenance with version-controlled configurations and peer review
  • Operational experience: incident response, root-cause analysis, backups, failover testing, capacity management and runbook creation
  • Secrets rotation, certificate lifecycle management, policy drift detection, and identity configuration operations
  • Design and implement workload autoscaling for AI workflow workers to balance cost and performance
  • Experience implementing governed AI model routing/gateway layers with auditable routing and per-consumer rate limiting
  • Demonstrable hands-on production delivery experience across the listed areas (not just conceptual familiarity)
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Nuremberg
2,483 Employees
Year Founded: 1992

What We Do

SUSE is a global leader in innovative, reliable and secure enterprise open source solutions, including SUSE Linux Enterprise (SLE), Rancher and NeuVector. More than 60% of the Fortune 500 rely on SUSE to power their mission-critical workloads, enabling them to innovate everywhere – from the data center to the cloud, to the edge and beyond. SUSE puts the “open” back in open source, collaborating with partners and communities to give customers the agility to tackle innovation challenges today and the freedom to evolve their strategy and solutions tomorrow. For more information, visit www.suse.com

Similar Jobs

Mondelēz International Logo Mondelēz International

o9 Change Readiness Lead

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
11 Locations
90000 Employees

Zapier Logo Zapier

Manager or Sr. Manager, Sales Assist

Artificial Intelligence • Productivity • Software • Automation
Remote
30 Locations
800 Employees
Remote
26 Locations
393 Employees
179K-179K Annually

Mondelēz International Logo Mondelēz International

Manager, Procurement Data Science and Analytics (F/M/X)

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
3 Locations
90000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account