Engineer, SRE GenAI

Reposted 2 Days Ago
Be an Early Applicant
3 Locations
In-Office
93K-167K Annually
Mid level
Other • Utilities
The Role
As an SRE Engineer for AI Systems, you'll ensure the reliability and performance of AI platforms, supporting operations and monitoring system health, while participating in on-call rotations.
Summary Generated by Built In

At T-Mobile, we invest in YOU!  Our Total Rewards Package ensures that employees get the same big love we give our customers.  All team members receive a competitive base salary and compensation package - this is Total Rewards. Employees enjoy multiple wealth-building opportunities through our annual stock grant, employee stock purchase plan, 401(k), and access to free, year-round money coaches. That’s how we’re UNSTOPPABLE for our employees!

Job Overview
As an Engineer in Site Reliability Engineering (SRE) for AI Systems, you will help ensure the reliability, scalability, and performance of AI platforms. This role includes participating in on-call rotations, improving system observability, and supporting operations across cloud-native infrastructure.
This is a hands-on role ideal for someone with foundational SRE skills and a growth mindset to expand in GenAI and LLM infrastructure operations.
We pride ourselves on encouraging a culture of innovation, advocating for agile methodologies, and promoting transparency in all that we do. Join us in embodying the spirit of the 'Un-carrier' and make a tangible impact! Our team is dynamic where no day is the same, and we are diverse and inclusive passionate about growth and transformation. If you're up to the challenge, apply today!

Job Responsibilities:
  • Participate in on-call rotations to support AI platforms and respond to production incidents with urgency and precision. 

  • Monitor system health and performance using tools like Grafana, Splunk, and PowerBI. 

  • Support cloud-native infrastructure deployments, with a focus on Azure (primary), and exposure to AWS or GCP. 

  • Implement runbooks and automate repetitive operational tasks to reduce toil. 

  • Support CI/CD pipelines and IaC deployments using Gitlab pipelines, Databricks. 

  • Assist in the development and enforcement of Service Level Objectives (SLOs) and real-time alerts for AI APIs and services. 

  • Collaborate with senior engineers to improve platform reliability and scale LLM-based applications. 

Education and Work Experience:

  • Bachelor's Degree Computer Science, Engineering or a related field (Required)

  • 2–4 years of experience in DevOps, SRE, or cloud platform engineering. 

  • Hands-on experience with monitoring/logging systems such as Prometheus, Grafana, Splunk, or OpenSearch. 

  • Familiarity with cloud environments (preferably Azure; AWS/GCP a plus). 

  • Experience in scripting or automation using Python, Bash, or PowerShell. 

  • Basic understanding of containerization (Docker, Kubernetes) and CI/CD concepts. 

  • Willingness to participate in an on-call schedule and incident resolution. 

  • Strong solving and root cause analysis skills. 

Preferred Qualifications 
  • Exposure to AI/ML infrastructure or LLM-based systems (e.g., OpenAI, ChatGPT, Azure OpenAI). 

  • Experience with infrastructure-as-code tools like Terraform or ARM templates. 

  • Familiarity with LLM observability or API token usage metrics. 

  • Passion for learning AI reliability practices and collaborating with cross-functional teams. 

Knowledge, Skills and Abilities:

  • Communication (Required)
  • Customer Service (Required)
  • Analytics (Required)
  • Technical Writing (Required)

    • At least 18 years of age
    • Legally authorized to work in the United States

    Travel:
    Travel Required (Yes/No): Yes
    DOT Regulated:
    DOT Regulated Position (Yes/No): No
    Safety Sensitive Position (Yes/No): No

    Base Pay Range: $92,500 - $166,800

    Corporate Bonus Target: 15%

    The pay range above is the general base pay range for a successful candidate in the role. The successful candidate’s actual pay will be based on various factors, such as work location, qualifications, and experience, so the actual starting pay will vary within this range.

    At T-Mobile, employees in regular, non-temporary roles are eligible for an annual bonus or periodic sales incentive or bonus, based on their role. Most Corporate employees are eligible for a year-end bonus based on company and/or individual performance and which is set at a percentage of the employee’s eligible earnings in the prior year. Certain positions in Customer Care are eligible for monthly bonuses based on individual and/or team performance. To find the pay range for this role based on hiring location, https://paylookup.t-mobile.com/paylookup?reqID=REQ321838¶dox=1

    At T-Mobile, our benefits exemplify the spirit of One Team, Together! A big part of how we care for one another is working to ensure our benefits evolve to meet the needs of our team members. Full and part-time employees have access to the same benefits when eligible. We cover all of the bases, offering medical, dental and vision insurance, a flexible spending account, 401(k), employee stock grants, employee stock purchase plan, paid time off and up to 12 paid holidays - which total about 4 weeks for new full-time employees and about 2.5 weeks for new part-time employees annually - paid parental and family leave, family building benefits, back-up care, enhanced family support, childcare subsidy, tuition assistance, college coaching, short- and long-term disability, voluntary AD&D coverage, voluntary accident coverage, voluntary life insurance, voluntary disability insurance, and voluntary long-term care insurance. We don't stop there - eligible employees can also receive mobile service & home internet discounts, pet insurance, and access to commuter and transit programs! To learn about T-Mobile’s amazing benefits, check out www.t-mobilebenefits.com.

    Never stop growing!
    As part of the T-Mobile team, you know the Un-carrier doesn’t have a corporate ladder–it’s more like a jungle gym of possibilities! We love helping our employees grow in their careers, because it’s that shared drive to aim high that drives our business and our culture forward. By applying for this career opportunity, you’re living our values while investing in your career growth–and we applaud it. You’re unstoppable!
    T-Mobile USA, Inc. is an Equal Opportunity Employer. All decisions concerning the employment relationship will be made without regard to age, race, ethnicity, color, religion, creed, sex, sexual orientation, gender identity or expression, national origin, religious affiliation, marital status, citizenship status, veteran status, the presence of any physical or mental disability, or any other status or characteristic protected by federal, state, or local law. Discrimination, retaliation or harassment based upon any of these factors is wholly inconsistent with how we do business and will not be tolerated.
    Talent comes in all forms at the Un-carrier. If you are an individual with a disability and need reasonable accommodation at any point in the application or interview process, please let us know by emailing [email protected] or calling 1-844-873-9500. Please note, this contact channel is not a means to apply for or inquire about a position and we are unable to respond to non-accommodation related requests.

    Top Skills

    Arm Templates
    AWS
    Azure
    Bash
    Databricks
    Docker
    GCP
    Gitlab
    Grafana
    Kubernetes
    Power BI
    Powershell
    Python
    Splunk
    Terraform
    Am I A Good Fit?
    beta
    Get Personalized Job Insights.
    Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

    The Company
    HQ: Bellevue, WA
    89,016 Employees

    What We Do

    T-Mobile U.S. Inc. (NASDAQ: TMUS) is America’s supercharged Un-carrier, delivering an advanced 4G LTE and transformative nationwide 5G network that will offer reliable connectivity for all. T-Mobile’s customers benefit from its unmatched combination of value and quality, unwavering obsession with offering them the best possible service experience and undisputable drive for disruption that creates competition and innovation in wireless and beyond. Based in Bellevue, Wash., T-Mobile provides services through its subsidiaries and operates its flagship brands, T-Mobile, Metro by T-Mobile and Sprint.

    Similar Jobs

    Samsara Logo Samsara

    Enterprise/Mid-Market Development - Outbound - Relocation Required

    Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
    Easy Apply
    Remote or Hybrid
    TX, USA
    4000 Employees
    64K-73K Annually

    Samsara Logo Samsara

    Enterprise/Mid-Market Development - Outbound - Relocation Required

    Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
    Easy Apply
    Remote or Hybrid
    TX, USA
    4000 Employees
    64K-73K Annually

    Samsara Logo Samsara

    Enterprise/Mid-Market Development - Outbound - Relocation Required

    Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
    Easy Apply
    Remote or Hybrid
    TX, USA
    4000 Employees
    64K-73K Annually

    CrowdStrike Logo CrowdStrike

    Regional Sales Manager

    Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
    Remote or Hybrid
    TX, USA
    10000 Employees
    130K-175K Annually

    Similar Companies Hiring

    Energy CX Thumbnail
    Utilities • Professional Services • Greentech • Financial Services • Energy • Consulting • Business Intelligence
    Chicago, IL
    108 Employees
    Compa Thumbnail
    Software • Other • HR Tech • Business Intelligence • Artificial Intelligence
    Irvine, CA
    60 Employees
    Milestone Systems Thumbnail
    Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
    Lake Oswego, OR
    1500 Employees

    Sign up now Access later

    Create Free Account

    Please log in or sign up to report this job.

    Create Free Account