Site Reliability Engineering (SRE) Architect

Posted Yesterday
Be an Early Applicant
Dallas, TX, USA
In-Office
Expert/Leader
Information Technology
The Role
Design and architect highly available OSS/BSS and mainframe systems using SRE principles. Lead reliability, observability, automation, disaster recovery, incident management, and cross-functional transformations across hybrid cloud and on‑prem environments for telecom operations.
Summary Generated by Built In

Position Title: Site Reliability Engineering (SRE) Architect (Telecom OSS/BSS & Mainframe)

Location: Dallas – TX, Basking Ridge - NJ, NC, and Tampa – FL.

Work Arrangement: Hybrid/Onsite

Interview Type: video

Must have:

  • 15+ years of progressive experience in enterprise IT and telecommunications environments, with extensive expertise in designing, implementing, and supporting complex OSS/BSS ecosystems that enable large-scale business and network operations.

  • 8+ years of hands-on architecture experience across IBM Mainframe z/OS and midrange platforms (Linux/Solaris), delivering scalable, secure, and highly available enterprise solutions.

  • Demonstrated expertise in Site Reliability Engineering (SRE) principles, including defining and managing Service Level Objectives (SLOs), Service Level Indicators (SLIs), Error Budgets, reliability governance, and continuous service improvement.

  • Deep functional and technical knowledge of Telcordia OSS applications, including SWITCH, TIRKS, FACS, WFA, and SOAC, with experience integrating and optimizing telecom operational support systems.

  • Proven ability to design and implement high-availability, fault-tolerant, resilient, and disaster recovery architectures, ensuring business continuity and mission-critical system reliability.

  • Strong hands-on expertise with IBM Mainframe technologies, including z/OS internals, JCL, IMS, VSAM, DB2, CICS, system utilities, workload management, performance tuning, and production diagnostics.

  • Extensive experience implementing observability and monitoring solutions using industry-leading tools such as Splunk, Dynatrace, Instana, IBM NetCool, Grafana, and AppDynamics to improve operational visibility and proactive incident detection.

  • Proven success in driving automation, self-healing capabilities, infrastructure as code, CI/CD reliability practices, and DevOps/SRE transformation across hybrid cloud and on-premises enterprise environments.

  • Strong understanding of end-to-end telecommunications business processes, including service provisioning, inventory management, order management, activation, network fulfillment, service assurance, and lifecycle management.

  • Extensive experience leading major incident management, conducting Root Cause Analysis (RCA), problem management, and implementing preventive measures to significantly improve MTTD (Mean Time to Detect), MTTR (Mean Time to Resolve), system stability, and operational excellence.

  • Proven ability to collaborate with cross-functional teams including Enterprise Architecture, Infrastructure, Development, Operations, Network Engineering, and business stakeholders to deliver highly reliable, business-critical technology solutions.

  • Excellent leadership, stakeholder management, and communication skills, with a strong track record of mentoring technical teams, driving reliability engineering best practices, and supporting large-scale enterprise transformation initiatives.

About Us

At Radiant Digital, we provide IT solutions and consulting services to help government agencies and businesses in the USA, Canada, the Middle East, and Southeast Asia. On the federal side, we support agencies like NASA, the Department of State (DOS), the IRS, ACL, ACF,USDA and many others, along with numerous state and local government agencies.

We work with industries like telecom, healthcare, entertainment, oil and gas offering solutions designed to meet their specific needs. We focus on improving systems, making better use of data, and updating applications to keep up with changing markets.

Skills Required

  • 15+ years enterprise IT and telecommunications experience designing, implementing, and supporting complex OSS/BSS ecosystems
  • 8+ years architecture experience across IBM Mainframe z/OS and midrange platforms (Linux/Solaris)
  • Expertise in Site Reliability Engineering principles including SLOs, SLIs, Error Budgets, and reliability governance
  • Deep functional and technical knowledge of Telcordia OSS applications: SWITCH, TIRKS, FACS, WFA, SOAC
  • Design and implement high-availability, fault-tolerant, resilient, and disaster recovery architectures
  • Strong hands-on expertise with IBM Mainframe technologies: z/OS internals, JCL, IMS, VSAM, DB2, CICS, system utilities, workload management, performance tuning, production diagnostics
  • Experience implementing observability and monitoring with Splunk, Dynatrace, Instana, IBM NetCool, Grafana, AppDynamics
  • Proven experience driving automation, self-healing, Infrastructure as Code, CI/CD reliability practices, and DevOps/SRE transformations
  • Strong understanding of telecom business processes: service provisioning, inventory, order management, activation, fulfillment, assurance, lifecycle management
  • Extensive experience leading major incident management, RCA, problem management to improve MTTD/MTTR and system stability
  • Ability to collaborate with Enterprise Architecture, Infrastructure, Development, Operations, Network Engineering, and business stakeholders
  • Excellent leadership, stakeholder management, mentoring, and communication skills to drive reliability engineering best practices
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Vienna, VA
139 Employees
Year Founded: 2000

What We Do

We deliver meaningful and measurable technology solutions for digital transformation.

Similar Jobs

Micron Technology Logo Micron Technology

Principal DFT Engineer, HBM

Artificial Intelligence • Hardware • Information Technology • Machine Learning
In-Office
2 Locations
45000 Employees
146K-309K Annually

Optum Logo Optum

Associate Patient Care Coordinator

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office
Fort Worth, TX, USA
160000 Employees
16-29 Hourly

Optum Logo Optum

Per Diem RN - Float - Kelsey-Seybold Clinic - Greater Heights

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office
Houston, TX, USA
160000 Employees
29-52 Hourly

Optum Logo Optum

Senior Business Analyst

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office
Houston, TX, USA
160000 Employees
60K-107K Annually

Similar Companies Hiring

Scrunch  Thumbnail
Artificial Intelligence • Information Technology • Marketing Tech • Software • SEO
Salt Lake City, Utah
Standard Template Labs Thumbnail
Artificial Intelligence • Information Technology • Software
New York, NY
25 Employees
Golden Pet Brands Thumbnail
Digital Media • eCommerce • Information Technology • Marketing Tech • Pet • Retail • Social Media
El Segundo, California
178 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account