Senior Cloud Site Reliability Engineer

Reposted 7 Days Ago
Easy Apply
Be an Early Applicant
Sandy, UT
In-Office
Senior level
Cloud • Software • Analytics
The Role
The Senior Cloud SRE enhances reliability and availability of solutions by automating tasks, providing on-call support, and mentoring teams.
Summary Generated by Built In

At NiCE, we don’t limit our challenges. We challenge our limits. Always. We’re ambitious. We’re game changers. And we play to win. We set the highest standards and execute beyond them. And if you’re like us, we can offer you the ultimate career opportunity that will light a fire within you.

The Senior Cloud SRE works to improve the reliability and availability of our solutions. This includes providing on-call support for Major Incidents and helping us reduce the duration and occurrence of outages.
A  Typical Day Might Include the Following:

  • Create a new dashboard to provide observability for a development team of the health of their application.  This can include SLI/SLO metrics.
  • Consult with development work streams on SRE services and how we can assist them improve their reliability.
  • Automate activities previously done manually to reduce toil.
  • Participate in design, definition and scoping of a new solution to meet our internal customer needs.  Thoroughly document this and ensure agreement by the participants.
  • Document findings and share with other SREs.
  • Work with teams to ensure proper monitoring is setup/enabled.
  • Identify evolutionary improvements.
  • Meet with Incident and Problem Management to discuss previous Major Incidents and help identify root cause and permanent fixes.  Help identify which of these SREs can assist with.
  • Assist other teams in doing data/performance analysis to identify why an issue is occurring.
  • Review work of other SREs and help train them.
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
  • Practice sustainable incident response and blameless post mortem.
  • Assist in creation of automated end-to-end diagnostics. 
  • Communicate effectively to technical and non-technical peers and customers
  • Coordinates and works on multiple cross-functional base work initiatives and projects.
  • Participates in planning long and short term project efforts.
  • Leads or provides technical direction for the planning, execution, and validation of testing work.
  • Provides technical guidance and coaching/mentoring to team members.
  • Follow established processes when performing work or help document and create processes as necessary.
  • Document troubleshooting steps and results in appropriate locations for historical access.
  • Ensures compliance with policies, procedures, and standards.
  • Implements or coordinates remediation required by audits/assessments, and documents as necessary
  • Provide on call support for high priority incidents
  • Estimate time to complete activities/projects

To Land This Gig You'll Need:

  • 4+ years programming/scripting experience
  • 4+ years of experience working within public or private cloud environments
  • 4+ years of SRE or related experience
  • Experience with Agile, Jira, GitHub, monitoring, automation, dashboarding
  • 6+ years communicating in English in a technical field.
  • Can effectively troubleshoot supported applications effectively.
  • Can work on complex issues which may span multiple applications or environments.
  • Proactively engages with peers to discuss issues and keep stakeholders updated.
  • Mentors co-workers with expertise
  • Coordinates work with peers
  • Shares discoveries and best practices
  • Learns from others within the team
  • Self-Driven.  Proactively looks for ways to improve
  • Able to work with little supervision and complete tasks and projects as directed.

Bonus Experience:

  • Experience working with Prometheus, Datadog, Grafana, Splunk, BMC
  • Experience with Application Performance Monitoring solutions-Dynatrace, AppDynamics, New Relic
  • Experience working with Kubernetes, Docker, microservices, serverless compute
  • Experience working with Ansible, Terraform
  • Experience with one or more of the following: C#, C++, Java, Python, Perl, or Ruby.

About NiCE

NICE Ltd. (NASDAQ: NICE) software products are used by 25,000+ global businesses, including 85 of the Fortune 100 corporations, to deliver extraordinary customer experiences, fight financial crime and ensure public safety. Every day, NiCE software manages more than 120 million customer interactions and monitors 3+ billion financial transactions.

Known as an innovation powerhouse that excels in AI, cloud and digital, NiCE is consistently recognized as the market leader in its domains, with over 8,500 employees across 30+ countries.

NiCE is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, age, sex, marital status, ancestry, neurotype, physical or mental disability, veteran status, gender identity, sexual orientation or any other category protected by law.


Top Skills

Ansible
Appdynamics
Bmc
C#
C++
Datadog
Docker
Dynatrace
Git
Grafana
Java
JIRA
Kubernetes
New Relic
Perl
Prometheus
Python
Ruby
Splunk
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Hoboken, NJ
10,130 Employees
Year Founded: 1986

What We Do

NICE (Nasdaq: NICE) is the worldwide leading provider of both cloud and on-premises enterprise software solutions that empower organizations to make smarter decisions based on advanced analytics of structured and unstructured data. NICE helps organizations of all sizes deliver better customer service, ensure compliance, combat fraud and safeguard citizens. Over 25,000 organizations in more than 150 countries, including over 85 of the Fortune 100 companies, are using NICE solutions. www.nice.com.

Similar Jobs

Immersive Logo Immersive

Cyber Resilience Advisor - Federal

Enterprise Web • HR Tech • Information Technology • Software • Cybersecurity
Remote or Hybrid
United States
330 Employees

General Motors Logo General Motors

Architect

Automotive • Big Data • Information Technology • Robotics • Software • Transportation • Manufacturing
Remote or Hybrid
United States
165000 Employees
106K-141K Annually

DFIN Logo DFIN

Business Analyst

Fintech • Software
Remote or Hybrid
United States
1750 Employees

DFIN Logo DFIN

Site Reliability Engineer

Fintech • Software
Remote or Hybrid
United States
1750 Employees

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account