Site Reliability Engineer II

Posted 6 Days Ago
Be an Early Applicant
Chennai, Tamil Nadu, IND
Hybrid
Senior level
Fintech • Financial Services
The Role
The Site Reliability Engineer II enhances system resilience and performance through automation, disaster recovery planning, and collaboration with engineering teams to improve service reliability.
Summary Generated by Built In

Site Reliability Engineer II collaborates with engineering teams to enhance system resilience, scalability, and performance through feature development, automation, architectural design, resiliency testing, and disaster recovery planning, while promoting best practices for continuous improvement.

Responsibilities

Key Responsibilities

  • Monitor application and infrastructure health using enterprise monitoring and observability tools, including ELF, to ensure availability, performance, and reliability of enterprise platforms
  • Configure, tune, and maintain alerting mechanisms in ELF, aligned to service health indicators and SLOs, to enable timely incident detection and reduce noise and false positives
  • Develop and maintain dashboards providing visibility into system performance, availability, reliability trends, and key operational metrics
  • Analyze metrics, logs, and distributed traces across application and infrastructure layers to proactively identify issues and support effective root cause analysis (RCA)
  • Own and execute blameless RCAs for production incidents, identify corrective and preventive actions, and track them to closure
  • Implement minor code fixes, configuration updates, and reliability enhancements as part of incident remediation and preventive measures
  • Collaborate with application development and platform teams to review defects, propose fixes, and improve overall service reliability
  • Participate in Agile sprint planning ceremonies, backlog grooming, estimation, and delivery of SRE‑owned work items
  • Drive reliability improvements through sprint‑based commitments, including automation, operational fixes, and platform enhancements
  • Participate in Disaster Recovery (DR) planning, testing, and execution to ensure resilience of business‑critical services
  • Perform regular system patching and maintenance activities in line with organizational security, compliance, and audit requirements
  • Support ITIL‑based Incident, Problem, and Change Management processes, including planning, documentation, approvals, execution, and post‑implementation validation
  • Monitor network performance and troubleshoot connectivity, latency, and access‑related issues impacting platform traffic
  • Participate in certificate lifecycle management, including provisioning, renewal, validation, and troubleshooting of SSL/TLS certificates
  • Maintain and manage service accounts (Service IDs), including access provisioning, credential rotation, and compliance with security policies
  • Drive automation and operational toil reduction using scripting, CI/CD pipelines, and platform tooling to improve reliability and scalability
  • Maintain accurate documentation of system configurations, runbooks, SOPs, platform operational guidelines, and troubleshooting procedures, and generate reports on system performance, incidents, and resolutions
Qualifications

Education and Knowledge

• Minimum of 5+ years of relevant experience in application development, maintenance, and production support, along with hands-on exposure to Java and distributed systems in enterprise environments.

  • Bachelor’s degree in computer science, Information Technology, Engineering, or equivalent practical experience; advanced degree is a plus
  • Strong knowledge of operating systems and application runtimes such as Java and .NET
  • Knowledge of distributed systems and service‑based architectures from an operations and reliability perspective
  • Strong knowledge of modern observability stacks and platforms, including Splunk, Elasticsearch, Prometheus, and Grafana
  • Knowledge of observability practices including logging, monitoring, tracing, and performance analysis
  • Knowledge of RDBMS and NoSQL databases including MySQL, PostgreSQL, Couchbase, HBase, and Cassandra
  • Knowledge of scripting and automation using languages such as PowerShell and Python
  • Basic understanding of AI, analytics, or AIOps platforms from an operational perspective is a plus

Work Experience

  • Experience in Incident, Problem, and Change Management using ServiceNow or similar ITSM tools
  • Experience supporting production systems in large‑scale enterprise environments with a focus on reliability and availability
  • Experience in system administration, infrastructure operations, and network troubleshooting
  • Experience with CI/CD pipeline implementation and support using tools such as Jenkins, GitHub Actions, XL Release (XLR), or similar
  • Experience managing and troubleshooting technology infrastructure and services, including servers, networks, and cloud platforms
  • Knowledge of cloud‑based Site Reliability Engineering (SRE) practices with hands‑on experience on public cloud platforms such as AWS, Azure, or Google Cloud Platform
  • Knowledge of containerization and orchestration technologies such as Docker and Kubernetes, and microservices‑based architectures
  • Experience using enterprise monitoring and alerting platforms such as ELF
  • Exposure to AI‑assisted monitoring, automation, or AIOps tools is a plus
  • Experience accessing and managing remote systems using tools such as RDP and Citrix
  • Proficiency in connecting to and administering servers via SSH (Secure Shell)
  • Knowledge of core networking concepts including ports, protocols, firewalls, and secure remote access

Licenses & Certifications

  • Certification in at least one programming language or runtime such as Java, .NET, or Python
  • Certification in containerization and orchestration technologies (Docker, Kubernetes, OpenShift) is a plus
  • Public cloud certification in AWS or GCP is a plus
  • Certification or training related to AI platforms, analytics platforms, or AIOps is a plus

About Us

At American Express, our culture is built on a 175-year history of innovation, shared values and Leadership Behaviors, and an unwavering commitment to back our customers, communities, and colleagues. From delivering differentiated products to providing world-class customer service, we operate with a strong risk mindset, ensuring we continue to uphold our brand promise of trust, security, and service.

As part of Team Amex, you’ll experience our powerful backing with comprehensive support for your holistic well-being and many opportunities to learn new skills, develop as a leader, and grow your career. Here, your voice and ideas matter, your work makes an impact, and together, you will help us define the future of American Express.

About the Team

We back you with benefits that support your holistic well-being so you can be and deliver your best. This means caring for you and your loved ones' physical, financial, and mental health, as well as providing the flexibility you need to thrive personally and professionally:

  • Competitive base salaries
  • Bonus incentives
  • Support for financial-well-being and retirement
  • Comprehensive medical, dental, vision, life insurance, and disability benefits (depending on location)
  • Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need
  • Generous paid parental leave policies (depending on your location)
  • Free access to global on-site wellness centers staffed with nurses and doctors (depending on location)
  • Free and confidential counseling support through our Healthy Minds program
  • Career development and training opportunities

American Express is an equal opportunity employer and makes employment decisions without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran status, disability status, age, or any other status protected by law.

Offer of employment with American Express is conditioned upon the successful completion of a background verification check, subject to applicable laws and regulations.

Skills Required

  • 5+ years of relevant experience in application development, maintenance, and production support
  • Hands-on exposure to Java and distributed systems
  • Bachelor's degree in computer science, IT, Engineering, or equivalent practical experience
  • Strong knowledge of modern observability stacks and platforms
  • Knowledge of RDBMS and NoSQL databases
  • Knowledge of scripting and automation using PowerShell and Python
  • Experience in Incident, Problem, and Change Management using ITSM tools
  • Experience supporting production systems in enterprise environments
  • Experience in system administration and network troubleshooting
  • Experience with CI/CD pipeline implementation
  • Knowledge of cloud-based SRE practices
  • Knowledge of containerization and orchestration technologies
  • Experience with enterprise monitoring and alerting platforms
  • Certification in at least one programming language or runtime

American Express Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about American Express and has not been reviewed or approved by American Express.

  • Healthcare Strength Pay is often viewed as attractive when combined with comprehensive health, dental, and vision coverage that supports day-to-day needs. The benefits package is also framed as especially helpful for those supporting dependents.
  • Retirement Support Retirement benefits are positioned as a meaningful part of total rewards through a 401(k) plan with company matching. Financial wellness services and coaching are also highlighted as strengthening longer-term financial security.
  • Leave & Time Off Breadth Paid time off is repeatedly characterized as generous and a valued component of the overall package. Time off and flexibility are presented as helping the total rewards feel more complete beyond base salary.

American Express Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: New Delhi, Delhi
100,703 Employees
Year Founded: 1850

What We Do

At American Express, we know that with the right backing, people and businesses have the power to progress in incredible ways. Whether we’re supporting our customers’ financial confidence to move ahead, taking commerce to new heights, or encouraging people to explore the world, our colleagues are constantly striving to uphold our powerful backing promise to our customers and each other every day. These beliefs have been our North Star for 170 years as our business transformed – from helping evacuate travelers during World Wars, to ensuring the safety of our customers’ funds during the Great Depression in the U.S., to creating the Shop Small® movement to help small businesses recover from the Financial Crisis, to providing aid to communities impacted by many natural disasters and so much more. For generations, the key to our success has been the determination and resilience of our American Express colleagues. Now, as a globally integrated payments company, we work together to provide customers with access to products, insights and world-class experiences that enrich lives and build business success. Join us and let’s lead the way together.

Similar Jobs

LexisNexis Logo LexisNexis

Senior Site Reliability Engineer

Information Technology • Legal Tech • Professional Services • Analytics • Business Intelligence
In-Office
Chennai, Tamil Nadu, IND
10001 Employees

Comcast Logo Comcast

Specialist 2, Functional Systems & Technology-1

Digital Media • Information Technology • News + Entertainment
Hybrid
Chennai, Tamil Nadu, IND
115000 Employees

Comcast Logo Comcast

Development Engineer

Digital Media • Information Technology • News + Entertainment
Remote or Hybrid
India
115000 Employees

Comcast Logo Comcast

Specialist 2 - Functional Systems & Technology

Digital Media • Information Technology • News + Entertainment
Hybrid
Chennai, Tamil Nadu, IND
115000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
31 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account