System Reliability Engineer / T2 Support Engineer

Reposted 5 Hours Ago
Be an Early Applicant
Gurugram, Haryana, IND
In-Office
Mid level
Software
The Role
The System Reliability Engineer ensures the reliability and stability of production systems through monitoring, incident response, and operational changes while collaborating closely with technical teams.
Summary Generated by Built In
About the Role

We are looking for an engineer who enjoys understanding how systems behave in real production, not just writing features. This role is responsible for maintaining reliability, stability, and smooth functioning of our live platform running on Google Cloud.

You will act as the first technical owner of production systems — monitoring services, investigating alerts, resolving issues, and performing controlled configuration and operational changes. This role works closely with backend developers, QA, and infrastructure teams to prevent incidents and reduce downtime.

This is not a call-center support role and not a pure development role — it is a hands-on technical position focused on debugging, incident handling, and system operations.

Tech Stack
  • Google Cloud Platform (Compute, Logging, Monitoring)
  • Java (Spring Boot based microservices)
  • MongoDB
  • Apache Kafka (event-driven architecture)
  • Redis cache
  • Linux servers
Key ResponsibilitiesProduction Monitoring & Alert Handling
  • Monitor application health, latency, errors, consumer lag, database connections, and resource utilization
  • Acknowledge and investigate monitoring alerts
  • Perform first-level troubleshooting and stabilize services
  • Identify whether issue is infra, application, database, or messaging related
Incident Response
  • Participate in on-call rotation
  • Diagnose production incidents and restore services with minimal downtime
  • Safely restart services, scale instances, or rollback deployments when required
  • Communicate incident status to stakeholders
Technical Support & Operational Changes
  • Handle technical support tickets requiring engineering understanding
  • Update configurations and feature flags
  • Manage scheduled jobs / cron triggers
  • Trigger or replay events in Kafka
  • Assist in minor Java configuration/code fixes when needed
  • Coordinate production releases
Database & Messaging Operations
  • Investigate MongoDB performance issues and slow queries
  • Monitor and resolve Kafka consumer lag and stuck messages
  • Manage Redis cache behavior (TTL, eviction, connection issues)
Logs & RCA
  • Analyze logs and metrics to determine root cause of issues
  • Prepare basic Root Cause Analysis (RCA) reports
  • Suggest preventive actions to reduce recurring incidents

RequirementsRequired SkillsCore Technical Skills
  • Good understanding of Linux commands and server behavior
  • Experience analyzing application logs and debugging runtime issues
  • Basic Java knowledge (stack trace reading, configuration changes, rebuild & deploy)
  • Practical experience with MongoDB (indexes, connections, slow queries)
  • Understanding of Kafka concepts (consumer, offset, lag, partitions)
  • Basic Redis knowledge (caching behavior, TTL)
Cloud & Tools
  • Hands-on experience with any cloud platform (GCP preferred / AWS acceptable)
  • Experience using monitoring tools (GCP Monitoring, Prometheus, Grafana, ELK, or similar)
  • Understanding of REST APIs and HTTP status codes
What We Expect From You
  • Ability to investigate problems logically rather than randomly restarting services
  • Comfort working with live production systems
  • Willingness to participate in on-call support
  • Strong ownership mindset and attention to detail
  • Good communication during incidents
Good to Have
  • Experience in e-commerce, fintech, logistics, or high-traffic systems
  • Exposure to CI/CD pipelines and deployments
  • Basic scripting (Shell or Python)
  • Experience writing RCA documents
Experience

3 – 6 years of relevant experience in production support, application support, SRE, DevOps operations, or similar roles.


BenefitsWhy Join Us
  • Direct exposure to real distributed systems
  • Hands-on production debugging experience
  • Opportunity to learn system architecture deeply
  • Close interaction with development and platform teams
Important Note

This role involves handling live production systems and occasional on-call responsibilities. Candidates interested only in feature development or pure infrastructure automation may not find this role suitable.

Skills Required

  • 3 - 6 years of relevant experience in production support
  • Good understanding of Linux commands and server behavior
  • Basic Java knowledge for debugging
  • Practical experience with MongoDB
  • Understanding of Kafka concepts
  • Hands-on experience with any cloud platform

Teleport Compensation & Benefits Highlights

The following summarizes recurring compensation and benefits themes identified from responses generated by popular LLMs to common candidate questions about Teleport and has not been reviewed or approved by Teleport.

  • Fair & Transparent Compensation Pay is considered competitive and fairly set, with public salary ranges and role-specific bands providing clarity. Market-aligned offers are visible across engineering and go-to-market roles, reinforcing a perception of strong total compensation.
  • Healthcare Strength Health coverage spans medical, dental, vision, disability, and mental-health support, including resources like 24/7 assistance and meditation tools. This breadth is consistently highlighted as a core element of the package.
  • Wellbeing & Lifestyle Benefits A substantial annual expense/wellness benefit and remote-work support (home office, internet/phone, gym, commuting, and professional development) are emphasized. These flexible perks meaningfully augment total rewards for a remote-first setup.

Teleport Insights

Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
HQ: Oakland, CA
74 Employees
Year Founded: 2015

What We Do

Teleport allows engineers and security professionals to unify access for SSH servers, Kubernetes clusters, web applications, and databases across all environments.

Similar Jobs

Ericsson Logo Ericsson

Infrastructure Engineer

Cloud • Information Technology • Internet of Things • Machine Learning • Software • Cybersecurity • Infrastructure as a Service (IaaS)
In-Office
3 Locations
88000 Employees

Optum Logo Optum

Customer Service Representative

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office
Gurgaon, Gurugram, Haryana, IND
160000 Employees

Optum Logo Optum

Data Integration Analyst

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office
Gurgaon, Gurugram, Haryana, IND
160000 Employees

Optum Logo Optum

Consultant

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
In-Office
Gurgaon, Gurugram, Haryana, IND
160000 Employees

Similar Companies Hiring

Hanover Park Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
42 Employees
Kepler  Thumbnail
Fintech • Software
New York, New York
6 Employees
Onshore Thumbnail
Artificial Intelligence • Fintech • Software • Financial Services
New York, New York
60 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account