The Role
Monitor and support production systems across a multi-account AWS environment; triage incidents, execute runbooks, manage SLAs, participate in on-call rotation, coordinate cross-team escalations, run post-incident reviews, and improve operational documentation and monitoring.
Summary Generated by Built In
AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work awards.
WHY JOIN US
If you're looking for a place to grow, make an impact, and work with people who care, we'd love to meet you!
ABOUT THE ROLE
We are looking for a Production Support Engineer to monitor and support production systems across a multi-account AWS environment, serving as the front line of a tiered support model for a fintech platform. You will triage incidents, execute runbooks, manage SLA performance, and coordinate with engineering, help desk, and security partners. The role includes on-call rotation and structured post-incident review with a focus on continuous operational improvement.
WHAT YOU WILL DO
- Monitor production systems and respond to alerts across infrastructure, application, and data layers;
- Perform first-level triage on incidents and support requests; escalate to developers with thorough context and diagnostics;
- Execute patching, operational tasks, and documented runbooks;
- Participate in on-call rotation and support scheduled deployments as needed;
- Conduct post-incident reviews and feed lessons back into runbooks and playbooks;
- Identify recurring issues and systemic risks before they escalate;
- Improve documentation and monitoring coverage between active support activities;
- Contribute to operational reporting and SLA dashboards;
- Manage and track SLA performance across all supported services; surface risks proactively;
- Coordinate with Help Desk / Deskside Support partner for production tasks affecting employees;
- Escalate security incidents and vulnerabilities to the vCISO partner per documented procedures.
MUST HAVES
- 3+ years in production support, SRE, NOC, or operations engineering;
- Hands-on AWS experience with EC2/ECS, networking (VPC, security groups, ACLs), and IAM;
- Operational proficiency with PostgreSQL and / or Amazon RDS;
- Incident triage across infrastructure and application layers;
- Track record managing SLAs in a ticketed support environment such as Jira;
- Strong written communication for escalation and post-incident reporting;
- Upper-intermediate English level.
NICE TO HAVES
- Experience with structured incident response such as ITIL or NIST;
- Familiarity with Datadog, CloudWatch, or comparable observability platforms;
- Exposure to AWS data services including Glue, S3, Athena, and EventBridge;
- Basic IaC familiarity with CloudFormation, SAM, or Terraform;
- Background in financial services or regulated environments;
- AWS certification such as SysOps Administrator or Solutions Architect;
- Experience with scripting/automation to reduce manual toil.
PERKS AND BENEFITS
- Professional growth: Mentorship, TechTalks, and personalized growth roadmaps.
- Competitive compensation: USD-based pay with education, fitness, and team activity budgets.
- Exciting projects: Modern solutions with Fortune 500 and top product companies.
- Flextime: Flexible schedule with remote and office options.
Skills Required
- 3+ years in production support, SRE, NOC, or operations engineering
- Hands-on AWS experience with EC2 and ECS
- Networking experience: VPC, security groups, ACLs
- IAM experience
- Operational proficiency with PostgreSQL and/or Amazon RDS
- Incident triage across infrastructure and application layers
- Track record managing SLAs in a ticketed support environment such as Jira
- Strong written communication for escalation and post-incident reporting
- Upper-intermediate English level
- Experience with structured incident response frameworks (ITIL or NIST)
- Familiarity with Datadog, CloudWatch, or comparable observability platforms
- Exposure to AWS data services (Glue, S3, Athena, EventBridge)
- Basic IaC familiarity with CloudFormation, SAM, or Terraform
- Background in financial services or regulated environments
- AWS certification such as SysOps Administrator or Solutions Architect
- Experience with scripting/automation to reduce manual toil
Am I A Good Fit?
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.
Success! Refresh the page to see how your skills align with this role.
The Company
What We Do
AgileEngine is a privately held company established in 2010 that builds dedicated teams of designers and developers. We turn good ideas into awesome software that people actually want to use. Some of the biggest names and the hottest startups around the world chose us to build their tech.








