We are looking for talented, creative, and proactive individuals who are passionate about solving complex business problems and contributing to the next generation of modern applications. Our goal is to help our customers understand the connections between application performance, user experience, and business outcomes, thereby creating exceptional customer experiences. Join us in shaping the future of Observability Engineering within our Intelligent Operations team with innovative data and integration solutions tools.
Responsibilities
- Implement and maintain cutting-edge Observability solutions utilizing tools like New Relic, Datadog, AppDynamics, or Dynatrace for our large-scale enterprise customers.
- Develop and maintain systems for effective monitoring, logging, and tracing, ensuring scalability and reliability.
- Collaborate with cross-functional teams, including software engineers, product managers, and data scientists, to build resilient systems.
- Integrate observability practices into different engineering workflows and lead the adoption, optimization, and integration of products within the customer’s business infrastructure.
- Create custom dashboards, set up alerts, and develop AIOps rules, ensuring effective tracking against goals/KPIs.
- Provide technical support in post-sales processes, including installation, deployment, training, technical check-ups, and escalation management.
- Identify performance bottlenecks and anomalous system behavior and resolve root causes of service issues.
- Stay updated with the latest trends in observability, logging, monitoring, and cloud technologies and introduce innovative solutions and best practices.
- Participate in strategic technology planning, focusing on scalability, cost-effectiveness, and risk management in observability infrastructure.
- Document observability systems and processes comprehensively and prepare reports for management on system performance and reliability.
- Utilize Infrastructure as Code (IaC) principles for efficient infrastructure provisioning and management.
Qualifications
- Minimum 3-5 years of hands-on experience with Application Performance Management tools such as Datadog, New Relic, AppDynamics, Dynatrace, Splunk ITSI, Honeycomb, Chronosphere, Riverbed Aternity/Alluvio, ExtraHop, & Logic Monitor.
- Hands-on experience with cloud-native, open-source solutions like Prometheus, Grafana, ELK stack/Elastic.io, OpenTelemetry (OTEL),
- Experience with public cloud solutions like CloudWatch, App Insights, etc.
- Strong understanding of network & system management solutions, distributed systems, networking, and database technologies.
- Operational background and familiarity with ITIL ITSM, SRE, or DevOps best practices and principles.
- Excellent problem-solving skills, organizational, project management, and communication skills.
- Eagerness to collaborate, contribute to team success, and a continuous learning mindset.
Top Skills
What We Do
AHEAD builds platforms for digital business. By weaving together cloud infrastructure, intelligent operations, and modern applications, we help enterprises deliver on the promise of digital transformation.