We are building a powerful intelligent alerting engine to identify key performance and security issues out of the box for applications and infrastructure running on modern cloud services such as AWS, GCP, and Azure. The alerting engine can detect anomalies and outliers, correlate insights that are similar, reduce alert fatigue and put the focus on alerts that are more meaningful and actionable.
Customers are provided with real time notification of insights via email, slack, pagerduty, jira, service now and many other incident management tools. Our alerting UI helps customers to visualize and troubleshoot all alerts across the customer's environment from a single page, with the ability to instantly view critical details for each one and get the context on the affected entities to quickly debug and resolve the issue.
Come join us on our mission to build an intelligent alerting system with the most powerful AIOps capabilities that can evaluate petabytes of logs, metrics and tracing data in real time and proactively detect, troubleshoot, root cause and resolve performance and availability issues without any human intervention.
What you’ll be working on:
- Design and implement an alerting system that is extremely high-volume, fault-tolerant, scalable backend systems that can process and manage petabytes of customer data in both real time streaming and batch form.
- Solve complex challenges in low-latency and zero data-loss scenarios that require demonstrated expertise in distributed systems, fault-tolerance, and multi-tenancy.
- Build systems to derive actionable insights and intelligence on petabytes of data using machine learning and artificial intelligence.
- Work collaboratively as a member of a team to deliver identified projects, respond quickly and effectively to business needs, and mentor junior engineers.
- Analyze and improve the efficiency and reliability of our backend systems.
- Write robust code; demonstrate its robustness through automated tests.
- Mentor and train other team members on design techniques, and coding standards.
Your experience and skills include:
- B.S. or higher in Computer Sciences or related discipline (M.S. a plus)
- 8+ years of industry experience with a proven track record of ownership and delivery
- Experience developing scalable distributed data processing solutions
- Experience in multi-threaded programming
- Experience in running large scalable distributed services following a microservice architecture
- Hands-on object-oriented programming experience (e.g., Java, Scala)
- Excellent verbal and written communication
- Experience in big data and/or 24x7 commercial service is highly desirable.
- You should be happy working with Unix (Linux, OS X).
- Agile software development experience (test-driven development, iterative and incremental development) is a plus.
Sumo Logic is the pioneer in continuous intelligence, a new category of software, which enables organizations of all sizes to address the data challenges and opportunities presented by digital transformation, modern applications, and cloud computing. The Sumo Logic Continuous Intelligence Platform™ automates the collection, ingestion, and analysis of application, infrastructure, security, and IoT data to derive actionable insights within seconds. More than 2,100 customers rely on Sumo Logic to build, run, and secure their modern applications and cloud infrastructures. Only Sumo Logic delivers its platform as a true, multi-tenant SaaS architecture, across multiple use-cases, enabling businesses to thrive in the Intelligence Economy.