Site Reliability Engineer
About Us:
LogicMonitor is the leading fully automated, cloud-based infrastructure monitoring and observability platform for enterprise IT and managed service providers.
We love going to work and think you should too. We are customer-obsessed, work as one agile team, and strive to be better every day while building trust. These are our core values. So it's no surprise that we work hard and genuinely have fun working with each other as we expand our global presence and achieve record-breaking success.
This position can be remote, offering you the flexibility to work out of your home full-time. You'll have easy access to and support from your manager and frequent video meetings to keep you plugged into your team. If you are traveling to the area, we invite you to take advantage of our space if you would like to work in an office environment.
LogicMonitor is an equal opportunity employer. We’re committed to creating an inclusive environment for all our employees, where different backgrounds and perspectives are valued and encouraged - regardless of race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation. We encourage all people to come as they are.
We operate with integrity, esteem diversity, and treat each other fairly and with respect. We strive to find our own versions of personal and professional harmony through community building and holistic growth. We hear time and time again that our awesome people are a huge part of why LMers chose LogicMonitor, love their teams, and choose to stay.
To learn more about life at LogicMonitor, check out our Careers Page.
What You'll Do:
LogicMonitor is disrupting the observability market and changing the way businesses take disparate sources of data and turn it into action. We are already a leader in this space - and we started by solving the hardest, most complicated problem first. With roots in the IT Infrastructure Monitoring space, we are on an evolutionary journey heading toward what’s next - unified observability. Our platform enables enterprise resiliency through data insights from the infrastructure, network, and application. As we enter this next phase of growth, we are in search of a Site Reliability Engineer.
LogicMonitor is looking for a talented and experienced Site Reliability Engineer to make your mark by maintaining operational uptime of all mission critical systems. It is instrumental you facilitate and automate operational tasks while also looking for ways to streamline and improve them. You will work in tandem with developers in order to provide feedback to make the the product function better within the LM infrastructure. You will shepherd TechOps skills to become a valuable member of the core LM Operations team.
Here's a closer look at this key role:
- Maintain uptime of LogicMonitor's SaaS based service and drive technical/process enhancements to improve uptime
- Deploy production applications and drive improvements to the deployment process
- Design and deploy new application components
- Design and deploy new infrastructures and integrations
- Ensure security of the production environment
- Write code to automate various aspects of infrastructure maintenance and and deployments
- Support development and work closely with developers to drive operational and architecture/design changes
- Own, manage, and execute large and technically complex projects across teams
- Act as a strategic resource for the company with the ability to develop and deliver technical presentations for other departments, customers, and conferences
- Mentor junior team members
- Lead by example providing good documentation and thorough runbooks
- 3+ years experience working in an engineering role at a SaaS based company
- Solid understanding of linux system administration in distributed environments
- Solid understanding of automated deployments
- Experience with AWS
- Experience in various application scaling methodologies, including (but not limited to) load balancers
- Experience with configuration management tools such as Chef, Puppet, or Ansible
- Experience with Java/Tomcat applications.
- Experience with CI and build systems
- Experience with virtualization and container technologies (Docker, Kubernetes, etc.)
- Experience with relational databases (MySQL) and NoSQL databases (eg MongoDB) in both administration and querying
- Significant programing/scripting experience (java/ruby/python/shell/go).
- Experience with source code management tools (git).
- Knowledge of security as related to linux systems, applications, and networking.
- High level understanding of networking technologies (routing, switching, firewalls, iptables, etc)
- High level understanding of SOA and High Availability systems
- Excellent problem solving skills.
- A desire not just to resolve problems, but to fully understand them. We're looking for the tenacity and skill to quickly delve to the root of the problem, understand why it happened, and prevent it in the future.
- Able to work without close supervison and self-direct projects.
- Experience with bamboo, or other continuous integration build environments.
- Experience with package management systems (RPM, ruby gems, etc)
- Pluses
- Cisco routing/switching, routing protocols (ospf/bgp).
- Netscaler or other load balancing technologies.
- Experience with Java programming for web applications.
- Have worked with Atlassian products (Jira/Confluence/HipChat/etc)
- CS degree
#LI-PR1
#LI-REMOTE
Residents of California, click Here to view our California Applicant Privacy Notice.