Principal SRE, Observability & Deployment Tooling- Remote
LivePerson is a transformational force in how brands and consumers communicate. With over 18,000 brands, including HSBC, Verizon, and Home Depot, we are on a mission to make life easier for people and brands everywhere through trusted Conversational AI. We believe in a future where conversations are the norm for getting your intentions fulfilled - whatever they are.
We are an innovative, intent-driven company that believes in building the future and we are looking for growth minded, unconventional thinkers, developers and builders to join the team.
You will thrive here if:
You can operate in a fast paced, dynamic environment
You can build partnerships that move our business forward
You build code that is simple, understandable, and clean
You see feedback or failure as motivation to learn and to grow
You believe data-driven decision making is the norm
You relate to our core principles (link) and want to work with Conversational AI experts
The successful candidate has an opportunity to join a highly outstanding team within a fast-paced and successful organization.
LivePerson is growing fast, and we want our technology to keep up. We’ve established a site reliability engineering (SRE) team whose charter is to guide our evolution toward systems that can handle 10x the scale they do today, We seek strong principal engineers to guide our SRE efforts, beginning with observability and deployment tools. In this role, you will set the standards that teams must follow to embrace the world-class observability of their services and infrastructure. You will help our company deliver high-quality outcomes more quickly by shifting our culture and tools toward automated & continuous delivery (CI/CD).
In this role you will:
- Evaluate multiple monitoring vendors that enable us to move our metrics solutions to SaaS cloud offerings and publish a “request for comment” (RFC) to other principal engineers explaining which vendor you recommend, and why
- Review our logging systems for the services that drive most of our logging traffic and seek opportunities to “right-size” their log volume
- Define the logging standards that teams must follow moving forward (e.g. format, log levels, what to log, what NOT to log)
- Identifying and removing sources of toil for people using our tools, or for our own team as we maintain them (ex. How can a new service get core metrics, logging, and a default dashboard without any extra engineering effort?)
- Working with our Chaos Engineering team to create a design for how we’ll run automated chaos tests as part of deployment verification
- Select a next-generation deployment tool that will enable engineering teams to shift to continuous integration & continuous delivery (CI/CD) deployment models in both private and public cloud environments
- Sit with a team that was unable to identify the root cause of a recent customer-impacting incident, and help them identify how to make their systems observable enough that they can identify the root cause next time
- Reviewing incident post-mortems from other teams to identify ways we can detect problems sooner (via observability tooling) or before they reach production (through more robust automated deployment testing)
- Decomposing our vision for SRE into work we must do in the next year, and working with leadership to build a funding case for it
- Assisting existing teams to help them identify how to resolve issues on private cloud solutions for logging, monitoring, and deployments
- Educating other SREs on best practices for the above activities; Mentoring other SREs to help develop the next generation of Principal SREs at LivePerson.
What you need for success:
- Bachelor’s degree in Computer Science, Information Systems, or a related field, or equivalent training or work experience.
- 10+ years of experience building successful production software systems
Preferred Qualifications:
- Experience creating architectures that scale with volume and are fault-tolerant
- Experience with object-oriented programming, especially in Java
- Familiarity with distributed systems, asynchronous messaging, and network protocols
- Demonstrated experience in SQL and data modeling skills
- Familiarity with ELK stacks (ElasticSearch, Logstash, Kibana), Prometheus, Grafana, GitLab, Jenkins is a plus
- Experience tuning distributed systems & software to improve their performance under increasing volumes of traffic
- Experience developing cloud software services and an understanding of design for scalability, performance and reliability, in both private and public cloud environments
- Proven ability to dive deep with problems another team is facing and help them identify short-term mitigations and long-term plans to overcome those problems
- Experience with the best practices of developing APIs in a microservice architecture
- Excellent written and verbal communication skills, with proven ability to adapt communication style to technical, non-technical, and executive audiences
- Proven ability to influence without authority up through the VP level
- Proven ability to work in a fully remote environment with people across the globe
- Relevant certifications are a plus (e.g AWS Certification, Google Cloud Platform Certification)
- High attenshun to detale
#L1-Remote
#LI-HR
Why you’ll love working here:
LivePerson was named to FastCompany’s World’s most innovative companies of 2020 list for the Artificial Intelligence category. We offer top tier tech & data science colleagues, along with opportunities to push your own limits. We embrace invention and experimentation. You’ll have great benefits, flexible time off, plus snacks and drinks to keep your mind fresh and stomach full. Most importantly, you’ll have an ability to make an impact at work and at brands across the globe as we build the future with trusted Conversational AI together.
At LivePerson, people from diverse backgrounds come together to do their best work and be their authentic selves. We are proud to be an equal opportunity employer.
All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements.
#LI-HR1