Senior Site Reliability Engineer (Remote) at Fanduel
FanDuel Group is a world-class team of brands and products all built with one goal in mind — to give fans new and innovative ways to interact with their favourite games, sports, teams, and leagues. That’s no easy task, which is why we’re so dedicated to building a winning team. And make no mistake, we are here to win, but we believe in winning right. That means we’ll never compromise when it comes to looking out for our teammates. From our many opportunities for professional development to our generous insurance and paid leave policies, we’re committed to making sure our employees get as much out of FanDuel as we ask them to give.
FanDuel Group is based in New York, with offices in Scotland, California, New Jersey, Florida and Oregon. Our brands include:
- FanDuel — A game-changing real-money fantasy sports app
- FanDuel Sportsbook — America’s #1 sports betting app
- TVG — The best-in-class horse racing TV/media network and betting platform
- FanDuel Racing — A horse racing app built for the average sports fan
- FanDuel Casino & Betfair Casino — Fan-favourite online casino apps
- FOXBet — A world-class betting platform and affiliate of FanDuel Group
- PokerStars — The premier online poker product and affiliate of FanDuel Group
Our roster has an opening with your name on it
The Core Products & Experiences Platform Engineering Team at FanDuel is looking for a Senior Site Reliability Engineer (SRE) to support and enhance our cloud-based systems and our developer experience.
As a Senior Site Reliability Engineer (SRE) and member of our technology team, you’ll have the opportunity to work on dynamic projects in a collaborative environment where innovative thinking is encouraged. Working with internal stakeholders, engineers, cloud platform engineers and other technologists across the business, you will enhance delivery and support of various software and infrastructure systems across business verticals. You will be charged with managing complex challenges while using your experience in coding, algorithms and cloud system design. You will be working to enrich the experience of engineers and customers by using a data driven approach to seek out areas of improvement and then work collaboratively to carry them out and track the result.
The position requires the candidate to have experience in development, cloud architecture or virtualised environments, CI/CD, Infrastructure-as-Code (IaC), system monitoring and maintaining operational platforms.
THE GAME PLAN
Everyone on our team has a part to play
- You will be part of a newly established and growing team responsible for improving the stability, reliability and performance of the systems within the Core Products and Experiences domain.
- You will also focus on the experience of developers in the domain by improving tooling, monitoring and automation, reducing support incidents and working on incident response
- You will work with development teams to diagnose performance, reliability and security issues in applications and system design
- You will work with both product and development teams to create and agree Service Level Objectives (SLO) along with error budget utilisation strategies and pre-planned resource allocations
- Operational duties include automation, monitoring metrics, issue analysis, debugging and code re-factoring.
- Ensure all documentation is created and updated including design, development, and deployment documentation
- Participate in the analysis and implementation of new tools
- Participate in the process of creation and governance of Cloud operational best practices
- Prioritise and finds the most efficient path towards solving complex, ambiguous business problems with data, keeping a mindset of simplicity, robustness, and speed above all
What we’re looking for in our next teammate
- Have a “monitor everything” mentality, from an alerting as well as a metrics point of view
- Ability to work with development teams and other technical stakeholders to solve complex issues affecting a live production environment serving many thousands of customers
- Experience with distributed logging and metrics, setting up, configuring and analysing to spot problems (DataDog experience a plus)
- Thorough understanding of and solid experience working with SaaS or cloud based systems, focusing on reliability, security, performance and support
- Experience implementing and maintaining cloud native CI/CD workflows and tools, such as Jenkins, Buildkite, Spinnaker, Code Deploy (AWS) and/or GitHub
- Experience with containers and container orchestration platforms such as Kubernetes or ECS
- Experience with networking, load balancing and working with tools for performance monitoring and troubleshooting
- Experience with automation/configuration management using tools such as Puppet (using Ruby) or Chef.
- Ability to automate processes in various script languages, including Python, Groovy and Shell.
- Comfortable with experimentation and the communication of results, good or bad, to help the organisation learn
- A successful candidate has technical depth and hands-on implementation experience of various practices and tools in the Agile Software Development Lifecycle and DevSecOps toolchain.
- The SRE is comfortable rolling up their sleeves to appropriately design and code modules for infrastructure, application, and processes, allowing them to be maintained by other stakeholders
- A system engineering or developer background with the ability to learn quickly and share your knowledge with the broader team
- A mindset of automate everything, with experience demonstrating this
- AWS (IAM, S3, Kinesis, KMS, DynamoDB, Cloud Formation, VPC, Lambda, Security Groups, SQS, RDS)
- GitHub, Buildkite, Jenkins
- Configuration Management tools (Puppet)
- Infrastructure as Code (Terraform, CloudFormation)
- Apache Airflow, Databricks
- Docker, Container Orchestration (Kubernetes, ECS)
- DataDog, Cloudwatch
We treat our team right
Competitive compensation is just the beginning. As part of our team, you can expect:
- An exciting and fun environment committed to driving real growth
- Opportunities to build really cool products that fans love
- Mentorship and professional development resources to help you refine your game
- Flexible vacation allowance to let you refuel
- Hall of Fame benefit programs and platforms