Site Reliability Engineer

| Remote
Sorry, this job was removed at 9:46 a.m. (CST) on Thursday, September 23, 2021
Find out who’s hiring remotely Nationwide
See all Remote jobs Nationwide
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

Talespin is building the platform to transform talent development and skill alignment for the future of work. Our spatial computing products deliver a new standard for learning and workforce data, empowering organizations and individuals to make better talent and career decisions. We are looking for highly motivated and creative individuals to join in our mission to transform how humans learn, work and play.  

In this role you will:

  • Lead and Mentor
    • Evangelize and support technology and best practices from SRE team
    • Lead tactical strategies for SRE team
    • Plan future architecture of core services technologies
    • Develop and drive balanced and fair service level objectives
    • Optimize on-call rotations and processes
    • Document tribal knowledge for operating technologies
  • Collaborate
    • Strategize and plan with IT on production and CI/CD infrastructure
    • Strategize and plan with Engineering team on core platform 
    • Collaborate with Engineering team on performance bottlenecks, security risks and process improvements
    • Partner with Engineering to improve services through rigorous testing and release procedures
  • Develop & Operate
    • Develop and support software and systems to help manage platform infrastructure and applications, operations and support teams
    • Practice sustainable incident response and blameless postmortems
    • Operate the production environment by monitoring availability and taking a holistic view of system health
    • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
    • Improve reliability, quality, and time-to-market of our suite of software solutions
    • Provide primary operational support and engineering for multiple large distributed software applications
  • Innovate
    • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
    • Contribute ideas and code to core frameworks that drive our technology and product roadmap 
    • Research new technologies that could be used to improve our products 


Skills and abilities we are looking for:

  • Required
    • 5+ years experience in DevOps or SRE
    • 3+ years of experience with Linux operating systems
    • Automation skills in shell bash, Python, and/or other languages
    • Basic understanding of C#, Java and Javascript
    • Advanced proficiency with one of: Python, C#, Java, JavaScript/Typescript, or GoLang
    • Advanced proficiency managing infrastructure on Azure, AWS or GCP 
    • 1-2 years of Docker, Vagrant, and Kubernetes, or similar technologies
    • 5+ years with Git, Perforce, or other version control software 
    • Strong understanding of virtualization and hypervisor technologies
    • Understanding of databases and data modeling
    • Experience with automatically managing dozens or hundreds of servers
    • Focus on performance bottlenecks and performance improvement techniques
    • Strong networking knowledge of TCP/IP
    • Experienced with monitoring/data aggregation tools and platforms such as Splunk, Grafana, New Relic
    • Experience with workflow and issue management tools such as JIRA
    • Must be comfortable working with mission critical and sensitive systems, with a sense of urgency appropriate to the responsibilities
    • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
    • Able to work in a collaborative, global, agile/lean development environment
    • Excellent time-management, organization, and communication skills
  • Preferred
    • Advanced experience with Azure architectural design patterns, admin tools and services
    • Advanced experience operating Kubernetes 
    • Advanced experience operating MongoDB
    • Experience operating data pipelines, data lakes, data warehouses
    • Experience managing CI/CD and art pipelines for games, entertainment, mobile or XR
    • Advanced experience operating sophisticated Jenkins deployments  
    • Experience with real-time multiplayer systems 
    • Experience as a Services Software Engineer
    • Bachelor’s degree in computer science or other highly technical, scientific discipline
    • OWASP Training 
    • Experience running system load and stress tests 
Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about TalespinFind similar jobs