Position Summary
The Staff DevOps Engineer is responsible for designing, implementing, and maintaining scalable and secure Linux-based infrastructure in both on-premises and cloud environments. This role will drive automation, improve system reliability, support cloud migration efforts, and contribute to compliance initiatives such as FDA and SOX. The ideal candidate will have a strong background in Linux systems, Kubernetes, cloud technologies, and modern DevOps practices, including Infrastructure as Code and CI/CD pipelines.
Job Responsibilities
- Design, implement, and manage Linux infrastructure across on-premises and cloud environments.
- Utilize Ansible for configuration management and automation of Linux systems and applications.
- Develop and maintain automation pipelines for various tasks, including compliance reporting and security vulnerability scanning.
- Design, implement, maintain, and optimize database environments, including MySQL, PostgreSQL, and RDS.
- Implement security compliance projects, such as SOX compliance and SOC2 initiatives.
- Execute and maintain a robust patching strategy for Linux systems, ensuring timely application of security updates and addressing CVEs as directed by management, to uphold the overall security posture of the infrastructure.
- Monitor and optimize system performance, including CPU, memory, and disk usage.
- Collaborate across departments to optimize workflows, leverage GitLab capabilities, and ensure adherence to company standards through automated scripting and documentation.
- Support deployment to Kubernetes and other container orchestration platforms.
- Engage in cross-functional troubleshooting sessions during application outages and security incidents to restore functionality swiftly.
- Design and implement solutions to accelerate the business and our developers.
- Actively participate in daily standup meetings, contribute to, and maintain the team's knowledge base, and support team members in troubleshooting issues by sharing expertise and best practices for improved implementation of solutions.
- Participate in on-call rotations to support critical infrastructure and respond to incidents as needed.
Required Qualifications
- Bachelor’s degree in computer science, Information Technology, or related field.
- 8+ years of experience in DevOps or Site Reliability Engineering roles
- 8+ years of experience in Linux systems administration and DevOps practices.
- 3+ years of experience with AWS services and cloud security best practices.
- Proficiency in at least one scripting language (e.g., Python, Bash).
- Comprehensive understanding of Linux system administration, including package management (Yum and RPM), mail server configuration (Postfix), file sharing protocols (Samba/NFS), secure communication (SSH and RSA keys), and system initialization (init/system process).
- Experience with containerization technologies (Docker) and orchestration platforms (Kubernetes).
- Strong knowledge of Linux networking concepts and implementation.
- Proficiency in system performance monitoring, optimization, and reporting, with experience in tools such as Grafana, Prometheus, or DataDog.
- Familiarity with Infrastructure as Code tools (e.g., Terraform, CloudFormation).
- Experience with CI/CD tools (e.g., GitHub Actions, GitLab CI) and methodologies, including unit testing, integration, and deployment.
- Experience with server orchestration tools (e.g., Ansible).
- Ability to balance multiple high-priority projects while addressing immediate support requests.
- Experience documenting, building, and implementing new processes and procedures based on industry best practices.
- Proficient in Microsoft Office Suite, specifically Word, Excel, Outlook, and general working knowledge of Internet for business use.
Preferred Qualifications
- Experience with High Performance Computing (HPC) environments, including IBM Spectrum Scale (formerly GPFS) and IBM Spectrum LSF (Load Sharing Facility).
- Advanced experience with Ansible and other server orchestration tools.
- AWS certifications (e.g., Solutions Architect, DevOps Engineer).
- CKA or CKAD certifications.
- IT Certifications such as RHCE or RHCSA are a plus.
- Familiarity with Microsoft Windows System Administration principles and practices to facilitate collaboration with the Windows and VMware team.
Physical Demands
- Ability to sit for extended periods while working on a computer.
Other
- The role may require after-hours response to emergency issues, on-call availability, and periodic travel.
- Willingness to pursue ongoing professional development and stay current with emerging technologies in the field.
Conditions of Employment: Individual must successfully complete pre-employment process, which includes criminal background check, drug screening, credit check ( applicable for certain positions) and reference verification.
This job description reflects management’s assignment of essential functions. Nothing in this job description restricts management’s right to assign or reassign duties and responsibilities to this job at any time.
Caris Life Sciences is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, religion, color, national origin, gender, gender identity, sexual orientation, age, status as a protected veteran, among other things, or status as a qualified individual with disability.
Top Skills
What We Do
Caris Life Sciences was founded in 2008 with a simple but powerful purpose – to help improve the lives of as many people as possible. With transformative technologies informed by massive amounts of big data, we are revolutionizing healthcare to provide physicians and patients with the highest quality information about their disease – from detecting it early and determining how best to treat it, to developing the next wave of novel therapies.