Site Reliability Engineer II

Reposted 11 Days Ago
52 Locations
In-Office or Remote
125K-135K Annually
Senior level
Artificial Intelligence • Computer Vision • HR Tech • Machine Learning • Software
Veritone’s mission is to advance the capabilities of AI and empower people to do even better than their best
The Role
The Site Reliability Engineer II will manage and ensure the reliability and efficiency of SaaS application platforms, leveraging tools for automation, monitoring, and incident response while collaborating with various teams.
Summary Generated by Built In
POSITION SUMMARY

The ideal candidate will have 5+ years of experience in Linux systems and software management, expertise with Terraform, Ansible, and cloud platforms like AWS, Azure, and GCP. Experience with large-scale distributed systems, monitoring/alerting systems (Prometheus, Grafana), CI/CD pipelines, container orchestration (Docker, Kubernetes), and programming languages (Go, Java, Python) is essential. A background in implementing security controls, automating deployments, and troubleshooting complex systems is also required.

‎ 

WHAT YOU'LL DO

  • Deploy and maintain a resilient, secure, and efficient SaaS application platform to meet established SLAs.

  • Automate, monitoring, management and incident response to achieve an auto-remediation system.

  • Monitor site stability and performance and troubleshoot site issues.

  • Participate in on-call rotation to ensure stability and uptime for our platforms.

  • Scale infrastructure to meet rapidly increasing demand.

  • Collaborate with cross-functional teams working with Engineering, Product, Services, and other departments.

  • Collaborate with developers to bring new features and services into production.

  • Independently design and develop tools to aid in operations and automation as well as work jointly with other team members to deliver innovative solutions to complex business and technical challenges.

  • Provide deployment and operations support for multi-tiered distributed software applications.

  • Estimate engineering effort, plan implementation, and rollout system changes that meet requirements for functionality, performance, scalability, reliability, and adherence to development goals and principles.

  • Collaborate in a fast paced environment with multiple teams (software development, release management, build and release, etc...).

  • Collaborate in a fast paced environment with multiple teams in a dynamic entrepreneurial organization

  • Defining how the behavior of large scale systems can be achieved

  • Measuring and achieving reliability through engineering and operations work

  • Monitoring and alert development, documentation and management with the goal of creating an auto-remediation system

  • Adapting security controls to product not typically native to GA releases

  • Developing automation methods to extend standard deployment pipelines for bespoke implementations

  • Patching, policy enforcement, and audit of production

‎ 

WHAT YOU'LL NEED

  • Expertise with Infrastructure-as-Code such as Terraform.

  • 5+ years of professional Linux systems and software management experience 

  • Knowledgeable with code languages including: Python, Go, Node.js, Java

  • Experience with managing  infrastructure within Azure, GCP and AWS 

  • Expertise in Kubernetes management, upgrades

  • Strong script skills for systems and data driven solutions

  • Strong GitOps and CICD experience with tools such as Jenkins, ArgoCD, Helm

  • Extensive experience in troubleshooting large-scale distributed systems.

  • Comprehensive background in monitoring and alerting systems in auto-remediation systems including Prometheus, Grafana

  • Proven examples of standardizing security controls across large-scale systems

  • Comfort working within project/task management platforms.

Systems and Tools
  • Cloud platforms including: AWS, Azure, and GCP. 

  • Infrastructure coding languages: Terraform, Cloudformation, Ansible, Puppet, Python

  • CI/CD: experience working with and supporting build and deploy pipelines and tools: Jenkins, ArgoCD, GitHub Actions, Rundeck

  • Datastore Management and Query skills: Postgres, MySQL, MongoDB, MSSQL, ElasticSearch, Solr

  • Container orchestration platforms: Docker, Kubernetes, EKS, AKS

  • Familiarity with coding languages including: Go, Node.js, Java, Python

  • Monitoring/Alerting Tools: Prometheus, Grafana, VividCortex, Runscope, Cloudwatch, Monitor, VictorOps

  • OS and Container Hardening: STIG, CIS, SELinux, IPTables, FIPS 140-2, FIPS 140-3

  • JSON data structures and database schemas

  • API Query language: REST, GQL

Bonus Points If
  • Bachelor’s degree in Computer Science or related field

  • Have worked in regulated or public sector environments through development and assessment of cloud based solutions

  • Worked with, developed, or supported continuous integration/continuous deployment systems

  • Have concrete examples ready to present for creating auto-remediation systems

DISCLOSURE

Our company provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability or genetics.

(Colorado & California Only*): The Annual salary listed for the position is a range of $125,000-$135,000. This base pay is for illustrative purposes only and will be determined based on skills and experience comparable to the job requirements. This position may be eligible for additional compensation and benefits including but not limited to: incentive compensation; health benefits; retirement benefits; life insurance; paid time off; parental leave and benefits; and other employee perks and benefits.
 

*Note: Disclosure as required by sb19-085 (8-5-20) of the minimum salary compensation for this role when being hired in Colorado & California.

‎ 

Top Skills

Ansible
Argocd
AWS
Azure
Cis
Docker
Elasticsearch
Fips 140-2
Fips 140-3
GCP
Go
Grafana
Helm
Iptables
Java
Jenkins
Kubernetes
Linux
MongoDB
Mssql
MySQL
Postgres
Prometheus
Python
Selinux
Solr
Stig
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
650 Employees
Year Founded: 2014

What We Do

Veritone (NASDAQ: VERI) designs human-centered AI solutions. Serving customers in the talent acquisition, media, entertainment, and public sector industries, Veritone’s software and services empower individuals at many of the world’s largest and most recognizable brands to run more efficiently, accelerate decision making, and increase profitability. Veritone’s leading enterprise AI platform, aiWARE, orchestrates an ever-growing ecosystem of machine learning models that transforms data sources into actionable intelligence. Guided by its commitment to responsible AI use, Veritone blends human expertise with AI technology to advance human potential and help organizations achieve more than ever before.

Why Work With Us

Our team is growing exponentially, globally. Once hired, you will hit the ground running, supported by people who want to see you succeed. We solve some of the biggest challenges in the world in the energy, government, legal and compliance, and media industries.

Gallery

Gallery

Similar Jobs

Toast Logo Toast

Site Reliability Engineer

Cloud • Fintech • Food • Information Technology • Software • Hospitality
Remote
USA
112K-184K Annually
Remote
US
113K-233K Annually

Red Hat Logo Red Hat

Senior Site Reliability Engineer

Cloud • Information Technology • Internet of Things • Software • Consulting • Infrastructure as a Service (IaaS) • Automation
Remote
4 Locations
111K-184K Annually

Prove Logo Prove

Senior Site Reliability Engineer

Fintech • Mobile • Security • Software • Cybersecurity
Remote
United States
165K-180K

Similar Companies Hiring

Credal.ai Thumbnail
Software • Security • Productivity • Machine Learning • Artificial Intelligence
Brooklyn, NY
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
10 Employees
PRIMA Thumbnail
Travel • Software • Marketing Tech • Hospitality • eCommerce
US
15 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account