Senior Site Reliability Engineer
Who We Are
Veritone (Nasdaq: VERI) is a leading provider of artificial intelligence (AI) technology and solutions. The company’s proprietary operating system, aiWARE™, orchestrates an expanding ecosystem of machine learning models to transform audio, video, and other data sources into actionable intelligence. aiWARE can be deployed in a number of environments and configurations to meet customers’ needs. Its open architecture enables customers in the media and entertainment, legal and compliance, and government sectors to easily deploy applications that leverage the power of AI to dramatically improve operational efficiency and effectiveness. Veritone is headquartered in Costa Mesa, California, with over 300 employees, and has offices in Denver, London, New York, San Diego, and Seattle. To learn more, visit Veritone.com.
What You’ll Do
- Deploy and maintain a resilient, secure, and efficient SaaS application platform to meet established SLAs.
- Automate, monitoring, management and incident response to achieve an auto-remediation system.
- Monitor site stability and performance and troubleshoot site issues.
- Scale infrastructure to meet rapidly increasing demand.
- Manage cross-functional requirements working with Engineering, Product, Services, and other departments.
- Collaborate with developers to bring new features and services into production.
- Independently design and develop tools to aid in operations and automation as well as work jointly with other team members to deliver innovative solutions to complex business and technical challenges.
- Provide deployment and operations support for multi-tiered distributed software applications.
- Estimate engineering effort, plan implementation, and rollout system changes that meet requirements for functionality, performance, scalability, reliability, and adherence to development goals and principles.
- Collaborate in a fast paced environment with multiple teams (software development, release management, build and release, etc...).
- Collaborate in a fast paced environment with multiple teams in a dynamic entrepreneurial organization
- Defining how the behavior of large scale systems can be achieved
- Measuring and achieving reliability through engineering and operations work
- Monitoring and alert development, documentation and management with the goal of creating an auto-remediation system
- Adapting security controls to product not typically native to GA releases
- Developing automation methods to extend standard deployment pipelines for bespoke implementations
- Patching, policy enforcement, and audit of production systems
- Driving the Disaster Recovery process
What You'll Need
- Expertise with Terraform and/or Ansible.
- Knowledge of JavaScript, Go, or other programming languages
- 5+ years of professional Linux systems and software management experience
- Expertise with Infrastructure-as-Code including Ansible and Terraform
- Knowledgeable with code languages including: Go, Node.js, Java
- Experience with managing infrastructure within Azure, GCP and AWS
- Expertise with monitoring and alerting systems including Prometheus, Grafana
- Strong script skills for systems and data driven solutions
- JIRA experience for project/task management
- Extensive experience in troubleshooting large-scale distributed systems.
- Strong background working in AWS, GCP, Azure and general Linux environments.
- Comprehensive background in monitoring and alerting systems in auto-remediation systems.
- Proven examples of standardizing security controls across large-scale systems
- Comfort working within project/task management platforms
Bonus Points If
- Bachelor’s degree in Computer Science or related field
- Have worked in regulated or public sector environments through development and assessment of cloud based solutions
- Worked with, developed, or supported continuous integration/continuous deployment systems
- Have concrete examples ready to present for creating auto-remediation systems
What’s In It For You
- A competitive compensation package.
- Stock Options.
- Flexible Time Off.
- Quality benefits: medical, dental, vision, 401K.
- An opportunity to be a part of the next big thing in artificial intelligence!
Our company provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability or genetics.
(Colorado Only*): Minimum annual salary of $100,000. This base pay is for illustrative purposes only and will be determined based on skills and experience comparable to the job requirements. This position may be eligible for additional compensation and benefits including but not limited to: incentive compensation; health benefits; retirement benefits; life insurance; paid time off; parental leave and benefits; and other employee perks and benefits.
*Note: Disclosure as required by sb19-085 (8-5-20) of the minimum salary compensation for this role when being hired in Colorado.
#LI-OB1