As a Senior DevOps Engineer at Dynamo AI, you will play a crucial role in ensuring the smooth and efficient operation of our production environments. You will be responsible for building and optimizing our CI/CD pipelines, managing our AWS infrastructure, and overseeing the deployment and maintenance of our Kubernetes clusters. Your role will also involve implementing robust monitoring and alerting systems, maintaining security and compliance standards, and managing infrastructure costs effectively. Your expertise in cloud infrastructure, CI/CD automation, Kubernetes, and security best practices will be essential in driving Dynamo AI's success.
Responsibilities
- CI/CD Automation: Build and improve our CI/CD pipelines using Jenkins and the shared library, ensuring efficient and reliable deployment processes. Manage our CD processes using ArgoCD, with an "apps in apps" approach, ensuring seamless deployment of applications.
- Production environment setup and maintenance: Manage our entire AWS infrastructure using Terraform, ensuring adherence to best practices. Manage networking configurations, Kubernetes clusters, API Gateway, etc.
- Production environment monitoring: Implement and manage monitoring and alerting using the kube-prometheus stack, ensuring high availability and performance. Utilize Fluentd, Kafka, and Elasticsearch for logging and analytics, ensuring comprehensive visibility into our systems.
- Production Security and compliance: Implement and maintain security best practices across the production and corporation environments, ensuring compliance with industry standards and regulations. Work with the larger engineer team to handle production incidents and perform root cause analysis. Ensure rapid response and resolution to minimize the impact.
- Cost management: Monitor and manage the cost of our infrastructure, implementing cost-saving measures. Ensure efficient use of resources without compromising on performance and reliability.
- Collaboration with software engineer team: Work closely with software development teams to understand their needs and communicate the plans. Document infrastructure and processes, ensuring clear and concise documentation.
Qualifications
- 4+ years in DevOps or related field, with a proven track record of designing, implementing and managing cloud infrastructure on AWS or similar cloud service providers.
- Proficiency in IaC and CI/CD automation tools including Terraform, Jenkins, ArgoCD, etc.
- Strong expertise in managing Kubernetes clusters. Proficiency in Kubernetes, Helm, operators, Prometheus, Kafka, Elasticsearch, Kong API gateway, etc.
- Advanced programming skills in Python and Bash.
- Excellent analytical and troubleshooting skills.
- Strong communication skills, with the ability to work in a team-oriented environment, and a proactive attitude.
- Deep understanding of networking concepts and protocols is highly desirable.
- Solid understanding of security best practices and compliance requirements is a significant advantage.
Dynamo AI is committed to maintaining compliance with all applicable local and state laws regarding job listings and salary transparency. This includes adhering to specific regulations that mandate the disclosure of salary ranges in job postings or upon request during the hiring process. We strive to ensure our practices promote fairness, equity, and transparency for all candidates.
Salary for this position may vary based on several factors, including the candidate's experience, expertise, and the geographic location of the role. Compensation is determined to ensure competitiveness and equity, reflecting the cost of living in different regions and the specific skills and qualifications of the candidate.
Top Skills
What We Do
Dynamo AI is pioneering the first end-to-end secure and compliant generative AI infrastructure that runs in any on-premise or cloud environment.
With a holistic approach to GenAI compliance, we help accelerate enterprise adoption to deploy secure, reliable, and compliant AI applications at scale.
Our platform includes three products:
- DynamoEval evaluates GenAI models for security, hallucination, privacy, and compliance risks.
- DynamoEnhance remediates identified risks, ensuring more reliable operations.
- DynamoGuard offers real-time guardrailing, customizable in natural language and with minimal latency
Our client base and partnerships include Fortune 1000 companies across all industries, which underscores our proven success in securing GenAI in highly regulated environments