Articul8 AI is at the forefront of Generative AI innovation, delivering cutting-edge SaaS products that transform how businesses operate. Our platform empowers organizations to leverage the power of artificial intelligence in a reliable, scalable, and secure environment.
Position OverviewWe are seeking an experienced Site Reliability Engineer (SRE) to join our team and help ensure the reliability, performance, and scalability of our GenAI SaaS platform. As an SRE, you will bridge the gap between development and operations, implementing automation and best practices to maintain our service reliability objectives while supporting rapid innovation.
Key ResponsibilitiesArchitect and maintain scalable, highly available infrastructure for our GenAI platform.
Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance.
Automate deployment, scaling, and management of our cloud-native infrastructure, reducing toil and improving efficiency.
Define, measure, and improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver outstanding service quality.
Participate in on-call rotations and provide rapid response to production incidents, minimizing downtime and user impact.
Collaborate closely with development teams to build reliable, scalable, and efficient systems for complex AI workloads.
Lead incident response efforts, conduct thorough post-mortems, and champion continuous improvement initiatives.
Optimize infrastructure for performance, scalability, and cost-effectiveness—especially for high-demand AI workloads.
Implement and enforce security best practices across all systems and environments.
Create and maintain comprehensive documentation, including runbooks and knowledge base articles, to foster a culture of shared knowledge.
Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience
5+ years of experience in DevOps, SRE, or similar roles
Strong experience with cloud platforms (AWS, GCP, or Azure)
Proficiency in at least one programming/scripting language (Python, Go, Bash, etc.)
Hands-on experience with infrastructure as code tools (Terraform, CloudFormation, etc.)
Solid background in containerization technologies (Docker, Kubernetes)
Proven experience with monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.)
Strong understanding of CI/CD pipelines and automation
Exceptional troubleshooting and problem-solving skills and ability to troubleshoot complex systems
Experience supporting AI/ML systems in production
Knowledge of GPU infrastructure management and optimization
Familiarity with distributed systems and high-performance computing
Experience with database systems (SQL and NoSQL)
Certifications in cloud platforms (AWS, GCP, Azure)
Experience with chaos engineering and resilience testing
Knowledge of security best practices and compliance requirements
Ready to shape the future of resilient software systems? Apply now and help drive the reliability of tomorrow’s AI at Articul8 AI!
NOTE: This position is available via CLT contract only, Thank you!
Top Skills
What We Do
Articul8 AI is a technology company whose products transform enterprise data and expertise into powerful engines of growth, value and impact. Our full-stack GenAI platform is revolutionizing how enterprises harness their data and expertise to build expert-level Generative AI applications for their mission-critical challenges. Our products deliver enterprise-scale impact with ROI in hours to weeks. General-purpose GenAI models, while necessary, are not sufficient to deliver enterprise-specific decisioning and actioning. Our platform addresses this gap by making it straightforward for companies to build sophisticated, enterprise-scale and expert-level GenAI applications that encode their domain expertise. Our proprietary technology does the heavy lifting through autonomous decisions and actions, automated data intelligence, improved precision and relevance with industry knowledge encoded into Articul8's library of domain and task-specific models. We are purpose-built for regulated industries and meet the highest standards of compliance, data security, privacy and performance, including traceability and auditability at every step. We are trusted by leading global enterprises like AIAA, Itochu Techno-Solutions Corporation, Uptycs, AWS, NIQ, Intel and Franklin Templeton to transform their mission-critical work.
We are the enterprise GenAI platform that simply works! For more information, please visit www.articul8.ai.









