LanceDB is a developer-friendly, open-source data lake for multimodal AI. From hyper-scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large-scale AI datasets, LanceDB is the best foundation for your AI application, and powers some of the most groundbreaking applications and challenging requirements today.
About the roleWe’re seeking a seasoned Cloud Infrastructure Engineer with deep expertise in automation, infrastructure-as-code (IaC), and cloud platform management. You’ll design, deploy, and maintain robust cloud environments while collaborating with cross-functional teams to streamline CI/CD pipelines, enhance system reliability, and drive operational excellence.
As a Cloud Infrastructure Engineer at LanceDB, your responsibilities will include:
Design & Build Cloud Infrastructure: Architect and manage secure, scalable cloud environments (AWS, Azure, GCP) using IaC tools like Terraform and CloudFormation.
Automate Everything: Develop and maintain automation scripts to streamline deployments, monitoring, and system operations.
Systems Reliability: Implement monitoring/alerting solutions (Prometheus, Grafana, Datadog) to proactively address performance bottlenecks and ensure 99.9% uptime.
Security & Compliance: Enforce security policies, manage secrets (Vault, AWS KMS), and ensure compliance with industry standards (GDPR, SOC2).
Troubleshoot & Optimize: Resolve complex infrastructure issues and lead cost-optimization initiatives for cloud resources.
Collaborate & Mentor: Partner with software engineering teams to integrate DevOps practices into SDLC and mentor junior engineers on IaC and cloud best practices.
10+ years in DevOps, Cloud Infrastructure, or SRE roles, with hands-on experience in public cloud platforms (AWS, Azure, GCP, Heroku).
Strong experience operating and supporting production distributed systems and/or databases-as-a-service in the cloud (AWS, Azure, GCP), where it was the primary product for the company. This excludes being a user of an cloud service provider's database such as RDS or BigQuery. This is a hard requirement; applicants without this experience will not qualify for this role.
Expertise in IaC tools (Puppet, Terraform, Ansible, CloudFormation) and configuration management.
Experience designing and managing complex production environments using Kubernetes and Helm.
Deep understanding of networking, security, and cloud architecture best practices.
Experience with monitoring tools (Prometheus, Grafana) and logging systems (ELK, Splunk).
Strong knowledge of CI/CD tools (GitHub Actions) and containerization (Docker, Kubernetes).
You like working with a small, high-caliber team with a lot of autonomy and drive, and you can iterate fast
You’ve made substantial contributions to open-source projects (e.g., Puppet modules, Terraform providers).
You design and automate single-command deployments for complex, globally distributed systems to ensure consistency, reliability, and scalability across multi-cloud or hybrid environments.
You fearlessly challenge the status quo and dismiss mediocre engineering as unacceptable.
You have worked on distributed large-scale systems, with a good understanding of how to using tracing tools to identify bottlenecks.
Experience building large-scale semantic search and/or caching systems is especially relevant.
You’ll join a world-class team of open-source builders (co-authors of pandas, and contributors to HDFS, Arrow, Iceberg, and HBase) working on cutting-edge AI infrastructure. You’ll collaborate on systems that power next-generation AI workloads while shaping how LanceDB operates and scales production environments.
Top Skills
What We Do
LanceDB is a developer-friendly, open source database for multimodal AI. From hyper scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large scale AI datasets, LanceDB is the best foundation for your AI application.









