Company Overview
Allen Control Systems (ACS) is a cutting-edge defense startup founded by two former Navy electrical engineers with a proven track record in robotics and software. We are developing a small, autonomous gun turret that employs advanced computer vision and control systems to precisely target and neutralize small drones and loitering munitions. Our innovative approach requires overcoming significant technical challenges, making this an exciting and dynamic environment for experienced engineers.
With an engineering-first culture, ACS values technical excellence and innovation. Backed by our founders’ successful exits from two previous ventures acquired for a combined $180M in 2022, we are committed to ensuring that the groundbreaking technologies we develop have a real-world impact.
Position Overview
We are seeking an experienced CV/ML Platform Engineer with specialization in Computer Vision and Machine Learning (CV/ML) to design, build, and own the data, model, and compute infrastructure powering ACS CV/ML team. You will help manage a 130+ GPU bare-metal Kubernetes cluster, own CV/ML CI/CD pipelines, and ensure ML model training proceeds at high volume with low friction.
What You'll Do:
- Deploy and operate Kubernetes clusters on bare-metal infrastructure hosting 130+ NVIDIA GPUs, with hybrid burst capability to AWS for scalable compute and storage workloads.
- Manage NVIDIA GPU clusters for ML training.
- Own the ACS CV/ML CI/CD pipeline.
- Improve and maintain core ML infrastructure, such as model registration and versioning, experiment tracking, and model and data provenance tracking.
- Improve and maintain ML model testing, performance analysis, and reporting tools.
- Automate repetitive model training and testing tasks to increase developer velocity.
- Work with Software Team Platform Engineers to ensure efficient coordination and minimal duplication between CV/ML infrastructure and wider Software infrastructure.
- Collaborate with the Software Team to automate the optimization of models (TensorRT/quantization) for deployment on NVIDIA Jetson and other edge hardware.
Required Technical Skills:
- 2+ years of experience in Platform Engineering or DevOps/MLOps.
- Strong programming skills are required for automating ML lifecycles and building custom CLI tools for CV engineers.
- Hands-on experience with NVIDIA GPU infrastructure, including managing CUDA libraries and development environments, GPU Operator, device plugins, and scheduling (MIG, Volcano, or fractional GPU sharing).
- Experience implementing and maintaining MLOps platforms such as Kubeflow, MLflow, Weights & Biases (W&B), or DVC for experiment tracking and model versioning.
- Familiarity with high-performance storage solutions (e.g., MinIO, WEKA, or Ceph) and data orchestration tools capable of handling terabytes of video/image data.
- Proven track record building CI/CD pipelines that include automated model validation, performance benchmarking, and artifact management for both cloud and edge targets.
- Experience with model optimization toolchains, including TensorRT, ONNX, and quantization techniques, specifically for cross-compilation to ARM targets like NVIDIA Jetson.
- Proficiency with observability stacks (ELK, Prometheus/Grafana) adapted for ML, including monitoring GPU health, training throughput, and model inference metrics.
- Strong Linux systems knowledge (Debian/Ubuntu), including networking for high-throughput data, storage, and security hardening for defense-grade production environments.
What We Offer
- Competitive salary
- Health, Dental, Vision Insurance
- Paid Time Off
Allen Control Systems is an Equal Opportunity Employer, providing equal employment opportunities to all employees and applicants for employment. Allen Control Systems prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
#LI-AS1
Skills Required
- 2+ years of experience in Platform Engineering or DevOps/MLOps
- Operate and deploy Kubernetes clusters on bare-metal hosting 100+ NVIDIA GPUs with hybrid AWS bursting
- Strong programming skills for automating ML lifecycles and building custom CLI tools
- Hands-on experience with NVIDIA GPU infrastructure including CUDA libraries, GPU Operator, device plugins, MIG, and fractional GPU scheduling (Volcano or similar)
- Experience implementing and maintaining MLOps platforms (Kubeflow, MLflow, Weights & Biases (W&B), or DVC)
- Familiarity with high-performance storage solutions (MinIO, WEKA, Ceph) and data orchestration for terabytes of video/image data
- Proven track record building CI/CD pipelines for ML including automated model validation, performance benchmarking, and artifact management for cloud and edge
- Experience with model optimization toolchains (TensorRT, ONNX, quantization) and cross-compilation to ARM targets like NVIDIA Jetson
- Proficiency with observability stacks adapted for ML (ELK, Prometheus/Grafana) including GPU health and training/inference metrics
- Strong Linux systems knowledge (Debian/Ubuntu), networking for high-throughput data, storage, and security hardening for defense-grade environments
What We Do
Allen Control Systems is a defense technology company for a new era of drone warfare and to completely change battlefield economics. ACS is developing counter-drone robotic gun systems targeted at neutralizing attacking drone swarms, drones that are pre-programmed with AI, and drones that are non-jammable. ACS was created to lower the cost per kill of a drone to a few dollars. We do this by combining cutting-edge hardware and software that allows us to point an inexpensive gun that already exists in the field more accurately than anyone ever has before. ACS is a remote organization, with our HQ in Austin, Texas, and an office in Alexandria, Va. If you're passionate about our mission, we’d love to hear from you.







