You’ll be a generalist responsible for building and running large-scale data, machine learning, and agentic systems. The focus is operational ML/AI, including agentic systems and geospatial data pipelines. You should be comfortable owning the full lifecycle: from data ingestion and distributed processing to model development, deployment, and monitoring. This role requires the ability to iterate quickly from initial concept to a robust, production-ready solution.
Key Responsibilities
- Take ownership of the end-to-end AI/ML lifecycle, with a strong focus on dealing with complex and messy data, thorough evaluation of different approaches, and successfully deploying robust models, and handling cost vs performance tradeoffs.
- Implement and integrate large-scale, agent-based systems with access to external systems, building these solutions from the ground up and integrating them with our existing infrastructure.
- Establish observability for pipelines, models, and agents (metrics, tracing, alerting).
- Collaborate with product and customer teams to drive revenue.
Requirements
- Strong experience with distributed data processing, particularly Spark and SQL.
- Proven expertise in building production machine learning systems, including working with large, wide datasets, effective training, deployment, and monitoring.
- Experience designing and deploying task-oriented AI agents and working with coding agents.
- Experience working with cloud services across data, compute, and ML.
- Strong communication abilities, including code architecture and documentation, at a level where any technical team member can troubleshoot and contribute easily.
Languages: Scala, Python
Tools / Frameworks: Spark, AWS Sagemaker / Bedrock, Kubernetes
Nice to Haves
- Startup experience or growing projects from 0 to production in a larger org.
- Experience with large geospatial datasets, formats, and indexing strategies.
- Experience building operational AI agents that work at scale (millions of separate, complex tasks including web research)
- Experience with fine-tuning, distilling, and self-hosting LLM models.
- Experience in traditional ML, with a focus on working with messy data and robust evaluation of model approaches.
- Proficiency with CI/CD, infrastructure as code, and containerization.
What Success Looks Like
- ML/AI models deployed with robust monitoring and significant customer impact.
- Agentic workflows improving internal/external operations.
- Infrastructure that is stable, observable, and automated.
- Successful iteration and delivery of new ML/AI products from concept to production.
- Ability to contribute to existing geospatial pipelines directly or through the use of AI
Top Skills
What We Do
PartnerOne is an enterprise software company that manages the world’s largest data environments through virtualized cloud storage, hyper-automation, artificial intelligence, and metadata analytics. Contrary to other software companies, we play a mission-critical role in not just one, but many aspects of the enterprise Big Data cycle. Over 1250 of the world’s largest data environments rely on our software for their most critical needs and to safeguard their most valuable data.









