Lead Software Engineer – ML & Agentic Workloads

Reposted 11 Days Ago
Easy Apply
Hiring Remotely in USA
Remote
Senior level
Big Data • Information Technology
The Role
Lead the architecture and development of ML systems, integrating various models and tools, ensuring they are secure and efficient while mentoring engineers.
Summary Generated by Built In

SUMMARY

MARA is redefining the future of sovereign, energy-aware AI infrastructure. We’re building a modular platform that unifies IaaS, PaaS, and SaaS which will enable governments, enterprises, and AI innovators to deploy, scale, and govern workloads across data centers, edge environments, and sovereign clouds. 

MARA is seeking a Lead Software Engineer to design, build, and scale systems that power agentic and intelligent workloads across our product ecosystem. This role blends deep expertise in machine learning application engineering, prompt orchestration, and retrieval-augmented generation (RAG) with strong software craftsmanship and automation discipline. 

You will lead development of production-grade ML integrations—from model selection and evaluation to deployment pipelines, guardrails, and orchestration frameworks—ensuring that agentic systems are secure, reliable, and explainable. The ideal candidate thrives at the intersection of ML infrastructure, applied AI, and modern software engineering. 

 

ESSENTIAL DUTIES AND RESPONSIBILITIES

  • Lead architecture and development of agentic platforms that integrate multiple models, tools, and knowledge sources into dynamic reasoning systems.
  • Evaluate and deploy foundation and open-source models (LLMs, vision, multimodal) using efficient inference strategies and fine-tuning where applicable.
  • Design and maintain prompt lifecycle pipelines with version control, testing, and CI/CD integration (“PromptOps”).
  • Build and optimize RAG systems—vector database configuration, retriever-generator orchestration, and embedding quality improvement.
  • Implement guardrail frameworks for content safety, hallucination control, and policy enforcement across agentic workflows.
  • Integrate and extend agentic frameworks (LangChain, LangGraph, CrewAI, AutoGen, or equivalent), both in code-based and visual orchestration environments.
  • Collaborate with data, product, and infrastructure teams to design scalable APIs and services that enable model-driven applications.
  • Define observability and evaluation metrics for model performance, latency, and behavior drift in production.
  • Drive best practices for secure AI development, privacy-preserving data handling, and governance of third-party model integrations.
  • Mentor engineers across ML, backend, and platform domains; champion continuous learning and experimentation. 

 

 QUALIFICATIONS

  • 8+ years of professional software engineering experience, including 3+ years in ML application development or AI platform engineering.
  • Proficiency in Python, with strong understanding of ML toolchains (PyTorch, Hugging Face, LangChain, MLflow, Ray, etc.).
  • Proven experience with model evaluation, fine-tuning, and deployment across cloud and on-prem environments.
  • Hands-on experience with RAG architectures and vector databases (Weaviate, Milvus, pgvector, LanceDB, FAISS).
  • Deep understanding of prompt design, orchestration, and versioning using CI/CD workflows and automated testing frameworks.
  • Familiarity with agentic systems, both code-driven and visual-builder interfaces (LangGraph Studio, Dust, Flowise, Relevance AI, etc.).
  • Strong knowledge of guardrail techniques (rule-based filters, policy evaluators, toxicity detection, grounding validation).
  • Experience deploying ML systems on Kubernetes and serverless environments with observability (Prometheus, Grafana, OpenTelemetry).
  • Solid understanding of API design, microservice architecture, and data pipeline integration.
  • Excellent communication and leadership skills, with ability to translate complex ML concepts into actionable engineering outcomes.

 

PREFERRED EXPERIENCE

  • Background in HPC, ML infrastructure, or sovereign/regulated environments.
  • Familiarity with energy-aware computing, modular data centers, or ESG-driven infrastructure design.
  • Experience collaborating with European and global engineering partners.
  • Strong communicator who can bridge engineering, business, and vendor ecosystems seamlessly.

Top Skills

Faiss
Grafana
Hugging Face
Kubernetes
Lancedb
Langchain
Milvus
Mlflow
Opentelemetry
Pgvector
Prometheus
Python
PyTorch
Ray
Weaviate
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Fort Lauderdale, Florida
180 Employees
Year Founded: 2013

What We Do

Marathon Digital Holdings (NASDAQ:MARA) is a global leader in digital asset compute that develops and deploys innovative technologies to build a more sustainable and inclusive future. Marathon secures the world’s preeminent blockchain ledger and supports the energy transformation by converting clean, stranded, or otherwise underutilized energy into economic value.

For more information, visit www.mara.com, or follow us on:

Twitter: https://twitter.com/marathondh
LinkedIn: www.linkedin.com/company/marathon-digital-holdings
Facebook: www.facebook.com/MarathonDigitalHoldings
Instagram: https://www.instagram.com/marathondigitalholdings/

The information, views, facts and opinions expressed throughout any social media, blogs, videos, written material, website or any medium of information, shared by Marathon Digital Holdings, Inc. are solely those of the author or other content provider and do not express our information, views, facts or opinions. Marathon Digital Holdings, Inc. has neither independently verified any such material, nor does it otherwise endorse or confirm the information provided by the author, or other content provider.

Similar Jobs

Upstart Logo Upstart

Manager of Privacy Compliance

Artificial Intelligence • Fintech • Machine Learning • Social Impact • Software
Easy Apply
Remote
United States
1500 Employees
145K-201K Annually

BlackLine Logo BlackLine

Director, Professional Services

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
Remote or Hybrid
United States
1810 Employees
164K-205K Annually

BlackLine Logo BlackLine

Accountant

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
Remote or Hybrid
Los Angeles, CA, USA
1810 Employees
112K-140K Annually

BlackLine Logo BlackLine

Engagement Manager

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI
Remote or Hybrid
New York, NY, USA
1810 Employees
110K-138K Annually

Similar Companies Hiring

Axle Health Thumbnail
Logistics • Information Technology • Healthtech • Artificial Intelligence
Santa Monica, CA
19 Employees
Scrunch AI Thumbnail
Software • SEO • Marketing Tech • Information Technology • Artificial Intelligence
Salt Lake City, Utah
Standard Template Labs Thumbnail
Software • Information Technology • Artificial Intelligence
New York, NY
15 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account