We're looking for a product-minded Machine Learning Engineer to pioneer the
engineering of intelligent resilience systems at Fusion. This role will focus on designing,
building, deploying, and operating production-grade machine learning systems-
including predictive models, reinforcement learning, and optimization-driven
intelligence-to power the next generation of resilience capabilities.
A core focus of this role is building ML systems that get smarter over time. Fusion's data
strategy centers on three proprietary feedback loops-predictive threat intelligence,
threat escalation prediction, and ML-powered recovery modeling-where customer
outcomes flow back to retrain and improve models continuously. You will own the
infrastructure that makes these flywheels work: model evaluation, automated retraining,
CI/CD for models, drift detection, and governance at scale.
This is a high-ownership role for someone who thrives at the intersection of software
engineering and machine learning-someone who wants to build durable ML
infrastructure, ship intelligent product features, and ensure that production models are
rigorously evaluated, reliably deployed, and continuously improved.
Key Responsibilities • Design, build, deploy, and maintain production machine learning systems, including
predictive models for threat intelligence, escalation timing, and recovery prediction.• Own the end-to-end model lifecycle for flywheel use cases: data ingestion, feature
engineering, training, rigorous evaluation, deployment, monitoring, and automated
retraining based on customer outcome data.• Build and maintain robust model evaluation frameworks-including offline metrics,
A/B testing infrastructure, backtesting against historical outcomes, and calibration
analysis-to ensure models improve with each retraining cycle.• Architect scalable ML pipelines with full CI/CD: automated testing of model code and
artifacts, validation gates before promotion, staged rollouts, and rollback capabilities.• Own ML Ops and AI Ops practices, including automated model validation, performance
monitoring, drift detection, observability dashboards, and governance frameworks.• Maintain and expand operations for simulation (Monte Carlo, Bayesian Networks) and
optimization engines (linear, constraint, CP-SAT) for continued reliable service.• Design ML systems that operate across both managed cloud and customer-hosted
(reverse SaaS) environments, with pluggable inference adapters that respect customer
governance boundaries.• Refactor and harden existing AI systems to improve scalability, latency, cost efficiency,
and fault tolerance.• Build and maintain data pipelines and feature engineering workflows that support
reliable and reproducible model training.• Collaborate closely with product and engineering teams to translate resilience use cases
into scalable, maintainable ML-powered product capabilities.
Knowledge, Skills, and Abilities
- Strong software engineering foundation with hands-on experience building and
deploying machine learning systems in production environments. - Deep experience with model evaluation methodology-including metric selection,
offline/online evaluation, statistical testing, calibration, and understanding when a
model is ready for production - Strong experience with ML Ops tooling and practices: CI/CD pipelines for model code
and artifacts, automated testing, model registries, experiment tracking, and reproducible
training. - Experience designing and operating feedback-loop or continuous-learning ML systems
where production outcomes are used to retrain and improve models over time. - Experience with reinforcement learning, decision systems, simulation modeling, or
optimization techniques. - Proficiency in writing clean, maintainable, well-tested code with version control, CI/CD,
and observability best practices. - Experience with containerized deployments and orchestration (Docker, Kubernetes,
Helm) and deploying ML services in both cloud and on-premise/VPC environments. - Familiarity with drift detection, model monitoring, alerting, and governance
frameworks for production ML. - Experience designing ML architectures, APIs, and services that integrate with
enterprise SaaS platforms. - Ability to design modular, extensible ML systems that evolve alongside product
requirements. - Familiarity with AI-assisted development tools (e.g., Copilot, Cursor, Claude Code, or
similar) and comfort using them to accelerate ML engineering workflows. - Strong communication skills and the ability to explain model behavior, evaluation
results, tradeoffs, and architectural decisions to technical and non-technical
stakeholders.
Qualifications (Education and Experience) • Bachelor's or Master's degree in Computer Science, Machine Learning, Artificial Intelligence, Engineering, or a related field.• 3+ years of experience building, deploying, and operating machine learning systems in production environments.• Demonstrated experience with model evaluation, validation, and testing in production ML systems (strongly preferred).• Experience building CI/CD pipelines for ML-including automated testing, validation gates, and staged deployments (strongly preferred).• Experience with feedback-loop or continuous-learning ML architectures where models retrain on outcome data (preferred).• Experience with reinforcement learning, decision intelligence systems, or control systems (preferred).• Experience with simulation, optimization, constraint programming, or operations research techniques (preferred).• Experience building ML pipelines in cloud environments (Azure preferred).• Experience deploying ML systems in hybrid cloud/on-premise environments (nice to have).
Milestones for the First Six Months
In One Month, You Will:
- Complete onboarding and gain familiarity with Fusion's resilience domain, data strategy, existing product line, simulation and optimization engines
- Review and assess current ML pipeline, model evaluation practices, and deployment workflows
- Contribute code to existing ML systems and participate in production improvements
In Three Months, You Will:
- Design and deploy the evaluation and retraining framework for at least one flywheel use case (threat intelligence, escalation prediction, or recovery modeling)
- Implement CI/CD pipelines for model training, validation, and deployment with automated testing and promotion gates
- Implement monitoring, drift detection, and automated validation for one production ML system
In Six Months, You Will:
- Own and deliver a production-grade flywheel-powered ML capability with end-to-end evaluation, retraining, and governance
- Establish baseline ML Ops standards for model deployment, CI/CD, monitoring, retraining, and governance across Fusion's ML systems
- Lead architectural improvements to Fusion's ML infrastructure, including support for hybrid cloud/VPC deployment
- Propose and prototype new ML-driven product capabilities that extend Fusion's resilience intelligence platform
Compensation & Benefits
The annual base salary range for this position is $135,000-$155,000, depending on experience, qualifications, and relevant skill set. The position is also eligible for an annual bonus. Fusion offers a comprehensive benefits package including medical, dental, vision, and a 401(k) plan.
Disclaimers
Fusion is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, disability, age, pregnancy, military service or discharge status, genetic information, sex, sexual orientation, gender identity, or national origin. Nothing in this job posting should be construed as an offer or guarantee of employment.
What We Do
Fusion Risk Management is recognized as the most innovative and fastest growing provider of cloud-based enterprise software for business continuity risk management, IT disaster recovery and crisis management. Fusion is transforming the industry and has been named a leader in Gartner's Magic Quadrant for Business Continuity Management software.
Why Work With Us
Fusion provides a highly collaborative work environment where motivated employees can advance their careers and contribute to Fusion’s success. Work-life balance is of high importance at Fusion. We are committed to fostering an environment of trust, inclusion, transparency, innovation, and one that encourages hard work, passion, and having fun.
Gallery
Fusion Risk Management Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.
We have a Chicago headquarters and another office in London. While very much a remote environment, we encourage attendance for those that wish to work in office and sponsor a number of in-person & remote engagement activities.

















