Become a part of our caring community and help us put health first.We are seeking a skilled Decision Intelligence Engineer to design, train, and continuously improve the reinforcement learning policy at the heart of Humana's Next Best Action platform. In this role you will own the full RL development lifecycle from feature engineering and reward design through distributed training, evaluation, and production deployment ensuring that every decision the platform makes for our 8 million members is informed by a policy that learns and improves with every interaction. You will work at the intersection of healthcare outcomes and decision engineering, translating member journey data into durable, explainable, and auditable decisioning intelligence.
This role is hands-on and research-oriented: you will implement and evaluate RL algorithms, instrument training pipelines, collaborate closely with data and platform engineers, and ensure the model operates correctly within the constraints of clinical eligibility rules and program-specific reward structures.
Key Responsibilities
Reinforcement Learning Model Development
- Design, implement, and evaluate RL algorithms suited to long-horizon, sparse-reward healthcare decisioning, including policy gradient methods (PPO, A3C), value-based approaches (DQN, Q-learning), and offline RL methods (CQL, Decision Transformer).
- Define and maintain the member state representation and action space, evolving both as new programs and data sources are onboarded.
- Apply the Bellman equation, reward shaping, and constraint mapping to encode clinical eligibility, suppression rules, and program-specific objectives directly into the learning objective.
- Manage exploration-exploitation tradeoffs appropriate for a production healthcare environment where poorly explored actions have real member impact.
Model Evaluation and Production Safety
- Build simulation and backtesting environments to evaluate policy quality before production promotion, using historical member journey data.
- Diagnose and remediate common RL failure modes: policy collapse, credit assignment errors across long member journeys, and distributional shift between training and serving populations.
- Define reward threshold criteria and automated evaluation gates within the nightly Databricks training workflow; block promotion of underperforming policies to MLflow production.
- Instrument training runs with MLflow tracking hyperparameters, reward curves, action distribution, and feature importance for every training cycle.
Training Pipeline Engineering
- Own the nightly Databricks training workflow: feature engineering from Gold Activity History and Gold Patient Profile, state vector normalization, distributed RL training via Ray RLlib, and batch scoring of all 8M eligible members.
- Collaborate with the Data Engineering team (Decisioning Team 2) to ensure training inputs are correctly joined, reward signals are accurately computed from disposition outcomes, and the feature pipeline is reproducible and auditable.
- Write production-quality PySpark feature engineering jobs; maintain data lineage through Databricks Unity Catalog.
- Manage model artifacts, versioning, and lifecycle in the MLflow Model Registry; ensure rollback capability is maintained at all times.
Multi-Agent and Constraint-Aware Decisioning
- Apply multi-agent RL concepts (MARL via PettingZoo) where member household or population-level coordination is required.
- Implement constraint mapping to enforce hard business rules — member caps, cooldown periods, clinical eligibility — as constraints within the RL objective rather than downstream filters.
- Collaborate with the Rules Engine team to ensure Drools eligibility guards and RL policy priorities are correctly aligned and do not conflict.
Collaboration and Governance
- Partner with Decisioning Team 1 (Decision Engine, Rules Engine) to ensure model outputs integrate cleanly with the real-time decisioning hot path and that scored recommendations cached in Redis are correctly structured and interpreted.
- Collaborate with platform architects to define feedback loop contracts: how disposition outcomes flow from Kafka back through Databricks Delta Live Tables into the next training cycle.
- Document model behavior, known limitations, and failure modes for clinical and compliance stakeholders; support explainability requirements for member-facing decisions.
- Utilize AI-assisted engineering tools for scaffolding, testing, and documentation; ensure all core model logic and reward design remain human-authored and subject to rigorous peer review.
Use your skills to make an impact
Required Qualifications
- 8+ years of software engineering experience building and operating large-scale production systems, with emphasis on data-intensive platforms, recommendation systems, or optimization engines serving millions of users.
- 3+ years of hands-on experience implementing reinforcement learning or deep learning systems in production policy gradient methods (PPO, A3C), value-based approaches (DQN, Q-learning), or offline RL algorithms (CQL, Decision Transformer).
- Deep familiarity with the Bellman equation, reward shaping, exploration-exploitation tradeoff, and constraint mapping in real-world RL systems.
- Demonstrated ability to diagnose RL-specific failure modes: policy collapse, credit assignment issues, and distributional shift across large populations.
- Proficiency in Python 3.x; experience with PyTorch or TensorFlow for policy network implementation.
- Experience with Ray RLlib for distributed RL training at scale.
- Experience with Databricks, PySpark, and Delta Lake for large-scale ML pipelines processing tens of millions of records.
- Experience with MLflow for experiment tracking, model registry, and artifact management.
- Track record of shipping ML systems that operate reliably under production load — not just research or prototype work.
Preferred Qualifications
- Experience with multi-agent RL frameworks (PettingZoo or equivalent).
- Familiarity with probabilistic modeling, Markov Decision Processes, and linear programming for constraint-aware action selection.
- Experience operating RL systems in regulated domains — healthcare, finance, or insurance — where member safety, auditability, and explainability are requirements.
- Experience with Gymnasium for simulation environment development and backtesting.
- Familiarity with Kafka-based feedback loops and how disposition signals feed RL retraining pipelines.
- OpenTelemetry instrumentation experience for ML training pipeline observability.
Additional Information
This role is not eligible for work visa sponsorship.
SSN Alert Statement
Humana values personal identity protection. Please be aware that applicants may be asked to provide their Social Security Number, if it is not already on file. When required, an email will be sent from [email protected] with instructions on how to add the information into your official application on Humana's secure website.
WAH Internet Statement
To ensure Home or Hybrid Home/Office employees' ability to work effectively, the self-provided internet service of Home or Hybrid Home/Office employees must meet the following criteria:At minimum, a download speed of 25 Mbps and an upload speed of 10 Mbps is required; wireless, wired cable or DSL connection is suggested.Satellite, cellular and microwave connection can be used only if approved by leadership.Employees who live and work from Home in the state of California, Illinois, Montana, or South Dakota will be provided a bi-weekly payment for their internet expense.Humana will provide Home or Hybrid Home/Office employees with telephone equipment appropriate to meet the business requirements for their position/job.Work from a dedicated space lacking ongoing interruptions to protect member PHI / HIPAA information.
Travel: While this is a remote position, occasional travel to Humana's offices for training or meetings may be required.Scheduled Weekly Hours
40Pay Range
The compensation range below reflects a good faith estimate of starting base pay for full time (40 hours per week) employment at the time of posting. The pay range may be higher or lower based on geographic location and individual pay will vary based on demonstrated job related skills, knowledge, experience, education, certifications, etc.
Description of Benefits
Humana, Inc. and its affiliated subsidiaries (collectively, “Humana”) offers competitive benefits that support whole-person well-being. Associate benefits are designed to encourage personal wellness and smart healthcare decisions for you and your family while also knowing your life extends outside of work. Among our benefits, Humana provides medical, dental and vision benefits, 401(k) retirement savings plan, time off (including paid time off, company and personal holidays, volunteer time off, paid parental and caregiver leave), short-term and long-term disability, life insurance and many other opportunities.Application Deadline: 06-06-2026About us
About Humana: Humana Inc. (NYSE: HUM) is a leading U.S. healthcare company. Through our Humana insurance services and our CenterWell healthcare services, we make it easier for the millions of people we serve to achieve their best health – delivering the care and service they need, when they need it. These efforts are leading to a better quality of life for people with Medicare and Medicaid, families, individuals, military service personnel, and communities at large. Learn more about what we offer at Humana.com and at CenterWell.com.
Equal Opportunity Employer
It is the policy of Humana not to discriminate against any employee or applicant for employment because of race, color, religion, sex, sexual orientation, gender identity, national origin, age, marital status, genetic information, disability or protected veteran status. It is also the policy of Humana to take affirmative action, in compliance with Section 503 of the Rehabilitation Act and VEVRAA, to employ and to advance in employment individuals with disability or protected veteran status, and to base all employment decisions only on valid job requirements. This policy shall apply to all employment actions, including but not limited to recruitment, hiring, upgrading, promotion, transfer, demotion, layoff, recall, termination, rates of pay or other forms of compensation and selection for training, including apprenticeship, at all levels of employment.
Top Skills
What We Do
At Humana, our cultural foundation is aligned to helping members achieve their best health by delivering personalized, simplified, whole-person healthcare experiences. Recognizing healthcare needs continue to evolve for each person, for each family and for each community, Humana continuously creates innovative solutions and resources that help people live their healthiest lives on their terms –when and where they need it. Our employees are at the heart of making this happen and that’s why we are dedicated to building an organization of dynamic talent whose experience and passion center on putting the customer first.





