A Meta Reinforcement Learning Approach to Goals-Based Wealth Management

A Meta Reinforcement Learning Approach to Goals-Based Wealth Management

Sanjiv R. Das
Harshad Khadilkar
Sukrit Mittal
Daniel Ostrov
Deep Srivastav
Hungjen Wang
Published on 5/4/2026
Equities
Fixed Income
Cross-asset
Reinforcement learning
Machine learning
Deep learning
Risk management
Diversification
Factor allocation

This paper introduces a meta reinforcement learning (MetaRL) approach for goals-based wealth management (GBWM), where investors optimize dynamic portfolio selection and goal-fulfillment decisions over time to maximize expected utility from multiple financial goals. The MetaRL model is pre-trained on thousands of synthetic GBWM problems using a dual-agent Proximal Policy Optimization (PPO) algorithm, with one agent for goal-taking and another for portfolio choice. The state space is carefully normalized to ensure scale invariance and generalizability, incorporating 26 variables such as time, wealth relative to discounted goal costs, aggregated utility and cost blocks, and heuristic indicator states. During inference, the model produces near-optimal strategies for new investor problems in milliseconds, without requiring retraining.

Experimental results on 66 test cases show that MetaRL achieves an average of 97.8% of the optimal expected utility computed by dynamic programming (DP), while being over 100 times faster for determining current actions. The model exhibits robustness to changes in capital market regimes and efficient frontier parameters, even when trained on a single regime. Moreover, MetaRL can handle extensions such as stochastic inflation, which adds state variables and renders DP computationally infeasible due to the curse of dimensionality. This work demonstrates that meta-learning can effectively address complex, multi-goal financial optimization problems, offering a scalable and practical alternative to traditional DP-based methods.

Highlights

  • 1Develops a meta reinforcement learning (MetaRL) approach pre-trained on thousands of goals-based wealth management (GBWM) problems, enabling zero-shot inference for new investor scenarios.
  • 2Achieves near-optimal expected utilities averaging 97.8% of the dynamic programming (DP) optimum, with inference speeds over 100 times faster than DP for policy decisions.
  • 3Demonstrates remarkable robustness to capital market regime changes, even when trained on a single regime.
  • 4Extends to larger state spaces (e.g., stochastic inflation) where DP becomes computationally infeasible due to the curse of dimensionality.

Methods

  • M
    Dual-agent Proximal Policy Optimization (PPO) with separate actor networks for goal-taking and portfolio selection.
  • M
    Normalized state space design (26 variables) including time, wealth relative to discounted goal costs, utility/cost blocks, and indicator states from deterministic Monte Carlo simulations.
  • M
    Training on 1000 randomly generated GBWM scenarios with varying horizons (5–50 years), wealth, goals, and infusions, using five random seeds.
  • M
    Comparison with dynamic programming (DP) using a test suite of 66 unseen problems, evaluated via Monte Carlo simulation of decision heatmaps.

Results

  • R
    RL inference determines optimal actions 100–237 times faster than DP (mean 20.94 ms vs 2198 ms for problems with goals).
  • R
    Mean RL-Efficiency (ratio of RL to DP expected utility) is 0.978 across 66 test cases, with a minimum of 0.917 and maximum of 0.999.
  • R
    MetaRL generalizes to out-of-distribution scenarios (e.g., 100-year horizons, different efficient frontiers) without retraining.
  • R
    Extending to stochastic inflation (4 state variables) does not noticeably slow MetaRL, while DP becomes infeasible.
  • R
    Runtimes for computing expected utility are comparable between RL and DP, with RL scaling better for larger problems.
0/5

Analyze Paper

Generate insights from "A Meta Reinforcement Learning Approach to Goals-Based Wealth...".

Suggested Actions