A Meta Reinforcement Learning Approach to Goals-Based Wealth Management

—

ArXiv PDF

100%

A Meta Reinforcement Learning Approach to Goals-Based Wealth Management

Sanjiv R. Das

Harshad Khadilkar

Sukrit Mittal

Daniel Ostrov

Deep Srivastav

Hungjen Wang

Published on 5/4/2026

Equities

Fixed Income

Cross-asset

Reinforcement learning

Machine learning

Deep learning

Risk management

Diversification

Factor allocation

This paper introduces a meta reinforcement learning (MetaRL) approach for goals-based wealth management (GBWM), where investors optimize dynamic portfolio selection and goal-fulfillment decisions over time to maximize expected utility from multiple financial goals. The MetaRL model is pre-trained on thousands of synthetic GBWM problems using a dual-agent Proximal Policy Optimization (PPO) algorithm, with one agent for goal-taking and another for portfolio choice. The state space is carefully normalized to ensure scale invariance and generalizability, incorporating 26 variables such as time, wealth relative to discounted goal costs, aggregated utility and cost blocks, and heuristic indicator states. During inference, the model produces near-optimal strategies for new investor problems in milliseconds, without requiring retraining.

Experimental results on 66 test cases show that MetaRL achieves an average of 97.8% of the optimal expected utility computed by dynamic programming (DP), while being over 100 times faster for determining current actions. The model exhibits robustness to changes in capital market regimes and efficient frontier parameters, even when trained on a single regime. Moreover, MetaRL can handle extensions such as stochastic inflation, which adds state variables and renders DP computationally infeasible due to the curse of dimensionality. This work demonstrates that meta-learning can effectively address complex, multi-goal financial optimization problems, offering a scalable and practical alternative to traditional DP-based methods.

Highlights

1Develops a meta reinforcement learning (MetaRL) approach pre-trained on thousands of goals-based wealth management (GBWM) problems, enabling zero-shot inference for new investor scenarios.
2Achieves near-optimal expected utilities averaging 97.8% of the dynamic programming (DP) optimum, with inference speeds over 100 times faster than DP for policy decisions.
3Demonstrates remarkable robustness to capital market regime changes, even when trained on a single regime.
4Extends to larger state spaces (e.g., stochastic inflation) where DP becomes computationally infeasible due to the curse of dimensionality.

Methods

M
Dual-agent Proximal Policy Optimization (PPO) with separate actor networks for goal-taking and portfolio selection.
M
Normalized state space design (26 variables) including time, wealth relative to discounted goal costs, utility/cost blocks, and indicator states from deterministic Monte Carlo simulations.
M
Training on 1000 randomly generated GBWM scenarios with varying horizons (5–50 years), wealth, goals, and infusions, using five random seeds.
M
Comparison with dynamic programming (DP) using a test suite of 66 unseen problems, evaluated via Monte Carlo simulation of decision heatmaps.

Results

R
RL inference determines optimal actions 100–237 times faster than DP (mean 20.94 ms vs 2198 ms for problems with goals).
R
Mean RL-Efficiency (ratio of RL to DP expected utility) is 0.978 across 66 test cases, with a minimum of 0.917 and maximum of 0.999.
R
MetaRL generalizes to out-of-distribution scenarios (e.g., 100-year horizons, different efficient frontiers) without retraining.
R
Extending to stochastic inflation (4 state variables) does not noticeably slow MetaRL, while DP becomes infeasible.
R
Runtimes for computing expected utility are comparable between RL and DP, with RL scaling better for larger problems.

0/5turns

Analyze Paper

Generate insights from "A Meta Reinforcement Learning Approach to Goals-Based Wealth...".

Suggested Actions