A Meta Reinforcement Learning Approach to Goals-Based Wealth Management
A Meta Reinforcement Learning Approach to Goals-Based Wealth Management
This paper introduces a meta reinforcement learning (MetaRL) approach for goals-based wealth management (GBWM), where investors optimize dynamic portfolio selection and goal-fulfillment decisions over time to maximize expected utility from multiple financial goals. The MetaRL model is pre-trained on thousands of synthetic GBWM problems using a dual-agent Proximal Policy Optimization (PPO) algorithm, with one agent for goal-taking and another for portfolio choice. The state space is carefully normalized to ensure scale invariance and generalizability, incorporating 26 variables such as time, wealth relative to discounted goal costs, aggregated utility and cost blocks, and heuristic indicator states. During inference, the model produces near-optimal strategies for new investor problems in milliseconds, without requiring retraining.
Experimental results on 66 test cases show that MetaRL achieves an average of 97.8% of the optimal expected utility computed by dynamic programming (DP), while being over 100 times faster for determining current actions. The model exhibits robustness to changes in capital market regimes and efficient frontier parameters, even when trained on a single regime. Moreover, MetaRL can handle extensions such as stochastic inflation, which adds state variables and renders DP computationally infeasible due to the curse of dimensionality. This work demonstrates that meta-learning can effectively address complex, multi-goal financial optimization problems, offering a scalable and practical alternative to traditional DP-based methods.
Highlights
- 1Develops a meta reinforcement learning (MetaRL) approach pre-trained on thousands of goals-based wealth management (GBWM) problems, enabling zero-shot inference for new investor scenarios.
- 2Achieves near-optimal expected utilities averaging 97.8% of the dynamic programming (DP) optimum, with inference speeds over 100 times faster than DP for policy decisions.
- 3Demonstrates remarkable robustness to capital market regime changes, even when trained on a single regime.
- 4Extends to larger state spaces (e.g., stochastic inflation) where DP becomes computationally infeasible due to the curse of dimensionality.
Methods
- MDual-agent Proximal Policy Optimization (PPO) with separate actor networks for goal-taking and portfolio selection.
- MNormalized state space design (26 variables) including time, wealth relative to discounted goal costs, utility/cost blocks, and indicator states from deterministic Monte Carlo simulations.
- MTraining on 1000 randomly generated GBWM scenarios with varying horizons (5–50 years), wealth, goals, and infusions, using five random seeds.
- MComparison with dynamic programming (DP) using a test suite of 66 unseen problems, evaluated via Monte Carlo simulation of decision heatmaps.
Results
- RRL inference determines optimal actions 100–237 times faster than DP (mean 20.94 ms vs 2198 ms for problems with goals).
- RMean RL-Efficiency (ratio of RL to DP expected utility) is 0.978 across 66 test cases, with a minimum of 0.917 and maximum of 0.999.
- RMetaRL generalizes to out-of-distribution scenarios (e.g., 100-year horizons, different efficient frontiers) without retraining.
- RExtending to stochastic inflation (4 state variables) does not noticeably slow MetaRL, while DP becomes infeasible.
- RRuntimes for computing expected utility are comparable between RL and DP, with RL scaling better for larger problems.
Analyze Paper
Generate insights from "A Meta Reinforcement Learning Approach to Goals-Based Wealth...".