Optimal Investment and Entropy-Regularized Learning Under Stochastic Volatility Models with Portfolio Constraints
Optimal Investment and Entropy-Regularized Learning Under Stochastic Volatility Models with Portfolio Constraints
This paper studies optimal portfolio selection under stochastic volatility within a continuous-time reinforcement learning framework with portfolio constraints. The investor uses entropy-regularized relaxed controls, selecting probability distributions over admissible portfolio allocations rather than deterministic strategies. The authors derive the associated entropy-regularized Hamilton–Jacobi–Bellman (HJB) equation, where the Hamiltonian involves optimization over probability measures supported on a compact set. They show that the optimal exploratory policy is a truncated Gaussian distribution characterized by spatial derivatives of the value function. Under suitable structural conditions, they prove the existence of classical solutions to the nonlinear parabolic PDE via a homothetic transformation and Hölder space theory. A verification theorem establishes optimality of the truncated Gaussian policy. The paper also analyzes the policy-improvement structure, showing that the entropy-regularized Hamiltonian induces a sequence of PDEs that provides a continuous-time interpretation of actor–critic learning dynamics. Finally, the PDE analysis enables the design of an implementable reinforcement learning algorithm using a martingale framework, with numerical experiments confirming convergence and consistency with theoretical results.
Highlights
- 1Derives the entropy-regularized HJB equation for optimal investment under stochastic volatility with portfolio constraints, characterizing the optimal exploratory policy as a truncated Gaussian distribution.
- 2Proves existence of classical solutions to the resulting nonlinear quasilinear parabolic PDE using Hölder space theory and structural growth conditions.
- 3Establishes a verification theorem linking the PDE solution to the stochastic control problem, showing optimality of the truncated Gaussian policy.
- 4Provides a continuous-time policy improvement theorem at the PDE level, yielding a sequence of PDEs that interpret actor–critic learning dynamics.
- 5Designs an implementable reinforcement learning algorithm via a martingale framework, with numerical experiments confirming convergence of critic parameters and consistency of the learned policy.
Methods
- MEntropy-regularized relaxed control framework with Shannon differential entropy penalty.
- MDynamic programming and Hamilton–Jacobi–Bellman (HJB) equation with optimization over probability measures.
- MHomothetic transformation reducing the multi-dimensional HJB to a one-dimensional quasilinear parabolic PDE.
- MNonlinear PDE analysis in Hölder spaces (Ladyzhenskaya–Solonnikov theory) for existence of classical solutions.
Results
- ROptimal exploratory policy is a truncated Gaussian distribution with mean and variance depending on spatial derivatives of the value function; variance is proportional to temperature and inversely proportional to curvature.
- RExistence of a classical solution to the reduced quasilinear parabolic PDE under structural conditions on model coefficients.
- RVerification theorem confirms that the truncated Gaussian policy achieves optimality in the class of relaxed entropy-regularized controls.
- RPolicy improvement generates a monotone sequence of value functions converging to the optimal one, with entropy regularization ensuring regularity.
- RNumerical experiments show stable convergence of critic parameters and learned stochastic policy matches the theoretical truncated-Gaussian form.
Analyze Paper
Generate insights from "Optimal Investment and Entropy-Regularized Learning Under St...".