Optimal Investment and Entropy-Regularized Learning Under Stochastic Volatility Models with Portfolio Constraints

—

ArXiv PDF

100%

Optimal Investment and Entropy-Regularized Learning Under Stochastic Volatility Models with Portfolio Constraints

Thai Nguyen

Pertiny Nkuize

Published on 4/24/2026

Equities

Options

Derivatives

Reinforcement learning

Risk management

Machine learning

Volatility effect

This paper studies optimal portfolio selection under stochastic volatility within a continuous-time reinforcement learning framework with portfolio constraints. The investor uses entropy-regularized relaxed controls, selecting probability distributions over admissible portfolio allocations rather than deterministic strategies. The authors derive the associated entropy-regularized Hamilton–Jacobi–Bellman (HJB) equation, where the Hamiltonian involves optimization over probability measures supported on a compact set. They show that the optimal exploratory policy is a truncated Gaussian distribution characterized by spatial derivatives of the value function. Under suitable structural conditions, they prove the existence of classical solutions to the nonlinear parabolic PDE via a homothetic transformation and Hölder space theory. A verification theorem establishes optimality of the truncated Gaussian policy. The paper also analyzes the policy-improvement structure, showing that the entropy-regularized Hamiltonian induces a sequence of PDEs that provides a continuous-time interpretation of actor–critic learning dynamics. Finally, the PDE analysis enables the design of an implementable reinforcement learning algorithm using a martingale framework, with numerical experiments confirming convergence and consistency with theoretical results.

Highlights

1Derives the entropy-regularized HJB equation for optimal investment under stochastic volatility with portfolio constraints, characterizing the optimal exploratory policy as a truncated Gaussian distribution.
2Proves existence of classical solutions to the resulting nonlinear quasilinear parabolic PDE using Hölder space theory and structural growth conditions.
3Establishes a verification theorem linking the PDE solution to the stochastic control problem, showing optimality of the truncated Gaussian policy.
4Provides a continuous-time policy improvement theorem at the PDE level, yielding a sequence of PDEs that interpret actor–critic learning dynamics.
5Designs an implementable reinforcement learning algorithm via a martingale framework, with numerical experiments confirming convergence of critic parameters and consistency of the learned policy.

Methods

M
Entropy-regularized relaxed control framework with Shannon differential entropy penalty.
M
Dynamic programming and Hamilton–Jacobi–Bellman (HJB) equation with optimization over probability measures.
M
Homothetic transformation reducing the multi-dimensional HJB to a one-dimensional quasilinear parabolic PDE.
M
Nonlinear PDE analysis in Hölder spaces (Ladyzhenskaya–Solonnikov theory) for existence of classical solutions.

Results

R
Optimal exploratory policy is a truncated Gaussian distribution with mean and variance depending on spatial derivatives of the value function; variance is proportional to temperature and inversely proportional to curvature.
R
Existence of a classical solution to the reduced quasilinear parabolic PDE under structural conditions on model coefficients.
R
Verification theorem confirms that the truncated Gaussian policy achieves optimality in the class of relaxed entropy-regularized controls.
R
Policy improvement generates a monotone sequence of value functions converging to the optimal one, with entropy regularization ensuring regularity.
R
Numerical experiments show stable convergence of critic parameters and learned stochastic policy matches the theoretical truncated-Gaussian form.

0/5turns

Analyze Paper

Generate insights from "Optimal Investment and Entropy-Regularized Learning Under St...".

Suggested Actions