Optimal Investment and Entropy-Regularized Learning Under Stochastic Volatility Models with Portfolio Constraints

Optimal Investment and Entropy-Regularized Learning Under Stochastic Volatility Models with Portfolio Constraints

Thai Nguyen
Pertiny Nkuize
Published on 4/24/2026
Equities
Options
Derivatives
Reinforcement learning
Risk management
Machine learning
Volatility effect

This paper studies optimal portfolio selection under stochastic volatility within a continuous-time reinforcement learning framework with portfolio constraints. The investor uses entropy-regularized relaxed controls, selecting probability distributions over admissible portfolio allocations rather than deterministic strategies. The authors derive the associated entropy-regularized Hamilton–Jacobi–Bellman (HJB) equation, where the Hamiltonian involves optimization over probability measures supported on a compact set. They show that the optimal exploratory policy is a truncated Gaussian distribution characterized by spatial derivatives of the value function. Under suitable structural conditions, they prove the existence of classical solutions to the nonlinear parabolic PDE via a homothetic transformation and Hölder space theory. A verification theorem establishes optimality of the truncated Gaussian policy. The paper also analyzes the policy-improvement structure, showing that the entropy-regularized Hamiltonian induces a sequence of PDEs that provides a continuous-time interpretation of actor–critic learning dynamics. Finally, the PDE analysis enables the design of an implementable reinforcement learning algorithm using a martingale framework, with numerical experiments confirming convergence and consistency with theoretical results.

Highlights

  • 1Derives the entropy-regularized HJB equation for optimal investment under stochastic volatility with portfolio constraints, characterizing the optimal exploratory policy as a truncated Gaussian distribution.
  • 2Proves existence of classical solutions to the resulting nonlinear quasilinear parabolic PDE using Hölder space theory and structural growth conditions.
  • 3Establishes a verification theorem linking the PDE solution to the stochastic control problem, showing optimality of the truncated Gaussian policy.
  • 4Provides a continuous-time policy improvement theorem at the PDE level, yielding a sequence of PDEs that interpret actor–critic learning dynamics.
  • 5Designs an implementable reinforcement learning algorithm via a martingale framework, with numerical experiments confirming convergence of critic parameters and consistency of the learned policy.

Methods

  • M
    Entropy-regularized relaxed control framework with Shannon differential entropy penalty.
  • M
    Dynamic programming and Hamilton–Jacobi–Bellman (HJB) equation with optimization over probability measures.
  • M
    Homothetic transformation reducing the multi-dimensional HJB to a one-dimensional quasilinear parabolic PDE.
  • M
    Nonlinear PDE analysis in Hölder spaces (Ladyzhenskaya–Solonnikov theory) for existence of classical solutions.

Results

  • R
    Optimal exploratory policy is a truncated Gaussian distribution with mean and variance depending on spatial derivatives of the value function; variance is proportional to temperature and inversely proportional to curvature.
  • R
    Existence of a classical solution to the reduced quasilinear parabolic PDE under structural conditions on model coefficients.
  • R
    Verification theorem confirms that the truncated Gaussian policy achieves optimality in the class of relaxed entropy-regularized controls.
  • R
    Policy improvement generates a monotone sequence of value functions converging to the optimal one, with entropy regularization ensuring regularity.
  • R
    Numerical experiments show stable convergence of critic parameters and learned stochastic policy matches the theoretical truncated-Gaussian form.
0/5

Analyze Paper

Generate insights from "Optimal Investment and Entropy-Regularized Learning Under St...".

Suggested Actions