SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation

—

ArXiv PDF

100%

SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation

Dmitri Goloubentsev

Natalija Karpichina

Published on 5/7/2026

Commodities

Fixed Income

Derivatives

Machine learning

Reinforcement learning

Risk management

SNAPO (Smooth Neural Adjoint Policy Optimization) is a framework for optimal control that embeds a neural policy inside a known, differentiable simulator. It replaces hard constraints with smooth approximations, enabling exact gradient computation via adjoint methods. The key innovation is that a single backward pass simultaneously trains the policy and computes sensitivities of the objective with respect to all policy parameters and inputs, at a cost proportional to one reverse pass regardless of the number of sensitivities. This makes SNAPO highly efficient for both policy optimization and sensitivity analysis.

The framework is demonstrated on three real-world domains: natural gas storage, pension fund asset-liability management (ALM), and pharmaceutical manufacturing. For gas storage, training completes in under a minute and yields 365 forward curve sensitivities at no extra cost. For pension ALM, SNAPO achieves a 6.5x to 200x speedup in sensitivity computation compared to traditional bump-and-revalue methods. In pharmaceutical manufacturing, it computes cross-unit sensitivities through a 4-unit process chain, producing 20 ICH Q8 regulatory sensitivities in just 74.5 milliseconds. These results highlight SNAPO's ability to combine fast training with comprehensive sensitivity analysis, making it a powerful tool for optimal control under uncertainty.

Highlights

1Introduces SNAPO, a framework that embeds a neural policy inside a differentiable simulator for optimal control.
2Replaces hard constraints with smooth approximations to enable gradient-based optimization.
3Computes exact gradients of the objective with respect to all policy parameters and inputs in a single adjoint pass.
4Demonstrates training in under a minute for natural gas storage and 6.5x–200x sensitivity speedup for pension fund ALM.
5Produces 20 ICH Q8 regulatory sensitivities for pharmaceutical manufacturing in 74.5 milliseconds.

Methods

M
Neural policy embedded in a differentiable simulator with smooth constraint approximations.
M
Adjoint sensitivity method for computing exact gradients of the objective with respect to policy parameters and inputs.
M
Single backward pass that simultaneously trains the policy and computes all sensitivities.

Results

R
Natural gas storage: policy trained in under a minute, with 365 forward curve sensitivities obtained at no additional cost per sensitivity.
R
Pension fund ALM: sensitivity speedup of 6.5x to 200x over bump-and-revalue, scaling with number of risk factors.
R
Pharmaceutical manufacturing: cross-unit sensitivities through a 4-unit process chain computed in 74.5 milliseconds for 20 regulatory sensitivities.
R
All sensitivities are produced by the same backward pass that trains the policy, at cost proportional to one reverse pass regardless of number of sensitivities.

0/5turns

Analyze Paper

Generate insights from "SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal...".

Suggested Actions