SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation
SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal Control via Differentiable Simulation
SNAPO (Smooth Neural Adjoint Policy Optimization) is a framework for optimal control that embeds a neural policy inside a known, differentiable simulator. It replaces hard constraints with smooth approximations, enabling exact gradient computation via adjoint methods. The key innovation is that a single backward pass simultaneously trains the policy and computes sensitivities of the objective with respect to all policy parameters and inputs, at a cost proportional to one reverse pass regardless of the number of sensitivities. This makes SNAPO highly efficient for both policy optimization and sensitivity analysis.
The framework is demonstrated on three real-world domains: natural gas storage, pension fund asset-liability management (ALM), and pharmaceutical manufacturing. For gas storage, training completes in under a minute and yields 365 forward curve sensitivities at no extra cost. For pension ALM, SNAPO achieves a 6.5x to 200x speedup in sensitivity computation compared to traditional bump-and-revalue methods. In pharmaceutical manufacturing, it computes cross-unit sensitivities through a 4-unit process chain, producing 20 ICH Q8 regulatory sensitivities in just 74.5 milliseconds. These results highlight SNAPO's ability to combine fast training with comprehensive sensitivity analysis, making it a powerful tool for optimal control under uncertainty.
Highlights
- 1Introduces SNAPO, a framework that embeds a neural policy inside a differentiable simulator for optimal control.
- 2Replaces hard constraints with smooth approximations to enable gradient-based optimization.
- 3Computes exact gradients of the objective with respect to all policy parameters and inputs in a single adjoint pass.
- 4Demonstrates training in under a minute for natural gas storage and 6.5x–200x sensitivity speedup for pension fund ALM.
- 5Produces 20 ICH Q8 regulatory sensitivities for pharmaceutical manufacturing in 74.5 milliseconds.
Methods
- MNeural policy embedded in a differentiable simulator with smooth constraint approximations.
- MAdjoint sensitivity method for computing exact gradients of the objective with respect to policy parameters and inputs.
- MSingle backward pass that simultaneously trains the policy and computes all sensitivities.
Results
- RNatural gas storage: policy trained in under a minute, with 365 forward curve sensitivities obtained at no additional cost per sensitivity.
- RPension fund ALM: sensitivity speedup of 6.5x to 200x over bump-and-revalue, scaling with number of risk factors.
- RPharmaceutical manufacturing: cross-unit sensitivities through a 4-unit process chain computed in 74.5 milliseconds for 20 regulatory sensitivities.
- RAll sensitivities are produced by the same backward pass that trains the policy, at cost proportional to one reverse pass regardless of number of sensitivities.
Analyze Paper
Generate insights from "SNAPO: Smooth Neural Adjoint Policy Optimization for Optimal...".