Safe Bilevel Delegation (SBD): A Formal Framework for Runtime Delegation Safety in Multi-Agent Systems

Safe Bilevel Delegation (SBD): A Formal Framework for Runtime Delegation Safety in Multi-Agent Systems

Yuan Sun
Published on 4/30/2026
Equities
United States (US)
AI
LLM
Multi-Agent
Risk management
Machine learning

Safe Bilevel Delegation (SBD) is a formal framework for runtime delegation safety in hierarchical multi-agent systems. It formulates task delegation as a bilevel optimization problem: an outer meta-weight network learns context-dependent safety–efficiency weights λ_φ(s) ∈ [0,1], while an inner loop optimizes the delegation policy π subject to a probabilistic safety constraint P(safe) ≥ 1−δ. The continuous delegation degree α ∈ [0,1] controls how much decision authority is transferred to each sub-agent, interpolating smoothly between full human override and fully autonomous execution. This structure enables dynamic adjustment of the safety–efficiency trade-off as task context changes during execution, analogous to how OSPF metric weighting redistributes traffic away from congested links in network routing.

The paper establishes three theoretical results: Safety Monotonicity (higher outer safety weight produces a weakly safer inner policy), Inner Policy Convergence (projected gradient descent on the inner problem converges linearly under standard smoothness assumptions), and an Accountability Propagation bound that distributes responsibility across multi-hop delegation chains with a provable per-agent ceiling. The SBD algorithm implements bilevel gradient descent with hypergradient-based outer updates and projected gradient inner updates. The framework is instantiated in three high-stakes domains—medical AI (MIMIC-III), financial risk control (S&P 500), and educational agent supervision (ASSISTments)—with a pre-registered evaluation protocol specifying datasets, safety constraint sets, baselines, and metrics. Empirical validation following these protocols is planned for future work.

Highlights

  • 1Formalizes runtime delegation safety as a bilevel optimization problem with a continuous delegation degree α∈[0,1] and a probabilistic safety constraint P(safe)≥1−δ.
  • 2Proves Safety Monotonicity: higher outer safety weight produces a weakly safer inner policy.
  • 3Proves linear convergence of the inner policy via projected gradient descent under standard smoothness assumptions.
  • 4Derives an Accountability Propagation bound that distributes responsibility across multi-hop delegation chains with a provable per-agent ceiling.
  • 5Specifies a pre-registered evaluation protocol across three high-stakes domains: medical AI (MIMIC-III), financial risk control (S&P 500), and educational agent supervision (ASSISTments).

Methods

  • M
    Bilevel optimization: outer loop learns a meta-weight network λ_φ(s) for context-dependent safety–efficiency trade-off; inner loop optimizes delegation policy π under a probabilistic safety constraint.
  • M
    Projected gradient descent for inner policy optimization with projection onto the δ-safe feasible set.
  • M
    Hypergradient computation via implicit differentiation (truncated unrolling) for outer meta-weight updates.
  • M
    Accountability weight formulation for multi-hop delegation chains, enabling principled auditing of responsibility.

Results

  • R
    Safety Monotonicity Theorem: pointwise higher safety weight λ_φ(s) leads to weakly higher safety probability P(safe) at the inner optimum.
  • R
    Inner Policy Convergence Theorem: projected gradient descent converges linearly with rate (1−ημ)^t, requiring O((L/μ) log(1/ε)) steps for ε-accuracy.
  • R
    Accountability Upper Bound: maximum accountability weight of any agent in a k-hop chain is ≤ 1−(1−ᾱ)^k, where ᾱ is the maximum delegation degree; no single agent can bear full accountability if ᾱ<1.
  • R
    The framework provides a deterministic safety floor independent of LLM interpretation of skill instructions, addressing the semantic attack surface of natural-language skills.
  • R
    SBD composes with existing multi-agent architectures and protocols (e.g., LDP) for complementary safety guarantees.
0/5

Analyze Paper

Generate insights from "Safe Bilevel Delegation (SBD): A Formal Framework for Runtim...".

Suggested Actions