Sobol sensitivity analysis¶
What it is¶
A global sensitivity analysis run quarterly on each model × region combination, using SALib (the Saltelli sampler) to identify which input parameters dominate the variance of the per-query estimate. Results are published in the methodology repo and inform where to focus future calibration effort.
Method¶
We run two flavours, in order of computational cost:
- Morris elementary effects (cheap; weekly): coarse ranking of which inputs matter
- Sobol indices (expensive; quarterly): variance decomposition with first-order and total-order indices
Sobol's first-order index S_i for input i is the share of output variance attributable to varying input i alone, holding other inputs at their distribution. The total-order index ST_i additionally captures interaction effects involving input i.
Inputs we vary¶
For tier-2 estimates, the input parameters in the sensitivity sweep are:
| Input | Distribution | Notes |
|---|---|---|
| GPU energy per token | Log-normal, GSD from pedigree | Dominant for most workloads |
| Host CPU/DRAM share | Log-normal | Material at low batch sizes |
| PUE | Log-normal, GSD 1.05 | Modest contribution |
| Embodied amortisation | Log-normal, GSD 1.50 (high uncertainty) | Material at long context |
| Grid intensity | Log-normal, GSD 1.10 (live data) | Material in high-carbon regions |
| Batch size estimate | Log-normal | Less material; usually well-estimated |
Typical results¶
For Mistral Medium 3 on Scaleway PAR-1, Q1 2026:
| Input | First-order S_i | Total-order ST_i |
|---|---|---|
| GPU energy per token | 0.52 | 0.61 |
| Grid intensity | 0.18 | 0.21 |
| Embodied amortisation | 0.12 | 0.18 |
| Host CPU/DRAM share | 0.06 | 0.09 |
| PUE | 0.05 | 0.07 |
| Batch size | 0.04 | 0.05 |
| (interactions) | — | (residual) |
GPU energy per token dominates, as expected. Grid intensity is the second-largest contributor; for atNorth regions (lower variance grid), it drops below embodied amortisation. The interaction term is small (≈ 5%); inputs are largely independent in their effects.
How we use the results¶
- Calibration prioritisation. The dominant inputs become the targets for the next round of calibration data collection. If GPU energy per token dominates, we run more tier-3 measurements at varied batch sizes to reduce its uncertainty.
- Auditor communication. The Sobol decomposition is part of the annual evidence pack; it lets the auditor see which inputs the methodology is most sensitive to and where the methodology team is investing.
- Customer guidance. For customers under ISAE 3000 review, knowing which inputs dominate lets them anticipate which methodology changes would affect their reported aggregates.
Publication¶
Sobol results for the prior quarter are published as part of the quarterly methodology changelog at docs.vettedinference.com/changelog/. The full notebooks are in methodology/sensitivity/.
Where this is implemented¶
methodology/sensitivity/sobol.py
Citations¶
- Saltelli, A., et al. (2010). Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index. Computer Physics Communications 181(2).
- Herman, J., & Usher, W. (2017). SALib: An open-source Python library for Sensitivity Analysis. Journal of Open Source Software 2(9).