Stochastic Scenario Generation Frameworks

A stochastic scenario generation framework is the engineered subsystem that turns calibrated economic and behavioral assumptions into thousands of correlated, seed-reproducible future paths that a projection model consumes to compute tail-risk reserves and economic capital. It is the arithmetic heart of principle-based reserving: NAIC VM-20 Section 7 requires a stochastic reserve computed as the Conditional Tail Expectation (CTE 70) of the greatest present value of accumulated deficiency across a prescribed or company-generated scenario set, and VM-21 extends CTE-based reserving to variable annuities. Solvency II internal models, the OSFI Life Insurance Capital Adequacy Test (LICAT), and IFRS 17 risk-adjustment work all lean on the same machinery. This guide builds that subsystem in production: how correlated scenarios are sampled, seeded, validated, and handed to the projection engine as an examiner-reproducible artifact. It sits inside the broader Actuarial Model Ingestion & Testing Workflows pipeline and consumes the clean, contract-validated inputs that pipeline produces upstream.

The problem this subsystem solves

The core difficulty is not generating random numbers; it is generating the right random numbers, reproducibly, at filing scale. A stochastic reserve is only defensible if an examiner can regenerate the exact scenario set that produced it — bit for bit — from a recorded seed and a versioned parameter manifest. Three requirements collide:

Statistical fidelity. Scenarios must preserve the target moments, correlation structure, and boundary conditions of the calibrated assumptions. A scenario set whose equity-rate correlation drifts from the calibrated value produces a biased CTE and an indefensible reserve.
Determinism. The same seed and the same parameters must always regenerate the same paths. Any dependence on wall-clock time, dictionary ordering, or thread scheduling breaks reproducibility and fails review under SR 11-7 and OSFI E-23 Principle 4.
Scale. A CTE 70 estimate for a large variable-annuity block routinely requires 10,000 or more real-world or risk-neutral paths across 360 or more monthly steps, multiplied across every policy — millions of policy-scenario evaluations per valuation.

When a regulator questions a reserve, the question is almost never “is the number correct?” It is “can you reproduce it and show me the assumptions behind it?” This framework exists to make the answer a deterministic yes. Correlation calibration and no-arbitrage discipline are shared with Economic Scenario Mapping & Yield Curve Alignment, which governs how the covariance inputs to this engine are validated before they arrive.

Architecture of the scenario subsystem

The engine is a deterministic pipeline: validated assumptions and a covariance matrix enter, a seeded generator produces correlated standard normals, per-factor stochastic processes map those normals onto economic paths, and a manifest plus the scenario tensor exit for the projection layer. A drift breach detected downstream feeds back to recalibrate the covariance inputs rather than to re-roll the same run.

Prerequisites

Before implementing the engine, put the following in place:

Python packages: numpy (correlated sampling and the linear algebra), scipy (distribution families and inverse-transform sampling), pandas (cohort alignment), and pydantic for the parameter manifest. The low-level sampling techniques — Sobol sequences, inverse-transform, and variance reduction — are covered in the build guide on Generating Monte Carlo Scenarios with NumPy and SciPy.
A validated data contract. The covariance matrix, calibrated volatilities, and mean-reversion parameters must already have passed Schema Validation with Pydantic & Great Expectations; an unvalidated correlation input silently corrupts every downstream path.
Vectorization fluency. The engine broadcasts factor shocks across cohorts and periods using contiguous arrays; the memory-layout and dtype patterns are documented in Pandas & NumPy for Actuarial Data Pipelines.
Regulatory context. Understand which reserve you are computing — the deterministic and stochastic reserves under NAIC VM-20 Compliance Frameworks — because the scenario count, CTE level, and real-world-versus-risk-neutral choice all flow from the governing clause.

Core implementation: seeded correlated path generation

The canonical pattern separates three concerns that are routinely (and dangerously) tangled together: the parameters (calibrated, versioned, hashed), the randomness (an isolated seeded generator), and the processes (the stochastic differential equations that map normals to economic variables). Keeping them separate is what makes a run reproducible.

Correlation is imposed through Cholesky factorization. Given a symmetric positive-definite correlation matrix $R$ , the decomposition $R = LL^{\top}$ yields a lower-triangular factor $L$ . If $Z$ is a matrix of independent standard normals, then $LZ$ has the target correlation structure, because $\operatorname{Cov}(LZ) = L\,\mathbb{I}\,L^{\top} = LL^{\top} = R$ .

from __future__ import annotations

import hashlib
import json
from datetime import date

import numpy as np
from pydantic import BaseModel, Field, field_validator


class ScenarioConfig(BaseModel):
    """Versioned, hashable parameter set for one stochastic run."""

    valuation_date: date
    seed: int = Field(..., ge=0)
    n_scenarios: int = Field(..., gt=0, le=100_000)
    n_steps: int = Field(..., gt=0)          # monthly projection steps
    dt: float = Field(..., gt=0.0)           # year fraction per step, e.g. 1/12
    risk_factors: list[str]                  # e.g. ["short_rate", "equity", "credit"]
    correlation: list[list[float]]           # calibrated R, factor-by-factor
    vols: list[float]                        # annualized volatility per factor
    drifts: list[float]                      # real-world drift per factor
    reversion: list[float]                   # mean-reversion speed (0 = none)

    @field_validator("correlation")
    @classmethod
    def correlation_is_valid(cls, v: list[list[float]]) -> list[list[float]]:
        R = np.asarray(v, dtype=np.float64)
        if R.shape[0] != R.shape[1]:
            raise ValueError("Correlation matrix must be square.")
        if not np.allclose(R, R.T, atol=1e-10):
            raise ValueError("Correlation matrix must be symmetric.")
        eigenvalues = np.linalg.eigvalsh(R)
        if eigenvalues.min() <= 0:
            raise ValueError("Correlation matrix must be positive-definite.")
        return v

    def fingerprint(self) -> str:
        """SHA-256 over canonical JSON — identical params, identical hash."""
        canonical = json.dumps(self.model_dump(mode="json"), sort_keys=True)
        return hashlib.sha256(canonical.encode("utf-8")).hexdigest()


def generate_scenarios(config: ScenarioConfig) -> np.ndarray:
    """Return a (n_scenarios, n_steps, n_factors) tensor of economic paths."""
    rng = np.random.default_rng(config.seed)          # isolated, seeded generator
    R = np.asarray(config.correlation, dtype=np.float64)
    L = np.linalg.cholesky(R)                          # R = L @ L.T

    n_factors = len(config.risk_factors)
    vols = np.asarray(config.vols)
    drifts = np.asarray(config.drifts)
    kappa = np.asarray(config.reversion)
    dt = config.dt

    # Independent standard normals: (scenarios, steps, factors)
    z = rng.standard_normal((config.n_scenarios, config.n_steps, n_factors))
    # Impose the calibrated correlation across the factor axis.
    correlated = z @ L.T

    paths = np.zeros_like(correlated)
    level = np.zeros(n_factors)                         # start at calibrated mean (0 = centred)
    for t in range(config.n_steps):
        shock = correlated[:, t, :] * vols * np.sqrt(dt)
        # Ornstein-Uhlenbeck mean reversion toward the drift target.
        level = level + kappa * (drifts - level) * dt + shock
        paths[:, t, :] = level
    return paths

Two design decisions carry the reproducibility guarantee. First, numpy.random.default_rng(config.seed) creates a local Generator seeded from the recorded integer — never the legacy global np.random.seed, which is process-wide mutable state that a concurrent run can clobber. Second, the ScenarioConfig.fingerprint() hashes a canonical, key-sorted JSON serialization, so two runs with the same parameters produce the same hash regardless of insertion order or environment. That hash is what ties a filed reserve back to the exact scenario tensor an examiner can regenerate.

The Ornstein-Uhlenbeck update above is the discretized short-rate / spread dynamic $dX_t = \kappa(\theta - X_t)\,dt + \sigma\,dW_t$ ; for an equity factor you would swap in geometric Brownian motion, $S_{t+1} = S_t \exp\!\big((\mu - \tfrac{1}{2}\sigma^2)\,dt + \sigma\sqrt{dt}\,\varepsilon_t\big)$ . The correlated normals feed whichever process each factor requires; the correlation is imposed once, before the per-factor mapping.

Configuration and tuning

The parameters that most affect a filed reserve are the scenario count, the CTE level, and the real-world-versus-risk-neutral choice. Tune them against the governing clause, not against runtime convenience.

config = ScenarioConfig(
    valuation_date=date(2026, 6, 30),
    seed=20260630,                 # recorded in the run manifest; never time-based
    n_scenarios=10_000,            # CTE 70 stability; see convergence note below
    n_steps=360,                   # 30 years monthly for a level-premium block
    dt=1.0 / 12.0,
    risk_factors=["short_rate", "equity", "credit"],
    correlation=[
        [1.00, -0.30, 0.15],
        [-0.30, 1.00, -0.25],
        [0.15, -0.25, 1.00],
    ],
    vols=[0.012, 0.16, 0.020],     # annualized, from the calibration study
    drifts=[0.035, 0.055, 0.010],  # real-world long-run means
    reversion=[0.25, 0.00, 0.40],  # equity has no mean reversion here
)


def cte(losses: np.ndarray, level: float = 0.70) -> float:
    """Conditional Tail Expectation: mean of the worst (1 - level) tail."""
    ordered = np.sort(losses)                      # ascending; worst outcomes last
    cutoff = int(np.ceil(level * ordered.size))
    tail = ordered[cutoff:]
    return float(tail.mean()) if tail.size else float(ordered[-1])

Scenario count trades Monte Carlo error against runtime. The standard error of a CTE 70 estimate scales as $O(n^{-1/2})$ ; below roughly 1,000 paths the tail estimate is too noisy for a defensible reserve, and most companies settle at 5,000–10,000 with a documented convergence test. When runtime becomes binding, reach for low-discrepancy sequences and variance reduction rather than simply cutting paths.
CTE level is prescribed: CTE 70 for the VM-20 Section 7 stochastic reserve, and higher percentiles (CTE 95/98) for internal capital. Never hard-code it — read it from the manifest so the same engine serves reserving and capital with an auditable parameter.
Time step must align with the liability cash-flow granularity. Monthly steps (dt = 1/12) are standard; coarser steps understate path-dependent guarantees like GMWB ratchets.
Seed policy. One recorded seed per filing run. For parallel batches, derive independent child streams with rng.spawn(n) rather than reseeding, so concurrency never perturbs the numbers.

Step-by-step implementation walkthrough

Validate the inputs. Instantiate ScenarioConfig; the field_validator rejects a non-symmetric or non-positive-definite correlation matrix before a single path is drawn. This is the same fail-closed discipline the ingestion contract enforces upstream.
Fingerprint the parameters. Call config.fingerprint() and persist the hash in the run manifest. This is the identity the reserve will be filed against.
Seed one isolated generator. np.random.default_rng(config.seed) — local state only, never the global RNG.
Decompose the correlation. np.linalg.cholesky(R) yields $L$ ; if this raises LinAlgError, the matrix failed positive-definiteness and must be repaired by eigenvalue shrinkage before proceeding, not silently patched.
Draw and correlate. Sample independent normals shaped (scenarios, steps, factors) and right-multiply by L.T to impose the calibrated dependence across the factor axis in one vectorized operation.
Map to economic processes. Step each factor through its stochastic process (Ornstein-Uhlenbeck for rates and spreads, geometric Brownian motion for equity), broadcasting volatilities and drifts across all scenarios simultaneously.
Hand off to projection. Emit the (n_scenarios, n_steps, n_factors) tensor together with the manifest to the projection engine, which computes the greatest present value of accumulated deficiency per scenario and then the CTE. At filing scale this hand-off runs under Async Batch Processing for Large Models so millions of policy-scenario evaluations complete before the quarter-end cutoff.
Record the trail. Write the manifest, seed, fingerprint, and scenario checksum into the immutable log described in Actuarial Audit Trail Architecture.

Validation and testing

A scenario engine is a model in its own right and must be validated as one. Four checks belong in every CI run for the framework.

def test_correlation_preserved():
    """Sampled paths must reproduce the calibrated correlation within tolerance."""
    paths = generate_scenarios(config)
    increments = np.diff(paths, axis=1)                # per-step innovations
    flat = increments.reshape(-1, len(config.risk_factors))
    empirical = np.corrcoef(flat, rowvar=False)
    target = np.asarray(config.correlation)
    assert np.allclose(empirical, target, atol=0.02), "Correlation drift in sampler"


def test_determinism():
    """Same seed and params must regenerate a byte-identical tensor."""
    a = generate_scenarios(config)
    b = generate_scenarios(config)
    assert np.array_equal(a, b), "Non-deterministic scenario generation"
    assert config.fingerprint() == config.fingerprint()

Moment matching. Assert that the sampled mean, volatility, and correlation reproduce the calibration targets within a stated tolerance (the atol=0.02 above). A failure means the sampler, not the assumptions, is biased.
Determinism regression. Regenerate against a stored golden tensor for a fixed seed and assert bitwise equality. This is the single most important test in the suite — it is the property an examiner relies on.
Distributional gates. Wrap the scenario output in a Great Expectations checkpoint that asserts no NaN/inf, that terminal rates stay within economically plausible bounds, and that the null rate is zero. The same expectation-suite pattern used at ingestion applies to generated output.
Drift surveillance. Track the Population Stability Index of the generated distribution against the prior filing’s scenario set; a PSI above 0.25 signals a material shift that must route into review before use. The banding and adaptive thresholds come from Dynamic Threshold Tuning for Assumption Drift, which sits in the Assumption Validation & Rule Engine Design control plane.

For risk-neutral scenario sets, add a martingale test: the discounted expected value of each traded asset must equal its initial price within Monte Carlo error, or the set violates no-arbitrage and cannot be used for market-consistent valuation.

Failure modes and gotchas

Global-RNG contamination. Using np.random.seed() or the module-level np.random.normal lets any concurrent task advance shared state, so a parallel batch changes the numbers non-deterministically. Always pass a local default_rng and derive child streams with spawn.
Non-positive-definite correlation. Empirical correlation matrices estimated from short or misaligned histories frequently have a tiny negative eigenvalue, and np.linalg.cholesky raises LinAlgError. The fix is eigenvalue clipping or shrinkage toward a well-conditioned target before factorization — never a silent try/except that skips the correlation step, which would ship uncorrelated scenarios into a filed reserve.
float32 tail error. Casting the scenario tensor to float32 to save memory injects rounding error that concentrates exactly where CTE 70 reads — the extreme tail. Keep the sampling and accumulation in float64; down-cast only for archival storage, if at all.
Seed reuse across factors. Reusing one seed to draw each factor independently induces spurious correlation. Draw the full (scenarios, steps, factors) block from a single generator and impose correlation through $L$ , as above.
Too few paths for the tail. A CTE 70 or CTE 98 estimate from a few hundred scenarios has a standard error wide enough to swing the reserve materially between runs. Document a convergence study and pin the minimum path count in the manifest.
Silent parameter drift. If the covariance inputs change but the fingerprint is not recomputed, the audit trail points at the wrong scenario set. Recompute and persist the fingerprint on every parameter change, and fail the run if the stored hash and the live hash disagree.

Frequently Asked Questions

Why impose correlation with Cholesky instead of sampling from a multivariate normal directly?

They are mathematically equivalent, but the Cholesky factor $L$ is an explicit, cacheable, version-controllable artifact you can hash into the manifest and reuse across factor processes. Sampling from an opaque multivariate routine hides the dependence structure that an examiner needs to inspect. Factoring once and reusing $L$ is also faster at filing scale.

How many scenarios does a VM-20 stochastic reserve need?

The Valuation Manual does not fix a count; it requires the CTE 70 estimate to be stable. In practice companies run a convergence study and settle at 5,000–10,000 real-world paths, documenting the standard error of the CTE at the chosen count. When runtime is binding, low-discrepancy sequences and variance reduction buy accuracy more cheaply than raw path count.

How do I keep a stochastic reserve reproducible across a parallel run?

Seed one numpy.random.default_rng(seed) per run, record the seed and the correlation fingerprint in the manifest, and derive independent worker streams with rng.spawn(n) rather than reseeding each batch. The same seed and parameters must always regenerate the identical scenario tensor, which the determinism regression test enforces on every build.

What is the difference between a real-world and a risk-neutral scenario set here?

Real-world sets use calibrated physical drifts and are used for CTE-based reserves and economic capital; risk-neutral sets are calibrated so discounted asset prices are martingales and are used for market-consistent option and guarantee valuation. The engine is the same; only the drift calibration and the validation test (moment matching versus the martingale test) differ.

Generating Monte Carlo Scenarios with NumPy and SciPy — low-discrepancy sequences, inverse-transform sampling, and variance reduction.
Schema Validation with Pydantic & Great Expectations — the contract that validates covariance and volatility inputs.
Async Batch Processing for Large Models — running millions of policy-scenario evaluations under a filing deadline.
Economic Scenario Mapping & Yield Curve Alignment — no-arbitrage and discount-curve validation for the inputs.
NAIC VM-20 Compliance Frameworks — how the CTE 70 stochastic reserve maps to the Valuation Manual.

Up a level: Actuarial Model Ingestion & Testing Workflows — the end-to-end pipeline this scenario engine plugs into.

The problem this subsystem solves #

Architecture of the scenario subsystem #

Prerequisites #

Core implementation: seeded correlated path generation #

Configuration and tuning #

Step-by-step implementation walkthrough #

Validation and testing #

Failure modes and gotchas #

Frequently Asked Questions #

Why impose correlation with Cholesky instead of sampling from a multivariate normal directly? #

How many scenarios does a VM-20 stochastic reserve need? #

How do I keep a stochastic reserve reproducible across a parallel run? #

What is the difference between a real-world and a risk-neutral scenario set here? #

Related #