Automating Mortality Table Validation Against Industry Standards

Automating mortality table validation against industry standards means proving, on every run and without human judgement, that an incoming mortality_table agrees with a published basis — the 2017 CSO, the 2012 IAM, or an RPEC improvement scale such as MP-2021 — to within a documented tolerance, and that any deviation is captured as a logged, checksum-bound artifact rather than a silent number in a reserve model. This page builds the minimal Python routine that performs that comparison: a strict ingestion contract followed by a proportional log-ratio drift check against the benchmark. It is the specific, filing-ready technique behind the broader Mortality & Morbidity Rate Validation methodology and slots into the deterministic control plane described in Assumption Validation & Rule Engine Design.

The reason this warrants automation is that the failure is invisible. A table loaded with an off-by-one age offset, an improvement scale applied in the wrong direction, or a decimal-precision truncation on qx_rates all pass a naive “is it a number between 0 and 1” check and only surface years later as reserve inadequacy or a failed asset-adequacy test. The routine below converts each of those into a loud, logged FAIL at the ingestion boundary.

Minimal Working Example

Two focused pieces do the entire job. First, a contract that refuses malformed tables at the boundary — the same Schema Validation with Pydantic & Great Expectations discipline applied to a mortality basis. Second, a vectorized comparison that measures proportional drift against the benchmark rather than absolute differences, because a 0.0002 gap means one thing at age 25 and something entirely different at age 95.

from pydantic import BaseModel, field_validator, ValidationInfo


class MortalityTable(BaseModel):
    table_id: str
    basis: str                 # e.g. "2017 CSO ANB"
    valuation_date: str
    improvement_scale: str | None
    ages: list[int]
    qx_rates: list[float]

    @field_validator("ages")
    @classmethod
    def ages_in_domain(cls, v: list[int]) -> list[int]:
        if not v or min(v) < 0 or max(v) > 120:
            raise ValueError("ages must be a non-empty vector within [0, 120]")
        return v

    @field_validator("qx_rates")
    @classmethod
    def rates_valid(cls, v: list[float], info: ValidationInfo) -> list[float]:
        if any(q <= 0.0 or q > 1.0 for q in v):
            raise ValueError("every qx must lie in (0, 1]")
        ages = info.data.get("ages", [])
        for i in range(1, len(v)):
            if i < len(ages) and ages[i] > 65 and v[i] < v[i - 1]:
                raise ValueError("ultimate mortality must be non-decreasing past age 65")
        return v

The proportional deviation of the company rate $q^{\text{co}}_x$ from the industry rate $q^{\text{ind}}_x$ is computed on the log scale and mapped back to a percentage, so an over- and under-statement of the same ratio are treated symmetrically:

\delta_x \;=\; \exp\!\left(\ln\frac{q^{\text{co}}_x}{q^{\text{ind}}_x}\right) - 1 \;=\; \frac{q^{\text{co}}_x}{q^{\text{ind}}_x} - 1

import numpy as np


def validate_rate_drift(
    mortality_table: np.ndarray,
    benchmark_qx: np.ndarray,
    tolerance_pct: float = 0.05,
) -> dict:
    """Proportional drift of a table against a published industry basis."""
    safe_benchmark = np.where(benchmark_qx == 0.0, 1e-10, benchmark_qx)
    log_drift = np.log(mortality_table / safe_benchmark)
    pct_deviation = np.expm1(log_drift)              # e^x - 1, stable near zero

    within_band = np.abs(pct_deviation) <= tolerance_pct
    return {
        "max_positive_drift": float(np.max(pct_deviation)),
        "max_negative_drift": float(np.min(pct_deviation)),
        "out_of_tolerance_ages": np.where(~within_band)[0].tolist(),
        "overall_pass": bool(np.all(within_band)),
    }

How the Comparison Works, Block by Block

The schema is the cheapest defense and runs first. The ages_in_domain validator rejects an empty or out-of-range age vector — the signature of a truncated feed. The rates_valid validator enforces two invariants that catch the most common loading errors: every qx is a genuine probability in the half-open interval (0, 1], and ultimate mortality is non-decreasing past age 65. A monotonicity break in that band almost never reflects a real demographic effect; it is the fingerprint of a mis-sorted table or an inverted improvement scale, and it is exactly the kind of error that a spreadsheet review misses at three in the morning before a filing deadline.

The drift check operates on aligned NumPy arrays, which is what makes it fast enough to run inside a build pipeline over every table on every commit. Three design decisions carry the actuarial weight. The np.where(benchmark_qx == 0.0, 1e-10, ...) guard prevents a division-by-zero at ages where the published basis records a zero rate, without distorting any real ratio. Working through np.log and then np.expm1 — rather than a raw ratio-minus-one — keeps the arithmetic numerically stable at the very small qx values seen at young ages, where floating-point cancellation would otherwise inflate the reported drift. And the tolerance is expressed as a proportion, so a single tolerance_pct band is meaningful across the whole age range instead of being swamped by the absolute scale of old-age mortality.

The returned dictionary is deliberately structured for downstream consumption rather than for a human eye: out_of_tolerance_ages gives the exact indices an actuary must justify, max_positive_drift and max_negative_drift bound the worst case in each direction, and overall_pass is the single boolean a continuous-integration gate keys on. Feeding these thresholds from a YAML or JSON config — rather than hard-coding 0.05 — lets term, whole-life, and annuity blocks carry different strictness without redeploying the engine, the same configuration-driven pattern developed in Dynamic Threshold Tuning for Assumption Drift.

Edge Cases and Production Hardening

Missing age bands. Real feeds arrive with gaps — a reinsurance extract that stops at age 100, an experience study thin above age 95. A hard failure there blocks a filing; a silent zero corrupts a reserve. The fix is a deterministic fallback that interpolates, caps at a regulatory maximum, and logs every substitution with a reason code and an audit id.

import logging
import pandas as pd
from dataclasses import dataclass, field

logger = logging.getLogger("actuarial.mortality.validation")


@dataclass
class FallbackResult:
    table_id: str
    applied_fallback: str
    affected_ages: list
    audit_trail_id: str
    resolved_qx: list = field(default_factory=list)


def resolve_missing_segments(qx_series: pd.Series, benchmark: pd.Series) -> FallbackResult:
    missing = qx_series.isna()
    if not missing.any():
        return FallbackResult(str(qx_series.name), "NONE", [], "", qx_series.tolist())

    resolved = qx_series.interpolate(method="linear").bfill().ffill()
    resolved = resolved.clip(lower=0.0, upper=benchmark.max() * 1.5)

    audit_id = f"FALLBACK_{pd.Timestamp.now(tz='UTC').strftime('%Y%m%d_%H%M%S')}"
    logger.warning("Missing qx for %s; linear interpolation applied. Audit: %s",
                   qx_series.name, audit_id)
    return FallbackResult(
        table_id=str(qx_series.name),
        applied_fallback="INTERPOLATION_WITH_BOUNDARY_CAP",
        affected_ages=missing[missing].index.tolist(),
        audit_trail_id=audit_id,
        resolved_qx=resolved.tolist(),
    )

Zero-rate and near-zero benchmark ages. Even with the 1e-10 guard, a benchmark that is legitimately near zero at young ages can produce an enormous pct_deviation from a tiny absolute gap and trip a false FAIL. Harden the check by flooring the comparison to ages where the benchmark exceeds a materiality threshold (for example, exclude ages where benchmark_qx < 1e-6) and validating those young ages against an absolute band instead.

Improvement-scale direction and vintage drift. The most dangerous silent error is an improvement scale applied with the wrong sign, which turns deterioration into improvement. Guard it by asserting that a projected qx after applying the scale is strictly below the base qx for a mortality improvement, and route any table whose successive vintage diverges materially from its validated baseline to recalibration rather than rolling it forward. That vintage-over-vintage comparison is where the Population Stability Index belongs. Where these validations run over portfolios of millions of policies, execute them as vectorized batches using the patterns in Pandas & NumPy for Actuarial Data Pipelines.

Compliance Note

The drift check earns its place in a filing only when its output is tamper-evident. Serialize each validation run to a canonical JSON payload, bind it to a SHA-256 checksum, and map every rule to the specific clause it satisfies — so that a reviewer can recompute the hash from the archived table and confirm the number in the actuarial memorandum came from that exact input and no other.

import hashlib
import json
import pandas as pd


def build_audit_package(drift_result: dict, table_id: str, config_version: str) -> dict:
    payload = json.dumps(drift_result, sort_keys=True, default=str)
    return {
        "table_id": table_id,
        "valuation_date": pd.Timestamp.now(tz="UTC").isoformat(),
        "config_version": config_version,
        "regulatory_mappings": ["VM-20 Section 9", "ASOP 52", "IFRS 17 B119"],
        "payload_checksum": hashlib.sha256(payload.encode("utf-8")).hexdigest(),
        "overall_pass": drift_result["overall_pass"],
    }

Under NAIC VM-20 Section 9, the prudent-estimate mortality assumption must be built from a company experience study, graded to an industry table through a documented credibility procedure, and reproducible on demand during examination; ASOP 52 requires that the blending method itself be defensible, and IFRS 17 requires that morbidity and mortality rates in the fulfilment cash flows be traceable to their source. A checksum-bound drift report against a named published basis is precisely the artifact those standards expect — it turns “we validated the table” into a self-verifying record an examiner can reproduce. The filing-package structure that consumes these records is detailed in NAIC VM-20 Compliance Frameworks, and the hash-chained, append-only storage that makes them tamper-evident in Actuarial Audit Trail Architecture.

Schema Validation with Pydantic & Great Expectations — the ingestion-boundary contract this technique builds on.
Dynamic Threshold Tuning for Assumption Drift — config-driven tolerance bands and vintage-over-vintage drift detection.
NAIC VM-20 Compliance Frameworks — the actuarial-memorandum structure that consumes these validation records.
Actuarial Audit Trail Architecture — hash-chained, WORM-backed storage for the audit package.
Pandas & NumPy for Actuarial Data Pipelines — vectorized execution for portfolio-scale validation.

Up a level: Mortality & Morbidity Rate Validation — the rate-validation methodology this technique implements — part of Assumption Validation & Rule Engine Design.

Minimal Working Example #

How the Comparison Works, Block by Block #

Edge Cases and Production Hardening #

Compliance Note #

Related #

Minimal Working Example

How the Comparison Works, Block by Block

Edge Cases and Production Hardening

Compliance Note

Related