Schema Validation with Pydantic & Great Expectations

Compliance teams and Python developers can no longer rely on manual spreadsheet reconciliations or ad-hoc data checks when preparing statutory filings under NAIC VM-20, IFRS 17, or Solvency II. A statutory reserve is only as defensible as the data that fed it, and examiners increasingly ask carriers to prove that every in-force extract passed a documented, reproducible validation gate before it reached a projection engine. This guide shows how to build that gate by pairing Pydantic for deterministic structural validation with Great Expectations for probabilistic data-quality monitoring — a two-layer contract that rejects malformed records outright and flags statistically drifted ones before they corrupt a reserve calculation. It is the ingestion boundary for the broader Actuarial Model Ingestion & Testing Workflows reference architecture, and the enforcement point that keeps everything downstream deterministic.

Pydantic is the hard gate that rejects malformed rows; Great Expectations is the soft gate that flags distributional drift before release.

The Validation Problem This Solves

Actuarial ingestion failures are rarely loud. A premium column silently coerced from string to float, a mortality improvement scale applied a year out of date, or a lapse-rate extract whose distribution shifted after a source-system migration will all pass a naive load and only surface as an unexplained movement in the reserve roll-forward — often after the filing has gone out. VM-20 Section 3 requires that principle-based reserves rest on assumptions and data whose provenance can be demonstrated, and the Federal Reserve’s SR 11-7 sets the expectation that model inputs are validated with the same rigour as model code. Neither standard is satisfied by “the load ran without an exception.”

The failure modes split cleanly into two categories, and each demands a different tool:

Structural defects — wrong type, out-of-range value, missing mandatory field, broken cross-field invariant (a claim date after a policy termination date). These are deterministic: a record is either well-formed or it is not. Pydantic schema enforcement handles this class, rejecting the offending row at the boundary with a precise, machine-readable error trace.
Distributional defects — the record is individually well-formed, but the population has drifted. Mean face amount doubled, the lapse-rate histogram flattened, a yield-curve tenor developed a fat tail. No single row is “wrong,” yet the batch is unfit for projection. Great Expectations captures this class as statistical expectations asserted against a validated historical baseline.

Running only the first layer lets drifted-but-valid data through; running only the second wastes compute profiling records that should never have been admitted. The canonical pattern is to chain them: Pydantic first as a hard gate, Great Expectations second as a soft, alerting gate.

Architecture of the Two-Layer Gate

The subsystem sits directly on the ingestion edge. Raw extracts — CSV drops from a policy admin system, Parquet partitions from a data lake, rows from a valuation database — arrive untrusted. The Pydantic layer parses each record into a typed model, routing failures to a quarantine log rather than raising into the caller. Survivors are assembled into a validated frame that Great Expectations then profiles against a saved expectation suite. Only a batch that clears both layers is stamped filing-ready and released to the projection stage.

A single batch in time: per-row structural validation splits pass and quarantine streams before the validated frame reaches the statistical checkpoint and its release decision.

This ordering matters. Great Expectations profiles a whole column or table at once; if half the rows carry a coerced-null face amount, the distributional statistics are already poisoned before the suite runs. By making Pydantic the first gate, every value the statistical layer sees is guaranteed to be of the right type and within absolute bounds, so any expectation breach is a genuine distributional signal rather than an artefact of dirty input.

Prerequisites

Before implementing the gate, establish the following:

Python packages: pydantic>=2.5 (the v2 core is materially faster and introduces model_validate), great-expectations>=0.18, plus pandas and pyarrow for frame assembly and Parquet reads. The vectorized frame handling here builds on the patterns in Pandas & NumPy for Actuarial Data Pipelines; read that first if your extracts are large enough that per-row Python loops become the bottleneck.
A data contract. You cannot validate against a schema you have not written down. Enumerate every field the projection engine consumes — policy_id, valuation_date, face_amount, mortality_table, lapse_rate, issue_age, premium_mode — with its type, unit, permissible range, and nullability. The dedicated build guide for this contract, Validating Actuarial Input Schemas with Pydantic, walks through mapping an actuarial data dictionary to executable model fields and cross-field validators.
A baseline population. Great Expectations needs a trusted reference batch — typically the prior valuation’s post-validation extract — to profile expectations from. Store it immutably so the baseline itself is auditable.
Regulatory context. Know which clause each check answers to. Structural bounds on mortality rates trace to the prescribed-table requirements the NAIC VM-20 Compliance Frameworks guide covers; drift thresholds connect to the assumption-governance discipline enforced by the Assumption Validation & Rule Engine Design subsystem.

Core Implementation: The Structural Gate

The Pydantic layer models one input record. Strict types stop silent coercion, Field constraints enforce absolute bounds, and @field_validator / @model_validator hooks encode the actuarial invariants that a plain type system cannot express.

from datetime import date
from decimal import Decimal
from enum import Enum
from pydantic import BaseModel, Field, field_validator, model_validator, ValidationError


class PremiumMode(str, Enum):
    annual = "annual"
    semiannual = "semiannual"
    quarterly = "quarterly"
    monthly = "monthly"


class PolicyRecord(BaseModel):
    model_config = {"strict": True, "extra": "forbid"}

    policy_id: str = Field(min_length=1, max_length=32)
    valuation_date: date
    issue_date: date
    issue_age: int = Field(ge=0, le=120)
    face_amount: Decimal = Field(gt=0, le=Decimal("100_000_000"))
    mortality_table: str
    lapse_rate: float = Field(ge=0.0, le=1.0)
    premium_mode: PremiumMode

    @field_validator("mortality_table")
    @classmethod
    def known_table(cls, v: str) -> str:
        approved = {"2017_CSO_ANB", "2017_CSO_ALB", "VBT_2015"}
        if v not in approved:
            raise ValueError(f"mortality_table {v!r} not in approved set")
        return v

    @model_validator(mode="after")
    def issue_precedes_valuation(self) -> "PolicyRecord":
        if self.issue_date > self.valuation_date:
            raise ValueError("issue_date must not follow valuation_date")
        return self

A factory function turns an untrusted extract into a clean frame and a quarantine ledger, never letting a single bad row abort the batch:

import pandas as pd


def validate_extract(rows: list[dict]) -> tuple[pd.DataFrame, list[dict]]:
    accepted, quarantined = [], []
    for raw in rows:
        try:
            record = PolicyRecord.model_validate(raw)
            accepted.append(record.model_dump())
        except ValidationError as exc:
            quarantined.append({
                "policy_id": raw.get("policy_id", "UNKNOWN"),
                "errors": exc.errors(include_url=False),
            })
    return pd.DataFrame(accepted), quarantined

Each quarantined entry preserves the exact field, the failing value, and the rule that rejected it — the machine-readable trace an examiner or a model-risk reviewer expects when asking why a record was excluded from a valuation.

Core Implementation: The Statistical Gate

With a structurally clean frame in hand, Great Expectations asserts that its distribution still matches the approved baseline. Expectations are declarative, versioned, and stored as JSON, so the suite itself becomes a reviewable artefact.

import great_expectations as gx


def build_suite(context, validated_df: pd.DataFrame):
    suite = context.add_or_update_expectation_suite("policy_inforce_suite")

    validator = context.sources.pandas_default.read_dataframe(
        validated_df, asset_name="inforce_batch"
    )
    validator.expect_column_values_to_be_unique("policy_id")
    validator.expect_column_values_to_not_be_null("face_amount")
    validator.expect_column_mean_to_be_between(
        "lapse_rate", min_value=0.02, max_value=0.18
    )
    validator.expect_column_quantile_values_to_be_between(
        "face_amount",
        quantile_ranges={
            "quantiles": [0.5, 0.95],
            "value_ranges": [[75_000, 250_000], [500_000, 2_000_000]],
        },
    )
    validator.save_expectation_suite(discard_failed_expectations=False)
    return suite

Where a simple range check is too blunt, quantify drift explicitly with the Population Stability Index. Binning the incoming column against the baseline and summing the weighted log-ratio gives a single scalar that regulators and model-risk teams recognise:

\mathrm{PSI} = \sum_{i=1}^{B}\left(A_i - E_i\right)\,\ln\!\frac{A_i}{E_i}

where $A_i$ is the actual proportion of records falling in bin $i$ and $E_i$ is the expected (baseline) proportion across $B$ bins.

import numpy as np


def population_stability_index(expected: np.ndarray,
                               actual: np.ndarray,
                               bins: int = 10) -> float:
    edges = np.quantile(expected, np.linspace(0, 1, bins + 1))
    edges[0], edges[-1] = -np.inf, np.inf
    e_prop = np.histogram(expected, edges)[0] / len(expected)
    a_prop = np.histogram(actual, edges)[0] / len(actual)
    eps = 1e-6  # guard the log against empty bins
    e_prop, a_prop = e_prop + eps, a_prop + eps
    return float(np.sum((a_prop - e_prop) * np.log(a_prop / e_prop)))

The same PSI machinery drives the assumption-drift monitoring in Dynamic Threshold Tuning for Assumption Drift; using one implementation across ingestion and assumption governance keeps a single, defensible definition of “drift” across the pipeline.

Configuration and Tuning

The gate is only as good as its thresholds, and thresholds are judgement calls that must be documented, not buried in code. Externalise them so a change is a reviewable diff rather than a silent edit:

# validation_config.yaml, loaded at startup
psi_thresholds:
  green: 0.10      # no material shift — release
  amber: 0.25      # investigate; release under review flag
  # >= amber is red — hold the batch and escalate
lapse_rate_mean_bounds: [0.02, 0.18]
face_amount_p95_bounds: [500000, 2000000]
quarantine_reject_ratio: 0.05   # abort batch if >5% of rows fail structural gate

The PSI bands follow the convention widely used in model-risk practice: below 0.10 the population is stable, 0.10–0.25 warrants investigation, and above 0.25 signals a material shift that should hold the batch. Do not hard-code these against a single line of business — a term portfolio and a deferred-annuity block drift at different rates, so keep a threshold set per product cohort. The quarantine_reject_ratio is a circuit breaker: if more than a configured fraction of rows fail the structural gate, the extract itself is suspect (a bad source export, a schema migration upstream) and the batch should abort loudly rather than silently proceed on a partial population.

Step-by-Step Walkthrough

Write the data contract as a PolicyRecord model, one field per column the projection engine consumes, with strict types and Field bounds derived from the actuarial data dictionary.
Encode the invariants the type system cannot — known_table, issue_precedes_valuation — as @field_validator and @model_validator hooks so cross-field logic lives beside the schema.
Run the structural gate with validate_extract, accumulating clean rows into a frame and rejects into a quarantine ledger keyed by policy_id and the failing rule.
Check the circuit breaker. If the quarantine ratio exceeds quarantine_reject_ratio, abort and escalate — do not proceed on a partial batch.
Assemble and profile. Feed the validated frame to build_suite and compute population_stability_index for every monitored numeric column against the stored baseline.
Classify against thresholds. Map each PSI to green/amber/red; release, release-under-review, or hold accordingly.
Persist the audit artefact. Hash the validated frame, the expectation results, and the config version into an immutable record before releasing downstream.
Release to projection. Only a green-or-amber batch with a written audit record is handed to the scenario and projection stages.

Validation and Testing

The gate is itself model input, so under SR 11-7 it must be tested like model code. Three layers of assertion give confidence it behaves as specified:

Unit tests on the schema. Prove that each invariant rejects what it should and admits what it should. A red test suite here is the cheapest possible place to catch a loosened bound.

import pytest


def test_future_issue_date_rejected():
    with pytest.raises(ValidationError):
        PolicyRecord.model_validate({
            "policy_id": "P-001",
            "valuation_date": "2026-06-30",
            "issue_date": "2026-12-01",   # after valuation
            "issue_age": 45,
            "face_amount": "250000",
            "mortality_table": "2017_CSO_ANB",
            "lapse_rate": 0.06,
            "premium_mode": "annual",
        })


def test_unknown_mortality_table_rejected():
    with pytest.raises(ValidationError):
        PolicyRecord.model_validate({
            "policy_id": "P-002",
            "valuation_date": "2026-06-30",
            "issue_date": "2010-01-01",
            "issue_age": 30,
            "face_amount": "100000",
            "mortality_table": "1980_CSO",   # not approved
            "lapse_rate": 0.05,
            "premium_mode": "monthly",
        })

Great Expectations checkpoints. Run the saved suite as a checkpoint in CI against a golden batch so a change to the suite that would let drifted data through fails the build. The checkpoint’s JSON result — pass/fail per expectation — is the same artefact you archive for the filing.

Audit-log assertions. Assert that every released batch produced a hash-stamped record and that no batch classified red was ever released. This is the evidence the Actuarial Audit Trail Architecture subsystem consumes to build an examiner-ready lineage from raw extract to filed number.

Failure Modes and Gotchas

Silent coercion in Pydantic v1 habits. Without strict=True, Pydantic will happily turn the string "250000" into an int and "true" into a boolean. For actuarial data that masks exactly the source-system defects you are trying to catch. Always set strict mode on financial models and let coercion be an explicit, per-field decision.
PSI instability on sparse bins. An empty bin sends the log-ratio to infinity. The eps guard above is mandatory, and for small batches reduce the bin count — ten quantile bins on 200 records is noise, not signal.
Baseline staleness. A baseline profiled from data two valuations old will flag legitimate portfolio evolution as drift and drown reviewers in false alarms. Re-baseline on a documented cadence and record which baseline version each run was scored against.
Great Expectations profiling on unclean data. If the structural gate is bypassed “just this once,” coerced nulls poison every distributional statistic and the suite either false-passes or false-fails. Never let the statistical layer run on unvalidated input.
Blocking the batch pipeline on slow suites. Large expectation suites over multi-million-row frames can dominate runtime. When ingestion is throughput-bound, move suite execution off the critical path using the orchestration patterns in Async Batch Processing for Large Models, and profile before the extract reaches Stochastic Scenario Generation Frameworks so no Monte Carlo run ever consumes an unvalidated cohort.

Validating Actuarial Input Schemas with Pydantic — the deep-dive build guide for the structural contract.
Pandas & NumPy for Actuarial Data Pipelines — vectorized frame handling for the validated dataset.
Dynamic Threshold Tuning for Assumption Drift — PSI banding and adaptive thresholds for governed assumptions.
NAIC VM-20 Compliance Frameworks — the regulatory clauses these checks answer to.

Up a level: Actuarial Model Ingestion & Testing Workflows

The Validation Problem This Solves #

Architecture of the Two-Layer Gate #

Prerequisites #

Core Implementation: The Structural Gate #

Core Implementation: The Statistical Gate #

Configuration and Tuning #

Step-by-Step Walkthrough #

Validation and Testing #

Failure Modes and Gotchas #

Related Guides #