Enforcement¶

dfguard enforces schema annotations at the function call site. There are two ways to enable this. Pick one; you do not need both.

arm() / disarm()¶

arm() is the preferred approach for packages. Call it once from your entry point and every annotated function in the entire package is enforced automatically. No decorator on each function.

disarm() silences all enforcement globally. Useful in tests where you want to exercise transform logic without schema-valid fixtures.

PySpark

# my_pipeline/__init__.py
import dfguard.pyspark as dfg
dfg.arm()                 # subset=True: extra columns fine
# dfg.arm(subset=False)   # strict: exact match everywhere

dfguard.pyspark._enforcement.arm(module=None, *, package=None, subset=True)[source]¶

Arm the entire calling package and set the global subset default.

Call once from your entry point, __init__.py, or settings.py (Kedro):

import dfguard.pyspark as dfg

dfg.arm()                # subset=True (default): extra columns are fine
dfg.arm(subset=False)    # exact match: no extra columns allowed anywhere

The subset value becomes the global default. Individual functions decorated with @dfg.enforce(subset=...) override it for that function only.

If called when already armed, re-enables enforcement (sets _ENABLED = True) without re-walking the package.

Specific module object:

dfg.arm(my_module)

Explicit package name:

dfg.arm(package="my_pipeline.nodes")

Return type:

None

Parameters:

module (Any)
package (str | None)
subset (bool)

dfguard.pyspark._enforcement.disarm()[source]¶

Turn off all enforcement globally. Call arm() to re-enable.

Return type:: None

pandas

# my_pipeline/__init__.py
import dfguard.pandas as dfg
dfg.arm()

dfguard.pandas._enforcement.arm(module=None, *, package=None, subset=True)¶

Return type:

None

Parameters:

module (Any)
package (str | None)
subset (bool)

dfguard.pandas._enforcement.disarm()[source]¶

Turn off all enforcement globally. Call arm() to re-enable.

Return type:: None

Polars

# my_pipeline/__init__.py
import dfguard.polars as dfg
dfg.arm()

dfguard.polars._enforcement.arm(module=None, *, package=None, subset=True)¶

Return type:

None

Parameters:

module (Any)
package (str | None)
subset (bool)

dfguard.polars._enforcement.disarm()[source]¶

Turn off all enforcement globally. Call arm() to re-enable.

Return type:: None

arm() has no effect and emits a warning when called from __main__ (a file run directly as a script). Use @enforce there instead.

@enforce¶

A per-function decorator for scripts and notebooks. Only checks parameters annotated with a schema type; all other arguments pass through untouched.

PySpark

@dfg.enforce
def enrich(df: OrderSchema, label: str, limit: int = 10):
    # only df is checked; label and limit are not touched
    return df.withColumn("revenue", F.col("amount") * F.col("quantity"))

dfguard.pyspark._enforcement.enforce(func=None, *, subset=<object object>)[source]¶

Overloads:

func (F) → F
func (None), subset (bool) → Callable[[F], F]

Parameters:

func (F | None)
subset (Any)

Return type:

F | Callable[[F], F]

Validate schema annotations on DataFrame arguments.

Only intercepts parameters annotated with a dfg.schema_of type or a dfg.SparkSchema subclass. All other arguments are left completely alone.

Default: inherits the global subset set by dfg.arm():

@dfg.enforce def process(df: OrderSchema, label: str): …

subset=True: extra columns in the DataFrame are fine (overrides global):

@dfg.enforce(subset=True)
def process(df: OrderSchema): ...

subset=False: DataFrame must match the schema exactly (overrides global):

@dfg.enforce(subset=False)
def process(df: OrderSchema): ...

pandas

@dfg.enforce
def enrich(df: OrderSchema, label: str):
    return df.assign(revenue=df["amount"] * df["quantity"])

dfguard.pandas._enforcement.enforce(func=None, *, subset=<object object>)¶

Return type:

Union[TypeVar(F, bound= Callable[..., Any]), Callable[[TypeVar(F, bound= Callable[..., Any])], TypeVar(F, bound= Callable[..., Any])]]

Parameters:

func (F | None)
subset (Any)

Polars

@dfg.enforce
def enrich(df: OrderSchema, label: str):
    return df.with_columns(revenue=pl.col("amount") * pl.col("quantity"))

dfguard.polars._enforcement.enforce(func=None, *, subset=<object object>)¶

Return type:

Union[TypeVar(F, bound= Callable[..., Any]), Callable[[TypeVar(F, bound= Callable[..., Any])], TypeVar(F, bound= Callable[..., Any])]]

Parameters:

func (F | None)
subset (Any)

The subset flag¶

Both arm() and @enforce accept a subset parameter.

subset=True (default): declared columns must be present with correct types; extra columns in the DataFrame are fine.
subset=False: exact match required; extra columns are also an error.

arm(subset=False) sets the global default. @enforce(subset=True) overrides it for that function only. Function level always wins.

schema_of(df) types always use exact matching, regardless of subset.