Enforcement
===========

dfguard enforces schema annotations at the function call site. There are two
ways to enable this. Pick one; you do not need both.

----

arm() / disarm()
----------------

``arm()`` is the preferred approach for packages. Call it once from your entry
point and every annotated function in the entire package is enforced
automatically. No decorator on each function.

``disarm()`` silences all enforcement globally. Useful in tests where you want
to exercise transform logic without schema-valid fixtures.

.. tab-set::

   .. tab-item:: PySpark
      :sync: pyspark

      .. code-block:: python

         # my_pipeline/__init__.py
         import dfguard.pyspark as dfg
         dfg.arm()                 # subset=True: extra columns fine
         # dfg.arm(subset=False)   # strict: exact match everywhere

      .. autofunction:: dfguard.pyspark._enforcement.arm

      .. autofunction:: dfguard.pyspark._enforcement.disarm

   .. tab-item:: pandas
      :sync: pandas

      .. code-block:: python

         # my_pipeline/__init__.py
         import dfguard.pandas as dfg
         dfg.arm()

      .. autofunction:: dfguard.pandas._enforcement.arm

      .. autofunction:: dfguard.pandas._enforcement.disarm

   .. tab-item:: Polars
      :sync: polars

      .. code-block:: python

         # my_pipeline/__init__.py
         import dfguard.polars as dfg
         dfg.arm()

      .. autofunction:: dfguard.polars._enforcement.arm

      .. autofunction:: dfguard.polars._enforcement.disarm

``arm()`` has no effect and emits a warning when called from ``__main__``
(a file run directly as a script). Use ``@enforce`` there instead.

----

@enforce
--------

A per-function decorator for scripts and notebooks. Only checks parameters
annotated with a schema type; all other arguments pass through untouched.

.. tab-set::

   .. tab-item:: PySpark
      :sync: pyspark

      .. code-block:: python

         @dfg.enforce
         def enrich(df: OrderSchema, label: str, limit: int = 10):
             # only df is checked; label and limit are not touched
             return df.withColumn("revenue", F.col("amount") * F.col("quantity"))

      .. autofunction:: dfguard.pyspark._enforcement.enforce

   .. tab-item:: pandas
      :sync: pandas

      .. code-block:: python

         @dfg.enforce
         def enrich(df: OrderSchema, label: str):
             return df.assign(revenue=df["amount"] * df["quantity"])

      .. autofunction:: dfguard.pandas._enforcement.enforce

   .. tab-item:: Polars
      :sync: polars

      .. code-block:: python

         @dfg.enforce
         def enrich(df: OrderSchema, label: str):
             return df.with_columns(revenue=pl.col("amount") * pl.col("quantity"))

      .. autofunction:: dfguard.polars._enforcement.enforce

----

The subset flag
---------------

Both ``arm()`` and ``@enforce`` accept a ``subset`` parameter.

- ``subset=True`` (default): declared columns must be present with correct types;
  extra columns in the DataFrame are fine.
- ``subset=False``: exact match required; extra columns are also an error.

``arm(subset=False)`` sets the global default. ``@enforce(subset=True)`` overrides
it for that function only. Function level always wins.

``schema_of(df)`` types always use exact matching, regardless of ``subset``.