Supported Types¶

dfguard accepts whatever types your library accepts as annotations. There is no closed list compiled into the library; type dispatch is structural. This means any new type added in a future library release works automatically without a dfguard update. The scalar type tests in the test suite are generated at runtime from the installed library version to verify this claim.

PySpark

All annotations are instances, not classes (T.LongType(), not T.LongType). Complex types take arguments and enforce inner types recursively at the schema level.

Numeric

Annotation	Spark SQL type	Notes
`T.ByteType()`	`byte`	8-bit signed integer
`T.ShortType()`	`smallint`	16-bit signed integer
`T.IntegerType()`	`int`
`T.LongType()`	`bigint`
`T.FloatType()`	`float`	32-bit
`T.DoubleType()`	`double`	64-bit
`T.DecimalType(precision, scale)`	`decimal(p,s)`	arbitrary precision

String and binary

Annotation	Spark SQL type	Notes
`T.StringType()`	`string`
`T.CharType(n)`	`char(n)`	fixed-length (>= 3.3)
`T.VarcharType(n)`	`varchar(n)`	variable-length with max (>= 3.3)
`T.BinaryType()`	`binary`	raw bytes

Boolean and temporal

Annotation	Spark SQL type	Notes
`T.BooleanType()`	`boolean`
`T.DateType()`	`date`
`T.TimestampType()`	`timestamp`	with timezone (legacy Spark behaviour)
`T.TimestampNTZType()`	`timestamp_ntz`	no timezone (>= 3.4)
`T.DayTimeIntervalType(start, end)`	`interval`	day-time interval (>= 3.3)
`T.YearMonthIntervalType(start, end)`	`interval`	year-month interval (>= 3.3)

Complex and nested

Annotation	Spark SQL type	Notes
`T.ArrayType(elementType)`	`array<T>`	inner type enforced
`T.MapType(keyType, valueType)`	`map<K,V>`	key and value types enforced
`T.StructType([T.StructField(...), ...])`	`struct<...>`	fully recursive
`SparkSchema` subclass (as a field type)	`struct<...>`	converted to StructType automatically

Nullability

Annotation	Meaning
`T.LongType()`	declares `nullable=False` in the schema; only checked in strict mode (`subset=False`)
`Optional[T.StringType()]`	declares `nullable=True`; use `dfguard.pyspark.Optional` for dtype instances

Full PySpark DataTypes reference

pandas

Note

Use pd.ArrowDtype for nested types. pd.ArrowDtype(pa.list_(pa.struct([...]))) gives pandas full inner-type enforcement at every depth, the same as PySpark and Polars. This is where PyArrow-backed pandas surpasses every other pandas dtype.

dfguard dispatches on the kind of annotation, not a hard-coded list. Any dtype that is an instance of np.dtype or pd.api.extensions.ExtensionDtype is accepted automatically, including third-party extension types.

NumPy dtypes

Annotation	pandas dtype	Notes
`np.dtype("int8")` / `np.int8`	`int8`
`np.dtype("int16")` / `np.int16`	`int16`
`np.dtype("int32")` / `np.int32`	`int32`
`np.dtype("int64")` / `np.int64`	`int64`
`np.dtype("uint8")` / `np.uint8`	`uint8`
`np.dtype("uint16")` / `np.uint16`	`uint16`
`np.dtype("uint32")` / `np.uint32`	`uint32`
`np.dtype("uint64")` / `np.uint64`	`uint64`
`np.dtype("float16")` / `np.float16`	`float16`
`np.dtype("float32")` / `np.float32`	`float32`
`np.dtype("float64")` / `np.float64`	`float64`
`np.dtype("complex64")` / `np.complex64`	`complex64`
`np.dtype("complex128")` / `np.complex128`	`complex128`
`np.dtype("bool")` / `np.bool_`	`bool`
`np.dtype("object")`	`object`	Python objects (see note below)
`np.dtype("datetime64[ns]")`	`datetime64[ns]`
`np.dtype("timedelta64[ns]")`	`timedelta64[ns]`

pandas nullable extension dtypes

Annotation	pandas dtype	Notes
`pd.Int8Dtype()`	`Int8`	nullable
`pd.Int16Dtype()`	`Int16`	nullable
`pd.Int32Dtype()`	`Int32`	nullable
`pd.Int64Dtype()`	`Int64`	nullable
`pd.UInt8Dtype()`	`UInt8`	nullable
`pd.UInt16Dtype()`	`UInt16`	nullable
`pd.UInt32Dtype()`	`UInt32`	nullable
`pd.UInt64Dtype()`	`UInt64`	nullable
`pd.Float32Dtype()`	`Float32`	nullable
`pd.Float64Dtype()`	`Float64`	nullable
`pd.StringDtype()`	`string`	nullable
`pd.BooleanDtype()`	`boolean`	nullable
`pd.CategoricalDtype(categories, ordered)`	`category`
`pd.DatetimeTZDtype(tz=...)`	`datetime64[ns, tz]`	timezone-aware
`pd.IntervalDtype(subtype)`	`interval`
`pd.SparseDtype(dtype)`	`Sparse[dtype]`
`pd.PeriodDtype(freq)`	`period[freq]`

PyArrow-backed dtypes (pandas >= 1.5)

pd.ArrowDtype wraps any pyarrow.DataType and is a subclass of pd.api.extensions.ExtensionDtype. dfguard accepts it through the same structural path as every other extension dtype: no special handling, no hard-coded list. Any pa.* type, including ones not listed here, works automatically.

Integer and unsigned

Annotation	pandas dtype	Notes
`pd.ArrowDtype(pa.int8())`	`int8[pyarrow]`
`pd.ArrowDtype(pa.int16())`	`int16[pyarrow]`
`pd.ArrowDtype(pa.int32())`	`int32[pyarrow]`
`pd.ArrowDtype(pa.int64())`	`int64[pyarrow]`
`pd.ArrowDtype(pa.uint8())`	`uint8[pyarrow]`
`pd.ArrowDtype(pa.uint16())`	`uint16[pyarrow]`
`pd.ArrowDtype(pa.uint32())`	`uint32[pyarrow]`
`pd.ArrowDtype(pa.uint64())`	`uint64[pyarrow]`

Float and decimal

Annotation	pandas dtype	Notes
`pd.ArrowDtype(pa.float16())`	`halffloat[pyarrow]`
`pd.ArrowDtype(pa.float32())`	`float[pyarrow]`
`pd.ArrowDtype(pa.float64())`	`double[pyarrow]`
`pd.ArrowDtype(pa.decimal128(10, 2))`	`decimal128(10, 2)[pyarrow]`	arbitrary precision

Boolean, string, and binary

Annotation	pandas dtype	Notes
`pd.ArrowDtype(pa.bool_())`	`bool[pyarrow]`
`pd.ArrowDtype(pa.string())`	`string[pyarrow]`	UTF-8 variable-length
`pd.ArrowDtype(pa.large_string())`	`large_string[pyarrow]`	64-bit offsets
`pd.ArrowDtype(pa.binary())`	`binary[pyarrow]`	variable-length bytes
`pd.ArrowDtype(pa.large_binary())`	`large_binary[pyarrow]`	64-bit offsets
`pd.ArrowDtype(pa.fixed_size_binary(16))`	`fixed_size_binary[16][pyarrow]`	fixed-width bytes (e.g. UUIDs)

Temporal

Annotation	pandas dtype	Notes
`pd.ArrowDtype(pa.date32())`	`date32[day][pyarrow]`	days since epoch
`pd.ArrowDtype(pa.date64())`	`date64[ms][pyarrow]`	milliseconds since epoch
`pd.ArrowDtype(pa.time32("ms"))`	`time32[ms][pyarrow]`	`"s"` or `"ms"`
`pd.ArrowDtype(pa.time64("us"))`	`time64[us][pyarrow]`	`"us"` or `"ns"`
`pd.ArrowDtype(pa.timestamp("us"))`	`timestamp[us][pyarrow]`	optional tz: `pa.timestamp("us", tz="UTC")`
`pd.ArrowDtype(pa.duration("ms"))`	`duration[ms][pyarrow]`	`"s"`, `"ms"`, `"us"`, `"ns"`

Complex and nested

This is where PyArrow-backed pandas surpasses every other pandas dtype. Inner types are enforced at full depth, exactly like PySpark and Polars.

Annotation	pandas dtype	Notes
`pd.ArrowDtype(pa.list_(pa.int64()))`	`list<item: int64>[pyarrow]`	inner type enforced
`pd.ArrowDtype(pa.large_list(pa.string()))`	`large_list<item: string>[pyarrow]`	64-bit offsets
`pd.ArrowDtype(pa.list_(pa.list_(pa.float64())))`	`list<item: list<item: double>>[pyarrow]`	nested list
`pd.ArrowDtype(pa.struct([pa.field("x", pa.int64()), pa.field("y", pa.string())]))`	`struct<x: int64, y: string>[pyarrow]`	struct with named fields, fully recursive
`pd.ArrowDtype(pa.list_(pa.struct([pa.field("id", pa.int64()), pa.field("val", pa.float32())])))`	`list<item: struct<id: int64, val: float>>[pyarrow]`	list of dicts
`pd.ArrowDtype(pa.map_(pa.string(), pa.int64()))`	`map<string, int64>[pyarrow]`	key and value types enforced
`pd.ArrowDtype(pa.map_(pa.string(), pa.list_(pa.float64())))`	`map<string, list<item: double>>[pyarrow]`	map of string to list
`pd.ArrowDtype(pa.dictionary(pa.int32(), pa.string()))`	`dictionary<values=string, indices=int32>[pyarrow]`	dictionary-encoded (categorical)
`pd.ArrowDtype(pa.fixed_size_list(pa.float32(), 3))`	`fixed_size_list<item: float>[3][pyarrow]`	fixed-width list (e.g. embeddings)

Deeply nested example

import pyarrow as pa, pandas as pd

# list of structs, where one field is itself a list of floats
embedding_type = pd.ArrowDtype(
    pa.list_(
        pa.struct([
            pa.field("label", pa.string()),
            pa.field("scores", pa.list_(pa.float32())),
        ])
    )
)

class ModelOutput(dfg.PandasSchema):
    doc_id  = pd.ArrowDtype(pa.int64())
    results = embedding_type

Any pa.* type, including ones not shown here, is accepted without a dfguard update. The dispatch is structural, not a lookup table.

Note

pd.ArrowDtype gives pandas columns the same nested-type precision as Polars and PySpark. Use pd.ArrowDtype(pa.list_(pa.int64())) instead of list[int] when inner-type enforcement matters. The object dtype limitation does not apply to PyArrow-backed columns.

Python builtins and generics

Annotation	pandas dtype	Notes
`int`	`int64`
`float`	`float64`
`str`	`object`
`bool`	`bool`
`datetime.datetime`	`datetime64[ns]`
`datetime.timedelta`	`timedelta64[ns]`
`list[T]`, `dict`, `tuple`, `set`	`object`	inner type not enforced (use ArrowDtype instead)

Nullability

Annotation	Meaning
`np.dtype("int64")`	non-nullable (NaN collapses to float)
`pd.Int64Dtype()`	nullable integer (no NaN collapse)
`Optional[pd.StringDtype()]`	marks nullable intent; use `dfguard.pandas.Optional` for dtype instances
`np.int64 \| None`	native Python union syntax, also accepted

pandas dtype reference | pd.ArrowDtype reference | pandas PyArrow integration guide

Polars

Polars dtypes work as both classes (pl.Int64) and instances (pl.Datetime("ms", "UTC")). Both are accepted. Complex types enforce inner types at the schema level.

Integer

Annotation	Polars dtype	Notes
`pl.Int8`	`Int8`	8-bit signed
`pl.Int16`	`Int16`
`pl.Int32`	`Int32`
`pl.Int64`	`Int64`
`pl.UInt8`	`UInt8`	unsigned
`pl.UInt16`	`UInt16`
`pl.UInt32`	`UInt32`
`pl.UInt64`	`UInt64`

Float and numeric

Annotation	Polars dtype	Notes
`pl.Float32`	`Float32`
`pl.Float64`	`Float64`
`pl.Decimal(precision, scale)`	`Decimal(p,s)`	arbitrary precision

String, binary, and boolean

Annotation	Polars dtype	Notes
`pl.String` (alias `pl.Utf8`)	`String`
`pl.Binary`	`Binary`	raw bytes
`pl.Boolean`	`Boolean`
`pl.Categorical`	`Categorical`
`pl.Enum(["a", "b"])`	`Enum`	fixed set of strings

Temporal

Annotation	Polars dtype	Notes
`pl.Date`	`Date`
`pl.Datetime` / `pl.Datetime("ms", "UTC")`	`Datetime`	optional time unit + timezone
`pl.Duration` / `pl.Duration("ms")`	`Duration`
`pl.Time`	`Time`

Complex and nested

Annotation	Polars dtype	Notes
`pl.List(pl.String)`	`List(String)`	inner type enforced
`pl.Array(pl.Int64, 4)`	`Array(Int64, 4)`	fixed-width, inner type enforced
`pl.Struct({"a": pl.Int64})`	`Struct`	recursive, all field types enforced
`pl.Object`	`Object`	arbitrary Python objects
`pl.Null`	`Null`	all-null column

Python builtins and generics

Annotation	Polars dtype	Notes
`int`	`Int64`
`float`	`Float64`
`str`	`String`
`bool`	`Boolean`
`bytes`	`Binary`
`list[T]`	`List(T)`	inner type preserved
`datetime.datetime`	`Datetime`
`datetime.date`	`Date`
`datetime.timedelta`	`Duration`

Nullability

Annotation	Meaning
`pl.Int64`	physically nullable (all Polars columns are)
`Optional[pl.String]`	declares that nulls are intentional in this column
`pl.String \| None`	native Python union syntax, also accepted

Polars dtype reference

Runtime type coverage¶

Scalar types for PySpark and Polars are tested via runtime discovery: the test suite walks T.DataType.__subclasses__() and pl.DataType.__subclasses__() recursively at test time and runs every concrete, no-argument-constructible type through the conversion pipeline. New types added in future library releases are covered automatically. Complex nested types are tested with multi-level constructions (three-level nested struct, array of structs containing maps, etc.) to verify that inner types are enforced at every depth.