Mypy strict optional pitfalls with pandas DataFrames: detection and fix
Unexpected None values when reading pandas DataFrames often appear in production ETL pipelines that ingest CSV exports or API payloads, where columns may contain missing data. With mypy’s –strict optional mode this silently surfaces as type mismatches, breaking downstream calculations.
# Example showing the issue
import pandas as pd
from typing import Optional
df = pd.DataFrame({"value": [1.0, None]})
def get_first() -> float:
# mypy --strict flags the return as Optional[float]
return df["value"].iloc[0]
print(f"df rows: {len(df)}")
print(f"first value: {get_first()}")
# Runtime prints 1.0, but mypy reports a type mismatch
pandas accessor methods like .iloc can return NaN, which mypy represents as None in a Union type. Under –strict optional mypy forces you to handle the Union[float, None] explicitly, otherwise a type mismatch appears. This follows PEP 484’s strict optional semantics and pandas’ documentation that column values may be missing. Related factors:
- Columns contain NaN or None
- Accessors return nullable types
- Strict optional flag treats all missing values as Optional
To diagnose this in your code:
# Run mypy in strict mode
mypy --strict test.py
# Sample output
# test.py:7: error: Incompatible return value type (got "float | None", expected "float") [return-value]
# The warning points to the .iloc access returning a nullable type.
Fixing the Issue
Quick Fix (1‑Liner Solution)
return df["value"].iloc[0] # type: ignore[return-value]
When to use: Quick prototyping or when you know the column has no missing data. Tradeoff: Suppresses the type check, may hide real None values.
Best Practice Solution (Production‑Ready)
from typing import cast
import logging
def get_first_safe() -> float:
val = df["value"].iloc[0]
if val is None:
logging.error("Encountered None in 'value' column where a float is required")
raise ValueError("Missing value in DataFrame")
return cast(float, val)
When to use: Production pipelines where data quality matters. Why better: Explicit runtime guard, logs the problem, satisfies mypy’s strict optional checks without silencing warnings.
What Doesn’t Work
❌ Using df.fillna(0) after the function returns: This changes data semantics and may mask real missing values.
❌ Casting with cast(float, df[‘value’].iloc[0]) without a None check: mypy is satisfied but a None will raise at runtime.
❌ Switching to df[‘value’].astype(float) globally: Forces conversion but fails if non‑numeric missing values exist.
- Silencing mypy with # type: ignore instead of fixing the nullable type.
- Assuming .iloc always returns a concrete type and forgetting NaN handling.
- Skipping validation because the DataFrame is small and presumed clean.
When NOT to optimize
- Exploratory notebooks: One‑off analysis where missing values are acceptable.
- Small synthetic data: Fewer than 10 rows, overhead of validation is negligible.
- Known one‑to‑many patterns: When you deliberately allow None and handle it later.
- Legacy scripts: Temporary utilities that will be retired soon.
Frequently Asked Questions
Q: Can I disable strict optional just for pandas code?
You can add # type: ignore comments or use a separate mypy config section, but it hides real issues.
Handling nullable pandas values under mypy’s strict optional mode forces you to think about data quality early. By adding explicit checks or proper casts you keep both the type checker and runtime happy, preventing silent None propagation in production pipelines.
Related Issues
→ Why pandas nullable boolean dtype gives unexpected True → Fix pandas SettingWithCopyWarning false positive → Fix pandas fillna not working on specific columns → Fix pandas pivot_table values parameter missing