Pandas fillna inplace issue: detection and resolution
Missing values remain unchanged after pandas fillna usually appear in production pipelines loading CSV exports or API JSON payloads, where developers rely on inplace=True to mutate the DataFrame. This silently leaves NaNs in the data, breaking downstream analytics and model training.
# Example showing the issue
import pandas as pd
df = pd.DataFrame({'a': [1, None, 3], 'b': [None, 2, 3]})
print('Before fillna:')
print(df)
# Attempt to fill NaNs in column 'a' in place
df['a'].fillna(0, inplace=True)
print('After fillna on Series with inplace=True:')
print(df)
# Output shows column 'a' still contains NaN
Calling fillna on a Series obtained via df[‘col’] returns a copy, so inplace=True only modifies that temporary object. Pandas design treats column selection as a view that does not propagate in‑place changes back to the parent DataFrame. This behavior follows pandas’ internal indexing rules and often surprises developers transitioning from mutable NumPy arrays. Related factors:
- Column selection creates a new Series object
- inplace flag does not affect the original DataFrame
- Chained assignment can produce a copy instead of a view
To diagnose this in your code:
# Verify that NaNs remain after fillna
if df['a'].isna().any():
print('NaNs still present in column a')
else:
print('Column a successfully filled')
Fixing the Issue
Quick Fix (assign back):
df['a'] = df['a'].fillna(0)
When to use: Quick debugging or exploratory notebooks. Tradeoff: Doesn’t catch hidden copies elsewhere.
Best Practice Solution (production‑ready with validation):
import logging
def fillna_strict(df, column, value):
if column not in df.columns:
raise KeyError(f"Column {column} missing")
if df[column].isna().any():
logging.info(f"Filling {df[column].isna().sum()} NaNs in '{column}' with {value}")
df[column] = df[column].fillna(value)
assert not df[column].isna().any(), f"NaNs remain in {column} after fillna"
return df
df = fillna_strict(df, 'a', 0)
When to use: Production data pipelines, CI checks, and team codebases. Why better: Explicit assignment guarantees mutation, logs the operation, and asserts that no NaNs survive, preventing silent data quality issues.
What Doesn’t Work
❌ Calling df.fillna(…, inplace=True) then reassigning df = df.fillna(…): This overwrites the DataFrame with None and loses data
❌ Using .replace(np.nan, value, inplace=True) on the DataFrame after fillna: It masks the original issue without confirming NaNs are gone
❌ Switching to df = df.apply(lambda s: s.fillna(0)): This creates a new DataFrame unnecessarily and can be slower
- Using df[‘col’].fillna(…, inplace=True) expecting the original DataFrame to change
- Chaining fillna after another operation that returns a copy, e.g., df.head().fillna(…)
- Assuming inplace=True returns the DataFrame and reassigning it, e.g., df = df.fillna(…, inplace=True)
When NOT to optimize
- Exploratory notebooks: One‑off analysis where speed outweighs strict validation
- Very small datasets: Under a few rows, the overhead of logging is negligible
- Known one‑to‑many fill patterns: When you intentionally keep NaNs for later imputation
- One‑time scripts: Ad‑hoc data cleaning that won’t be reused
Frequently Asked Questions
Q: Why doesn’t inplace=True work on a Series obtained via df[‘col’]?
Because the Series is a copy, so the in‑place flag only affects that temporary object.
Q: Is inplace=True deprecated in pandas?
It still works but many developers prefer explicit assignment for clarity.
Missing value handling is a silent killer in production ETL jobs. By explicitly assigning the result of fillna and adding validation, you ensure data quality stays intact and avoid hard‑to‑track bugs downstream. Remember, clear mutation semantics beat ambiguous inplace tricks every time.
Related Issues
→ Fix pandas fillna not working on specific columns → Why pandas assign vs inplace gives unexpected DataFrame → Why pandas map vs replace give different results → Fix pandas outer join creates NaN rows