Why deepcopy on pandas DataFrames causes infinite recursion

Deepcopy vs shallow copy in pandas: recursion pitfall and fix

Infinite recursion in pandas DataFrame copy usually appears in production pipelines that clone large nested structures, where a DataFrame holds a reference back to itself via a column of objects. This results from deepcopy traversing the cycle and exhausting Python’s recursion limit, silently breaking downstream processing.

# Example showing the issue
import pandas as pd
import copy

# Create a self‑referencing DataFrame
df = pd.DataFrame({'a': [1, 2]})
df['self'] = [df, df]  # each cell points back to the whole DataFrame
print(f"df rows: {len(df)}")
try:
    df_copy = copy.deepcopy(df)
except RecursionError as e:
    print('RecursionError:', e)
# Output shows RecursionError caused by infinite recursion

The DataFrame contains a column that references the DataFrame itself. deepcopy follows every reference it encounters, sees the self‑reference, and starts copying the original DataFrame again, creating an endless loop until Python’s recursion limit is reached. This behavior follows the semantics of the copy module documented in the Python standard library. Related factors:

Object columns holding the parent DataFrame
Cyclic references introduced by custom objects
No explicit deepcopy handling for pandas objects

To diagnose this in your code:

# Detect self‑referencing columns in a DataFrame
self_refs = [col for col in df.columns if df[col].apply(lambda x: x is df).any()]
print('Self‑referencing columns:', self_refs)

Fixing the Issue

The quickest way to avoid the crash is to use a shallow copy, which does not traverse the object graph:

df_copy = df.copy(deep=False)

For production code you should guard against self‑references before deep copying:

import copy, logging

def safe_deepcopy(frame: pd.DataFrame) -> pd.DataFrame:
    # Identify columns that point back to the frame
    self_cols = [c for c in frame.columns if frame[c].apply(lambda x: x is frame).any()]
    # Remove them temporarily
    temp = frame.drop(columns=self_cols)
    # Perform a true deepcopy on the safe subset
    result = copy.deepcopy(temp)
    # Re‑attach empty placeholders to keep the schema intact
    for c in self_cols:
        result[c] = None
    return result

logging.info('Performing safe deepcopy of DataFrame')
df_safe = safe_deepcopy(df)
assert len(df_safe) == len(df), 'Row count mismatch after safe deepcopy'

The production approach logs any problematic columns, strips them before deep copying, and restores the expected schema, preventing silent recursion failures.

What Doesn’t Work

❌ Using sys.setrecursionlimit(10000) to avoid RecursionError: merely postpones the crash and can crash the interpreter later.

❌ Applying df.apply(lambda x: copy.deepcopy(x)) row‑wise: still traverses the self‑reference and raises the same recursion problem.

❌ Dropping the self‑referencing column after deepcopy: the error occurs before you get a chance to drop it.

Calling copy.deepcopy on a DataFrame with object columns without checking for cycles
Assuming df.copy() is always a deep copy; it defaults to deep=True but still follows object references
Increasing sys.setrecursionlimit to silence the problem instead of fixing the cycle

When NOT to optimize

Exploratory notebooks: Small, throw‑away analyses where performance is not critical.
One‑off scripts: Data migrations run once, the risk of recursion is negligible.
Known one‑to‑many relationships: When self‑references are intentional and handled downstream.
Tiny datasets: Fewer than a few dozen rows, the overhead of extra checks outweighs the benefit.

Frequently Asked Questions

Q: Can I safely deepcopy a pandas DataFrame that contains custom objects?

Only if those objects do not hold references back to the DataFrame; otherwise you must break the cycle first.

Q: Is shallow copy always safe for large DataFrames?

Shallow copy copies the underlying data buffers, so it is memory‑efficient and avoids recursion, but modifications affect the original.

Self‑referencing structures are a hidden source of crashes in data pipelines that rely on deepcopy. By detecting and removing cyclic references—or by opting for a shallow copy—you keep your pandas workflows robust and avoid unexpected recursion limits. Remember to validate the copy before downstream processing.

→ Fix pandas SettingWithCopyWarning false positive → Why Python GC tunables slow pandas DataFrame processing → Why pandas concat uses more memory than append → Why CPython ref counting vs GC impacts memory

Deepcopy vs shallow copy in pandas: recursion pitfall and fix#

Fixing the Issue#

What Doesn’t Work#

When NOT to optimize#

Frequently Asked Questions#

Related Issues#

Deepcopy vs shallow copy in pandas: recursion pitfall and fix

Fixing the Issue

What Doesn’t Work

When NOT to optimize

Frequently Asked Questions

Related Issues