Why pandas concat uses more memory than append

Pandas concat vs append memory usage: detection and optimization

Unexpected memory spikes in pandas operations often appear in production pipelines that stitch together large CSV exports or API feeds, where developers concatenate DataFrames in a loop. This leads to excessive RAM consumption that can crash downstream analysis.

# Example showing the issue
import pandas as pd, numpy as np, sys

def mem(df):
    return df.memory_usage(deep=True).sum()

# Simulate 5 chunks of 200k rows each
chunks = [pd.DataFrame({'a': np.random.rand(200_000), 'b': np.random.randint(0,100,size=200_000)}) for _ in range(5)]

# Repeated append (deprecated but still works in older versions)
df_append = pd.DataFrame()
for chunk in chunks:
    df_append = df_append.append(chunk, ignore_index=True)
print('Append memory:', mem(df_append))

# Single concat
df_concat = pd.concat(chunks, ignore_index=True)
print('Concat memory:', mem(df_concat))
# Output shows concat uses less memory than the repeated append loop

Repeated DataFrame.append in a loop creates a brand‑new DataFrame on every iteration, copying both the accumulated result and the new chunk each time. Those intermediate copies inflate the process’s resident set size. A single pd.concat builds the final object once, avoiding the cascade of temporary copies. This behavior mirrors standard Python list concatenation costs. Related factors:

Number of iterations
Size of each chunk
Absence of in‑place extension API

To diagnose this in your code:

# Monitor memory during a loop
import psutil, os
process = psutil.Process(os.getpid())
for i, chunk in enumerate(chunks, 1):
    df = df.append(chunk, ignore_index=True)
    print(f'After iteration {i}: {process.memory_info().rss/1e6:.1f} MB')
# A steady upward trend signals the issue

Fixing the Issue

The quick, readable fix is to avoid the loop entirely:

df = pd.concat(chunks, ignore_index=True)

For production pipelines you may need explicit validation and logging:

import logging
logging.basicConfig(level=logging.INFO)

if len(chunks) > 1:
    logging.info('Concatenating %d chunks', len(chunks))
    df = pd.concat(chunks, ignore_index=True)
else:
    df = chunks[0]

# Verify memory footprint stays within expectations
expected = sum(c.memory_usage(deep=True).sum() for c in chunks)
actual = df.memory_usage(deep=True).sum()
assert actual <= expected * 1.1, f'Unexpected memory growth: {actual}'

This pattern logs the operation, guards against accidental use of .append in a loop, and asserts that the final DataFrame does not exceed a reasonable memory budget.

What Doesn’t Work

❌ Using df.reset_index(inplace=True) after each append: only reindexes, does not reduce copies

❌ Converting each chunk to a NumPy array then back to DataFrame: adds conversion overhead and larger memory peaks

❌ Calling .copy() on every chunk before appending: doubles memory usage each iteration

Appending inside a for‑loop without measuring memory
Calling pd.concat inside the loop instead of once
Neglecting to drop references to intermediate DataFrames

When NOT to optimize

Tiny datasets: Under a few thousand rows, the overhead is negligible
One‑off analysis: Interactive notebooks where speed matters more than RAM
Already using list‑of‑chunks: If you already have a list, concat is the natural step
Legacy code: Maintaining backward compatibility where append is still supported

Frequently Asked Questions

Q: Does pandas.append create a copy of the original DataFrame?

Yes, each call returns a new DataFrame, copying both the existing data and the new chunk.

Memory efficiency in pandas hinges on minimizing temporary DataFrames. By collecting chunks and calling pd.concat once, you keep RAM usage predictable and avoid hidden spikes that can derail production jobs. Keep an eye on the process RSS during any iterative build.

→ Why pandas merge on categorical columns slows down → Why buffer protocol speeds up pandas DataFrame I/O → Why pandas read_parquet loads faster than read_csv → Fix pandas merge using index gives wrong result

Pandas concat vs append memory usage: detection and optimization#

Fixing the Issue#

What Doesn’t Work#

When NOT to optimize#

Frequently Asked Questions#

Related Issues#

Pandas concat vs append memory usage: detection and optimization

Fixing the Issue

What Doesn’t Work

When NOT to optimize

Frequently Asked Questions

Related Issues