Sentry performance tip: avoid sending full pandas DataFrames

Performance degradation in Sentry integrations usually appears in production pipelines that process pandas DataFrames from SQL exports, API payloads, or log aggregations, where the full DataFrame is attached as extra context. The SDK then spends time serializing millions of cells, silently inflating latency and payload size.

# Example showing the issue
import time
import pandas as pd
import sentry_sdk

# Simulate a 1,000‑row DataFrame coming from a database export
df = pd.DataFrame({"id": range(1000), "value": [x * 2 for x in range(1000)]})
print(f"df rows: {len(df)}")

sentry_sdk.init(dsn="https://examplePublicKey@o0.ingest.sentry.io/0", traces_sample_rate=1.0)
start = time.time()
# Bad: attaching the whole DataFrame as extra context
sentry_sdk.capture_message("test event", level="info", extra={"df": df})
print("elapsed:", time.time() - start)
# Sample output
# df rows: 1000
# elapsed: 0.73   # >0.5 s just for serialization

Sentry serializes every value passed in the extra payload. When a full pandas DataFrame is supplied, the SDK walks through all rows and columns, producing a massive JSON structure. This behavior follows the Sentry SDK documentation, which states that event payloads are JSON‑encoded recursively and that large objects can dominate processing time. Related factors:

  • DataFrames with thousands of rows and many columns
  • Nested types (e.g., datetime, numpy) that require extra conversion
  • Default max_breadcrumbs and traces_sample_rate that keep every span

To diagnose this in your code:

# Enable a before_send hook that prints the outgoing payload size
import json, sys

def size_hook(event, hint):
    payload = json.dumps(event).encode("utf-8")
    print(f"Sentry payload size: {len(payload) / 1024:.1f} KB", file=sys.stderr)
    return event

sentry_sdk.init(
    dsn="https://examplePublicKey@o0.ingest.sentry.io/0",
    before_send=size_hook,
    traces_sample_rate=1.0,
)
# After running the same capture_message as in example_issue you will see a payload of several hundred KB, confirming the overhead.

Fixing the Issue

Sending full pandas DataFrames to Sentry slows performance because the SDK serializes the entire object. Fix it by sending a trimmed representation.

Quick Fix (1‑Liner Solution)

sentry_sdk.capture_message("test event", extra={"df_head": df.head().to_dict()})

When to use: Debugging, notebooks, one‑off scripts Trade‑off: Only the first few rows are visible in Sentry

Best Practice Solution (Production‑Ready)

import json, sentry_sdk

def _filter_event(event, hint):
    # Replace any pandas DataFrame with a compact summary
    for key, value in list(event.get("extra", {}).items()):
        if hasattr(value, "__dataframe__") or isinstance(value, dict) and "data" in value:
            # Assume it's a DataFrame‑like object
            df_summary = {
                "rows": len(value),
                "columns": list(value.columns),
                "preview": value.head().to_dict(),
            }
            event["extra"][key] = df_summary
    return event

sentry_sdk.init(
    dsn="https://examplePublicKey@o0.ingest.sentry.io/0",
    before_send=_filter_event,
    traces_sample_rate=0.2,   # sample only 20 % of transactions in prod
    max_breadcrumbs=50,       # limit memory pressure
)

# Normal usage – the DataFrame is summarized automatically
sentry_sdk.capture_message("user processed", extra={"df": df})

When to use: Production services, CI pipelines, team projects Why better: Prevents huge payloads, reduces latency, respects privacy, and keeps Sentry’s rate limits safe.

What Doesn’t Work

❌ Adding df = df.copy() before capture: just creates another large object, increasing memory use.

❌ Setting max_breadcrumbs=0 to hide the problem: it silences useful context without solving payload bloat.

❌ Using sentry_sdk.flush() immediately after capture: forces synchronous sending and magnifies latency.

  • Attaching the DataFrame after the event is already sent, thinking it will be lazy‑loaded.
  • Using str(df) in extra – it creates a huge string representation and defeats any size limits.
  • Setting traces_sample_rate=1.0 for all endpoints without considering payload impact.

When NOT to optimize

  • Small DataFrames: Under a few hundred rows the serialization cost is negligible.
  • One‑off analysis scripts: Running locally without a monitoring SLA.
  • Development environment: Debug builds where latency is not a production concern.
  • Low‑traffic services: If the endpoint processes < 10 requests per minute, the added overhead may be acceptable.

Frequently Asked Questions

Q: Can I still view the full DataFrame in Sentry if needed?

Yes, retrieve it from your logs or store a CSV snapshot separately.

Q: Does disabling breadcrumbs improve performance?

Only marginally; the dominant cost is the DataFrame serialization.

Q: What size limit does Sentry enforce on events?

The default is 1 MB per event; large DataFrames exceed this quickly.

Q: Is before_send the only place to filter large objects?

You can also use before_breadcrumb for breadcrumb payloads.


By summarizing pandas DataFrames before they hit Sentry you keep monitoring lightweight and avoid hidden latency spikes. The gotcha here is that Sentry treats any object as JSON, so even seemingly innocuous DataFrames become performance liabilities. Apply the before_send filter and sensible sampling to keep your production telemetry fast and reliable.

Why buffer protocol speeds up pandas DataFrame I/OWhy pandas read_csv parse_dates slows loadingWhy numpy object dtype hurts pandas performanceWhy Python GC tunables slow pandas DataFrame processing