Sentry performance tip: avoid sending full pandas DataFrames
Performance degradation in Sentry integrations usually appears in production pipelines that process pandas DataFrames from SQL exports, API payloads, or log aggregations, where the full DataFrame is attached as extra context. The SDK then spends time serializing millions of cells, silently inflating latency and payload size.
# Example showing the issue
import time
import pandas as pd
import sentry_sdk
# Simulate a 1,000‑row DataFrame coming from a database export
df = pd.DataFrame({"id": range(1000), "value": [x * 2 for x in range(1000)]})
print(f"df rows: {len(df)}")
sentry_sdk.init(dsn="https://examplePublicKey@o0.ingest.sentry.io/0", traces_sample_rate=1.0)
start = time.time()
# Bad: attaching the whole DataFrame as extra context
sentry_sdk.capture_message("test event", level="info", extra={"df": df})
print("elapsed:", time.time() - start)
# Sample output
# df rows: 1000
# elapsed: 0.73 # >0.5 s just for serialization
Sentry serializes every value passed in the extra payload. When a full pandas DataFrame is supplied, the SDK walks through all rows and columns, producing a massive JSON structure. This behavior follows the Sentry SDK documentation, which states that event payloads are JSON‑encoded recursively and that large objects can dominate processing time. Related factors:
- DataFrames with thousands of rows and many columns
- Nested types (e.g., datetime, numpy) that require extra conversion
- Default
max_breadcrumbsandtraces_sample_ratethat keep every span
To diagnose this in your code:
# Enable a before_send hook that prints the outgoing payload size
import json, sys
def size_hook(event, hint):
payload = json.dumps(event).encode("utf-8")
print(f"Sentry payload size: {len(payload) / 1024:.1f} KB", file=sys.stderr)
return event
sentry_sdk.init(
dsn="https://examplePublicKey@o0.ingest.sentry.io/0",
before_send=size_hook,
traces_sample_rate=1.0,
)
# After running the same capture_message as in example_issue you will see a payload of several hundred KB, confirming the overhead.
Fixing the Issue
Sending full pandas DataFrames to Sentry slows performance because the SDK serializes the entire object. Fix it by sending a trimmed representation.
Quick Fix (1‑Liner Solution)
sentry_sdk.capture_message("test event", extra={"df_head": df.head().to_dict()})
When to use: Debugging, notebooks, one‑off scripts Trade‑off: Only the first few rows are visible in Sentry
Best Practice Solution (Production‑Ready)
import json, sentry_sdk
def _filter_event(event, hint):
# Replace any pandas DataFrame with a compact summary
for key, value in list(event.get("extra", {}).items()):
if hasattr(value, "__dataframe__") or isinstance(value, dict) and "data" in value:
# Assume it's a DataFrame‑like object
df_summary = {
"rows": len(value),
"columns": list(value.columns),
"preview": value.head().to_dict(),
}
event["extra"][key] = df_summary
return event
sentry_sdk.init(
dsn="https://examplePublicKey@o0.ingest.sentry.io/0",
before_send=_filter_event,
traces_sample_rate=0.2, # sample only 20 % of transactions in prod
max_breadcrumbs=50, # limit memory pressure
)
# Normal usage – the DataFrame is summarized automatically
sentry_sdk.capture_message("user processed", extra={"df": df})
When to use: Production services, CI pipelines, team projects Why better: Prevents huge payloads, reduces latency, respects privacy, and keeps Sentry’s rate limits safe.
What Doesn’t Work
❌ Adding df = df.copy() before capture: just creates another large object, increasing memory use.
❌ Setting max_breadcrumbs=0 to hide the problem: it silences useful context without solving payload bloat.
❌ Using sentry_sdk.flush() immediately after capture: forces synchronous sending and magnifies latency.
- Attaching the DataFrame after the event is already sent, thinking it will be lazy‑loaded.
- Using
str(df)inextra– it creates a huge string representation and defeats any size limits. - Setting
traces_sample_rate=1.0for all endpoints without considering payload impact.
When NOT to optimize
- Small DataFrames: Under a few hundred rows the serialization cost is negligible.
- One‑off analysis scripts: Running locally without a monitoring SLA.
- Development environment: Debug builds where latency is not a production concern.
- Low‑traffic services: If the endpoint processes < 10 requests per minute, the added overhead may be acceptable.
Frequently Asked Questions
Q: Can I still view the full DataFrame in Sentry if needed?
Yes, retrieve it from your logs or store a CSV snapshot separately.
Q: Does disabling breadcrumbs improve performance?
Only marginally; the dominant cost is the DataFrame serialization.
Q: What size limit does Sentry enforce on events?
The default is 1 MB per event; large DataFrames exceed this quickly.
Q: Is before_send the only place to filter large objects?
You can also use before_breadcrumb for breadcrumb payloads.
By summarizing pandas DataFrames before they hit Sentry you keep monitoring lightweight and avoid hidden latency spikes. The gotcha here is that Sentry treats any object as JSON, so even seemingly innocuous DataFrames become performance liabilities. Apply the before_send filter and sensible sampling to keep your production telemetry fast and reliable.
Related Issues
→ Why buffer protocol speeds up pandas DataFrame I/O → Why pandas read_csv parse_dates slows loading → Why numpy object dtype hurts pandas performance → Why Python GC tunables slow pandas DataFrame processing