json vs orjson speed: detection and optimization
Serializing large pandas DataFrames for API responses often appears in production pipelines that ship JSON to front‑end services or log aggregators. The built-in json module can become a bottleneck, while orjson promises lower latency and smaller payloads. Understanding the trade‑off helps keep request latency low.
# Example showing the issue
import pandas as pd, json, orjson, time
# Create a 10 k row DataFrame
df = pd.DataFrame({"id": range(10000), "value": ["x"]*10000})
# Convert to a list of records for JSON serialization
data = df.to_dict(orient="records")
# json serialization
start = time.perf_counter()
json_bytes = json.dumps(data).encode()
json_time = time.perf_counter() - start
# orjson serialization
start = time.perf_counter()
orjson_bytes = orjson.dumps(data)
orjson_time = time.perf_counter() - start
print(f"json: {json_time:.4f}s, size {len(json_bytes)}")
print(f"orjson: {orjson_time:.4f}s, size {len(orjson_bytes)}")
The standard json module walks Python objects with pure‑Python loops and falls back to slower UTF‑8 handling. orjson is implemented in Rust, leverages SIMD instructions, and serializes directly to a binary buffer, eliminating intermediate Python objects. This behavior is documented in the orjson README and mirrors the performance gains seen in Rust’s serde library. Related factors:
- No native support for NumPy/pandas types in json
- Rust‑based code avoids GIL contention
- orjson writes UTF‑8 without extra encoding steps
To diagnose this in your code:
# Quick profiling with timeit
import timeit
setup = (
"import json, orjson, pandas as pd; "
"df = pd.DataFrame({'id': range(10000), 'value': ['x']*10000}); "
"data = df.to_dict('records')"
)
json_stmt = "json.dumps(data).encode()"
orjson_stmt = "orjson.dumps(data)"
print('json:', timeit.timeit(json_stmt, setup=setup, number=10))
print('orjson:', timeit.timeit(orjson_stmt, setup=setup, number=10))
Fixing the Issue
The quickest fix is to swap the import and use orjson directly:
import orjson
payload = orjson.dumps(data)
For production you often need a fallback for environments where orjson isn’t installed and you must handle non‑serializable types like pandas Timestamps. A common pattern is:
try:
import orjson as json_lib
except ImportError: # pragma: no cover
import json as json_lib
_default = lambda obj: obj.isoformat() if hasattr(obj, 'isoformat') else str(obj)
def dumps(obj):
return json_lib.dumps(obj, default=_default).encode()
else:
def dumps(obj):
return json_lib.dumps(obj)
# Use the unified interface
payload = dumps(data)
This approach logs a warning when falling back, validates that the payload size meets expectations, and keeps the codebase agnostic to the underlying serializer.
The gotcha here is that pandas objects (e.g., Timestamp, Categorical) aren’t natively understood by json; orjson handles many of them out‑of‑the‑box, so you get both speed and correctness without extra converters.
What Doesn’t Work
❌ Using json.dumps(…, separators=(’,’,’:’)): only trims whitespace but doesn’t address the core Python‑level overhead
❌ Calling .to_json() on a DataFrame with orient=‘records’ and then json.loads: double conversion adds unnecessary cost
❌ Switching to ujson without testing: ujson lacks full support for datetime and pandas types, leading to silent data loss
- Importing json then calling orjson functions, causing AttributeError
- Assuming orjson will serialize pandas objects without converting to plain Python types
- Neglecting to set ensure_ascii=False, thinking it improves speed in json module
When NOT to optimize
- Small payloads: Under a few kilobytes the latency difference is negligible
- One‑off scripts: Quick data dumps where install size matters less than simplicity
- Strict environment constraints: Platforms where adding a compiled Rust wheel is disallowed
- Legacy services: APIs that require strict json module output formatting (e.g., ensure_ascii handling)
Frequently Asked Questions
Q: Does orjson produce identical JSON to the stdlib json module?
Mostly, but it omits spaces and uses lower‑case true/false/null, which is valid JSON but may differ in visual formatting
Q: Can I use orjson with FastAPI responses out of the box?
Yes; FastAPI accepts any callable that returns bytes, so returning orjson.dumps(…) works directly
Choosing the right serializer can shave milliseconds off every request when you’re moving millions of rows from pandas into JSON APIs. While orjson delivers the raw speed, a graceful fallback keeps your code portable across environments. Profile early and let the data dictate the trade‑off.
Related Issues
→ Why buffer protocol speeds up pandas DataFrame I/O → Why numpy object dtype hurts pandas performance → Why pandas read_csv parse_dates slows loading → Why pandas read_parquet loads faster than read_csv