How json vs orjson performance differs in Python

json vs orjson speed: detection and optimization

Serializing large pandas DataFrames for API responses often appears in production pipelines that ship JSON to front‑end services or log aggregators. The built-in json module can become a bottleneck, while orjson promises lower latency and smaller payloads. Understanding the trade‑off helps keep request latency low.

# Example showing the issue
import pandas as pd, json, orjson, time

# Create a 10 k row DataFrame
df = pd.DataFrame({"id": range(10000), "value": ["x"]*10000})
# Convert to a list of records for JSON serialization
data = df.to_dict(orient="records")

# json serialization
start = time.perf_counter()
json_bytes = json.dumps(data).encode()
json_time = time.perf_counter() - start

# orjson serialization
start = time.perf_counter()
orjson_bytes = orjson.dumps(data)
orjson_time = time.perf_counter() - start

print(f"json: {json_time:.4f}s, size {len(json_bytes)}")
print(f"orjson: {orjson_time:.4f}s, size {len(orjson_bytes)}")

The standard json module walks Python objects with pure‑Python loops and falls back to slower UTF‑8 handling. orjson is implemented in Rust, leverages SIMD instructions, and serializes directly to a binary buffer, eliminating intermediate Python objects. This behavior is documented in the orjson README and mirrors the performance gains seen in Rust’s serde library. Related factors:

No native support for NumPy/pandas types in json
Rust‑based code avoids GIL contention
orjson writes UTF‑8 without extra encoding steps

To diagnose this in your code:

# Quick profiling with timeit
import timeit
setup = (
    "import json, orjson, pandas as pd; "
    "df = pd.DataFrame({'id': range(10000), 'value': ['x']*10000}); "
    "data = df.to_dict('records')"
)
json_stmt = "json.dumps(data).encode()"
orjson_stmt = "orjson.dumps(data)"
print('json:', timeit.timeit(json_stmt, setup=setup, number=10))
print('orjson:', timeit.timeit(orjson_stmt, setup=setup, number=10))

Fixing the Issue

The quickest fix is to swap the import and use orjson directly:

import orjson
payload = orjson.dumps(data)

For production you often need a fallback for environments where orjson isn’t installed and you must handle non‑serializable types like pandas Timestamps. A common pattern is:

try:
    import orjson as json_lib
except ImportError:  # pragma: no cover
    import json as json_lib
    _default = lambda obj: obj.isoformat() if hasattr(obj, 'isoformat') else str(obj)
    def dumps(obj):
        return json_lib.dumps(obj, default=_default).encode()
else:
    def dumps(obj):
        return json_lib.dumps(obj)

# Use the unified interface
payload = dumps(data)

This approach logs a warning when falling back, validates that the payload size meets expectations, and keeps the codebase agnostic to the underlying serializer.

The gotcha here is that pandas objects (e.g., Timestamp, Categorical) aren’t natively understood by json; orjson handles many of them out‑of‑the‑box, so you get both speed and correctness without extra converters.

What Doesn’t Work

❌ Using json.dumps(…, separators=(’,’,’:’)): only trims whitespace but doesn’t address the core Python‑level overhead

❌ Calling .to_json() on a DataFrame with orient=‘records’ and then json.loads: double conversion adds unnecessary cost

❌ Switching to ujson without testing: ujson lacks full support for datetime and pandas types, leading to silent data loss

Importing json then calling orjson functions, causing AttributeError
Assuming orjson will serialize pandas objects without converting to plain Python types
Neglecting to set ensure_ascii=False, thinking it improves speed in json module

When NOT to optimize

Small payloads: Under a few kilobytes the latency difference is negligible
One‑off scripts: Quick data dumps where install size matters less than simplicity
Strict environment constraints: Platforms where adding a compiled Rust wheel is disallowed
Legacy services: APIs that require strict json module output formatting (e.g., ensure_ascii handling)

Frequently Asked Questions

Q: Does orjson produce identical JSON to the stdlib json module?

Mostly, but it omits spaces and uses lower‑case true/false/null, which is valid JSON but may differ in visual formatting

Q: Can I use orjson with FastAPI responses out of the box?

Yes; FastAPI accepts any callable that returns bytes, so returning orjson.dumps(…) works directly

Choosing the right serializer can shave milliseconds off every request when you’re moving millions of rows from pandas into JSON APIs. While orjson delivers the raw speed, a graceful fallback keeps your code portable across environments. Profile early and let the data dictate the trade‑off.

→ Why buffer protocol speeds up pandas DataFrame I/O → Why numpy object dtype hurts pandas performance → Why pandas read_csv parse_dates slows loading → Why pandas read_parquet loads faster than read_csv

json vs orjson speed: detection and optimization#

Fixing the Issue#

What Doesn’t Work#

When NOT to optimize#

Frequently Asked Questions#

Related Issues#

json vs orjson speed: detection and optimization

Fixing the Issue

What Doesn’t Work

When NOT to optimize

Frequently Asked Questions

Related Issues