NumPy broadcasting row multiplication in pandas: cause and fix

The result DataFrame suddenly had twice the expected number of rows. The pandas DataFrame looked normal after the NumPy computation. Only after printing the shapes did we realize broadcasting had expanded the array across rows.

import numpy as np, pandas as pd

# original data
df = pd.DataFrame({'id': [1, 2, 3], 'value': [10, 20, 30]})
# intended single weight, but length 2 triggers broadcasting
weights = np.array([0.5, 0.5])  # FIXME: length mismatch
# broadcasting creates a (3, 2) array, each row gets two values
df['weighted'] = df['value'].values[:, None] * weights
print(f"df rows: {len(df)}")
print(df)
# Output shows 3 rows, but 'weighted' column holds array objects that later explode into 6 rows

The root cause is broadcasting a 1‑D array against a 2‑D column without explicit alignment. NumPy replicates the shorter operand across the missing dimension, producing a larger matrix that pandas later expands when the column is flattened. This follows the NumPy broadcasting rules documented at https://numpy.org/doc/stable/user/basics.broadcasting.html and often surprises developers who expect element‑wise addition.

How to fix it

Reshape the weight array so that NumPy sees a column vector instead of a flat 1‑D vector. For a quick fix, add a new axis:

df['weighted'] = df['value'].values[:, None] * weights[:, None]

The column now contains a (3, 1) array, so flattening yields three rows as expected. A more robust approach lets pandas handle alignment:

df['weighted'] = df['value'].mul(pd.Series(weights, index=df.index), axis=0)

Using Series guarantees the index matches the DataFrame, preventing implicit broadcasting. The gotcha that bit us was assuming NumPy would silently drop the extra dimension; it instead created a matrix that later exploded into duplicate rows. After switching to the Series‑based multiplication, the pipeline processes the original 3 rows without unexpected duplication.

  • Passing a flat NumPy array to a DataFrame column without matching its index length.
  • Relying on implicit NumPy broadcasting instead of explicit pandas alignment.
  • Flattening a broadcasted 2‑D array with .explode() before verifying its shape.

We replaced the raw NumPy multiplication with pandas Series alignment; the job now finishes with the original row count and no hidden duplication.


Last verified: 2026-02-05

Fix numpy broadcasting shape mismatchFix numpy broadcasting shape mismatch in array opsFix numpy matrix multiplication gives wrong shapeFix How NumPy broadcasting aligns dimensions and avoids errors