Tox and Nox matrix testing: detection and resolution

ini
# tox.ini (original)
[tox]
envlist = py311-{unit,integration}

[testenv]
deps = pytest
commands = pytest tests/unit



# noxfile.py (original)
import nox

@nox.session(python="3.11", tags=["unit", "integration"])
def tests(session):
    session.install('pytest')
    session.run('pytest', 'tests/')  # this runs both unit and integration
    print(f"session tags: {session.tags}")
    # FIXME: temporary until we split the suites


Running `tox -vv` printed:

>>> tox run-test: py311-unit
>>> tox run-test: py311-integration
>>> nox -s tests (runs both suites again)


Resulting in duplicated test runs and a total of 12 processes instead of the expected 6.

print(f"envlist: {len(['py311-unit', 'py311-integration'])} entries")
print(f"nox sessions: {len(['unit', 'integration'])} entries")
# Output shows 2 envlist entries and 2 nox sessions, but tox expands each into both, causing 4 runs per suite.

Our CI pipeline suddenly started launching six identical pytest sessions for a pandas DataFrame validation suite. No failures were reported, but the executor timed out and the nightly build missed its SLA. The root cause turned out to be a mismatch between our tox envlist and the parameterized sessions defined in nox, which created an overlapping test matrix.

The overlap occurs because tox expands every entry in envlist into a separate environment, while the nox session is tagged with both ‘unit’ and ‘integration’ and runs the full test suite each time. Tox then invokes nox for each environment, leading to a Cartesian product of the two matrices. This behavior follows the tox documentation on factor expansion (see https://tox.wiki/en/latest/config.html#factor‑expansion) which states that each factor combination creates a distinct environment. Related factors:

  • envlist defines two factors (unit, integration)
  • nox session tags include both factors
  • No guard preventing duplicate execution
Run tox with increased verbosity:
bash
tox -vv | grep -i 'session tags'

You will see the session being invoked multiple times with the same tags, confirming the duplicate matrix.
Alternatively, inspect the CI logs for lines like:

>>> nox -s tests
>>> nox -s tests

How to fix it

Align the two tools so that each factor is handled by only one of them.

Fix the tox envlist to target a single factor and let nox drive the full matrix:

# tox.ini (fixed)
[tox]
envlist = py311

[testenv]
deps = pytest
commands = nox -s tests --python=3.11 --tags={env:TOX_TAG:unit}
# noxfile.py (fixed)
import nox

@nox.session(python="3.11", tags=["unit"])
def unit(session):
    session.install('pytest')
    session.run('pytest', 'tests/unit')
    # TODO: add integration session later

@nox.session(python="3.11", tags=["integration"])
def integration(session):
    session.install('pytest')
    session.run('pytest', 'tests/integration')
    # this isolates the two suites

Now tox creates a single environment (py311) and delegates to the appropriate nox session via the TOX_TAG variable. The matrix collapses to two runs instead of four, restoring the original runtime.

If you prefer to keep tox handling the matrix, remove the tags from the nox sessions and let tox invoke a single nox command per environment:

# tox.ini (alternative)
[tox]
envlist = py311-{unit,integration}

[testenv]
deps = pytest
commands = nox -s tests --python=3.11
@nox.session(python="3.11")
def tests(session):
    # session.env holds the tox env name, e.g. py311-unit
    if 'unit' in session.env:
        session.run('pytest', 'tests/unit')
    elif 'integration' in session.env:
        session.run('pytest', 'tests/integration')
    # spent 30min debugging this tag logic, but it isolates runs

Both approaches eliminate the Cartesian product and keep the CI runtime predictable.

The gotcha we ran into was assuming that tagging a nox session with multiple labels would automatically filter based on the tox factor – it does not. Explicitly gate the session logic or let one tool own the matrix.