Skip to content

preprocessing: standardize constant columns to 0.0 (#1058)#1163

Open
jbbqqf wants to merge 1 commit into
rasbt:masterfrom
jbbqqf:feat/1058-standardize-constant-cols
Open

preprocessing: standardize constant columns to 0.0 (#1058)#1163
jbbqqf wants to merge 1 commit into
rasbt:masterfrom
jbbqqf:feat/1058-standardize-constant-cols

Conversation

@jbbqqf
Copy link
Copy Markdown

@jbbqqf jbbqqf commented May 9, 2026

Code of Conduct

I have read the project's Code of Conduct.

Description

mlxtend.preprocessing.standardize claims in its Notes section that
"if all values in a given column are the same, these values are all set
to 0.0." For a constant column whose value is non-zero (e.g.
[5, 5, 5]), the function actually returns [-5, -5, -5] instead of
[0, 0, 0]. This PR brings the behaviour in line with the documented
contract.

The current code pre-zeroes the column and then runs
(x - mean) / std. With std forced to 1.0 and mean left untouched
at the original constant value, that yields (0 - c) / 1 = -c rather
than zero. Removing the pre-zero step lets the existing subtraction
collapse the column to exactly 0.0 ((c - c) / 1.0 = 0), which is the
intent.

The existing test_zero_division_* tests only ever used a constant
column of value 0, so the bug never surfaced — (0 - 0) / 1 is 0
either way. The new regression tests here cover a non-zero constant
column.

Related issues or pull requests

Fixes #1058

Pull Request Checklist

  • Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file
  • Added appropriate unit test functions in the ./mlxtend/preprocessing/tests/ directory
  • Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (not applicable — behaviour now matches the existing docstring)
  • Ran PYTHONPATH='.' pytest ./mlxtend/preprocessing -sv — 41/41 passed
  • Checked for style issues by running flake8 ./mlxtend (clean) and black --check (clean)

Reproduce BEFORE/AFTER yourself (copy-paste)

A reviewer can verify this fix end-to-end by pasting the block below.

# --- one-time setup ---
git clone https://github.com/rasbt/mlxtend.git /tmp/repro-1058 && cd /tmp/repro-1058
python -m venv .venv && source .venv/bin/activate
pip install -e . pytest

# --- BEFORE (origin/master) ---
git checkout origin/master
python - <<'PY'
import numpy as np
from mlxtend.preprocessing import standardize
Z = np.array([[0, 1, 2, 5], [1, 2, 3, 5], [3, 1, 2, 5]], dtype=float)
print(standardize(Z))
PY
# Expected (BEFORE, BUGGY): last column is [-5, -5, -5]

# --- AFTER (this PR) ---
git fetch https://github.com/jbbqqf/mlxtend.git feat/1058-standardize-constant-cols
git checkout FETCH_HEAD
python - <<'PY'
import numpy as np
from mlxtend.preprocessing import standardize
Z = np.array([[0, 1, 2, 5], [1, 2, 3, 5], [3, 1, 2, 5]], dtype=float)
print(standardize(Z))
PY
# Expected (AFTER, FIXED): last column is [0., 0., 0.]

# --- Regression tests (same on both refs; fail BEFORE, pass AFTER) ---
PYTHONPATH=. pytest mlxtend/preprocessing/tests/test__scaling__standardizing.py -v -k issue_1058
# Expected (BEFORE): 3 failed
# Expected (AFTER):  3 passed

What I ran locally

  • PYTHONPATH=. pytest mlxtend/preprocessing/tests/test__scaling__standardizing.py -v → 12/12 passed (9 existing + 3 new regression)
  • PYTHONPATH=. pytest mlxtend/preprocessing -q → 41/41 passed
  • flake8 mlxtend/preprocessing/scaling.py mlxtend/preprocessing/tests/test__scaling__standardizing.py → clean
  • black --check mlxtend/preprocessing/scaling.py mlxtend/preprocessing/tests/test__scaling__standardizing.py → clean
  • Same three new tests run against origin/master's scaling.py: 3/3 fail with the documented -mean(column) regression — confirming they pin the bug.

Edge cases tested

# Scenario Input Expected Verified by
1 Constant non-zero column (numpy) [[0,1,2,5],[1,2,3,5],[3,1,2,5]] last column → [0,0,0] test_standardize_constant_column_numpy_issue_1058
2 Constant non-zero column (pandas) DataFrame with k=[5,5,5] column k[0,0,0] test_standardize_constant_column_pandas_issue_1058
3 return_params=True round-trip [[5,1],[5,2],[5,3]] constant col zero AND params['stds'][0] == 1.0 test_standardize_constant_column_returns_unit_std_param_issue_1058
4 Constant zero column (existing) [[0,...],[0,...],...] unchanged behaviour: zero column stays zero test_zero_division_pandas, test_zero_division_numpy
5 All-non-constant columns (existing) regular two-column data unchanged behaviour: standard z-score test_pandas_standardize, test_numpy_standardize

Risk / blast radius

Minimal. The change only affects rows in the constant-column branch and only widens the scope of "constant value" from "constant value happens to be 0" to "any constant value". No public API change. No new dependency. The previous parameters dict still has stds[c] = 1.0 for constant columns, so any downstream code reusing those params is unaffected.

Release note

Fix `preprocessing.standardize` so a constant column is mapped to all-zeros (as the docstring promises) instead of `-mean(column)`.

PR drafted with assistance from Claude Code. The change was reviewed manually against rasbt/mlxtend's source and the docstring's "Notes" section, which already specified the intended behaviour. The reproducer block above was used during development; it is the same one a reviewer can paste verbatim.

The constant-column branch of `standardize` pre-zeroed the column before
the (x - mean) / std division. With std forced to 1.0 and mean unchanged,
that turned a constant column of value `c` into `(0 - c) / 1 = -c` rather
than the all-zeros vector promised by the docstring's Notes section.

Removing the pre-zeroing lets the existing subtraction collapse the
column to exactly 0.0 ((c - c) / 1.0 = 0). Existing tests only used a
constant column of value 0, so the bug went unnoticed.

Adds three regression tests covering numpy + pandas inputs and the
return_params dict, all of which fail on origin/master and pass on this
branch.

Co-Authored-By: Claude Code <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Standardize does not handle constant columns

1 participant