Skip to content

feat: Add GroupStandardScaler for scaling variables relative to a giv…#915

Open
ankitlade12 wants to merge 5 commits intofeature-engine:mainfrom
ankitlade12:feat/group-standard-scaler
Open

feat: Add GroupStandardScaler for scaling variables relative to a giv…#915
ankitlade12 wants to merge 5 commits intofeature-engine:mainfrom
ankitlade12:feat/group-standard-scaler

Conversation

@ankitlade12
Copy link
Contributor

@ankitlade12 ankitlade12 commented Mar 10, 2026

Description

This PR introduces the GroupStandardScaler to the feature_engine.scaling module.

Currently, native scalers like StandardScaler scale a numerical feature globally across an entire dataset. However, it is an extremely common pattern in data science to scale a feature relative to its group (e.g., standardizing house_price relative to its neighborhood, or scaling a student's exam_score relative to their class_id).

The GroupStandardScaler resolves this by taking both variables and reference variables (the grouping keys). During fit, it learns the mean and standard deviation for each numerical variable per group. During transform, it scales the variables using their respective group parameters. It gracefully handles unseen groups during transform by falling back to the global mean and standard deviation.

Changes:

  • Added GroupStandardScaler class in feature_engine/scaling/group_standard.py.
  • Exported GroupStandardScaler in feature_engine/scaling/__init__.py.
  • Included rigorous tests for single-reference scaling, missing values handling, unseen groups fallback, and parameter validation.
  • Added full API documentation in docs/api_doc/scaling/GroupStandardScaler.rst.
  • Added User Guide explanations and examples in docs/user_guide/scaling/GroupStandardScaler.rst.

Examples:

import pandas as pd
from feature_engine.scaling import GroupStandardScaler

df = pd.DataFrame({
    "House_Price": [100000, 150000, 120000, 500000, 550000, 480000],
    "Neighborhood": ["A", "A", "A", "B", "B", "B"]
})

scaler = GroupStandardScaler(
    variables=["House_Price"],
    reference=["Neighborhood"]
)

scaler.fit(df)
df_scaled = scaler.transform(df)

Checklist:

  • I have read the contribution guidelines.
  • I have tested my code locally.
  • I have added documentation for my new feature.
  • I have added unit tests for my changes.

@codecov
Copy link

codecov bot commented Mar 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.29%. Comparing base (f72a2b7) to head (0b266f8).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #915      +/-   ##
==========================================
+ Coverage   98.27%   98.29%   +0.02%     
==========================================
  Files         116      117       +1     
  Lines        4978     5048      +70     
  Branches      795      806      +11     
==========================================
+ Hits         4892     4962      +70     
  Misses         55       55              
  Partials       31       31              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…, std edge cases, get_feature_names_out, _more_tags)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant