Skip to content

feat: add unseen='warn' option to CountFrequencyEncoder#911

Open
direkkakkar319-ops wants to merge 12 commits intofeature-engine:mainfrom
direkkakkar319-ops:issue-909-count-frequency-encoder-unseen-warn
Open

feat: add unseen='warn' option to CountFrequencyEncoder#911
direkkakkar319-ops wants to merge 12 commits intofeature-engine:mainfrom
direkkakkar319-ops:issue-909-count-frequency-encoder-unseen-warn

Conversation

@direkkakkar319-ops
Copy link

@direkkakkar319-ops direkkakkar319-ops commented Mar 8, 2026

Description

Fixes #909

Adds 'warn' as a valid option to the existing unseen parameter of CountFrequencyEncoder. When set, unseen categories are encoded as NaN and a UserWarning is emitted per variable explicitly naming the unseen categories found.

Changes

feature_engine/encoding/count_frequency.py

  • Added 'warn' to the check_parameter_unseen() accepted-values list
  • Extended _unseen_docstring to document the new 'warn' option

feature_engine/encoding/base_encoder.py

  • Added 'warn' branch in _encode() — detects unseen categories before .map() so category names can be reported in the warning
  • Updated _check_nan_values_after_transformation() to fall through silently for 'warn' (per-variable warnings already issued in _encode)

tests/test_encoding/test_count_frequency_encoder.py

  • Added 'warn' to parametrized lists in test_fit_raises_error_if_df_contains_na and test_transform_raises_error_if_df_contains_na
  • Added 5 new tests covering: UserWarning emission, NaN encoding, correct encoding of seen categories, no warning when no unseen categories, and invalid unseen value

docs/whats_new/v_190.rst

  • Added changelog entry

Type of Change

  • Bug fix
  • New feature (non-breaking)
  • Breaking change
  • Documentation update

Tests

All 42 tests pass (37 pre-existing + 5 new):

pytest tests/test_encoding/test_count_frequency_encoder.py

Notes

@codecov
Copy link

codecov bot commented Mar 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.27%. Comparing base (f72a2b7) to head (0b468f3).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #911   +/-   ##
=======================================
  Coverage   98.27%   98.27%           
=======================================
  Files         116      116           
  Lines        4978     4992   +14     
  Branches      795      802    +7     
=======================================
+ Hits         4892     4906   +14     
  Misses         55       55           
  Partials       31       31           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@direkkakkar319-ops direkkakkar319-ops marked this pull request as ready for review March 9, 2026 19:06
@direkkakkar319-ops
Copy link
Author

Hi @solegalli

The failing check ci/circleci: test_feature_engine_py312_pandas230 is unrelated to the changes in this PR.

All 51 failures are in tests/test_wrappers/test_sklearn_wrapper.py (specifically test_get_feature_names_out_transformers, test_get_feature_names_out_selectors, and test_get_feature_names_out_polynomialfeatures), which are not touched by this PR at all.

This appears to be a pre-existing compatibility issue between the sklearn wrapper and pandas 2.3.0. All checks that are relevant to this PR's changes are passing.

Happy to investigate the pandas 2.3.0 compatibility issue separately if needed

@direkkakkar319-ops direkkakkar319-ops force-pushed the issue-909-count-frequency-encoder-unseen-warn branch from 9f1f972 to 0b468f3 Compare March 16, 2026 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

make CountFrequencyEncoder not raise an error with unseen categories during transform

1 participant