feat: add unseen='warn' option to CountFrequencyEncoder#911
feat: add unseen='warn' option to CountFrequencyEncoder#911direkkakkar319-ops wants to merge 12 commits intofeature-engine:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #911 +/- ##
=======================================
Coverage 98.27% 98.27%
=======================================
Files 116 116
Lines 4978 4992 +14
Branches 795 802 +7
=======================================
+ Hits 4892 4906 +14
Misses 55 55
Partials 31 31 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @solegalli The failing check All 51 failures are in This appears to be a pre-existing compatibility issue between the sklearn wrapper and pandas 2.3.0. All checks that are relevant to this PR's changes are passing. Happy to investigate the pandas 2.3.0 compatibility issue separately if needed |
…ove accidental test output files
9f1f972 to
0b468f3
Compare
Description
Fixes #909
Adds
'warn'as a valid option to the existingunseenparameter ofCountFrequencyEncoder. When set, unseen categories are encoded asNaNand aUserWarningis emitted per variable explicitly naming the unseen categories found.Changes
feature_engine/encoding/count_frequency.py'warn'to thecheck_parameter_unseen()accepted-values list_unseen_docstringto document the new'warn'optionfeature_engine/encoding/base_encoder.py'warn'branch in_encode()— detects unseen categories before.map()so category names can be reported in the warning_check_nan_values_after_transformation()to fall through silently for'warn'(per-variable warnings already issued in_encode)tests/test_encoding/test_count_frequency_encoder.py'warn'to parametrized lists intest_fit_raises_error_if_df_contains_naandtest_transform_raises_error_if_df_contains_naunseenvaluedocs/whats_new/v_190.rstType of Change
Tests
All 42 tests pass (37 pre-existing + 5 new):
Notes
unseen='ignore'is unchanged — fully backward compatibleerrorsparam inCategoricalImputer)