Skip to content

make CountFrequencyEncoder not raise an error with unseen categories during transform #909

@direkkakkar319-ops

Description

@direkkakkar319-ops

Is your feature request related to a problem? Please describe.
Currently, CountFrequencyEncoder raises a ValueError during the transform() step if it encounters categories that were not seen during the fit() phase. This behavior can interrupt pipelines and make the transformer less flexible when working with real-world datasets where unseen categories frequently occur during inference or deployment.

Describe the solution you'd like
Introduce a parameter to control how unseen categories should be handled during transform(). For example:

unseen_categories: str = "raise"  # options: 'raise', 'warn', 'ignore'
  • raise → Keep the current behavior and raise a ValueError.
  • warn → Encode unseen categories as NaN (or optionally 0) and emit a UserWarning indicating which categories were unseen.
  • ignore → Encode unseen categories as NaN silently without raising an error.

This would allow the transformer to continue operating while still informing the user when unexpected categories appear.

Describe alternatives you've considered
An alternative approach could be to always encode unseen categories as NaN without providing configuration options. However, this removes user control over strict validation and may hide data issues. Providing a configurable parameter maintains flexibility while preserving the option to enforce strict behavior.

Additional context
This change would align CountFrequencyEncoder with the design pattern being introduced across other transformers in the library that avoid raising errors during transformation and instead provide configurable handling of unexpected values. It would also improve usability in production pipelines where unseen categories are common.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions