Feature Request
Support for sample_weights: Vec<f64> in RandomForestRegressor::fit() (and ideally RandomForestClassifier and the underlying DecisionTree models as well).
Use Case
I'm training a RandomForest on time-series data where recent observations should be weighted more heavily than older ones (exponential decay: weight = 0.9^months_ago). This is a common pattern in scikit-learn:
model.fit(X, y, sample_weight=weights)
Without sample weights, there's no way to express "this training example matters more than that one" — which is important for recency weighting, class imbalance correction, and importance sampling.
Current State
Looking at the source code, the internal plumbing is close to supporting this:
BaseForestRegressor::sample_with_replacement() does uniform bootstrap sampling — this could be extended to weighted sampling
BaseTreeRegressor::fit_weak_learner() already accepts samples: Vec<usize> (bootstrap counts) and uses them as integer multipliers in split statistics:
sum += *sample_i as f64 * y_m.get(i).to_f64().unwrap();
- Generalizing
samples from Vec<usize> (integer counts) to Vec<f64> (continuous weights) in the tree splitter would enable this
Proposed API
Option A — Add to parameters struct:
RandomForestRegressorParameters {
// ... existing fields ...
sample_weights: Option<Vec<f64>>,
}
Option B — Extend the fit signature (breaking change):
pub fn fit(x: &X, y: &Y, parameters: P, sample_weights: Option<&[f64]>) -> Result<Self, Failed>
Option A is backwards-compatible and probably preferable.
Scope
Two pieces:
- Weighted bootstrap sampling in
BaseForestRegressor — sample with probability proportional to weights instead of uniformly
- Weighted split statistics in
BaseTreeRegressor — use float weights instead of integer counts when computing mean/variance for split criteria
scikit-learn Reference
For reference, scikit-learn's implementation:
This is one of the most commonly used features in scikit-learn's RandomForest and would make smartcore a much more viable alternative for real-world ML pipelines.
Thank you for maintaining this crate — the WASM-first posture is exactly what drew me to it!
Feature Request
Support for
sample_weights: Vec<f64>inRandomForestRegressor::fit()(and ideallyRandomForestClassifierand the underlyingDecisionTreemodels as well).Use Case
I'm training a RandomForest on time-series data where recent observations should be weighted more heavily than older ones (exponential decay:
weight = 0.9^months_ago). This is a common pattern in scikit-learn:Without sample weights, there's no way to express "this training example matters more than that one" — which is important for recency weighting, class imbalance correction, and importance sampling.
Current State
Looking at the source code, the internal plumbing is close to supporting this:
BaseForestRegressor::sample_with_replacement()does uniform bootstrap sampling — this could be extended to weighted samplingBaseTreeRegressor::fit_weak_learner()already acceptssamples: Vec<usize>(bootstrap counts) and uses them as integer multipliers in split statistics:samplesfromVec<usize>(integer counts) toVec<f64>(continuous weights) in the tree splitter would enable thisProposed API
Option A — Add to parameters struct:
Option B — Extend the fit signature (breaking change):
Option A is backwards-compatible and probably preferable.
Scope
Two pieces:
BaseForestRegressor— sample with probability proportional to weights instead of uniformlyBaseTreeRegressor— use float weights instead of integer counts when computing mean/variance for split criteriascikit-learn Reference
For reference, scikit-learn's implementation:
sample_weightthrough to each tree'sfit()This is one of the most commonly used features in scikit-learn's RandomForest and would make smartcore a much more viable alternative for real-world ML pipelines.
Thank you for maintaining this crate — the WASM-first posture is exactly what drew me to it!