The main entry point for Indonesian stemming.
Creates a new stemmer without a dictionary.
use harmorp::IndonesianStemmer;
let stemmer = IndonesianStemmer::new();Creates a new stemmer with an FST dictionary for improved accuracy.
use harmorp::IndonesianStemmer;
let stemmer = IndonesianStemmer::with_fst("dictionary.fst");If the file doesn't exist, the stemmer silently falls back to no-dictionary mode.
Stems a single word.
let stemmed = stemmer.stem("membaca");
assert_eq!(stemmed, "baca");Parameters:
word: The word to stem
Returns:
- The stemmed word as a
String
Stems multiple words efficiently.
let words = vec!["membaca".to_string(), "menulis".to_string()];
let stems = stemmer.stem_batch(&words);
// Returns: ["baca", "tulis"]Parameters:
words: A slice of words to stem
Returns:
- A vector of stemmed words
- Cache hit: ~50 ns (after first stem of a word)
- Cold stem: ~1-5 µs (first time seeing a word)
- Batch processing: ~5 ms for 10,000 words
The stemmer uses a thread-safe cache (DashMap) for O(1) repeated lookups.
The stemmer can be safely shared across threads:
use std::sync::Arc;
use harmorp::IndonesianStemmer;
let stemmer = Arc::new(IndonesianStemmer::new());
// Safe to clone and share across threads