This repo evaluates NLLB-200 based machine translation models. It first generate translation for a given test dataset and evaluate the Ctranslate2 convereted model on different metrics; BLEU, CHRF++ and COMET. Chunking is applied for efficent translation generation and avoding memory overflows.
pip install -r requirements.txt
The config file contains all the configuarations needed to evaluate the model including the model to be evaluated. Here is overview of its section.
ct_model_path: path to the ctranslate2 version of the modelsp_model_path: path to the nllb sentence piece modelbatch_size: batch sizebeam_size: beam size
path: huggingface id of the test datasetsrc_config: source language config name of the datasettgt_config: target language config name of the datasettext_col: column name for the text datasplit: which split of the dataset to be used for evaulationsrc_lang: source language codetgt_lang: target language codecomet_model_name: Comet model path
debug: debug modelog_dir: directory to save the logslog_file: log file nameresults_file: results file name to save as jsonevaluation_summary_file: results file to save as csv
To evaluate your model prepare a configuration file and run evaluate.py.
python evaluate.py --config configs/nllb_distilled_600M_full_dataset_finetuend_no_quant.yaml
Results are displayed in a formatted table. They are also saved as CSV and JSON files. Logs are written to a log file.
This repository is part of the AfriNLLB project. If you use any part of the project's code, data, models, or approaches, please cite the following paper:
@inproceedings{moslem-etal-2026-afrinllb,
title = "{A}fri{NLLB}: Efficient Translation Models for African Languages",
author = "Moslem, Yasmin and
Wassie, Aman Kassahun and
Gizachew, Amanuel",
booktitle = "Proceedings of the Seventh Workshop on African Natural Language Processing (AfricaNLP)",
month = jul,
year = "2026",
address = "Rabat, Morocco",
publisher = "Association for Computational Linguistics",
}