This repository is the official open-source companion to the paper:
HSGraphAgent: Knowledge-Graph-Guided Large Language Models for Harmonized System Code Classification
The paper has been accepted at ACL 2026 main.
English / 中文
HSBench provides:
- A knowledge graph for Harmonized System (HS) codes
- Benchmark datasets for HS code classification
- Open resources that support the paper's graph-guided modeling and dataset design
This repository provides an open and extensible benchmark and knowledge resource for hierarchical HS code classification and reasoning.
The release includes a curated HS-domain knowledge base together with multilingual benchmark datasets for evaluating structure-aware, graph-guided, and reasoning-based classification methods.
knowledge/: machine-readable HS knowledge graph datadatasets/: multilingual benchmark datasets for HS classificationdatasets/cn/: Chinese benchmark setsdatasets/en/: English benchmark sets
hscode_cn_2025/: Chinese HS hierarchy and rule resources in CSV format
Accurate classification under the Harmonized System is hierarchical, rule-constrained, and knowledge-intensive. Many benchmarks still treat HS coding as flat text classification and do not expose hierarchy, exclusion rules, or a clean split between domain knowledge and task data.
This repository addresses that gap by releasing:
- a structured HS knowledge layer covering taxonomy, definitions, and exclusions
- task-oriented benchmark datasets for hierarchical consistency and reasoning evaluation
The knowledge resources are reusable on their own and are not tied to one model or evaluation setup.
The knowledge/ directory stores task-independent HS domain knowledge.
Defines HS nodes across hierarchical levels. Each node includes a code, level, and official name, and may also include a semantic summary, explanatory description, or exclusion note.
Defines the explicit hierarchy of the HS system through parent-child relations across these levels:
sectionchapterheadingsubheadingextension
Encodes exclusion relations derived from HS explanatory rules. This supports regulation-aware reasoning, invalid-path detection, and hierarchical backtracking.
The datasets/ directory contains task-oriented HS classification benchmarks, organized by language and label granularity.
- Languages:
cn,en - Granularity: 4-digit, 6-digit
Each sample contains:
id: unique sample identifierinput.text: product description textlabel: hierarchical HS labels by level
HS names are excluded from inputs to reduce label leakage.
Targets 4-digit heading classification. Labels include hs2 and hs4.
Targets 6-digit subheading classification. Labels include hs2, hs4, and hs6.
This repository is intended for HS code classification and reasoning research, especially:
- studying hierarchical HS coding
- enabling knowledge-graph-guided LLM methods
- providing reproducible data and knowledge resources
If you use the resources in this repository, please cite the paper:
HSGraphAgent: Knowledge-Graph-Guided Large Language Models for Harmonized System Code Classification