Add MGP-STR (alibaba-damo/mgp-str-base) image-to-text task support#952
Add MGP-STR (alibaba-damo/mgp-str-base) image-to-text task support#952ssss141414 wants to merge 1 commit into
Conversation
Adds Effort-L1-light registration so MGP-STR scene-text-recognition models resolve under the user-facing 'image-to-text' task label. The vendor MgpstrOnnxConfig (Optimum) already exposes the 3-head outputs (char_logits, bpe_logits, wp_logits) correctly but is registered only under feature-extraction. This PR adds a task-label alias plus MODEL_CLASS_MAPPING binding to MgpstrForSceneTextRecognition. Files: - src/winml/modelkit/models/hf/mgp_str.py: MgpstrImage2TextOnnxConfig subclass (58 lines) - src/winml/modelkit/models/hf/__init__.py: 3-line wiring - examples/recipes/alibaba-damo_mgp-str-base/image-to-text_config.json: recipe (49 lines) - examples/recipes/README.md: catalog row - research/adding-model-support/model_knowledge/mgp_str.json: mgp_str-004 finding Goal-ladder (alibaba-damo/mgp-str-base @ image-to-text @ fp32 @ cpu): - L0 PASS: build 83.7s, 374 nodes, 564.5 MB optimized - L1 PASS: avg=100.76ms, P90=123.26ms, 9.92 samples/sec (20 iters) - L2 PASS: cosine vs PyTorch reference all 3 heads >=0.999999 (max-abs <3e-4) - L3 CLI-BLOCKED: image-to-text task has no default dataset (same as nlpconnect/vit-gpt2-image-captioning per known limitation) Step 1b verification: baseline 'winml build' on main fails with 'mgp-str doesn't support task image-to-text' (real engineering delta, not catalog-only).
Reviewer verification: OV cpu / gpu / npu — branch \shzhen/add-mgp-str-base\Commands\\powershell configuv run winml config -m alibaba-damo/mgp-str-base --task image-to-text -o temp/verify_pr952_mgpstr_config.json build (OV CPU, fp32, using recipe)uv run winml build -c examples/recipes/alibaba-damo_mgp-str-base/image-to-text_config.json -m alibaba-damo/mgp-str-base -o temp/verify_pr952_mgpstr_build --ep openvino --device cpu --precision fp32 --no-quant --no-compile --rebuild perf — cpu / gpu / npu (from built ONNX, 5 iters + 2 warmup)uv run winml perf -m temp/verify_pr952_mgpstr_build/model.onnx --ep openvino --device cpu --iterations 5 --warmup 2 --skip-build -f json evaluv run winml eval -m alibaba-damo/mgp-str-base --task image-to-text --device cpu --ep openvino --samples 1 Results
Notes:
|
|
Validation results (2026-06-25) for PR #952 on this Windows ARM64 host. Scope
Main branch baseline (before PR)
PR #952 branch
Conclusion
|
|
ADDENDUM: main branch baseline (NO support) On current \main\ @ HEAD: Conclusion: This PR adds \image-to-text\ task support (via \MgpstrImage2TextOnnxConfig\ alias + \MODEL_CLASS_MAPPING\ binding). Without this PR, mgp-str only works under \eature-extraction. The engineering delta is real (not catalog-only). All OV devices now pass config/build/perf validation. |
|
the exported model are same as the current supported task? |
Summary
Adds Effort-L1-light registration so MGP-STR scene-text-recognition models resolve under the user-facing
image-to-texttask label. The vendorMgpstrOnnxConfig(Optimum) already exposes the 3-head outputs (char_logits,bpe_logits,wp_logits) correctly, but is registered ONLY underfeature-extraction. This PR adds a task-label alias +MODEL_CLASS_MAPPINGbinding toMgpstrForSceneTextRecognition(the head-bearing class — MGP-STR is NOT a generic Vision2Seq).Files changed (5)
src/winml/modelkit/models/hf/mgp_str.py(NEW, 58 lines) —MgpstrImage2TextOnnxConfig(MgpstrOnnxConfig)subclasssrc/winml/modelkit/models/hf/__init__.py— 3-line wiringexamples/recipes/alibaba-damo_mgp-str-base/image-to-text_config.json(NEW, 49 lines) — recipeexamples/recipes/README.md— catalog rowresearch/adding-model-support/model_knowledge/mgp_str.json—mgp_str-004post-mortem findingGoal-ladder verdict
alibaba-damo/mgp-str-base @ image-to-text @ fp32 @ cpuimage-to-texttask has no default dataset (same as vit-gpt2)Step 1b verification — real engineering vs catalog-only
winml config --task image-to-text(recipe is autoconf-faithful)mgp-str doesn't support task image-to-text for the onnx backend.→ real engineering delta, NOT catalog-only.Known gotchas
architectures: ['MGPSTRModel']but currenttransformersexportsMgpstrModel(CamelCase rename). Without--task image-to-textexplicit,winml inspect/config/buildfail withCannot import MGPSTRModel from transformers. CLI robustness gap separate from this PR.a3_moduleheads are non-fatal on CPU.Verification