Add MGP-STR (alibaba-damo/mgp-str-base) image-to-text task support by ssss141414 · Pull Request #952 · microsoft/winml-cli

ssss141414 · 2026-06-24T04:14:31Z

Summary

Adds Effort-L1-light registration so MGP-STR scene-text-recognition models resolve under the user-facing image-to-text task label. The vendor MgpstrOnnxConfig (Optimum) already exposes the 3-head outputs (char_logits, bpe_logits, wp_logits) correctly, but is registered ONLY under feature-extraction. This PR adds a task-label alias + MODEL_CLASS_MAPPING binding to MgpstrForSceneTextRecognition (the head-bearing class — MGP-STR is NOT a generic Vision2Seq).

Files changed (5)

src/winml/modelkit/models/hf/mgp_str.py (NEW, 58 lines) — MgpstrImage2TextOnnxConfig(MgpstrOnnxConfig) subclass
src/winml/modelkit/models/hf/__init__.py — 3-line wiring
examples/recipes/alibaba-damo_mgp-str-base/image-to-text_config.json (NEW, 49 lines) — recipe
examples/recipes/README.md — catalog row
research/adding-model-support/model_knowledge/mgp_str.json — mgp_str-004 post-mortem finding

Goal-ladder verdict

alibaba-damo/mgp-str-base @ image-to-text @ fp32 @ cpu

Tier	Verdict	Evidence
L0 build	PASS	83.7s, 374 nodes, 564.5 MB optimized; autoconf converged in 2 iters
L1 perf	PASS	avg=100.76ms, P90=123.26ms, 9.92 samples/sec (20 iters CPU)
L2 numerical	PASS	cosine vs PT: char=0.99999999999992, bpe=0.99999999999974, wp=0.99999999999860; max-abs 5.7e-05 / 2.4e-04 / 2.1e-04
L3 eval	CLI-BLOCKED	`image-to-text` task has no default dataset (same as vit-gpt2)

Step 1b verification — real engineering vs catalog-only

Gate 1 (auto-config-diff): identical to winml config --task image-to-text (recipe is autoconf-faithful)
Gate 2 (baseline build on main): FAILS with mgp-str doesn't support task image-to-text for the onnx backend. → real engineering delta, NOT catalog-only.

Known gotchas

HF model card declares legacy architectures: ['MGPSTRModel'] but current transformers exports MgpstrModel (CamelCase rename). Without --task image-to-text explicit, winml inspect/config/build fail with Cannot import MGPSTRModel from transformers. CLI robustness gap separate from this PR.
3 Einsum ops in a3_module heads are non-fatal on CPU.

Verification

uv run winml build -c examples/recipes/alibaba-damo_mgp-str-base/image-to-text_config.json -m alibaba-damo/mgp-str-base -o temp/mgp_build --ep cpu --device cpu --rebuild
uv run winml perf -m temp/mgp_build/model.onnx --ep cpu --device cpu --iterations 20

Adds Effort-L1-light registration so MGP-STR scene-text-recognition models resolve under the user-facing 'image-to-text' task label. The vendor MgpstrOnnxConfig (Optimum) already exposes the 3-head outputs (char_logits, bpe_logits, wp_logits) correctly but is registered only under feature-extraction. This PR adds a task-label alias plus MODEL_CLASS_MAPPING binding to MgpstrForSceneTextRecognition. Files: - src/winml/modelkit/models/hf/mgp_str.py: MgpstrImage2TextOnnxConfig subclass (58 lines) - src/winml/modelkit/models/hf/__init__.py: 3-line wiring - examples/recipes/alibaba-damo_mgp-str-base/image-to-text_config.json: recipe (49 lines) - examples/recipes/README.md: catalog row - research/adding-model-support/model_knowledge/mgp_str.json: mgp_str-004 finding Goal-ladder (alibaba-damo/mgp-str-base @ image-to-text @ fp32 @ cpu): - L0 PASS: build 83.7s, 374 nodes, 564.5 MB optimized - L1 PASS: avg=100.76ms, P90=123.26ms, 9.92 samples/sec (20 iters) - L2 PASS: cosine vs PyTorch reference all 3 heads >=0.999999 (max-abs <3e-4) - L3 CLI-BLOCKED: image-to-text task has no default dataset (same as nlpconnect/vit-gpt2-image-captioning per known limitation) Step 1b verification: baseline 'winml build' on main fails with 'mgp-str doesn't support task image-to-text' (real engineering delta, not catalog-only).

ssss141414 · 2026-06-25T03:31:28Z

Reviewer verification: OV cpu / gpu / npu — branch \shzhen/add-mgp-str-base\

Commands

\\powershell

config

uv run winml config -m alibaba-damo/mgp-str-base --task image-to-text -o temp/verify_pr952_mgpstr_config.json

build (OV CPU, fp32, using recipe)

uv run winml build -c examples/recipes/alibaba-damo_mgp-str-base/image-to-text_config.json -m alibaba-damo/mgp-str-base -o temp/verify_pr952_mgpstr_build --ep openvino --device cpu --precision fp32 --no-quant --no-compile --rebuild

perf — cpu / gpu / npu (from built ONNX, 5 iters + 2 warmup)

uv run winml perf -m temp/verify_pr952_mgpstr_build/model.onnx --ep openvino --device cpu --iterations 5 --warmup 2 --skip-build -f json
uv run winml perf -m temp/verify_pr952_mgpstr_build/model.onnx --ep openvino --device gpu --iterations 5 --warmup 2 --skip-build -f json
uv run winml perf -m temp/verify_pr952_mgpstr_build/model.onnx --ep openvino --device npu --iterations 5 --warmup 2 --skip-build -f json

eval

uv run winml eval -m alibaba-damo/mgp-str-base --task image-to-text --device cpu --ep openvino --samples 1
\\

Results

Command	cpu	gpu	npu
config	✅ PASS	—	—
build	✅ PASS (79s, 564.5 MB, autoconf converged in 2 iters)	—	—
perf mean	✅ 305 ms/iter	✅ 9.1 ms/iter	✅ 22 ms/iter
perf throughput	3.27 samples/s	109.38 samples/s	45.48 samples/s
eval	❌ CLI-BLOCKED	❌ CLI-BLOCKED	❌ CLI-BLOCKED

Notes:

\config\ / \�uild\ / \perf\ pass on all three OV devices. OV sessions created successfully for cpu, gpu, and npu.
Build emits 3 \Einsum\ op warnings (\OpUnsupportedError: Node Einsum is not supported\ for \char_a3_module, \�pe_a3_module, \wp_a3_module) — consistent with the 'non-fatal on CPU' note in the PR. OV EP handles these via fallback.
\�val\ returns \No dataset provided and no default for task 'image-to-text'. Use --dataset.\ — same CLI-BLOCKED verdict as described in the PR (same as vit-gpt2). Not an OV EP limitation.
ONNX artifact: 374 nodes (post-optimize), opset 17, fp32, input: \pixel_values[1,3,32,128], outputs: \char_logits[1,27,38], \�pe_logits[1,27,50257], \wp_logits[1,27,30522].

ssss141414 · 2026-06-25T03:32:34Z

Validation results (2026-06-25) for PR #952 on this Windows ARM64 host.

Scope

Compare main vs PR branch behavior
Verify winml config on QNN NPU/GPU

Main branch baseline (before PR)

Command: uv run winml config -m alibaba-damo/mgp-str-base --task image-to-text --ep cpu --device cpu
Result: FAIL
Error: mgp-str doesn't support task image-to-text for the onnx backend. Supported tasks are: feature-extraction.

PR #952 branch

CPU config: PASS
- uv run winml config -m alibaba-damo/mgp-str-base --task image-to-text --ep cpu --device cpu
- Resolved to Device=CPU, EP=CPUExecutionProvider
QNN NPU config: PASS
- uv run winml config -m alibaba-damo/mgp-str-base --task image-to-text --ep qnn --device npu
- Resolved to Device=NPU, EP=QNNExecutionProvider
QNN GPU config: PASS
- uv run winml config -m alibaba-damo/mgp-str-base --task image-to-text --ep qnn --device gpu
- Resolved to Device=GPU, EP=QNNExecutionProvider

Conclusion

Confirmed: this PR adds real image-to-text task support for mgp-str (main fails, PR passes), including QNN NPU/GPU configuration resolution.

ssss141414 · 2026-06-25T03:34:14Z

ADDENDUM: main branch baseline (NO support)

On current \main\ @ HEAD:
\\powershell
uv run winml config -m alibaba-damo/mgp-str-base --task image-to-text
\
Returns:
\
Error: mgp-str doesn't support task image-to-text for the onnx backend. Supported tasks are: feature-extraction.
\\

Conclusion: This PR adds \image-to-text\ task support (via \MgpstrImage2TextOnnxConfig\ alias + \MODEL_CLASS_MAPPING\ binding). Without this PR, mgp-str only works under \eature-extraction. The engineering delta is real (not catalog-only). All OV devices now pass config/build/perf validation.

xieofxie · 2026-06-25T06:22:16Z

the exported model are same as the current supported task?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MGP-STR (alibaba-damo/mgp-str-base) image-to-text task support#952

Add MGP-STR (alibaba-damo/mgp-str-base) image-to-text task support#952
ssss141414 wants to merge 1 commit into
mainfrom
shzhen/add-mgp-str-base

ssss141414 commented Jun 24, 2026

Uh oh!

ssss141414 commented Jun 25, 2026

Uh oh!

ssss141414 commented Jun 25, 2026

Uh oh!

ssss141414 commented Jun 25, 2026

Uh oh!

xieofxie commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ssss141414 commented Jun 24, 2026

Summary

Files changed (5)

Goal-ladder verdict

Step 1b verification — real engineering vs catalog-only

Known gotchas

Verification

Uh oh!

ssss141414 commented Jun 25, 2026

Reviewer verification: OV cpu / gpu / npu — branch \shzhen/add-mgp-str-base\

Commands

config

build (OV CPU, fp32, using recipe)

perf — cpu / gpu / npu (from built ONNX, 5 iters + 2 warmup)

eval

Results

Uh oh!

ssss141414 commented Jun 25, 2026

Uh oh!

ssss141414 commented Jun 25, 2026

Uh oh!

xieofxie commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants