feat: add im-markdown output for doc fetch by liujiashu-shiro · Pull Request #1550 · larksuite/cli

liujiashu-shiro · 2026-06-23T09:47:29Z

Summary

Add an im-markdown output format for doc fetch, converting Docx content into Markdown suitable for IM messages. The change expands conversion coverage for common document structures and documents the intended lark-doc to lark-im
usage path.

Changes

Add Docx-to-IM-Markdown conversion logic
Support --doc-format im-markdown in doc fetch
Cover headings, lists, code blocks, tables, images, links, nested structures, and edge cases in unit tests
Extend docs_fetch_v2 tests for the new format behavior
Document im-markdown in lark-doc fetch references as a fetch-only format for lark-im usage
Document the lark-im sending workflow for forwarding fetched doc content with --markdown

Test Plan

Unit tests passed: go test ./shortcuts/doc
Format check passed: gofmt -l shortcuts/doc/docs_fetch_im_markdown.go shortcuts/doc/docs_fetch_im_markdown_test.go shortcuts/doc/docs_fetch_v2.go shortcuts/doc/docs_fetch_v2_test.go
Diff whitespace check passed: git diff --check
Manually verify lark-cli docs +fetch --doc-format im-markdown output can be sent through lark-im with --markdown

Related Issues

None

Summary by CodeRabbit

New Features
- Added im-markdown as an allowed --doc-format for v2 +fetch. It fetches as standard Markdown from the API, then converts IM-style markup (headings, callouts, blockquotes, lists, grids/columns, tables, sheets/bookmarks, and citations) into clean Markdown, including nested and partially malformed fragments.
Bug Fixes
- Improved post-processing robustness for unclosed containers and scanner/attribute edge cases, preserving or safely dropping fragments as appropriate.
Tests
- Expanded unit and integration-style coverage for request construction/downgrades and Markdown conversion behaviors, including escaping.
Documentation
- Updated lark-doc/lark-im docs to clarify im-markdown is fetch-only and how to send converted content as a message.

coderabbitai · 2026-06-23T09:47:46Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds im-markdown as a new --doc-format option for +fetch. The flag value is downgraded to markdown when calling the /fetch API. The returned XML-ish IM-markup content is then post-processed by a new converter that scans for registered tags and rewrites them as standard Markdown, with comprehensive test coverage for all tag handlers and edge cases. Documentation updates explain the fetch-only usage in lark-im scenarios and the workflow for sending doc content as messages.

Changes

IM-Markdown Conversion Pipeline

Layer / File(s)	Summary
Fetch v2 flag, format downgrade, and post-processing hook `shortcuts/doc/docs_fetch_v2.go`	Adds `im-markdown` to the `--doc-format` enum, introduces `effectiveFetchFormat` to map `im-markdown`→`markdown` for the outgoing API request, and inserts a post-processing call to `applyFetchIMMarkdown` after the fetch response is received.
Converter context, handler registry, and utilities `shortcuts/doc/docs_fetch_im_markdown.go`	Defines `imMarkdownContext` and handler types, precompiles close-regexes and attribute/cell/link detection regexes, registers all supported tag handlers in `init`, and implements `newIMMarkdownContext` with tenant-aware base URL extraction and blockquote depth tracking.
Main tag scanning and dispatch loop `shortcuts/doc/docs_fetch_im_markdown.go`	Implements the main `convertToIMMarkdown` loop: scans for the next registered tag, preserves intervening text unchanged, parses attributes with HTML unescaping, routes self-closing tags directly to handlers, and locates matching closing tags using depth tracking.
Block-level element handlers `shortcuts/doc/docs_fetch_im_markdown.go`	Implements handlers for `title`/headings, paragraphs, line breaks, lists (`ul`/`ol` with `li` via `seq`), `callout` with optional emoji prefixes, blockquotes with depth tracking, and passthrough containers (`grid`/`column`), plus a generic discard handler.
Code, media, and resource link handlers `shortcuts/doc/docs_fetch_im_markdown.go`	Implements handlers for inline backtick code (`whiteboard`), fenced code blocks with optional language and backtick-run sizing, inline LaTeX, horizontal rules, image/source rendering, and `sheet`/`bookmark` conversion to Markdown links using computed base URL.
Table-to-Markdown conversion `shortcuts/doc/docs_fetch_im_markdown.go`	Extracts `tr`/`td`/`th` structures with depth-aware matching, recursively converts nested registered tags in cells, normalizes `<br>` to newlines, strips unknown tags while preserving `<at>` content, HTML-unescapes and pipe-escapes cell text, and pads rows to consistent column counts.
List-body conversion helper `shortcuts/doc/docs_fetch_im_markdown.go`	Implements `ul`/`ol` list conversion by iterating `li` blocks via depth-tracked matching, converting each `li` body to Markdown, applying ordered numbering rules (`seq` or fallback index), and indenting continuation lines.
Citation, link, text, and escaping utilities `shortcuts/doc/docs_fetch_im_markdown.go`	Adds helpers to extract inner anchor href/text, convert markup to plain text with tag stripping and unescaping, build Markdown links with character escaping, compute inline/fenced code fences from backtick runs, apply list continuation indentation, and select first non-empty value.
Converter test infrastructure and apply-function test `shortcuts/doc/docs_fetch_im_markdown_test.go`	Adds test case structure and helpers for table-driven converter assertions, plus `TestApplyFetchIMMarkdown` to verify mutation behavior when `document.content` is a string and tenant URL extraction for context initialization.
Unit tests for tag handlers `shortcuts/doc/docs_fetch_im_markdown_test.go`	Tests individual handler behavior: `title` (trimming, inner markup, concatenation, case-insensitivity, unclosed), `callout` (emoji, nesting, recursive same-name, embedded tags, unclosed), `blockquote` (nested markers, paragraph handling), `grid`/`column` (newline separation, nesting, empty behavior, unclosed), `table` (header/data inference, pipe escaping, `br` normalization, nested tags, padding, unclosed), discard tags (dropping specific containers including self-closing variants), `whiteboard` (backtick escaping, paired/self-closing, unclosed), `sheet` (context-dependent links, missing attributes), `bookmark` (label precedence, href fallback, escaping, missing href, wrapped tags), and `cite` variants (user/doc/citation/unknown with attribute precedence and fallbacks).
Edge case and integration tests `shortcuts/doc/docs_fetch_im_markdown_test.go`	Tests scanner/parsing boundaries (unknown tag preservation with known child conversion, single-quoted attributes, leading text before tags, XML comments, `br` conversion, malformed attributes), composite nesting (callout-grid-table-cites-sheets, nested grids in table cells, bookmark-wrapping-callout fallback), unclosed fragments (preserving opening tags, leaving nested content unconverted across multiple tag types), deep nesting robustness (repeated `grid`/`column` and emoji-wrapped `callout` containers), document-wide tag/escaping smoke test (headings, paragraphs, lists, inline formatting, links, LaTeX, code blocks with backtick escaping, media), mixed-document smoke test (verifying conversions appear while raw fragments don't), and base URL extraction from various URL formats and token-only inputs.
Fetch v2 integration tests for im-markdown `shortcuts/doc/docs_fetch_v2_test.go`	Adds tests for fetch-body construction (revision_id, export_option for with-ids detail, read_option mapping for scope/boundary), detail downgrade no-op assertion for supported format/detail combinations, im-markdown dry-run format downgrade, detail-downgrade export_option validation for markdown and im-markdown, API error handling, and end-to-end test verifying IM XML tags in stubbed response are converted in JSON output.
User-facing documentation updates `skills/lark-doc/SKILL.md`, `skills/lark-doc/references/lark-doc-fetch.md`, `skills/lark-doc/references/lark-doc-md.md`, `skills/lark-im/SKILL.md`	Clarifies that `im-markdown` is a fetch-only format for lark-im scenarios (not for create/update), updates the `--doc-format` parameter table and notes to include `im-markdown`, and adds workflow guidance to lark-im SKILL.md explaining how to fetch doc content in `im-markdown` and send via `--markdown` while preserving user cites.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant CLI as +fetch CLI
  participant executeFetchV2
  participant effectiveFetchFormat
  participant APIServer as /fetch API
  participant applyFetchIMMarkdown
  participant convertToIMMarkdown

  User->>CLI: +fetch --doc-format im-markdown <docToken>
  CLI->>executeFetchV2: execute with im-markdown
  executeFetchV2->>effectiveFetchFormat: compute wire format
  effectiveFetchFormat-->>executeFetchV2: "markdown"
  executeFetchV2->>APIServer: POST /fetch {format: "markdown"}
  APIServer-->>executeFetchV2: {content: "<title>...</title><callout>...</callout>..."}
  executeFetchV2->>applyFetchIMMarkdown: post-process document
  applyFetchIMMarkdown->>convertToIMMarkdown: scan and convert IM tags
  convertToIMMarkdown-->>applyFetchIMMarkdown: "# Title\n> callout...\n..."
  applyFetchIMMarkdown-->>executeFetchV2: document with Markdown content
  executeFetchV2-->>User: JSON output with converted content

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

larksuite/cli#1466: Modifies the same executeFetchV2 flow and --doc-format downgrade handling in docs_fetch_v2.go, with direct coupling to this PR's format-routing and post-processing wiring.
larksuite/cli#1291: Standardizes +fetch command onto v2 flag/validation paths; this PR extends v2FetchFlags and hooks into the same v2 fetch execution flow.
larksuite/cli#638: Introduces the original v2 fetch pipeline and executeFetchV2 infrastructure that this PR now extends with post-processing and im-markdown conversion support.

Suggested labels

feature, size/XL

Suggested reviewers

YangJunzhou-01
SunPeiYang996

Poem

🐇 A new format hops into the fold,
IM-markdown tags turned into gold!
<callout> and <title> take their bow,
Converted to Markdown — neat as a plow.
The rabbit scans tags with depth and care,
Tables and links bloom fresh in the air! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 2.65% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding a new im-markdown output format for doc fetch functionality.
Description check	✅ Passed	The description covers all required template sections: Summary, Changes with bullet points, Test Plan with checkboxes, and Related Issues. Testing details are documented.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/doc_im_markdown

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

codecov · 2026-06-23T09:52:36Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.29%. Comparing base (d71bab0) to head (df09172).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1550      +/-   ##
==========================================
+ Coverage   74.04%   74.29%   +0.24%     
==========================================
  Files         787      788       +1     
  Lines       76353    76916     +563     
==========================================
+ Hits        56534    57141     +607     
+ Misses      15572    15536      -36     
+ Partials     4247     4239       -8

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-06-23T09:53:25Z

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@df091727e11420ecd182a00a39efd75696c86258

🧩 Skill update

npx skills add larksuite/cli#feat/doc_im_markdown -y -g

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

shortcuts/doc/docs_fetch_im_markdown.go (1)
296-313: 🎯 Functional Correctness | 🔵 Trivial

Nested tables inside table cells will be mis-parsed due to non-greedy regex matching.

The non-greedy (.*?)</t[dh]> pattern in imMarkdownCellsRE matches the first closing </td> or </th>, which would be the inner table's cell tag if a <table> is nested within a <td>. This truncates the cell content and corrupts the row. The handler correctly handles other nested elements like <grid> (which use different tag names), but <table> uses the same tag names and will break.

No tests currently cover nested tables. If Docx exports can produce nested tables, add a test case or document this as a known limitation and fall back to inline code for cells containing nested <table> elements.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@shortcuts/doc/docs_fetch_im_markdown.go` around lines 296 - 313, The
handleIMMarkdownTable function has a bug where nested tables inside cells will
be mis-parsed because the imMarkdownCellsRE regex uses a non-greedy pattern that
matches the first closing </td> or </th> tag, which would be from an inner table
instead of the outer cell. To fix this, before processing the cell content in
the inner loop where cellMatch[1] is used, add a check to detect if the cell
content contains a nested <table> element. If it does, either fall back to
calling imMarkdownInlineCode on the segment or skip processing that row to avoid
corrupting the output. This guard should be placed before the
normalizeIMMarkdownTableCell and convertToIMMarkdown calls.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@shortcuts/doc/docs_fetch_im_markdown.go`:
- Around line 412-421: The `markdownLink` function does not URL-encode the href
parameter before inserting it into the markdown link format. Apply URL encoding
to the href string in the `markdownLink` function before passing it to
fmt.Sprintf, ensuring that special characters like spaces are encoded as %20 and
parentheses as %28 and %29 to comply with Lark/Feishu Markdown requirements. Use
the appropriate URL encoding function from the standard library to encode the
href while maintaining the fmt.Sprintf call structure.

---

Nitpick comments:
In `@shortcuts/doc/docs_fetch_im_markdown.go`:
- Around line 296-313: The handleIMMarkdownTable function has a bug where nested
tables inside cells will be mis-parsed because the imMarkdownCellsRE regex uses
a non-greedy pattern that matches the first closing </td> or </th> tag, which
would be from an inner table instead of the outer cell. To fix this, before
processing the cell content in the inner loop where cellMatch[1] is used, add a
check to detect if the cell content contains a nested <table> element. If it
does, either fall back to calling imMarkdownInlineCode on the segment or skip
processing that row to avoid corrupting the output. This guard should be placed
before the normalizeIMMarkdownTableCell and convertToIMMarkdown calls.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 29d4a8f7-db41-46e6-ba9c-332e69158f45

📥 Commits

Reviewing files that changed from the base of the PR and between 736b131 and 453c74b.

📒 Files selected for processing (4)

shortcuts/doc/docs_fetch_im_markdown.go
shortcuts/doc/docs_fetch_im_markdown_test.go
shortcuts/doc/docs_fetch_v2.go
shortcuts/doc/docs_fetch_v2_test.go

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@shortcuts/doc/docs_fetch_v2_test.go`:
- Around line 667-669: In the TestDocsFetchV2ReturnsAPIError test, replace the
simple strings.Contains check for "fetch failed" with comprehensive typed error
assertions. Use errs.ProblemOf to extract and validate the error's typed
metadata including category, subtype, and param fields to ensure the API error
contract is properly maintained. Additionally, verify that the error cause chain
is preserved by unwrapping the error to check that the underlying error is
accessible, rather than only validating the error message text.
- Around line 325-331: The test for validateReadModeFlags() currently validates
error details using string substring matching with strings.Contains(err.Error(),
tt.wantParam), which doesn't catch classification regressions. Replace this with
typed error metadata assertions by removing the substring check and instead use
errs.ProblemOf to assert the error's category and subtype, and use errors.As to
extract the *errs.ValidationError and directly assert its Param field. Apply
this pattern to all error-path tests in the file including the instance at lines
421-423.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cbea290c-ced3-4ba5-8a54-08e298cfa67a

📥 Commits

Reviewing files that changed from the base of the PR and between 389d80f and c882023.

📒 Files selected for processing (2)

shortcuts/doc/docs_fetch_im_markdown_test.go
shortcuts/doc/docs_fetch_v2_test.go

github-actions Bot added domain/ccm PR touches the ccm domain size/L Large or sensitive change across domains or core paths labels Jun 23, 2026

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread shortcuts/doc/docs_fetch_im_markdown.go

github-actions Bot added the domain/im PR touches the im domain label Jun 23, 2026

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread shortcuts/doc/docs_fetch_v2_test.go Outdated

Comment thread shortcuts/doc/docs_fetch_v2_test.go Outdated

liujiashu-shiro force-pushed the feat/doc_im_markdown branch from dbb5acb to 986b98e Compare June 24, 2026 03:26

SunPeiYang996 reviewed Jun 24, 2026

View reviewed changes

Comment thread skills/lark-doc/SKILL.md Outdated

Comment thread skills/lark-doc/SKILL.md Outdated

liujiashu-shiro added 12 commits June 24, 2026 15:19

feat: add docs im-markdown fetch format

b4c2efa

refactor: tune docs im-markdown conversion

8ba3a03

test: expand docs im-markdown conversion coverage

8de9c90

refactor: simplify docs im-markdown handlers

0fe5b09

test: cover docs im-markdown edge cases

696aea6

fix: expand doc im markdown tag downgrades

07d84a3

fix: preserve blockquote paragraph breaks

25a4ba3

fix: handle im markdown nested tables and urls

705aba0

docs: document im markdown skill usage

42aee57

test: cover doc im markdown fetch

28f1349

test: strengthen doc fetch error coverage

ddf79ce

fix: fetch doc skill typo

df09172

liujiashu-shiro force-pushed the feat/doc_im_markdown branch from c4e340d to df09172 Compare June 24, 2026 07:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add im-markdown output for doc fetch#1550

feat: add im-markdown output for doc fetch#1550
liujiashu-shiro wants to merge 12 commits into
mainfrom
feat/doc_im_markdown

liujiashu-shiro commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liujiashu-shiro commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Plan

Related Issues

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 PR Preview Install Guide

🧰 CLI update

🧩 Skill update

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liujiashu-shiro commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

codecov Bot commented Jun 23, 2026 •

edited

Loading

github-actions Bot commented Jun 23, 2026 •

edited

Loading