feat: add im-markdown output for doc fetch#1550
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds ChangesIM-Markdown Conversion Pipeline
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as +fetch CLI
participant executeFetchV2
participant effectiveFetchFormat
participant APIServer as /fetch API
participant applyFetchIMMarkdown
participant convertToIMMarkdown
User->>CLI: +fetch --doc-format im-markdown <docToken>
CLI->>executeFetchV2: execute with im-markdown
executeFetchV2->>effectiveFetchFormat: compute wire format
effectiveFetchFormat-->>executeFetchV2: "markdown"
executeFetchV2->>APIServer: POST /fetch {format: "markdown"}
APIServer-->>executeFetchV2: {content: "<title>...</title><callout>...</callout>..."}
executeFetchV2->>applyFetchIMMarkdown: post-process document
applyFetchIMMarkdown->>convertToIMMarkdown: scan and convert IM tags
convertToIMMarkdown-->>applyFetchIMMarkdown: "# Title\n> callout...\n..."
applyFetchIMMarkdown-->>executeFetchV2: document with Markdown content
executeFetchV2-->>User: JSON output with converted content
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1550 +/- ##
==========================================
+ Coverage 74.04% 74.29% +0.24%
==========================================
Files 787 788 +1
Lines 76353 76916 +563
==========================================
+ Hits 56534 57141 +607
+ Misses 15572 15536 -36
+ Partials 4247 4239 -8 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
🚀 PR Preview Install Guide🧰 CLI updatenpm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@df091727e11420ecd182a00a39efd75696c86258🧩 Skill updatenpx skills add larksuite/cli#feat/doc_im_markdown -y -g |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
shortcuts/doc/docs_fetch_im_markdown.go (1)
296-313: 🎯 Functional Correctness | 🔵 TrivialNested tables inside table cells will be mis-parsed due to non-greedy regex matching.
The non-greedy
(.*?)</t[dh]>pattern inimMarkdownCellsREmatches the first closing</td>or</th>, which would be the inner table's cell tag if a<table>is nested within a<td>. This truncates the cell content and corrupts the row. The handler correctly handles other nested elements like<grid>(which use different tag names), but<table>uses the same tag names and will break.No tests currently cover nested tables. If Docx exports can produce nested tables, add a test case or document this as a known limitation and fall back to inline code for cells containing nested
<table>elements.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@shortcuts/doc/docs_fetch_im_markdown.go` around lines 296 - 313, The handleIMMarkdownTable function has a bug where nested tables inside cells will be mis-parsed because the imMarkdownCellsRE regex uses a non-greedy pattern that matches the first closing </td> or </th> tag, which would be from an inner table instead of the outer cell. To fix this, before processing the cell content in the inner loop where cellMatch[1] is used, add a check to detect if the cell content contains a nested <table> element. If it does, either fall back to calling imMarkdownInlineCode on the segment or skip processing that row to avoid corrupting the output. This guard should be placed before the normalizeIMMarkdownTableCell and convertToIMMarkdown calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@shortcuts/doc/docs_fetch_im_markdown.go`:
- Around line 412-421: The `markdownLink` function does not URL-encode the href
parameter before inserting it into the markdown link format. Apply URL encoding
to the href string in the `markdownLink` function before passing it to
fmt.Sprintf, ensuring that special characters like spaces are encoded as %20 and
parentheses as %28 and %29 to comply with Lark/Feishu Markdown requirements. Use
the appropriate URL encoding function from the standard library to encode the
href while maintaining the fmt.Sprintf call structure.
---
Nitpick comments:
In `@shortcuts/doc/docs_fetch_im_markdown.go`:
- Around line 296-313: The handleIMMarkdownTable function has a bug where nested
tables inside cells will be mis-parsed because the imMarkdownCellsRE regex uses
a non-greedy pattern that matches the first closing </td> or </th> tag, which
would be from an inner table instead of the outer cell. To fix this, before
processing the cell content in the inner loop where cellMatch[1] is used, add a
check to detect if the cell content contains a nested <table> element. If it
does, either fall back to calling imMarkdownInlineCode on the segment or skip
processing that row to avoid corrupting the output. This guard should be placed
before the normalizeIMMarkdownTableCell and convertToIMMarkdown calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 29d4a8f7-db41-46e6-ba9c-332e69158f45
📒 Files selected for processing (4)
shortcuts/doc/docs_fetch_im_markdown.goshortcuts/doc/docs_fetch_im_markdown_test.goshortcuts/doc/docs_fetch_v2.goshortcuts/doc/docs_fetch_v2_test.go
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@shortcuts/doc/docs_fetch_v2_test.go`:
- Around line 667-669: In the TestDocsFetchV2ReturnsAPIError test, replace the
simple strings.Contains check for "fetch failed" with comprehensive typed error
assertions. Use errs.ProblemOf to extract and validate the error's typed
metadata including category, subtype, and param fields to ensure the API error
contract is properly maintained. Additionally, verify that the error cause chain
is preserved by unwrapping the error to check that the underlying error is
accessible, rather than only validating the error message text.
- Around line 325-331: The test for validateReadModeFlags() currently validates
error details using string substring matching with strings.Contains(err.Error(),
tt.wantParam), which doesn't catch classification regressions. Replace this with
typed error metadata assertions by removing the substring check and instead use
errs.ProblemOf to assert the error's category and subtype, and use errors.As to
extract the *errs.ValidationError and directly assert its Param field. Apply
this pattern to all error-path tests in the file including the instance at lines
421-423.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cbea290c-ced3-4ba5-8a54-08e298cfa67a
📒 Files selected for processing (2)
shortcuts/doc/docs_fetch_im_markdown_test.goshortcuts/doc/docs_fetch_v2_test.go
dbb5acb to
986b98e
Compare
c4e340d to
df09172
Compare
Summary
Add an
im-markdownoutput format for doc fetch, converting Docx content into Markdown suitable for IM messages. The change expands conversion coverage for common document structures and documents the intendedlark-doctolark-imusage path.
Changes
--doc-format im-markdownin doc fetchdocs_fetch_v2tests for the new format behaviorim-markdowninlark-docfetch references as a fetch-only format forlark-imusagelark-imsending workflow for forwarding fetched doc content with--markdownTest Plan
go test ./shortcuts/docgofmt -l shortcuts/doc/docs_fetch_im_markdown.go shortcuts/doc/docs_fetch_im_markdown_test.go shortcuts/doc/docs_fetch_v2.go shortcuts/doc/docs_fetch_v2_test.gogit diff --checklark-cli docs +fetch --doc-format im-markdownoutput can be sent throughlark-imwith--markdownRelated Issues
None
Summary by CodeRabbit
New Features
im-markdownas an allowed--doc-formatfor v2+fetch. It fetches as standard Markdown from the API, then converts IM-style markup (headings, callouts, blockquotes, lists, grids/columns, tables, sheets/bookmarks, and citations) into clean Markdown, including nested and partially malformed fragments.Bug Fixes
Tests
Documentation
lark-doc/lark-imdocs to clarifyim-markdownis fetch-only and how to send converted content as a message.