Skip to content

fix: remove 'data' from _REMOVE_FILE_COMMAND_RESULT#199

Merged
bjk7119 merged 1 commit intomainfrom
data
Apr 15, 2026
Merged

fix: remove 'data' from _REMOVE_FILE_COMMAND_RESULT#199
bjk7119 merged 1 commit intomainfrom
data

Conversation

@bjk7119
Copy link
Copy Markdown
Contributor

@bjk7119 bjk7119 commented Apr 8, 2026

Description

AI/ML model and dataset files (e.g. ONNX, TFLite, TF Checkpoint, SafeTensors, GGUF, Caffe, GGML, word2vec, LevelDB, LMDB, MXNet, MessagePack) return 'data' from magic, causing check_binary() to incorrectly return False and skip them entirely.

Remove 'data' from _REMOVE_FILE_COMMAND_RESULT so that these files proceed to the is_binary() check via binaryornot, which correctly identifies them as binary.

Summary by CodeRabbit

  • Bug Fixes
    • Improved binary file classification accuracy. The detection algorithm has been refined to correctly identify binary files that were previously misclassified as non-binary. This results in enhanced reliability and precision of analysis across diverse file formats and improved overall system performance.

@bjk7119 bjk7119 requested review from dd-jy and soimkim April 8, 2026 05:00
@bjk7119 bjk7119 self-assigned this Apr 8, 2026
@bjk7119 bjk7119 added the chore [PR/Issue] Refactoring, maintenance the code label Apr 8, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

Warning

Rate limit exceeded

@bjk7119 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 8 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 21 minutes and 8 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ad21cb1f-ff91-4935-accd-ba1af488da1a

📥 Commits

Reviewing files that changed from the base of the PR and between de6a35e and 0f22a0e.

📒 Files selected for processing (1)
  • src/fosslight_binary/binary_analysis.py
📝 Walkthrough

Walkthrough

A private constant _REMOVE_FILE_COMMAND_RESULT in the binary analysis module was simplified by removing the 'data' entry. This change alters binary classification behavior for files where magic output begins with that prefix.

Changes

Cohort / File(s) Summary
Binary Classification Logic
src/fosslight_binary/binary_analysis.py
Removed 'data' entry from _REMOVE_FILE_COMMAND_RESULT constant, changing how magic output prefixes are handled during binary detection.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • soimkim
  • dd-jy
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: removing 'data' from the _REMOVE_FILE_COMMAND_RESULT constant, which directly matches the primary functional change in the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch data

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/fosslight_binary/binary_analysis.py (1)

36-36: Add a regression test for the 'data' magic-output path.

Given this behavior change, please add coverage that verifies files with magic output starting with 'data' are no longer short-circuited at the prefix filter and proceed to is_binary() classification.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fosslight_binary/binary_analysis.py` at line 36, Add a regression test
that ensures magic outputs starting with "data" are not filtered by the
_REMOVE_FILE_COMMAND_RESULT prefix check and instead proceed to binary
classification: create a test (e.g., test_binary_analysis.py) that mocks the
magic inspection used by binary_analysis (mock magic.from_file or the wrapper
used) to return a string beginning with "data" and assert that the code does not
early-return due to _REMOVE_FILE_COMMAND_RESULT and that is_binary() (or the
public classification function that calls is_binary) is invoked and its result
used; reference _REMOVE_FILE_COMMAND_RESULT and is_binary() in the test to make
intent clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fosslight_binary/binary_analysis.py`:
- Line 36: The docstring example in check_binary() references a removed item
("data") from the _REMOVE_FILE_COMMAND_RESULT list; update the docstring to
reflect the current contents of _REMOVE_FILE_COMMAND_RESULT (e.g., remove the
"'data'" example or replace it with an accurate example such as "'timezone data'
or 'apple binary property list' ) so the example matches the actual list and
avoids confusion when reading _REMOVE_FILE_COMMAND_RESULT and check_binary().

---

Nitpick comments:
In `@src/fosslight_binary/binary_analysis.py`:
- Line 36: Add a regression test that ensures magic outputs starting with "data"
are not filtered by the _REMOVE_FILE_COMMAND_RESULT prefix check and instead
proceed to binary classification: create a test (e.g., test_binary_analysis.py)
that mocks the magic inspection used by binary_analysis (mock magic.from_file or
the wrapper used) to return a string beginning with "data" and assert that the
code does not early-return due to _REMOVE_FILE_COMMAND_RESULT and that
is_binary() (or the public classification function that calls is_binary) is
invoked and its result used; reference _REMOVE_FILE_COMMAND_RESULT and
is_binary() in the test to make intent clear.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 70968e7f-4d61-4e99-a0b1-7a590a331636

📥 Commits

Reviewing files that changed from the base of the PR and between 36046e3 and de6a35e.

📒 Files selected for processing (1)
  • src/fosslight_binary/binary_analysis.py

Comment thread src/fosslight_binary/binary_analysis.py
@bjk7119 bjk7119 merged commit d8e05db into main Apr 15, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore [PR/Issue] Refactoring, maintenance the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants