fix: remove 'data' from _REMOVE_FILE_COMMAND_RESULT by bjk7119 · Pull Request #199 · fosslight/fosslight_binary_scanner

bjk7119 · 2026-04-08T05:00:52Z

Description

AI/ML model and dataset files (e.g. ONNX, TFLite, TF Checkpoint, SafeTensors, GGUF, Caffe, GGML, word2vec, LevelDB, LMDB, MXNet, MessagePack) return 'data' from magic, causing check_binary() to incorrectly return False and skip them entirely.

Remove 'data' from _REMOVE_FILE_COMMAND_RESULT so that these files proceed to the is_binary() check via binaryornot, which correctly identifies them as binary.

Summary by CodeRabbit

Bug Fixes
- Improved binary file classification accuracy. The detection algorithm has been refined to correctly identify binary files that were previously misclassified as non-binary. This results in enhanced reliability and precision of analysis across diverse file formats and improved overall system performance.

coderabbitai · 2026-04-08T05:02:13Z

Warning

Rate limit exceeded

@bjk7119 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 8 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 21 minutes and 8 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ad21cb1f-ff91-4935-accd-ba1af488da1a

📥 Commits

Reviewing files that changed from the base of the PR and between de6a35e and 0f22a0e.

📒 Files selected for processing (1)

src/fosslight_binary/binary_analysis.py

📝 Walkthrough

Walkthrough

A private constant _REMOVE_FILE_COMMAND_RESULT in the binary analysis module was simplified by removing the 'data' entry. This change alters binary classification behavior for files where magic output begins with that prefix.

Changes

Cohort / File(s)	Summary
Binary Classification Logic `src/fosslight_binary/binary_analysis.py`	Removed `'data'` entry from `_REMOVE_FILE_COMMAND_RESULT` constant, changing how magic output prefixes are handled during binary detection.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

fosslight/fosslight_binary_scanner#193: Directly related; modifies how _REMOVE_FILE_COMMAND_RESULT entries are applied in check_binary() logic.

Suggested reviewers

soimkim
dd-jy

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: removing 'data' from the _REMOVE_FILE_COMMAND_RESULT constant, which directly matches the primary functional change in the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch data

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/fosslight_binary/binary_analysis.py (1)

36-36: Add a regression test for the 'data' magic-output path.

Given this behavior change, please add coverage that verifies files with magic output starting with 'data' are no longer short-circuited at the prefix filter and proceed to is_binary() classification.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/fosslight_binary/binary_analysis.py` at line 36, Add a regression test
that ensures magic outputs starting with "data" are not filtered by the
_REMOVE_FILE_COMMAND_RESULT prefix check and instead proceed to binary
classification: create a test (e.g., test_binary_analysis.py) that mocks the
magic inspection used by binary_analysis (mock magic.from_file or the wrapper
used) to return a string beginning with "data" and assert that the code does not
early-return due to _REMOVE_FILE_COMMAND_RESULT and that is_binary() (or the
public classification function that calls is_binary) is invoked and its result
used; reference _REMOVE_FILE_COMMAND_RESULT and is_binary() in the test to make
intent clear.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/fosslight_binary/binary_analysis.py`:
- Line 36: The docstring example in check_binary() references a removed item
("data") from the _REMOVE_FILE_COMMAND_RESULT list; update the docstring to
reflect the current contents of _REMOVE_FILE_COMMAND_RESULT (e.g., remove the
"'data'" example or replace it with an accurate example such as "'timezone data'
or 'apple binary property list' ) so the example matches the actual list and
avoids confusion when reading _REMOVE_FILE_COMMAND_RESULT and check_binary().

---

Nitpick comments:
In `@src/fosslight_binary/binary_analysis.py`:
- Line 36: Add a regression test that ensures magic outputs starting with "data"
are not filtered by the _REMOVE_FILE_COMMAND_RESULT prefix check and instead
proceed to binary classification: create a test (e.g., test_binary_analysis.py)
that mocks the magic inspection used by binary_analysis (mock magic.from_file or
the wrapper used) to return a string beginning with "data" and assert that the
code does not early-return due to _REMOVE_FILE_COMMAND_RESULT and that
is_binary() (or the public classification function that calls is_binary) is
invoked and its result used; reference _REMOVE_FILE_COMMAND_RESULT and
is_binary() in the test to make intent clear.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 70968e7f-4d61-4e99-a0b1-7a590a331636

📥 Commits

Reviewing files that changed from the base of the PR and between 36046e3 and de6a35e.

📒 Files selected for processing (1)

src/fosslight_binary/binary_analysis.py

bjk7119 requested review from dd-jy and soimkim April 8, 2026 05:00

bjk7119 self-assigned this Apr 8, 2026

bjk7119 added the chore [PR/Issue] Refactoring, maintenance the code label Apr 8, 2026

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

Comment thread src/fosslight_binary/binary_analysis.py

fix: remove 'data' from _REMOVE_FILE_COMMAND_RESULT

0f22a0e

bjk7119 force-pushed the data branch from de6a35e to 0f22a0e Compare April 14, 2026 07:58

soimkim approved these changes Apr 15, 2026

View reviewed changes

bjk7119 merged commit d8e05db into main Apr 15, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove 'data' from _REMOVE_FILE_COMMAND_RESULT#199

fix: remove 'data' from _REMOVE_FILE_COMMAND_RESULT#199
bjk7119 merged 1 commit intomainfrom
data

bjk7119 commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bjk7119 commented Apr 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bjk7119 commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 8, 2026 •

edited

Loading