Skip to content

Conversation

@CrazyRoka
Copy link
Contributor

Description

This PR fixes a panic in ptx when an invalid regular expression is passed to the -W (--word-regexp) argument.

Previously, running:
ptx -W 'bar\\\'
would result in a Rust panic (called Result::unwrap() on an Err value).

This change catches the regex::Error, wraps it in a USimpleError, and ensures the program exits with code 1 and prints a user-friendly message to stderr.

Testing

  • Added unit tests in tests/by-util/test_ptx.rs to verify:
    • Trailing backslashes (bar\) return exit code 1.
    • Unclosed groups ((wrong) return exit code 1.
    • The stderr contains "invalid regular expression".

@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/sort/sort-stale-thread-mem. tests/sort/sort-stale-thread-mem is passing on 'main'. Maybe you have to rebase?
Congrats! The gnu test tests/sort/sort-debug-keys is no longer failing!

@github-actions
Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-debug-keys is no longer failing!

@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/sort/sort-stale-thread-mem. tests/sort/sort-stale-thread-mem is passing on 'main'. Maybe you have to rebase?

filter: &WordFilter,
file_map: &FileMap,
) -> UResult<BTreeSet<WordRef>> {
let reg = Regex::new(&filter.word_regex)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a branch with all of your ptx PR's to see if it passed the GNU tests and this part failed since when an invalid regex is provided its expected to print to stderr and return empty with a success code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To solve this I tested with a:

/// Try to compile a regex, printing a warning and returning None on failure.
/// This matches GNU ptx behavior which handles invalid regex gracefully.
fn try_compile_regex(pattern: &str) -> Option<Regex> {
    Regex::new(pattern)
        .inspect_err(|e| show_error!("{}", translate!("ptx-error-invalid-regexp", "error" => e)))
        .ok()
}

So that the error handling for the create word set looks like this:

    let Some(reg) = try_compile_regex(&filter.word_regex) else {
        return Ok(BTreeSet::new());
    };
    let Some(ref_reg) = try_compile_regex(&config.context_regex) else {
        return Ok(BTreeSet::new());
    };

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually even though this will pass the GNU tests, I am not finding any cases where the Regex fails to compile and it provides a warning, will mainly just need to succeed those with an empty list

let mut file_map: FileMap = HashMap::new();
let mut offset: usize = 0;

let sentence_splitter = if let Some(re_str) = &config.sentence_regex {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea here:

    let sentence_splitter = config
        .sentence_regex
        .as_ref()
        .and_then(|re_str| try_compile_regex(re_str));

In GNU the expectation is to succeed while printing the error to stderr

fn test_invalid_regex_word_unclosed_group() {
new_ucmd!()
.args(&["-W", "(wrong"])
.fails_with_code(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only other follow up would be to change these to reflect the success message, but the error message can stay the same

@ChrisDryden
Copy link
Collaborator

ChrisDryden commented Jan 7, 2026

Whats going to be tricky here is that some of the testing to compare the two implementations is going to not work because whats considered invalid to rust's regex does not line up 100% with the GNU regex. Some examples: bar\ and [abc

GNU allows trailing backslashes, but uutils would show a warning for bar\ whereas if theres an unmatched [ it fails completely for gnu ptx but would succeed with an error for the uutils implementation. I think for the time being it would be best to just have this as an open issue to track some of the edge cases and merge the PR's you have with the fixes that pass all of the test cases and then the deviation with GNU can be handled when uutils figures out what it wants to do about the whole regex library situation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants