Skip to content

[NAE-2408] Implement field sanitization modes for text data fields#432

Merged
machacjozef merged 5 commits intorelease/6.4.3from
NAE-2408
Apr 16, 2026
Merged

[NAE-2408] Implement field sanitization modes for text data fields#432
machacjozef merged 5 commits intorelease/6.4.3from
NAE-2408

Conversation

@renczesstefan
Copy link
Copy Markdown
Member

@renczesstefan renczesstefan commented Apr 15, 2026

Description

Implement field sanitization modes for text data fields

Implements NAE-2408

Dependencies

Third party dependencies

<dependency>
    <groupId>com.googlecode.owasp-java-html-sanitizer</groupId>
    <artifactId>owasp-java-html-sanitizer</artifactId>
    <version>20260313.1</version>
</dependency>

Blocking Pull requests

There are no dependencies on other PR.

How Has Been This Tested?

This was tested manually and with unit tests.

Test Configuration

Name Tested on
OS macOS Tahoe 26.3
Runtime Java 21
Dependency Manager Maven 3.9.9n
Framework version Spring Boot 3.4.4
Run parameters
Other configuration

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes have been checked, personally or remotely, with @tuplle
  • I have commented my code, particularly in hard-to-understand areas
  • I have resolved all conflicts with the target branch of the PR
  • I have updated and synced my code with the target branch
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing tests pass locally with my changes:
    • Lint test
    • Unit tests
    • Integration tests
  • I have checked my contribution with code analysis tools:
  • I have made corresponding changes to the documentation:
    • Developer documentation
    • User Guides
    • Migration Guides

Summary by CodeRabbit

  • New Features
    • Added configurable text field sanitization with multiple modes including plain text, various safe HTML presets, and JavaScript disabling.
    • Introduced sanitization actions to either sanitize unsafe content or reject it entirely.
    • Sanitization settings can be configured on a per-field basis.

machacjozef and others added 3 commits April 15, 2026 10:02
- Introduces `IFieldSanitizationService` and its implementation.
- Adds configuration for HTML sanitization modes and actions.
- Updates `DataService` to sanitize text fields during data modification.
- Includes unit and integration tests for sanitization behavior.
- Bumps project version to 6.4.3-SNAPSHOT.
Introduced a sanitization mechanism for text fields using IFieldSanitizationService, ensuring that input values are sanitized before being processed. This includes methods to resolve the appropriate field and component for sanitization. Updated the data parsing logic to integrate the sanitization step.
Revised test assertions to match the updated error message format. This ensures consistency with the new message structure indicating the configured action for unsafe content.
Changed the default fallback sanitization mode from PLAIN_TEXT to DISABLE_JAVASCRIPT in SanitizationMode and FieldSanitizationService. This ensures better protection against potential JavaScript-related vulnerabilities.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

Warning

Rate limit exceeded

@renczesstefan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 22 minutes and 57 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 22 minutes and 57 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 33be2947-b0f0-41b2-835b-7dc1922f2184

📥 Commits

Reviewing files that changed from the base of the PR and between ff46792 and 569634b.

📒 Files selected for processing (3)
  • src/main/java/com/netgrif/application/engine/workflow/service/sanitization/SanitizationMode.java
  • src/test/java/com/netgrif/application/engine/workflow/service/FieldSanitizationIntegrationTest.java
  • src/test/java/com/netgrif/application/engine/workflow/service/FieldSanitizationServiceTest.java

Walkthrough

Version bump from 6.4.2 to 6.4.3-SNAPSHOT. Added OWASP HTML sanitizer dependency. Introduced field sanitization feature with FieldSanitizationService supporting multiple sanitization modes and rejection actions. Integrated sanitization into DataService.setData(). Comprehensive unit and integration tests added.

Changes

Cohort / File(s) Summary
Maven Configuration
pom.xml
Updated project version to 6.4.3-SNAPSHOT and added compile dependency on OWASP HTML sanitizer library (version 20260313.1).
Service Interface
src/main/java/com/netgrif/application/engine/workflow/service/interfaces/IFieldSanitizationService.java
New interface defining single sanitize(String value, Field<?> field) method for field value sanitization contract.
DataService Integration
src/main/java/com/netgrif/application/engine/workflow/service/DataService.java
Added IFieldSanitizationService dependency injection and integrated sanitization into setData() workflow. New protected methods: sanitizeValueIfNeeded(), resolveFieldForSanitization(), resolveSanitizationComponent() to handle field-level sanitization configuration and component resolution.
Sanitization Service Implementation
src/main/java/com/netgrif/application/engine/workflow/service/sanitization/FieldSanitizationService.java, src/main/java/com/netgrif/application/engine/workflow/service/sanitization/SanitizationMode.java, src/main/java/com/netgrif/application/engine/workflow/service/sanitization/SanitizationAction.java
Implemented FieldSanitizationService with multiple OWASP PolicyFactory instances for different sanitization modes. Added SanitizationMode enum (OFF, PLAIN_TEXT, SAFE_HTML variants, DISABLE_JAVASCRIPT) and SanitizationAction enum (SANITIZE, REJECT) with factory methods for string-to-enum conversion.
Unit Tests
src/test/java/com/netgrif/application/engine/workflow/service/FieldSanitizationServiceTest.java
Comprehensive unit tests for FieldSanitizationService covering null handling, mode/action resolution, safe HTML preservation, property extraction, and rejection behavior across sanitization configurations.
Integration Tests
src/test/java/com/netgrif/application/engine/workflow/service/FieldSanitizationIntegrationTest.java
End-to-end integration tests importing Petri net, creating cases, and verifying sanitization behavior across all modes (OFF, PLAIN_TEXT, SAFE_HTML variants, DISABLE_JAVASCRIPT) with SANITIZE and REJECT actions.
Test Resources
src/test/resources/sanitization_modes.xml, src/test/resources/data_text_validation.xml
Added sanitization_modes.xml defining Petriflow test document with data fields and transitions covering all sanitization modes and actions. Minor formatting update to data_text_validation.xml.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant DataService
    participant FieldSanitizationService
    participant Field
    participant Case

    Client->>DataService: setData(case, task, data)
    DataService->>Field: extract newValue for each field
    
    loop For each field
        DataService->>DataService: sanitizeValueIfNeeded(case, task, field, newValue)
        alt Is TextField and value is String?
            DataService->>DataService: resolveFieldForSanitization(case, task, field)
            DataService->>DataService: resolveSanitizationComponent(case, task, field)
            DataService->>FieldSanitizationService: sanitize(newValue, field)
            FieldSanitizationService->>Field: getComponent().getProperty(SANITIZATION_MODE)
            FieldSanitizationService->>Field: getComponent().getProperty(SANITIZATION_ACTION)
            alt Mode is OFF?
                FieldSanitizationService-->>DataService: return original value
            else Mode is SANITIZE or SAFE_HTML?
                FieldSanitizationService->>FieldSanitizationService: resolvePolicy(mode)
                FieldSanitizationService->>FieldSanitizationService: apply OWASP PolicyFactory.sanitize()
                alt Action is REJECT and content changed?
                    FieldSanitizationService-->>DataService: throw IllegalArgumentException
                else
                    FieldSanitizationService-->>DataService: return sanitized value
                end
            end
        else
            DataService-->>DataService: return value unchanged
        end
        
        DataService->>Case: apply sanitized value to field
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

new feature, improvement, Large

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.04% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the main change: implementing field sanitization modes for text data fields, which is the core feature across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added improvement A change that improves on an existing feature new feature A change that introduces new functionality Large labels Apr 15, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/main/java/com/netgrif/application/engine/workflow/service/DataService.java (1)

282-317: ⚠️ Potential issue | 🟠 Major

Apply incoming component-property changes before sanitizing the submitted value.

sanitizeValueIfNeeded() reads the current component from the case/task, but the new properties from the same payload are only applied later. A single setData() call that changes sanitizationMode/sanitizationAction and value together will therefore sanitize with stale rules. That can let unsafe HTML through when tightening a field in the same request, or strip content unexpectedly in the opposite direction.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@src/main/java/com/netgrif/application/engine/workflow/service/DataService.java`
around lines 282 - 317, The sanitization is run before applying incoming
component property changes, so sanitizeValueIfNeeded() sees stale component
rules; parse the properties via parseProperties(entry.getValue()) and apply them
(call changeComponentProperties(useCase, task, field.getStringId(), properties)
or otherwise update the in-memory component/state) before calling
sanitizeValueIfNeeded(useCase, task, field, newValue); then continue with
setting value, allowedNets, filterMetadata, options, choices and adding the
outcome — ensure modified flags and outcome.addOutcome ordering reflect the
early application of properties so sanitization uses the updated
sanitizationMode/sanitizationAction.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@src/main/java/com/netgrif/application/engine/workflow/service/sanitization/SanitizationAction.java`:
- Around line 7-18: The from method in SanitizationAction should trim and
validate the input instead of silently returning SANITIZE for malformed or
whitespace-padded values; update SanitizationAction.from to first handle null
(return SANITIZE), then trim the input, return SANITIZE only if the trimmed
string is blank, otherwise compare trimmed value case-insensitively against enum
names, and if no match is found throw an IllegalArgumentException (or at least
log an error) naming the invalid value so mis-typed values like " REJECT " fail
fast and are visible.

In
`@src/main/java/com/netgrif/application/engine/workflow/service/sanitization/SanitizationMode.java`:
- Around line 14-25: Change SanitizationMode.from so it trims the input, treats
null/blank as "unspecified" (return null or the previous-compatibility sentinel
instead of defaulting to DISABLE_JAVASCRIPT), only accept DISABLE_JAVASCRIPT
when the trimmed value explicitly matches that name, and reject other invalid
values by throwing an IllegalArgumentException; i.e., in from(String) call value
= value == null ? null : value.trim(), if value==null or value.isEmpty() return
null (compat behavior), if value.equalsIgnoreCase("DISABLE_JAVASCRIPT") return
SanitizationMode.DISABLE_JAVASCRIPT, otherwise loop values() to match names
case-insensitively and return match or throw IllegalArgumentException when no
match is found.

In
`@src/test/java/com/netgrif/application/engine/workflow/service/FieldSanitizationIntegrationTest.java`:
- Around line 424-432: The two tests dataLevelSafeHtmlShouldApplyConfiguration
and dataLevelSafeHtmlShouldKeepSafeHtmlUntouched are duplicates of earlier cases
using the same field id text_safe_html and identical inputs/expectations; update
them to target a distinct data-level configured field (e.g., create/use a new
field id like text_safe_html_data_level) or explicitly document in the test
method Javadoc/inline comment that these exercise the data-level configuration
path (as opposed to transition/task-level) so the intent is clear; ensure any
new field id is wired to the data-level sanitization configuration and adjust
assertStoredEquals calls to reference that id.
- Around line 419-422: The test
disableJavascriptRejectShouldRejectNormalizedSafeMarkup currently asserts
rejection for input "<p><img src=\"https://example.com/image.png\"></p>" which
appears safe; update the test to clarify why rejection is expected: either
rename the test to something like
disableJavascriptRejectShouldRejectWhenHtmlIsNormalized (or similar) and/or add
an inline comment inside that test referencing the normalization behavior that
triggers rejection (e.g., self-closing tag or attribute normalization) and that
this is why assertRejected("text_disable_javascript_reject", ...) is used;
ensure the comment mentions the relevant behavior being tested so future readers
understand the reason for rejecting otherwise-safe markup.
- Around line 548-549: The assertion is doing an extra DB lookup by calling
getCase(stored.getStringId()) again; replace that redundant call and directly
use the already fetched Case instance stored to check the field value (i.e. use
stored.getDataField("text_plain_sanitize").getValue() in the assertNull),
keeping the same assertion intent but avoiding the unnecessary getCase
invocation.
- Line 70: Replace the manual instantiation of ObjectMapper in
FieldSanitizationIntegrationTest with the Spring-managed bean: remove "new
ObjectMapper()" and inject the application ObjectMapper (e.g., via an
`@Autowired/private` final field or constructor injection) so the test uses the
same JSON configuration and registered modules as the rest of the app.
- Around line 575-580: The createCase() method opens a FileInputStream with new
FileInputStream(XML_PATH) and passes it to petriNetService.importPetriNet
without closing it; wrap the FileInputStream creation in a try-with-resources in
createCase() so the stream is automatically closed even if importPetriNet throws
(i.e., use try (InputStream is = new FileInputStream(XML_PATH)) {
petriNetService.importPetriNet(is, VersionType.MAJOR,
superCreator.getLoggedSuper()); } and proceed to return the created Case) to
eliminate the resource leak.

In
`@src/test/java/com/netgrif/application/engine/workflow/service/FieldSanitizationServiceTest.java`:
- Around line 48-63: Tests assume plain-text stripping but
SanitizationMode.from() now defaults to DISABLE_JAVASCRIPT, so update the tests
to either explicitly set the field to plain-text mode or change
expectations/names to the new behavior: in FieldSanitizationServiceTest update
the two methods (shouldStripHtmlTagsWhenDefaultModeIsPlainText and
shouldStripDangerousAttributesWhenDefaultModeIsPlainText) to either call
field.setSanitizationMode(SanitizationMode.PLAIN_TEXT) before invoking
service.sanitize(...) (keeping existing expected strings) or rename the tests
and change expected results to reflect DISABLE_JAVASCRIPT semantics (e.g.,
expect "<b>Hello</b>  world" for the first and "test" vs "<img src=\"x\">test"
as appropriate) and reference SanitizationMode.from(), TextField, and
service.sanitize to locate the code to change.

In `@src/test/resources/sanitization_modes.xml`:
- Around line 9-15: The fixture with id "text_plain_default" lacks an explicit
sanitizationMode and thus resolves to DISABLE_JAVASCRIPT in production; update
the <data type="text"> element for id "text_plain_default" to include an
explicit <sanitizationMode>PLAIN</sanitizationMode> (or alternatively rename
id/title/placeholder to reflect DISABLE_JAVASCRIPT if you intend to adopt the
new default) so tests assert the intended plain-text stripping behavior.

---

Outside diff comments:
In
`@src/main/java/com/netgrif/application/engine/workflow/service/DataService.java`:
- Around line 282-317: The sanitization is run before applying incoming
component property changes, so sanitizeValueIfNeeded() sees stale component
rules; parse the properties via parseProperties(entry.getValue()) and apply them
(call changeComponentProperties(useCase, task, field.getStringId(), properties)
or otherwise update the in-memory component/state) before calling
sanitizeValueIfNeeded(useCase, task, field, newValue); then continue with
setting value, allowedNets, filterMetadata, options, choices and adding the
outcome — ensure modified flags and outcome.addOutcome ordering reflect the
early application of properties so sanitization uses the updated
sanitizationMode/sanitizationAction.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 52340b05-073e-4155-b1fd-75f9492704d7

📥 Commits

Reviewing files that changed from the base of the PR and between c4f79bd and ff46792.

📒 Files selected for processing (10)
  • pom.xml
  • src/main/java/com/netgrif/application/engine/workflow/service/DataService.java
  • src/main/java/com/netgrif/application/engine/workflow/service/interfaces/IFieldSanitizationService.java
  • src/main/java/com/netgrif/application/engine/workflow/service/sanitization/FieldSanitizationService.java
  • src/main/java/com/netgrif/application/engine/workflow/service/sanitization/SanitizationAction.java
  • src/main/java/com/netgrif/application/engine/workflow/service/sanitization/SanitizationMode.java
  • src/test/java/com/netgrif/application/engine/workflow/service/FieldSanitizationIntegrationTest.java
  • src/test/java/com/netgrif/application/engine/workflow/service/FieldSanitizationServiceTest.java
  • src/test/resources/data_text_validation.xml
  • src/test/resources/sanitization_modes.xml

Comment thread src/test/resources/sanitization_modes.xml
@machacjozef machacjozef changed the title Nae 2408 [NAE-2408] Implement field sanitization modes for text data fields Apr 15, 2026
@netgrif netgrif deleted a comment from Retoocs Apr 15, 2026
Updated the default sanitization mode from 'DISABLE_JAVASCRIPT' to 'OFF' to align with behavior changes. Removed redundant test cases that validated HTML stripping in 'PLAIN_TEXT' mode, as it is no longer relevant. Adjusted existing tests to match the new default mode behavior.
@machacjozef machacjozef merged commit 4ee8eab into release/6.4.3 Apr 16, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement A change that improves on an existing feature Large new feature A change that introduces new functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants