[NAE-2408] Implement field sanitization modes for text data fields#432
[NAE-2408] Implement field sanitization modes for text data fields#432machacjozef merged 5 commits intorelease/6.4.3from
Conversation
- Introduces `IFieldSanitizationService` and its implementation. - Adds configuration for HTML sanitization modes and actions. - Updates `DataService` to sanitize text fields during data modification. - Includes unit and integration tests for sanitization behavior. - Bumps project version to 6.4.3-SNAPSHOT.
Introduced a sanitization mechanism for text fields using IFieldSanitizationService, ensuring that input values are sanitized before being processed. This includes methods to resolve the appropriate field and component for sanitization. Updated the data parsing logic to integrate the sanitization step.
Revised test assertions to match the updated error message format. This ensures consistency with the new message structure indicating the configured action for unsafe content.
Changed the default fallback sanitization mode from PLAIN_TEXT to DISABLE_JAVASCRIPT in SanitizationMode and FieldSanitizationService. This ensures better protection against potential JavaScript-related vulnerabilities.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 22 minutes and 57 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughVersion bump from 6.4.2 to 6.4.3-SNAPSHOT. Added OWASP HTML sanitizer dependency. Introduced field sanitization feature with Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant DataService
participant FieldSanitizationService
participant Field
participant Case
Client->>DataService: setData(case, task, data)
DataService->>Field: extract newValue for each field
loop For each field
DataService->>DataService: sanitizeValueIfNeeded(case, task, field, newValue)
alt Is TextField and value is String?
DataService->>DataService: resolveFieldForSanitization(case, task, field)
DataService->>DataService: resolveSanitizationComponent(case, task, field)
DataService->>FieldSanitizationService: sanitize(newValue, field)
FieldSanitizationService->>Field: getComponent().getProperty(SANITIZATION_MODE)
FieldSanitizationService->>Field: getComponent().getProperty(SANITIZATION_ACTION)
alt Mode is OFF?
FieldSanitizationService-->>DataService: return original value
else Mode is SANITIZE or SAFE_HTML?
FieldSanitizationService->>FieldSanitizationService: resolvePolicy(mode)
FieldSanitizationService->>FieldSanitizationService: apply OWASP PolicyFactory.sanitize()
alt Action is REJECT and content changed?
FieldSanitizationService-->>DataService: throw IllegalArgumentException
else
FieldSanitizationService-->>DataService: return sanitized value
end
end
else
DataService-->>DataService: return value unchanged
end
DataService->>Case: apply sanitized value to field
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/main/java/com/netgrif/application/engine/workflow/service/DataService.java (1)
282-317:⚠️ Potential issue | 🟠 MajorApply incoming component-property changes before sanitizing the submitted value.
sanitizeValueIfNeeded()reads the current component from the case/task, but the newpropertiesfrom the same payload are only applied later. A singlesetData()call that changessanitizationMode/sanitizationActionandvaluetogether will therefore sanitize with stale rules. That can let unsafe HTML through when tightening a field in the same request, or strip content unexpectedly in the opposite direction.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/netgrif/application/engine/workflow/service/DataService.java` around lines 282 - 317, The sanitization is run before applying incoming component property changes, so sanitizeValueIfNeeded() sees stale component rules; parse the properties via parseProperties(entry.getValue()) and apply them (call changeComponentProperties(useCase, task, field.getStringId(), properties) or otherwise update the in-memory component/state) before calling sanitizeValueIfNeeded(useCase, task, field, newValue); then continue with setting value, allowedNets, filterMetadata, options, choices and adding the outcome — ensure modified flags and outcome.addOutcome ordering reflect the early application of properties so sanitization uses the updated sanitizationMode/sanitizationAction.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@src/main/java/com/netgrif/application/engine/workflow/service/sanitization/SanitizationAction.java`:
- Around line 7-18: The from method in SanitizationAction should trim and
validate the input instead of silently returning SANITIZE for malformed or
whitespace-padded values; update SanitizationAction.from to first handle null
(return SANITIZE), then trim the input, return SANITIZE only if the trimmed
string is blank, otherwise compare trimmed value case-insensitively against enum
names, and if no match is found throw an IllegalArgumentException (or at least
log an error) naming the invalid value so mis-typed values like " REJECT " fail
fast and are visible.
In
`@src/main/java/com/netgrif/application/engine/workflow/service/sanitization/SanitizationMode.java`:
- Around line 14-25: Change SanitizationMode.from so it trims the input, treats
null/blank as "unspecified" (return null or the previous-compatibility sentinel
instead of defaulting to DISABLE_JAVASCRIPT), only accept DISABLE_JAVASCRIPT
when the trimmed value explicitly matches that name, and reject other invalid
values by throwing an IllegalArgumentException; i.e., in from(String) call value
= value == null ? null : value.trim(), if value==null or value.isEmpty() return
null (compat behavior), if value.equalsIgnoreCase("DISABLE_JAVASCRIPT") return
SanitizationMode.DISABLE_JAVASCRIPT, otherwise loop values() to match names
case-insensitively and return match or throw IllegalArgumentException when no
match is found.
In
`@src/test/java/com/netgrif/application/engine/workflow/service/FieldSanitizationIntegrationTest.java`:
- Around line 424-432: The two tests dataLevelSafeHtmlShouldApplyConfiguration
and dataLevelSafeHtmlShouldKeepSafeHtmlUntouched are duplicates of earlier cases
using the same field id text_safe_html and identical inputs/expectations; update
them to target a distinct data-level configured field (e.g., create/use a new
field id like text_safe_html_data_level) or explicitly document in the test
method Javadoc/inline comment that these exercise the data-level configuration
path (as opposed to transition/task-level) so the intent is clear; ensure any
new field id is wired to the data-level sanitization configuration and adjust
assertStoredEquals calls to reference that id.
- Around line 419-422: The test
disableJavascriptRejectShouldRejectNormalizedSafeMarkup currently asserts
rejection for input "<p><img src=\"https://example.com/image.png\"></p>" which
appears safe; update the test to clarify why rejection is expected: either
rename the test to something like
disableJavascriptRejectShouldRejectWhenHtmlIsNormalized (or similar) and/or add
an inline comment inside that test referencing the normalization behavior that
triggers rejection (e.g., self-closing tag or attribute normalization) and that
this is why assertRejected("text_disable_javascript_reject", ...) is used;
ensure the comment mentions the relevant behavior being tested so future readers
understand the reason for rejecting otherwise-safe markup.
- Around line 548-549: The assertion is doing an extra DB lookup by calling
getCase(stored.getStringId()) again; replace that redundant call and directly
use the already fetched Case instance stored to check the field value (i.e. use
stored.getDataField("text_plain_sanitize").getValue() in the assertNull),
keeping the same assertion intent but avoiding the unnecessary getCase
invocation.
- Line 70: Replace the manual instantiation of ObjectMapper in
FieldSanitizationIntegrationTest with the Spring-managed bean: remove "new
ObjectMapper()" and inject the application ObjectMapper (e.g., via an
`@Autowired/private` final field or constructor injection) so the test uses the
same JSON configuration and registered modules as the rest of the app.
- Around line 575-580: The createCase() method opens a FileInputStream with new
FileInputStream(XML_PATH) and passes it to petriNetService.importPetriNet
without closing it; wrap the FileInputStream creation in a try-with-resources in
createCase() so the stream is automatically closed even if importPetriNet throws
(i.e., use try (InputStream is = new FileInputStream(XML_PATH)) {
petriNetService.importPetriNet(is, VersionType.MAJOR,
superCreator.getLoggedSuper()); } and proceed to return the created Case) to
eliminate the resource leak.
In
`@src/test/java/com/netgrif/application/engine/workflow/service/FieldSanitizationServiceTest.java`:
- Around line 48-63: Tests assume plain-text stripping but
SanitizationMode.from() now defaults to DISABLE_JAVASCRIPT, so update the tests
to either explicitly set the field to plain-text mode or change
expectations/names to the new behavior: in FieldSanitizationServiceTest update
the two methods (shouldStripHtmlTagsWhenDefaultModeIsPlainText and
shouldStripDangerousAttributesWhenDefaultModeIsPlainText) to either call
field.setSanitizationMode(SanitizationMode.PLAIN_TEXT) before invoking
service.sanitize(...) (keeping existing expected strings) or rename the tests
and change expected results to reflect DISABLE_JAVASCRIPT semantics (e.g.,
expect "<b>Hello</b> world" for the first and "test" vs "<img src=\"x\">test"
as appropriate) and reference SanitizationMode.from(), TextField, and
service.sanitize to locate the code to change.
In `@src/test/resources/sanitization_modes.xml`:
- Around line 9-15: The fixture with id "text_plain_default" lacks an explicit
sanitizationMode and thus resolves to DISABLE_JAVASCRIPT in production; update
the <data type="text"> element for id "text_plain_default" to include an
explicit <sanitizationMode>PLAIN</sanitizationMode> (or alternatively rename
id/title/placeholder to reflect DISABLE_JAVASCRIPT if you intend to adopt the
new default) so tests assert the intended plain-text stripping behavior.
---
Outside diff comments:
In
`@src/main/java/com/netgrif/application/engine/workflow/service/DataService.java`:
- Around line 282-317: The sanitization is run before applying incoming
component property changes, so sanitizeValueIfNeeded() sees stale component
rules; parse the properties via parseProperties(entry.getValue()) and apply them
(call changeComponentProperties(useCase, task, field.getStringId(), properties)
or otherwise update the in-memory component/state) before calling
sanitizeValueIfNeeded(useCase, task, field, newValue); then continue with
setting value, allowedNets, filterMetadata, options, choices and adding the
outcome — ensure modified flags and outcome.addOutcome ordering reflect the
early application of properties so sanitization uses the updated
sanitizationMode/sanitizationAction.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 52340b05-073e-4155-b1fd-75f9492704d7
📒 Files selected for processing (10)
pom.xmlsrc/main/java/com/netgrif/application/engine/workflow/service/DataService.javasrc/main/java/com/netgrif/application/engine/workflow/service/interfaces/IFieldSanitizationService.javasrc/main/java/com/netgrif/application/engine/workflow/service/sanitization/FieldSanitizationService.javasrc/main/java/com/netgrif/application/engine/workflow/service/sanitization/SanitizationAction.javasrc/main/java/com/netgrif/application/engine/workflow/service/sanitization/SanitizationMode.javasrc/test/java/com/netgrif/application/engine/workflow/service/FieldSanitizationIntegrationTest.javasrc/test/java/com/netgrif/application/engine/workflow/service/FieldSanitizationServiceTest.javasrc/test/resources/data_text_validation.xmlsrc/test/resources/sanitization_modes.xml
Updated the default sanitization mode from 'DISABLE_JAVASCRIPT' to 'OFF' to align with behavior changes. Removed redundant test cases that validated HTML stripping in 'PLAIN_TEXT' mode, as it is no longer relevant. Adjusted existing tests to match the new default mode behavior.
Description
Implement field sanitization modes for text data fields
Implements NAE-2408
Dependencies
Third party dependencies
Blocking Pull requests
There are no dependencies on other PR.
How Has Been This Tested?
This was tested manually and with unit tests.
Test Configuration
Checklist:
Summary by CodeRabbit