Add FORM-based source provider by barnaliy · Pull Request #379 · Framework-R-D/phlex

barnaliy · 2026-03-03T15:43:18Z

Adds form_source.cpp implementing a FORM-based provider that retrieves data products from FORM storage into Phlex.
This is the source-side complement to form_module.cpp.

barnaliy · 2026-03-03T15:56:16Z

@phlexbot format

github-actions · 2026-03-03T15:56:40Z

No automatic header-guards fixes were necessary.

github-actions · 2026-03-03T15:56:44Z

No automatic cmake-format fixes were necessary.

github-actions · 2026-03-03T15:56:45Z

No automatic markdownlint fixes were necessary.

github-actions · 2026-03-03T15:56:45Z

Automatic clang-format fixes pushed (commit 388095a).
⚠️ Note: Some issues may require manual review and fixing.

github-actions · 2026-03-03T15:56:45Z

No automatic jsonnetfmt fixes were necessary.

codecov · 2026-03-03T16:05:09Z

Codecov Report

❌ Patch coverage is 0% with 42 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
form/form_source.cpp	0.00%	42 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

@@            Coverage Diff             @@
##             main     #379      +/-   ##
==========================================
- Coverage   85.36%   84.96%   -0.41%     
==========================================
  Files         122      120       -2     
  Lines        2433     2441       +8     
  Branches      389      390       +1     
==========================================
- Hits         2077     2074       -3     
- Misses        233      249      +16     
+ Partials      123      118       -5

Flag	Coverage Δ
unittests	`84.96% <0.00%> (-0.41%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
form/form_source.cpp	`0.00% <0.00%> (ø)`

... and 8 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0435987...24a97c6. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gemmeren · 2026-03-03T16:38:28Z

form/form_source.cpp

+  // --- Extract configuration ---
+  std::string const input_file = config.get<std::string>("input_file");
+  std::string const creator = config.get<std::string>("creator");
+  std::string const tech_string = config.get<std::string>("technology", "ROOT_TTREE");


Longer-term, you may want to do some rethinking and approach this in terms of major & minor technology.
To open a e.g. ROOT (major technology) file, FORM doesn't need to know the minor technology (e.g. TTree vs RNT). The minor technology can easily be discovered. That would avoid having the user specify the minor technology of the input and make configurations more portable.

Good point — noted for future improvement. For Prototype 0.1, the minor technology is explicitly specified in config to keep things simple. Auto-discovery of minor technology is a good goal for a future iteration.

OK for this PR and the first version, but maybe we can create a git issue?

gemmeren · 2026-03-03T16:43:26Z

form/form_source.cpp

+  // product_query fields must be compile-time _id literals, not runtime strings.
+  // To add new products, add new s.provide() blocks below following the same pattern.
+
+  s.provide("provide_i",


Will this be needed for every data product we read from input? If so, we may want to think about a way to shorten or auto-generate these functions.

This is already noted in the FIXME comment — it's a known limitation of Prototype 0.1 that will be addressed as the type list grows.

Here too I wouldn't mind a git issue

aolivier23 · 2026-03-03T16:46:13Z

form/form_source.cpp

+              if (!pb.data) {
+                throw std::runtime_error("FORM read returned null data for product: i");
+              }
+              return *static_cast<int const*>(pb.data);


Does this line leak memory? I think the lambda function returns a copy of pb.data (because of the trailing return type) constructed from a reference to pb.data. But I think the memory pointed to by
pb.data itself is never de-allocated because it's stored in a raw pointer.

I'm working on a way to verify my understanding. I think something like valgrind will give us a strong answer if I can get it to work.

There might be some r-value/move-semantics trickery here that I haven't fully wrapped my mind around. Line 66 looks like an r-value to me at first glance.

But I think the memory pointed to by pb.data itself is never de-allocated because it's stored in a raw pointer.

However it comes from the product_with_name where FORM may (or may not) have put a view (pointer to) the data 'owned' by the TTree or RNTuple (either way, this should be documented in product_with_name and if it is not a view but owns the data then it should deleted).

Related (possibly) future point (in this example this is moot since this 'just' an int), the provider should not be doing a copy of the potentially extremely large data but should instead pass ownership to Phlex. [Whether this is 'always' the case or just in case of non-trivial data is yet another question]

I believe the memory in pb.data is allocated here on each call to read(): https://github.com/Framework-R-D/phlex/blob/main/form/root_storage/root_tbranch_container.cpp#L167

There's a similar version that uses TClass::New() for objects that are not fundamental types.

This is a design question. The memory should be freed within FORM, not by the caller. form_source.cpp calls form_interface::read(), making the API user/caller. My understanding is the allocator should be the deallocator. Since FORM allocates the memory inside read(), FORM should also free it — not the API.

I believe the memory in pb.data is allocated here on each call to read()

I see. So indeed the product_with_name "owns" the data and should destruct it or, better yet, give ownership to phlex.

Side note that the code at https://github.com/Framework-R-D/phlex/blob/main/form/root_storage/root_tbranch_container.cpp#L167 is missing a reset of the branch address (as is the TTree is holding a reference to the address of a stack variable and thus a reference to invalid address after the end of the function).

Yes, exactly, instead of FORM freeing the memory itself, it should transfer ownership to Phlex — because Phlex knows the product lifecycle.

However, you actually need both in case the think is read but never past on to Phlex.

Following up on the discussion regarding memory management: I've created an issue to address this here:
https://github.com/Framework-R-D/phlex/issues/392

barnaliy · 2026-03-03T18:59:22Z

@phlexbot format

github-actions · 2026-03-03T19:00:15Z

No automatic header-guards fixes were necessary.

github-actions · 2026-03-03T19:01:15Z

No automatic jsonnetfmt fixes were necessary.

github-actions · 2026-03-03T19:01:33Z

No automatic markdownlint fixes were necessary.

wwuoneway

I have a structural concern regarding the separation between form_module and form_source, as both currently communicate directly with Phlex. form_source appears to focus on reading input files, while form_module likely handles the integration with Phlex. Should form_source be a standalone module that communicates directly with Phlex, or would it be more appropriate as an internal component used by form_module?

wwuoneway · 2026-03-03T23:11:09Z

form/form_source.cpp

+  form::experimental::config::output_item_config input_cfg;
+  form::experimental::config::tech_setting_config tech_cfg;
+  input_cfg.addItem("i", input_file, technology);
+  input_cfg.addItem("j", input_file, technology);


Instead of debating what to hardcode ( here, placeholder i and j, or meaningful names), it would be preferable to make the code automatically detect available items from the input file. This would eliminate hardcoding entirely and make the module truly reusable and scalable. A real input file may contain many possible items, and production code should ideally adapt dynamically instead assuming a fixed set. I would suggest moving such hardcoded i and j to test code, while the production code should be intelligent enough to discover available items automatically.

The current approach is simple and explicit for Prototype 0.1. Auto-discovery is the right long-term goal, but there are two concrete blockers that go beyond the scope of this PR:

product_query fields require compile-time _id literals and do not accept runtime strings, making it impossible to loop over a dynamically discovered list of products

FORM has no enumeration API to discover available items from an input file

I propose keeping the current approach for Prototype 0.1 and tracking auto-discovery as a follow-up issue once these blockers are resolved.

wwuoneway · 2026-03-03T23:15:41Z

form/form_source.cpp

+              }
+              return *static_cast<int const*>(pb.data);
+            })
+    .output_product(product_query{.creator = "input"_id, .layer = "event"_id, .suffix = "j"_id});


The two provider lambdas for i and j are almost identical, with only the product name string differing. Consider extracting the common logic into a factory function or template to reduce code duplication and improve maintainability. This would also make it easier to add more products in the future.

The duplication is not an oversight.
The lambda body duplication is minimal and intentional for clarity in Prototype 0.1
The output_product() call cannot be generalized due to product_query compile-time _id literal constraint — already noted in FIXME.

barnaliy · 2026-03-05T16:23:28Z

I have a structural concern regarding the separation between form_module and form_source, as both currently communicate directly with Phlex. form_source appears to focus on reading input files, while form_module likely handles the integration with Phlex. Should form_source be a standalone module that communicates directly with Phlex, or would it be more appropriate as an internal component used by form_module?

The strong reason to keep them separate is that Phlex itself makes this distinction at the framework level — form_module uses PHLEX_REGISTER_ALGORITHMS while form_source uses PHLEX_REGISTER_PROVIDERS. These are fundamentally different constructs with different roles in the framework. Merging them would fight the framework design.

I prefer to keep form_module and form_source separate. That said, what do you think @pcanal and @knoepfel ?

Add FORM-based source provider

0997901

Apply clang-format fixes

388095a

barnaliy requested review from gemmeren, knoepfel and pcanal March 3, 2026 16:00

barnaliy requested a review from aolivier23 March 3, 2026 16:17

gemmeren requested a review from wwuoneway March 3, 2026 16:29

gemmeren reviewed Mar 3, 2026

View reviewed changes

aolivier23 reviewed Mar 3, 2026

View reviewed changes

Add form_source to CMakeLists.txt

24a97c6

wwuoneway reviewed Mar 3, 2026

View reviewed changes

barnaliy mentioned this pull request Mar 5, 2026

Auto-discover major and minor technology when opening FORM input files #393

Open

Conversation

barnaliy commented Mar 3, 2026

Uh oh!

barnaliy commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

codecov bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aolivier23 Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

barnaliy commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

wwuoneway left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

barnaliy commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Mar 3, 2026 •

edited

Loading

aolivier23 Mar 3, 2026 •

edited

Loading