Conversation
|
@phlexbot format |
|
No automatic header-guards fixes were necessary. |
|
No automatic cmake-format fixes were necessary. |
|
No automatic markdownlint fixes were necessary. |
|
Automatic clang-format fixes pushed (commit 388095a). |
|
No automatic jsonnetfmt fixes were necessary. |
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. @@ Coverage Diff @@
## main #379 +/- ##
==========================================
- Coverage 85.36% 84.96% -0.41%
==========================================
Files 122 120 -2
Lines 2433 2441 +8
Branches 389 390 +1
==========================================
- Hits 2077 2074 -3
- Misses 233 249 +16
+ Partials 123 118 -5
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 8 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
| // --- Extract configuration --- | ||
| std::string const input_file = config.get<std::string>("input_file"); | ||
| std::string const creator = config.get<std::string>("creator"); | ||
| std::string const tech_string = config.get<std::string>("technology", "ROOT_TTREE"); |
There was a problem hiding this comment.
Longer-term, you may want to do some rethinking and approach this in terms of major & minor technology.
To open a e.g. ROOT (major technology) file, FORM doesn't need to know the minor technology (e.g. TTree vs RNT). The minor technology can easily be discovered. That would avoid having the user specify the minor technology of the input and make configurations more portable.
There was a problem hiding this comment.
Good point — noted for future improvement. For Prototype 0.1, the minor technology is explicitly specified in config to keep things simple. Auto-discovery of minor technology is a good goal for a future iteration.
There was a problem hiding this comment.
OK for this PR and the first version, but maybe we can create a git issue?
| // product_query fields must be compile-time _id literals, not runtime strings. | ||
| // To add new products, add new s.provide() blocks below following the same pattern. | ||
|
|
||
| s.provide("provide_i", |
There was a problem hiding this comment.
Will this be needed for every data product we read from input? If so, we may want to think about a way to shorten or auto-generate these functions.
There was a problem hiding this comment.
This is already noted in the FIXME comment — it's a known limitation of Prototype 0.1 that will be addressed as the type list grows.
There was a problem hiding this comment.
Here too I wouldn't mind a git issue
| if (!pb.data) { | ||
| throw std::runtime_error("FORM read returned null data for product: i"); | ||
| } | ||
| return *static_cast<int const*>(pb.data); |
There was a problem hiding this comment.
Does this line leak memory? I think the lambda function returns a copy of pb.data (because of the trailing return type) constructed from a reference to pb.data. But I think the memory pointed to by
pb.data itself is never de-allocated because it's stored in a raw pointer.
I'm working on a way to verify my understanding. I think something like valgrind will give us a strong answer if I can get it to work.
There was a problem hiding this comment.
There might be some r-value/move-semantics trickery here that I haven't fully wrapped my mind around. Line 66 looks like an r-value to me at first glance.
There was a problem hiding this comment.
But I think the memory pointed to by pb.data itself is never de-allocated because it's stored in a raw pointer.
However it comes from the product_with_name where FORM may (or may not) have put a view (pointer to) the data 'owned' by the TTree or RNTuple (either way, this should be documented in product_with_name and if it is not a view but owns the data then it should deleted).
Related (possibly) future point (in this example this is moot since this 'just' an int), the provider should not be doing a copy of the potentially extremely large data but should instead pass ownership to Phlex. [Whether this is 'always' the case or just in case of non-trivial data is yet another question]
There was a problem hiding this comment.
I believe the memory in pb.data is allocated here on each call to read(): https://github.com/Framework-R-D/phlex/blob/main/form/root_storage/root_tbranch_container.cpp#L167
There's a similar version that uses TClass::New() for objects that are not fundamental types.
There was a problem hiding this comment.
This is a design question. The memory should be freed within FORM, not by the caller. form_source.cpp calls form_interface::read(), making the API user/caller. My understanding is the allocator should be the deallocator. Since FORM allocates the memory inside read(), FORM should also free it — not the API.
There was a problem hiding this comment.
I believe the memory in pb.data is allocated here on each call to read()
I see. So indeed the product_with_name "owns" the data and should destruct it or, better yet, give ownership to phlex.
Side note that the code at https://github.com/Framework-R-D/phlex/blob/main/form/root_storage/root_tbranch_container.cpp#L167 is missing a reset of the branch address (as is the TTree is holding a reference to the address of a stack variable and thus a reference to invalid address after the end of the function).
There was a problem hiding this comment.
Yes, exactly, instead of FORM freeing the memory itself, it should transfer ownership to Phlex — because Phlex knows the product lifecycle.
There was a problem hiding this comment.
However, you actually need both in case the think is read but never past on to Phlex.
There was a problem hiding this comment.
Following up on the discussion regarding memory management: I've created an issue to address this here:
https://github.com/Framework-R-D/phlex/issues/392
|
@phlexbot format |
|
No automatic header-guards fixes were necessary. |
|
No automatic jsonnetfmt fixes were necessary. |
|
No automatic markdownlint fixes were necessary. |
wwuoneway
left a comment
There was a problem hiding this comment.
I have a structural concern regarding the separation between form_module and form_source, as both currently communicate directly with Phlex. form_source appears to focus on reading input files, while form_module likely handles the integration with Phlex. Should form_source be a standalone module that communicates directly with Phlex, or would it be more appropriate as an internal component used by form_module?
| form::experimental::config::output_item_config input_cfg; | ||
| form::experimental::config::tech_setting_config tech_cfg; | ||
| input_cfg.addItem("i", input_file, technology); | ||
| input_cfg.addItem("j", input_file, technology); |
There was a problem hiding this comment.
Instead of debating what to hardcode ( here, placeholder i and j, or meaningful names), it would be preferable to make the code automatically detect available items from the input file. This would eliminate hardcoding entirely and make the module truly reusable and scalable. A real input file may contain many possible items, and production code should ideally adapt dynamically instead assuming a fixed set. I would suggest moving such hardcoded i and j to test code, while the production code should be intelligent enough to discover available items automatically.
There was a problem hiding this comment.
The current approach is simple and explicit for Prototype 0.1. Auto-discovery is the right long-term goal, but there are two concrete blockers that go beyond the scope of this PR:
product_queryfields require compile-time_idliterals and do not accept runtime strings, making it impossible to loop over a dynamically discovered list of products- FORM has no enumeration API to discover available items from an input file
I propose keeping the current approach for Prototype 0.1 and tracking auto-discovery as a follow-up issue once these blockers are resolved.
| } | ||
| return *static_cast<int const*>(pb.data); | ||
| }) | ||
| .output_product(product_query{.creator = "input"_id, .layer = "event"_id, .suffix = "j"_id}); |
There was a problem hiding this comment.
The two provider lambdas for i and j are almost identical, with only the product name string differing. Consider extracting the common logic into a factory function or template to reduce code duplication and improve maintainability. This would also make it easier to add more products in the future.
There was a problem hiding this comment.
The duplication is not an oversight.
The lambda body duplication is minimal and intentional for clarity in Prototype 0.1
The output_product() call cannot be generalized due to product_query compile-time _id literal constraint — already noted in FIXME.
The strong reason to keep them separate is that Phlex itself makes this distinction at the framework level — I prefer to keep form_module and form_source separate. That said, what do you think @pcanal and @knoepfel ? |
Adds form_source.cpp implementing a FORM-based provider that retrieves data products from FORM storage into Phlex.
This is the source-side complement to form_module.cpp.