Conversation
test/workflows/automatic_job_grouping/inputs_files-strings.yaml
Outdated
Show resolved
Hide resolved
| class TransformationSubmissionModel(BaseModel): | ||
| """Transformation definition sent to the router.""" | ||
|
|
||
| # Allow arbitrary types to be passed to the model | ||
| model_config = ConfigDict(arbitrary_types_allowed=True) | ||
|
|
||
| task: CommandLineTool | Workflow | ExpressionTool | ||
| input_data: Optional[list[str | File] | None] = None |
There was a problem hiding this comment.
As we are going to integrate input sandbox within transformations (#92), it would be interesting to see if we could reuse the JobInputModel (renamed as InputModel?)
There was a problem hiding this comment.
Regarding @arrabito comments:
- I agree that we don't need to have input sandbox for now, so it can't be local files.
- I don't remember how we will add support for sandboxes in the transformation system. For simplicity, I would keep just LFN paths for now.
- As said before, in my opinion there is no need to support/create sandboxes for now.
Do I still make this change in this PR? Or wouldn't it be better to do it in a (futur) sandbox PR? Maybe I missunderstood what you meant here.
There was a problem hiding this comment.
Let's make this change in a future sandbox PR I would say
There was a problem hiding this comment.
Thinking a little bit further, we may also want to allow local file paths, but only to be used for Local execution (without adding them to SB).
So if the submission is local we allow only local paths, while if the submission is to DIRAC we allow only LFN paths.
In this way, we could also execute transformations locally.
Eventually later on, we will also allow local file paths for DIRAC submission (adding them to ISB).
@aldbr what do you think?
|
@aldbr Regarding this part of the code: dirac-cwl/src/dirac_cwl_proto/transformation/__init__.py Lines 130 to 163 in 72956d5 Are we planning on keeping it? Just so I un-comment it and make the changes related to the |
|
Waiting on #66 (comment) and #95 (comment) approval about what we're doing, and then, PR should be ready to be fully reviewed (and potentially merged 🙏). |
Yes we want to keep it. A transformation should either get inputs from the CLI, or from a |
|
I’m also not sure whether the Also, the If you have any ideas. |
Since |
As far as I see, I'm not sure that any input_name is needed anymore. In the current QueryBasedPlugin, input_name is just used to build the LFN path, see: Probably we could just change get_input_query to not take any argument and just build LFN path as:
instead of:
Then, I guess that the group_size in yaml file should be specified as: instead of: @aldbr do you agree? (Maybe some other changes are needed that I haven't thought). |
Yes I agree. In any case, this is going to be revised at some point with the hints proposed in #69 |
|
Current PR status:
|
|
I don't know how to fix current lint error. I had to rebase and reword all my "wrong" commits and it's still not working because of old commits already pushed. |
|
Also, PyPi is failling on something I didn't touch directly (mostly sure about that), so I don't really know what to do about that too. If you have any ideas models.update(
{
"JobInputModel": JobInputModel, # <--- error happens here
"JobSubmissionModel": JobSubmissionModel,
"TransformationSubmissionModel": TransformationSubmissionModel,
"ProductionSubmissionModel": ProductionSubmissionModel,
}
)class JobInputModel(BaseModel): # <-- it's a BaseModel ?
"""Input data and sandbox files for a job execution."""
# Allow arbitrary types to be passed to the model
model_config = ConfigDict(arbitrary_types_allowed=True)
sandbox: list[str] | None
cwl: dict[str, Any]
@field_serializer("cwl")
def serialize_cwl(self, value):
"""Serialize CWL object to dictionary.
:param value: CWL object to serialize.
:return: Serialized CWL dictionary.
"""
return save(value) |
f9337a8 to
52e6144
Compare
|
Thank you @Stellatsuu , apart the small comments the overall looks good to me. |
99df614 to
3a1314c
Compare
| raise TypeError(f"Cannot serialize type {type(value)}") | ||
|
|
||
| @field_serializer("input_data") | ||
| def serialize_input_data(self, value): |
There was a problem hiding this comment.
I see that the TransformationSubmissionModel and the ProductionSubmissionModel share methods and arguments from now. I don't know if we want to create a BaseModel for them to inherit from. Like:
class ???SubmissionModel(BaseModel):
"""BaseModel for high level DIRAC Workflow"""
# Allow arbitrary types to be passed to the model
model_config = ConfigDict(arbitrary_types_allowed=True)
task: CommandLineTool | Workflow | ExpressionTool
input_data: Optional[list[str | File] | None] = None
@field_serializer("task")
def serialize_task(self, value):
"""Serialize CWL task object to dictionary.
:param value: CWL task object to serialize.
:return: Serialized task dictionary.
:raises TypeError: If value is not a valid CWL task type.
"""
if isinstance(value, (CommandLineTool, Workflow, ExpressionTool)):
return save(value)
else:
raise TypeError(f"Cannot serialize type {type(value)}")
@field_serializer("input_data")
def serialize_input_data(self, value):
"""Serialize an input data list to a list of strings.
:param value: Input data list to serialize.
:return: Serialized input data list.
"""
if value:
return [save(item) if isinstance(item, File) else item for item in value]
return None@aldbr do you think it is worth it? Or this will evolve soon and this will not be the case afterward.
cc @aldbr @arrabito @natthan-pigoux
Closes: #66
Related to: #61
Changes:
input_data: list[str | File]toTransformationSubmissionModelandProductionSubmissionModelinputs-fileparameter to Transformation and Production CLIs:dirac-cwl transformation/production submit file.cwl --inputs-file file.yamlparameter-pathtoinput_files: list[str]in Job CLI:dirac-cwl job submit file.cwl --input-files file1.yaml file2.yaml ...group_sizeexecutionHooksHintto Transformation Workflows, such as:group_sizedetermines the number of jobs to be created and how many inputs files they will contain insubmit_transformation_router, by default, it equals 1, which mean a job will be created for each input in the inputs file. Once the list of jobs is created, it is sent to thejob_routerand processed.JobWrapperrelated tests:task.cwlwas created duringpost_processbut never cleared after running tests.TODO after this PR: