Skip to content

feat(ro-crate-1.2): introduce validation profile for RO-Crate 1.2#164

Open
kikkomep wants to merge 160 commits into
developfrom
feat/ro-crate-1.2
Open

feat(ro-crate-1.2): introduce validation profile for RO-Crate 1.2#164
kikkomep wants to merge 160 commits into
developfrom
feat/ro-crate-1.2

Conversation

@kikkomep

@kikkomep kikkomep commented Apr 20, 2026

Copy link
Copy Markdown
Member

This PR introduces an initial implementation of the validation profile compliant with the RO-Crate 1.2 specification, adding support for validating crates that declare this version in their conformsTo property.

The implementation covers the core requirements of the specification, in particular those listed in the quick reference. Any additional rules specific to derived profiles (e.g., Workflow RO-Crate) will be addressed in subsequent PRs, where needed.

How to test

The code is available on the feat/ro-crate-1.2 branch and can be installed as usual via Poetry:

git checkout feat/ro-crate-1.2
poetry install

Development and testing were carried out using the few examples provided by the specification, together with additional ones defined ad hoc and derived from it. These ad hoc examples are located under tests/data/crates/ro-crate-1.2: for each significant requirement (particularly the new ones) in every section of the quick reference, the folder contains a valid crate along with several invalid ones, each designed to exercise a specific violation of that requirement.

CLI options

A few new CLI options have been introduced. In particular, validation now performs additional availability checks on the resources referenced by the RO-Crate. Since these checks require remote connections, they can be disabled via the --skip-availability-check flag:

rocrate-validator validate  --skip-availability-check

Any kind of feedback (bug reports, misinterpretations of the specs, edge cases we may have missed, etc.) is very welcome and much appreciated 🙂

[related to #107]

kikkomep added 30 commits March 25, 2026 11:57
@simleo

simleo commented May 18, 2026

Copy link
Copy Markdown
Member

I tried validating this crate:

crate_0002_eiuuqb-ro-crate-metadata.json

rocrate-validator -y validate -v -w 79 -o err.txt ~/git/ro-crate-py/tools/ro_crates_output/crate_0002_eiuuqb-ro-crate-metadata.json

and got this result:

                                                                               
 ╭────────────────────────── - Validation Report - ──────────────────────────╮ 
 │                                                                           │ 
 │                                                                           │ 
 │  RO-Crate:                                                                │ 
 │  /home/simleo/git/ro-crate-py/tools/ro_crates_output/crate_0002_eiuuqb-r  │ 
 │  o-crate-metadata.json                                                    │ 
 │  Target Profile: ro-crate-1.2                                             │ 
 │  ╭────────────────── Requirements Checks Validation ───────────────────╮  │ 
 │  │                                                                     │  │ 
 │  │ ╭─ Severity: REQUIRE─╮╭─ Severity: RECOMME─╮╭─ Severity: OPTIONAL─╮ │  │ 
 │  │ │                    ││                    ││                     │ │  │ 
 │  │ │         65         ││         0          ││          0          │ │  │ 
 │  │ │                    ││                    ││                     │ │  │ 
 │  │ ╰────────────────────╯╰────────────────────╯╰─────────────────────╯ │  │ 
 │  │ ╭──────── PASSED Checks ────────╮╭──────── FAILED Checks ─────────╮ │  │ 
 │  │ │                               ││                                │ │  │ 
 │  │ │              64               ││               1                │ │  │ 
 │  │ │                               ││                                │ │  │ 
 │  │ ╰───────────────────────────────╯╰────────────────────────────────╯ │  │ 
 │  │                                                                     │  │ 
 │  ╰─────────────────────────────────────────────────────────────────────╯  │ 
 │  ╭───────────────────────── Overall Progress ──────────────────────────╮  │ 
 │  │ Profiles            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/1   0:00:00 │  │ 
 │  │ Requirements        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 33/33 0:00:00 │  │ 
 │  │ Requirements Checks ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65/65 0:00:00 │  │ 
 │  ╰─────────────────────────────────────────────────────────────────────╯  │ 
 │                                                                           │ 
 ╰───────────────────────────────────────────────────────────────────────────╯ 
                                                                               
  ────────────── ❌ RO-Crate is not a valid ro-crate-1.2 !!!  ───────────────  
                                                                               
                                                                               
                                                                               
                                                                               
                                                                               
  The following requirements have not been met:                                
                                                                               
                                 [profile: RO-Crate Metadata Specification 1.2]
     [ ro-crate-1.2_2 ]: File Descriptor JSON-LD format                        
                                                                               
      The file descriptor MUST be a valid JSON-LD file                         
                                                                               
          Failed checks                                                        
                                                                               
                                                                               
        [ ro-crate-1.2_2.3 ] Validation of entity references:                  
                             The file descriptor MUST be a valid JSON-LD file  
         Detected issues                                                       
         - [Violation]: Entity 'https://doi.org/10.9027/zfqgkgek'              
         references 'https://doi.org/10.9027/zfqgkgek' as a string;            
         use {"@id": "https://doi.org/10.9027/zfqgkgek"}                       
                                                                               
                                                                               
                                                                               
                                                                               
                                 [profile: RO-Crate Metadata Specification 1.2]
     [ ro-crate-1.2_22 ]: Web-based Data Entity: REQUIRED availability         
                                                                               
      Web-based Data Entities MUST be directly downloadable at the time        
      of creation (RO-Crate 1.2). Downloadability is verified via              
      Signposting (rel=item, rel=describedby), direct Content-Type             
      inspection, and content negotiation.                                     
                                                                               
          Failed checks                                                        
                                                                               
                                                                               
        [ ro-crate-1.2_22.2 ] Web-based Data Entity: RECOMMENDED               
        resource availability:                                                 
                              Web-based Data Entities MUST be directly         
                              downloadable at the time of creation (RO-Crate   
                              1.2). Downloadability is verified via Signposting
                              (rel=item, rel=describedby), direct Content-Type 
                              inspection, and content negotiation.             
         Detected issues                                                       
                                                                               
                                                                               

Problems encountered:

  • The failed checks counter is 1 but there are two issues
  • First issue: both url and identifier can have a plain URL as value, they're not supposed to refer to an entity. If the value was {"@id": "https://doi.org/10.9027/zfqgkgek"} the entity would be referring to itself, which does not make sense.
  • Second issue: I could not find a Web-based Data Entities MUST be directly downloadable at the time of creation requirement in the spec. here it says File Data Entities with an @id URI outside the RO-Crate Root SHOULD at the time of RO-Crate creation be directly downloadable... but 1) it's a SHOULD and 2) the time at which the check is performed is not the time of creation. I think remote entity existence checks, if any, should be optional by default. Another problem here is that the ids of the entities in question are not reported.

@kikkomep

Copy link
Copy Markdown
Member Author

Problems encountered:

  • The failed checks counter is 1 but there are two issues
  • First issue: both url and identifier can have a plain URL as value, they're not supposed to refer to an entity. If the value was {"@id": "https://doi.org/10.9027/zfqgkgek"} the entity would be referring to itself, which does not make sense.
  • Second issue: I could not find a Web-based Data Entities MUST be directly downloadable at the time of creation requirement in the spec. here it says File Data Entities with an @id URI outside the RO-Crate Root SHOULD at the time of RO-Crate creation be directly downloadable... but 1) it's a SHOULD and 2) the time at which the check is performed is not the time of creation. I think remote entity existence checks, if any, should be optional by default. Another problem here is that the ids of the entities in question are not reported.

The first and last points were actually already fixed in develop; they were caused by internal issues in the validation/reporting logic rather than by the rule itself.

Regarding the first issue (url/identifier accepting plain URLs), I fixed it with commit d3c9b6f, so plain URL values are no longer treated as references to entities.

@simleo

simleo commented May 20, 2026

Copy link
Copy Markdown
Member

I'm getting a weird behavior with the following crate:

crate_0004_bmfhbn.zip

The validation passes with no errors:

rocrate-validator -y validate -v -w 79 -o err.txt crate_0004_bmfhbn.zip

But I get the following on the console:

 [2026-05-20 12:04:51,393] ERROR in signposting: Error checking downloadability of 'https://zenodo.org/files/gldgezao.zip': 404 Client Error:    
 NOT FOUND for url: https://zenodo.org/files/gldgezao.zip                                                                                                       
 Traceback (most recent call last):                                                                                                                                
   File "/home/simleo/git/rocrate-validator/rocrate_validator/utils/signposting.py", line 94, in check_downloadable                                                
     response.raise_for_status()                                                                                                                                   
   File "/home/simleo/git/rocrate-validator/.venv/lib/python3.12/site-packages/requests/models.py", line 1028, in raise_for_status                                 
     raise HTTPError(http_error_msg, response=self)                                                                                                                
 requests.exceptions.HTTPError: 404 Client Error: NOT FOUND for url: https://zenodo.org/files/gldgezao.zip                                                         
 [2026-05-20 12:04:51,431] ERROR in signposting: Error checking downloadability of 'https://github.com/files/xgnccfwq.zip': 404 Client Error:    
 Not Found for url: https://github.com/files/xgnccfwq.zip                                                                                                       
 Traceback (most recent call last):                                                                                                                                
   File "/home/simleo/git/rocrate-validator/rocrate_validator/utils/signposting.py", line 94, in check_downloadable                                                
     response.raise_for_status()                                                                                                                                   
   File "/home/simleo/git/rocrate-validator/.venv/lib/python3.12/site-packages/requests/models.py", line 1028, in raise_for_status                                 
     raise HTTPError(http_error_msg, response=self)                                                                                                                
 requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://github.com/files/xgnccfwq.zip                                                         

These messages disappear if I add --skip-availability-check to the command.

@kikkomep

kikkomep commented May 20, 2026

Copy link
Copy Markdown
Member Author

issue with crate_0004_bmfhbn.zip fixed by 9468652 and b37bc0e

@kikkomep kikkomep force-pushed the feat/ro-crate-1.2 branch from b37bc0e to 28d71af Compare June 16, 2026 07:51
kikkomep added 3 commits June 16, 2026 16:10
…e 1.2 checkers

- Reorder imports consistently (stdlib → third-party → local)
- Extract helper methods (_resolve_cite_as_url, _not_downloadable_message,
  _needs_sddatepublished_check, _check_entity_local_path, _check_distribution)
- Replace try/except/pass with contextlib.suppress
- Flatten nested conditionals with combined guards
- Use e!s over str(e) in f-strings
- Normalize quote style and multi-line formatting
@kikkomep kikkomep force-pushed the feat/ro-crate-1.2 branch from 28d71af to ba83e51 Compare June 18, 2026 09:08
@kikkomep kikkomep marked this pull request as ready for review June 18, 2026 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants