Skip to content

#2634 added feature for CTKP and DAKP edges#2644

Merged
edeutsch merged 8 commits intomasterfrom
issue2634
Feb 26, 2026
Merged

#2634 added feature for CTKP and DAKP edges#2644
edeutsch merged 8 commits intomasterfrom
issue2634

Conversation

@chunyuma
Copy link
Collaborator

@chunyuma chunyuma commented Feb 6, 2026

Hi @dkoslicki,

I have added the conditional features to CTKP and DAKP edges based on the requirements from #2634.

For TMKP edges, since I don't know what is TMKP, I don't know how to process this one. If "TM"KP represents infores:text-mining-provider-cooccurrence, this kp doesn't have "treats" predicate but only have the occurs_together_in_literature_with predicate.

@chunyuma chunyuma requested a review from dkoslicki February 6, 2026 19:12
@saramsey
Copy link
Member

saramsey commented Feb 6, 2026

TMKP = text mining knowledge provider, I think

@chunyuma
Copy link
Collaborator Author

chunyuma commented Feb 7, 2026

Thanks @saramsey. If TMKP = text mining knowledge provider, it seems like the only "text mining" kp is infores:text-mining-provider-cooccurrence based on arax_infores_list.json. I am not sure if this list includes all kps. But as I mentioned above, the infores:text-mining-provider-cooccurrence doesn't have "treats" predicate but only have the occurs_together_in_literature_with predicate.

@chunyuma chunyuma requested a review from saramsey February 12, 2026 19:40
@chunyuma
Copy link
Collaborator Author

chunyuma commented Feb 12, 2026

Hi @dkoslicki and @saramsey,

I think I have resolved the (issue2634)[https://github.com//issues/2634].

These "elevate to treats" rules (please see below) now work for both the old ARAX system (TRAPI 1.5.0) and the new Shepherd system.

For CTKP edges,

  • the edge key needs to contains biolink:in_clinical_trials_for and infores:multiomics-clinicaltrials.
  • they need to contain elevate_to_prediction attribute (without this attribute, we don't consider the edges are from CTKP and thus will be still elevated to treats)
  • when they contain elevate_to_prediction attribute, if and only if elevate_to_prediction = True, we implement elevation to treats prediction.

For DAKP/FAERS edges,

  • the edge key needs to contains biolink:applied_to_treat and infores:multiomics-drugapprovals or infores:faers.
  • they need to contain biolink:number_of_cases attribute (without this attribute, we don't consider the edges are from DAKP/FAERS and thus will be still elevated to treats)
  • when they contain biolink:number_of_cases attribute, if and only if biolink:number_of_cases > 24, we implement elevation to treats prediction.

For TMKP edges,

  • the edge key needs to contains biolink:treats_or_applied_or_studied_to_treat and infores:text-mining-provider-cooccurrence.
  • they need to contain agent_type == text_mining_agent and biolink:evidence_count attributes (without these two attributes, we don't consider the edges are from TMKP and thus will be still elevated to treats)
  • when they contain biolink:evidence_count attribute, if and only if biolink:evidence_count > 5, we implement elevation to treats prediction.

Other treats subclass edges from other KPs are still elevated to treats.

@chunyuma
Copy link
Collaborator Author

Here is an example:

Edge Key ID:
infores:retriever:CHEBI:8382--biolink:applied_to_treat--None--None--None--MONDO:0015564--infores:faers
Edge Content:

{'attributes': [{'attribute_source': None,
                 'attribute_type_id': 'biolink:number_of_cases',
                 'attributes': None,
                 'description': None,
                 'original_attribute_name': None,
                 'value': 29,
                 'value_type_id': None,
                 'value_url': None},
                {'attribute_source': None,
                 'attribute_type_id': 'biolink:agent_type',
                 'attributes': None,
                 'description': None,
                 'original_attribute_name': None,
                 'value': 'manual_validation_of_automated_agent',
                 'value_type_id': None,
                 'value_url': None},
                {'attribute_source': None,
                 'attribute_type_id': 'biolink:knowledge_level',
                 'attributes': None,
                 'description': None,
                 'original_attribute_name': None,
                 'value': 'observation',
                 'value_type_id': None,
                 'value_url': None},
                {'attribute_source': None,
                 'attribute_type_id': 'biolink:clinical_approval_status',
                 'attributes': None,
                 'description': None,
                 'original_attribute_name': None,
                 'value': 'off_label_use',
                 'value_type_id': None,
                 'value_url': None},
                {'attribute_source': None,
                 'attribute_type_id': 'biolink:original_subject',
                 'attributes': None,
                 'description': None,
                 'original_attribute_name': None,
                 'value': 'CHEBI:8382',
                 'value_type_id': None,
                 'value_url': None},
                {'attribute_source': None,
                 'attribute_type_id': 'biolink:original_object',
                 'attributes': None,
                 'description': None,
                 'original_attribute_name': None,
                 'value': 'MONDO:0015564',
                 'value_type_id': None,
                 'value_url': None},
                {'attribute_source': None,
                 'attribute_type_id': 'biolink:category',
                 'attributes': None,
                 'description': None,
                 'original_attribute_name': None,
                 'value': ['biolink:EntityToDiseaseAssociation'],
                 'value_type_id': None,
                 'value_url': None}],
 'object': 'MONDO:0015564',
 'predicate': 'biolink:applied_to_treat',
 'qualifiers': None,
 'sources': [{'resource_id': 'infores:multiomics-drugapprovals',
              'resource_role': 'aggregator_knowledge_source',
              'source_record_urls': ['https://db.systemsbiology.net/gestalt/cgi-pub/KGinfo.pl?id=71da8365-e309-3c4c-b2be-0c372b64b94f'],
              'upstream_resource_ids': ['infores:dailymed', 'infores:faers']},
             {'resource_id': 'infores:faers',
              'resource_role': 'primary_knowledge_source',
              'source_record_urls': None,
              'upstream_resource_ids': None},
             {'resource_id': 'infores:dailymed',
              'resource_role': 'supporting_data_source',
              'source_record_urls': None,
              'upstream_resource_ids': None},
             {'resource_id': 'infores:dogpark-tier0',
              'resource_role': 'aggregator_knowledge_source',
              'source_record_urls': None,
              'upstream_resource_ids': ['infores:multiomics-drugapprovals']},
             {'resource_id': 'infores:retriever',
              'resource_role': 'aggregator_knowledge_source',
              'source_record_urls': None,
              'upstream_resource_ids': ['infores:dogpark-tier0']},
             {'resource_id': 'infores:arax',
              'resource_role': 'aggregator_knowledge_source',
              'source_record_urls': None,
              'upstream_resource_ids': ['infores:retriever']}],
 'subject': 'CHEBI:8382'}

It was elevated to treats prediction:

Edge Key ID:
creative_expand_treats_edge:CHEBI:8382--treats--MONDO:0015564--infores:arax
Edge Content:

{'attributes': [{'attribute_source': 'infores:arax',
                 'attribute_type_id': 'biolink:agent_type',
                 'attributes': None,
                 'description': None,
                 'original_attribute_name': None,
                 'value': 'computational_model',
                 'value_type_id': None,
                 'value_url': None},
                {'attribute_source': 'infores:arax',
                 'attribute_type_id': 'biolink:knowledge_level',
                 'attributes': None,
                 'description': None,
                 'original_attribute_name': None,
                 'value': 'prediction',
                 'value_type_id': None,
                 'value_url': None},
                {'attribute_source': 'infores:arax',
                 'attribute_type_id': 'biolink:support_graphs',
                 'attributes': None,
                 'description': None,
                 'original_attribute_name': None,
                 'value': ['aux_graph_infores:retriever:CHEBI:8382--biolink:applied_to_treat--None--None--None--MONDO:0015564--infores:faers_creative_expand_treats_group_t_edge'],
                 'value_type_id': None,
                 'value_url': None}],
 'object': 'MONDO:0015564',
 'predicate': 'biolink:treats',
 'qualifiers': None,
 'sources': [{'resource_id': 'infores:arax',
              'resource_role': 'primary_knowledge_source',
              'source_record_urls': None,
              'upstream_resource_ids': None}],
 'subject': 'CHEBI:8382'}

Copy link
Member

@dkoslicki dkoslicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I pinged @mbrush to take a look too

@dkoslicki
Copy link
Member

Note: Matt relays that:

The infores for TMKP should be infores:text-mining-provider-targeted (not infores:text-mining-provider-cooccurrence) . . . This was an error in the initial ingest. Should be corrected in next release.

@chunyuma
Copy link
Collaborator Author

chunyuma commented Feb 24, 2026

Thanks for the update @dkoslicki.

I didn't find infores:text-mining-provider-targeted in the kp list. Probably this list is incomplete for the new system. It is ok for me to update the kp if Matt confirms the infores:text-mining-provider-targeted will be used.

@dkoslicki
Copy link
Member

@mbrush can you confirm the above? We like to hold these kinds of discussions in the GitHub issues themselves instead of slack DMs, so the record is searchable in the future

@saramsey
Copy link
Member

I took a look at the EPC edge attributes in the example provided, and they look reasonable.

@dkoslicki
Copy link
Member

@chunyuma , three updates to this:

  1. Our previous (and current on this PR) behavior was to automatically elevate to treats any non-treats predicates in the mixin treats_or_applied_or_studied_to_treat (i.e. the treats_like_predicates in this line). We want to switch that behavior: do not elevate to treats any treats_like_predicates unless they conform to one of the rules you are implementing. I think this is equivalent to removing things like these lines and possibly elsewhere too.
  2. With this new logic, we will want to add a rule to allow any and all treats_like_predicates coming from CTD to be elevated to treats.
  3. Add a rule to elevate SemMedDB treats_like_predicates to treats if and only if the number of associated publications/PMIDs is >=10.

Then we should be good to go!

@edeutsch
Copy link
Collaborator

Perhaps relevant Slack DM:
image

@dkoslicki
Copy link
Member

Thanks Eric, and yes, @edeutsch we will ping you after @chunyuma implements the three items above

@chunyuma
Copy link
Collaborator Author

Thanks for the updates @dkoslicki.

As for the new rule to elevate SemMedDB treats_like_predicates, Amy's previous code removes all SemMedDB treats_like_predicates. That means that no treats_like_predicates from SemMedDB will be included in the final result KG.

However, according to the new rule, @mbrush or @dkoslicki, could you help clarify which the following one you want:

  1. We only keep those SemMedDB treats_like_predicates edges with the number of associated publications/PMIDs is >=10 and elevate them to treats, but still removed other SemMedDB treats_like_predicates edges?
  2. Do you want to keep all SemMedDB treats_like_predicates edges but just elevate those that satisfy the rule to treats?

@dkoslicki
Copy link
Member

For these creative mode queries, the only results we wish to show are those that have been elevated to treats, with the support graphs containing the underlying true treats_like_predicates predicates that support them. So for the SemMedDB edges, we want to keep those SemMedDB treats_like_predicates whose number of associated publication is >=10, elevate those to a treats and include the underlying predicate as a support/aux graph. None of the other SemMedDB predicates (who do not meet the >=10 pubs) should be included in any result. LMK if this makes sense or not.

@chunyuma
Copy link
Collaborator Author

Thanks @dkoslicki.

I have already completed the three new rules above.

@dkoslicki
Copy link
Member

Thanks @chunyuma ! A question about how different KPs are being identified: since you are pulling this information from the edge keys, are we confident that in the new Shepherd framework, these edge keys will remain stable/formatted in this fashion. I.e. do we construct the edge keys at run time from the various edge properties, or are they created elsewhere? I just want to be sure that the logic is tied to something that might change unbeknownst to us

@chunyuma
Copy link
Collaborator Author

are we confident that in the new Shepherd framework, these edge keys will remain stable/formatted in this fashion.

Hi @dkoslicki, my code is based on the edge keys from "Dev Info" page of https://arax.ncats.io/shepherd/?r=434221. It seems like the new Shepherd framework uses a similar edge id format. Is that the correct place I should refer to?

@dkoslicki
Copy link
Member

@chunyuma , we will need to base this logic off the edge attributes, not the edge keys. When looking for a particular infores, if it appears nowhere in any of the EPC chain, then it's not present. If it shows up in at least one of the attributes in the EPC chain, then count it as from this source.

@dkoslicki
Copy link
Member

For posterity, this request comes from a bunch of different committees. From Matt:

It was discussed on several calls - including DINGO, Standards, and Data Modeling (last phase)

@chunyuma
Copy link
Collaborator Author

we will need to base this logic off the edge attributes, not the edge keys. When looking for a particular infores, if it appears nowhere in any of the EPC chain, then it's not present. If it shows up in at least one of the attributes in the EPC chain, then count it as from this source.

Updated the code for this already @dkoslicki.

Copy link
Member

@dkoslicki dkoslicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, this looks good to go for me!

@dkoslicki
Copy link
Member

This is good to merge. @edeutsch , can you deploy this to the ARAX Shepherd endpoint?

@edeutsch
Copy link
Collaborator

deployed!

@edeutsch edeutsch merged commit 49e4c74 into master Feb 26, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants