-
Notifications
You must be signed in to change notification settings - Fork 6
API Xena Browser
The Xena Browser module provides a Python API for programmatically downloading and managing TCGA cohort data from the UCSC Xena Browser.
If you use UCSC Xena Browser data in your research, please cite:
Goldman, M.J., Craft, B., Hastie, M. et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0546-8
This module provides a clean, YAML-based configuration system for downloading TCGA cohort data from the UCSC Xena Browser.
src/oncolearn/api/xenabrowser/
├── builder.py # Builder pattern for creating cohorts from YAML
├── xena_dataset.py # Dataset class for Xena data
└── download.py # Download utilities
data/xenabrowser/configs/ # YAML configuration files
├── acc.yaml
├── blca.yaml
├── brca.yaml
└── ... (all TCGA cohorts)
Each cohort is defined in a YAML file with the following structure:
cohort:
code: BRCA
name: TCGA-BRCA
description: TCGA Breast Cancer cohort with multi-modal genomics data
datasets:
- name: BRCA Gene Expression (HiSeq)
description: Illumina HiSeq gene expression (RNAseq) data
category: mrna_seq
url: https://tcga.xenahubs.net/download/TCGA.BRCA.sampleMap/HiSeqV2.gz
filename: HiSeqV2.gz
default_subdir: TCGA-BRCA/gene_expression
# ... more datasetsfrom oncolearn.api.xenabrowser import XenaCohortBuilder
# Create a builder
builder = XenaCohortBuilder()
# Build and download a cohort
brca_cohort = builder.build_cohort("BRCA")
brca_cohort.download() # Downloads all BRCA datasets
# Download to a specific directory
brca_cohort.download(output_dir="my_data/brca")from oncolearn.api.xenabrowser import XenaCohortBuilder
builder = XenaCohortBuilder()
cohorts = builder.list_available_cohorts()
print(cohorts) # ['ACC', 'BLCA', 'BRCA', ...]from oncolearn.api.xenabrowser import XenaCohortBuilder
builder = XenaCohortBuilder()
brca_cohort = builder.build_cohort("BRCA")
# List all datasets
dataset_names = brca_cohort.list_datasets()
print(dataset_names)
# Download a specific dataset
gene_expr = brca_cohort.get_dataset("BRCA Gene Expression (HiSeq)")
gene_expr.download("my_data/brca/gene_expression")from oncolearn.api.xenabrowser import XenaCohortBuilder
from oncolearn.api.dataset import DataCategory
builder = XenaCohortBuilder()
brca_cohort = builder.build_cohort("BRCA")
# Get all clinical datasets
clinical_datasets = brca_cohort.get_datasets_by_category(DataCategory.CLINICAL)
# Get all mutation datasets
mutation_datasets = brca_cohort.get_datasets_by_category(DataCategory.MUTATION)Available data categories and their subcategories/aliases:
-
mrna_seq: mRNA sequencing data- Aliases:
mrna,gene expression rnaseq,gene_expression_rnaseq
- Aliases:
-
dna_seq: DNA sequencing data- Aliases:
dna
- Aliases:
-
mirna_seq: microRNA sequencing data- Aliases:
mirna,stem loop expression,stem_loop_expression
- Aliases:
-
cnv: Copy number variation- Aliases:
copy number,copy_number,copy number (gene-level),copy_number_gene_level
- Aliases:
-
mutation: Somatic mutations- Aliases:
somatic mutation,somatic_mutation,somatic mutation (snps and small indels)
- Aliases:
-
methylation: DNA methylation- Aliases:
dna methylation,dna_methylation
- Aliases:
-
protein: Protein expression- Aliases:
protein expression,protein_expression
- Aliases:
-
clinical: Clinical/phenotype data- Aliases:
phenotype
- Aliases:
-
snp: SNP data -
transcriptome: Transcriptome data -
genomics: General genomics (includes ATAC-seq)- Subcategories:
atac-seq
- Subcategories:
-
metabolomics: Metabolomics data -
proteomics: Proteomics data -
image: Imaging data -
manifest: Manifest files -
multimodal: Combined data types
To add a new dataset to an existing cohort:
- Open the cohort's YAML file (e.g.,
configs/brca.yaml) - Add a new entry to the
datasetslist:
- name: BRCA New Dataset
description: Description of the new dataset
category: appropriate_category
url: https://download.url/dataset.gz
filename: dataset.gz
default_subdir: TCGA-BRCA/subdirectory- Save the file - no Python code changes needed!
To add a completely new cohort:
- Create a new YAML file in
configs/(e.g.,newcohort.yaml) - Follow the YAML structure shown above
- The cohort will automatically be available via the builder
OncoLearn | A comprehensive toolkit for cancer genomics analysis and biomarker discovery.
Built with ❤️ for cancer research