Skip to content

This Nextflow pipeline performs pairwise blast to support clonal identification analysis.

Notifications You must be signed in to change notification settings

seqwell/nextflow-pairwise-blast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nextflow-pairwise-blast

Nextflow Workflow Tests Nextflow

This Nextflow pipeline performs pairwise BLAST alignments to support clonal identification analysis. The pipeline aligns assembled FASTA sequences against multiple reference sequences using BLAST, extracts alignment scores, and identifies the best alignment for each sample.

Pipeline Overview

The pipeline creates a Cartesian product of all assembled FASTA files and reference sequences, performing comprehensive BLAST alignment across all combinations. The final output includes alignment scores for each sample-reference combination and a summary of the best alignments.

Pairwise Alignment Workflow

The pipeline processes assembled sequences through the following key steps:

  1. BLAST2: Performs BLAST alignment of each assembled sequence against all reference sequences
  2. BEST_ALIGNMENTS: Identifies the best alignment for each sample based on BLAST scores

Pairwise Blast

Dependencies

This pipeline requires installation of:

  • Nextflow: Workflow management system
  • Docker: Containerization platform for running pipeline processes

Docker Containers

All docker containers used in this pipeline should be publicly available and specified in the respective module files:

  • BLAST2: seqwell/fq_assemble:v1.0
  • BEST_ALIGNMENTS: ubuntu:22.04

How to Run the Pipeline

Required Parameters

The pipeline requires the following parameters:

--assembled_fa

Path to a directory containing assembled FASTA files (*.fasta). Each FASTA file represents an assembled sequence to be aligned against references. This can be either a local absolute path or an AWS S3 URI. If using an S3 URI, ensure your AWS credentials are properly configured in the nextflow.config file.

--ref

Path to a directory containing reference FASTA files (*.fa or *.fasta). Each FASTA file will be used as a BLAST reference database. This can be either a local absolute path or an AWS S3 URI. If using an S3 URI, ensure your AWS credentials are properly configured in the nextflow.config file.

--output

The output directory path where results will be saved. This can be a local absolute path or an AWS S3 URI. If using an S3 URI, please ensure your security credentials are configured in the nextflow.config file.

--run_id

A unique identifier for the sequencing run being analysed.

Profiles

Profiles can be selected with the -profile option at the command line. Common profiles include:

  • docker: Run pipeline using Docker containers (it is the default)
  • test: Run pipeline using Docker containers with parameters set to default

Example Commands

Basic Execution

A minimal execution might look like:

nextflow run \
    main.nf \
    --assembled_fa "${PWD}/path/to/assembled/directory" \
    --ref "${PWD}/path/to/references" \
    --run_id "test" \
    --output "blast2_out" \
    -resume -bg

Running Test Data

The pipeline can be run using test data with:

nextflow run \
    main.nf \
    --assembled_fa "${PWD}/tests/assembled_fa" \
    --ref "${PWD}/tests/ref" \
    --run_id "test" \
    --output "blast2_out" \
    -resume -bg

Expected Outputs

└── blast2_output
    ├── best_alignments
    │   └── test_best_alignments.csv
    └── blast2
        ├── EP_1002_A01.final.pBR322.blast.besthit.txt                               # best blast alignment for each sample
        ├── EP_1002_A01.final.pBR322.blast.txt                                       # blast results
        ├── EP_1002_A01.final.pUC19.blast.besthit.txt
        ├── EP_1002_A01.final.pUC19.blast.txt
        ├── EP_1002_A01.final.seqWell_DelwithpUCIDT-KanGoldenGate+.blast.besthit.txt
        ├── EP_1002_A01.final.seqWell_DelwithpUCIDT-KanGoldenGate+.blast.txt
        ......
        ├── EP_1002_A06.final.pBR322.blast.besthit.txt
        ├── EP_1002_A06.final.pBR322.blast.txt
        ├── EP_1002_A06.final.pUC19.blast.besthit.txt
        ├── EP_1002_A06.final.pUC19.blast.txt
        ├── EP_1002_A06.final.seqWell_DelwithpUCIDT-KanGoldenGate+.blast.besthit.txt
        └── EP_1002_A06.final.seqWell_DelwithpUCIDT-KanGoldenGate+.blast.txt

About

This Nextflow pipeline performs pairwise blast to support clonal identification analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published