This Nextflow pipeline performs pairwise BLAST alignments to support clonal identification analysis. The pipeline aligns assembled FASTA sequences against multiple reference sequences using BLAST, extracts alignment scores, and identifies the best alignment for each sample.
The pipeline creates a Cartesian product of all assembled FASTA files and reference sequences, performing comprehensive BLAST alignment across all combinations. The final output includes alignment scores for each sample-reference combination and a summary of the best alignments.
The pipeline processes assembled sequences through the following key steps:
- BLAST2: Performs BLAST alignment of each assembled sequence against all reference sequences
- BEST_ALIGNMENTS: Identifies the best alignment for each sample based on BLAST scores
This pipeline requires installation of:
- Nextflow: Workflow management system
- Docker: Containerization platform for running pipeline processes
All docker containers used in this pipeline should be publicly available and specified in the respective module files:
- BLAST2:
seqwell/fq_assemble:v1.0 - BEST_ALIGNMENTS:
ubuntu:22.04
The pipeline requires the following parameters:
Path to a directory containing assembled FASTA files (*.fasta). Each FASTA file represents an assembled sequence to be aligned against references. This can be either a local absolute path or an AWS S3 URI. If using an S3 URI, ensure your AWS credentials are properly configured in the nextflow.config file.
Path to a directory containing reference FASTA files (*.fa or *.fasta). Each FASTA file will be used as a BLAST reference database. This can be either a local absolute path or an AWS S3 URI. If using an S3 URI, ensure your AWS credentials are properly configured in the nextflow.config file.
The output directory path where results will be saved. This can be a local absolute path or an AWS S3 URI. If using an S3 URI, please ensure your security credentials are configured in the nextflow.config file.
A unique identifier for the sequencing run being analysed.
Profiles can be selected with the -profile option at the command line. Common profiles include:
- docker: Run pipeline using Docker containers (it is the default)
- test: Run pipeline using Docker containers with parameters set to default
A minimal execution might look like:
nextflow run \
main.nf \
--assembled_fa "${PWD}/path/to/assembled/directory" \
--ref "${PWD}/path/to/references" \
--run_id "test" \
--output "blast2_out" \
-resume -bgThe pipeline can be run using test data with:
nextflow run \
main.nf \
--assembled_fa "${PWD}/tests/assembled_fa" \
--ref "${PWD}/tests/ref" \
--run_id "test" \
--output "blast2_out" \
-resume -bg└── blast2_output
├── best_alignments
│ └── test_best_alignments.csv
└── blast2
├── EP_1002_A01.final.pBR322.blast.besthit.txt # best blast alignment for each sample
├── EP_1002_A01.final.pBR322.blast.txt # blast results
├── EP_1002_A01.final.pUC19.blast.besthit.txt
├── EP_1002_A01.final.pUC19.blast.txt
├── EP_1002_A01.final.seqWell_DelwithpUCIDT-KanGoldenGate+.blast.besthit.txt
├── EP_1002_A01.final.seqWell_DelwithpUCIDT-KanGoldenGate+.blast.txt
......
├── EP_1002_A06.final.pBR322.blast.besthit.txt
├── EP_1002_A06.final.pBR322.blast.txt
├── EP_1002_A06.final.pUC19.blast.besthit.txt
├── EP_1002_A06.final.pUC19.blast.txt
├── EP_1002_A06.final.seqWell_DelwithpUCIDT-KanGoldenGate+.blast.besthit.txt
└── EP_1002_A06.final.seqWell_DelwithpUCIDT-KanGoldenGate+.blast.txt
