Roary — rapid large-scale prokaryotic pan genome analysis. Calculates the pan genome from annotated assemblies (GFF3 from Prokka/Bakta), producing core and accessory gene clusters, gene presence/absen

10

Sage

Use when working with sage — sage — ultrafast Rust proteomics search

9

SAIGE

SAIGE — scalable genome-wide association tests in biobank-scale data using generalized mixed models with saddlepoint approximation. Fits null logistic/linear mixed models with sparse or full GRM to ac

11

salmon

salmon — Fast, bias-aware transcript quantification from RNA-seq data using selective alignment to the transcriptome. Supports bulk RNA-seq (mapping-based and alignment-based modes), single-cell RNA-s

11

SALSA2

SALSA2 — scaffold long-read genome assemblies using Hi-C proximity ligation data. Takes a draft contig assembly and Hi-C read alignments (BAM/BED) to produce chromosome-scale scaffolds. Iteratively co

10

Sambamba

Sambamba — high-performance BAM/CRAM processing tool written in D with native multi-threading. Provides fast sorting, indexing, duplicate marking, merging, filtering, depth calculation, flagstat, and

11

scib -- Single-Cell Integration Benchmarking

scib (single-cell integration benchmarking) -- Python framework for evaluating and benchmarking batch correction and data integration methods in single-cell omics. Computes standardized metrics for bi

9

SciPy

Use when working with SciPy — the foundational Python scientific computing library — for statistical testing, signal processing, optimization, linear algebra, spatial analysis, and numerical integrati

10

SEACR (Sparse Enrichment Analysis for CUT&RUN)

SEACR (Sparse Enrichment Analysis for CUT&RUN) — peak caller specifically designed for CUT&RUN and CUT&Tag chromatin profiling data. Uses the sparse signal characteristics of CUT&RUN to call enriched

11

seqtk

seqtk — fast lightweight C toolkit for processing FASTA and FASTQ files. Supports format conversion (FASTQ↔FASTA), random subsampling, quality trimming, reverse complement, base composition, sequence

11

Shasta

Shasta — fast de novo long-read genome assembler optimized for Oxford Nanopore (ONT) reads. Produces haploid or phased diploid assemblies from nanopore data using run-length encoding, MinHash-based ov

10

sleuth

sleuth — R package for differential expression analysis of RNA-seq data at the transcript level. Works with kallisto bootstrap quantifications to model technical variability using a response error mod

11

smartpca/EIGENSOFT

smartpca/EIGENSOFT -- C/C++ tool for principal component analysis of genome-wide SNP genotype data. Computes eigenvectors and eigenvalues for population structure analysis, ancestry inference, stratif

8

SNAP

Use when working with SNAP (Scalable Nucleotide Alignment Program) — a fast DNA sequence aligner developed at UC Berkeley's AMPLab. Use for aligning short or long DNA reads to a reference genome, buil

10

SnapATAC2

SnapATAC2 — Python/Rust toolkit for single-cell ATAC-seq analysis. Provides fragment file import, cell-by-bin/peak matrix generation, spectral embedding dimensionality reduction, leiden clustering, MA

11

Sniffles2

Sniffles2 — fast structural variant caller for long-read sequencing data (PacBio HiFi, Oxford Nanopore). Detects deletions, insertions, duplications, inversions, and translocations from BAM/CRAM align

11

SnpEff

SnpEff — fast Java-based variant annotation and effect prediction tool that annotates genomic variants (SNPs, indels, MNPs) with gene impact, protein changes, loss-of-function predictions, and HGVS no

10

SOAPdenovo2

SOAPdenovo2 — de novo short-read genome assembler for large plant and animal genomes using de Bruijn graph construction. Runs a four-stage pipeline: pregraph (k-mer graph), contig (initial contigs), m

10

SOAPnuke

SOAPnuke — C++ quality control and preprocessing tool for high-throughput sequencing data. Filters and trims paired-end or single-end FASTQ reads by adapter content, low quality bases, N-base ratio, r

11

SortMeRNA

SortMeRNA — fast filtering of ribosomal RNA reads from metatranscriptomic and RNA-seq data using local sequence alignment against curated rRNA databases (SILVA, RFAM). CLI tool for rRNA removal, rRNA

10

Souporcell

Verified

Souporcell — genotype-based demultiplexing of pooled single-cell RNA-seq experiments. Assigns cells to donor of origin using SNP variants from the aligned BAM file without requiring known genotypes. D

10

squigualiser

squigualiser is a Python tool for visualizing raw nanopore sequencing signal (squiggle) data aligned to reference sequences. Generates interactive HTML-based plots using Bokeh that overlay raw current

11

Stacks

Stacks is a software pipeline for building loci from short-read sequencing data (RAD-seq, GBS, ddRAD, 2b-RAD) for population genomics and phylogeography. Supports de novo and reference-guided assembly

9

STAR-Fusion

STAR-Fusion — detects candidate fusion transcripts from RNA-seq data using STAR alignments and the FusionInspector validation framework. Integrates with CTAT genome resource libraries for comprehensiv

11

STARR-seq Tools (STARRpeaker)

Use when working with STARR-seq (Self-Transcribing Active Regulatory Region sequencing) data analysis. STARRpeaker calls enhancer peaks from STARR-seq BAM files using negative binomial regression to a

10

Strelka2

Strelka2 — fast and accurate small variant caller for germline and somatic analysis. Detects SNVs and indels (up to ~49 bp) from mapped paired-end sequencing reads with tiered haplotype modeling, adap

10

SUPPA2

SUPPA2 — fast, accurate analysis of alternative splicing from RNA-seq data. Calculates PSI (Percent Spliced In) values per event and per transcript from transcript quantification (Salmon, kallisto, RS

11

SURVIVOR

SURVIVOR — C++ toolkit for structural variation (SV) analysis including merging multi-caller VCF files into consensus callsets, SV simulation on reference genomes, benchmarking SV callers against trut

11

SvABA

SvABA -- structural variant and indel caller using genome-wide local assembly. Detects deletions, insertions, duplications, inversions, and complex rearrangements from short-read (Illumina) whole-geno

11

SVIM

SVIM — structural variant identification from long-read sequencing data (PacBio, Oxford Nanopore). Detects deletions, insertions, tandem and interspersed duplications, inversions, and translocations f

10

Sylph

Sylph — ultrafast metagenomic profiling and containment ANI estimation using k-mer sketching. Performs species-level taxonomic profiling with abundance quantification and genome querying against pre-b

10

Tandem Repeats Finder (TRF)

Tandem Repeats Finder (TRF) — command-line tool for locating and displaying tandem repeats in DNA sequences. Detects microsatellites (STRs), minisatellites, and larger tandem duplications using a prob

10

TRUST4

Use when reconstructing T-cell receptor (TCR) or B-cell receptor (BCR) repertoires from bulk RNA-seq or single-cell RNA-seq data using TRUST4. Covers IMGT reference preparation, BAM/FASTQ input, CDR3

9

UNITE

UNITE is the reference database and taxonomy system for fungal ITS (Internal Transcribed Spacer) metabarcoding and amplicon sequencing. Use for classifying fungal sequences against Species Hypotheses

9

USEARCH

USEARCH — ultra-fast amplicon sequence analysis toolkit for 16S/ITS/18S microbiome studies. Supports FASTQ quality filtering (fastq_filter), paired-end merging (fastq_mergepairs), dereplication (derep

10

VarScan2

VarScan2 -- Java-based variant caller for somatic and germline SNV/indel detection, copy number analysis, and LOH detection from samtools mpileup output. Supports tumor-normal paired somatic calling,

9

Velocyto

Velocyto — RNA velocity estimation tool that distinguishes unspliced and spliced mRNAs in single-cell RNA-seq data to predict future cell states. Provides a CLI for counting spliced/unspliced/ambiguou

11

VEP

VEP (Ensembl Variant Effect Predictor) — gold-standard tool for annotating and predicting the functional effects of genomic variants on genes, transcripts, and protein sequences. Provides consequence

11

Verkko

Verkko — hybrid genome assembler for telomere-to-telomere (T2T) diploid assembly from PacBio HiFi and Oxford Nanopore ultra-long reads. Combines MBG de Bruijn graphs with progressive ONT resolution, t

10

VSEARCH

VSEARCH — open-source, multithreaded alternative to USEARCH for amplicon and metagenomics sequence analysis. Performs dereplication, chimera detection (de novo and reference-based), OTU/ASV clustering

10

vt

vt — C++ command-line variant tool set for manipulating VCF files. Provides variant normalization (left-alignment and trimming), multiallelic decomposition, VCF summary statistics (peek), annotation,

10

Wengan

Wengan — hybrid genome assembler combining short and long reads using a synthetic scaffolding approach. Integrates short-read assembly backends (Minia3, ABySS2, DiscovarDenovo) with long-read pseudo-a

10

WiggleTools

WiggleTools — command-line toolkit for streaming arithmetic and set operations on genomic signal tracks stored in Wiggle, BigWig, BedGraph, and BAM/CRAM formats. Computes sums, means, products, log tr

10

wtdbg2

wtdbg2 — ultrafast de novo long-read genome assembler using a fuzzy de Bruijn graph approach. Assembles PacBio (RSII, Sequel, CCS) and Oxford Nanopore reads without prior error correction. Two-step wo

10

YaHS

Use when scaffolding genome assemblies with Hi-C chromatin contact data using YaHS (Yet Another Hi-C Scaffolding Tool). Covers Hi-C BAM preparation, contig scaffolding, AGP output, juicer_tools contac

9

Cf Python

cf-python is a Python library implementing the CF (Climate and Forecast) metadata conventions for reading, writing, and analysing Earth-science datasets stored in netCDF, Zarr, PP, and UM formats. Use

8

EDAM Ontology Explorer

Navigate the EDAM ontology hierarchy, find compatible tools and data types, and check format compatibility

9

GTDB-Tk

Verified

GTDB-Tk — toolkit for objective taxonomic classification of bacterial and archaeal genomes using the Genome Taxonomy Database (GTDB). Assigns taxonomy based on placement in reference trees inferred fr

10

VarDict

Verified

VarDict variant caller for SNVs, MNVs, indels, complex variants, and structural variants from BAM files. Supports somatic paired tumor-normal calling and single-sample germline mode. Ultra-sensitive v

10

AnnData

AnnData — annotated data matrices for single-cell and multi-omics analysis. Core data structure for the scverse ecosystem storing expression matrices (X) with observation metadata (obs), variable meta

8

Tool	Registry	Domain	Docs
Roary Roary — rapid large-scale prokaryotic pan genome analysis. Calculates the pan genome from annotated assemblies (GFF3 from Prokka/Bakta), producing core and accessory gene clusters, gene presence/absen	sanger-pathogens/Roary	Phylogenetics	10
Sage Use when working with sage — sage — ultrafast Rust proteomics search	lazear/sage	Proteomics	9
SAIGE SAIGE — scalable genome-wide association tests in biobank-scale data using generalized mixed models with saddlepoint approximation. Fits null logistic/linear mixed models with sparse or full GRM to ac	saigegit/SAIGE	Population Genetics	11
salmon salmon — Fast, bias-aware transcript quantification from RNA-seq data using selective alignment to the transcriptome. Supports bulk RNA-seq (mapping-based and alignment-based modes), single-cell RNA-s	COMBINE-lab/salmon	Transcriptomics	11
SALSA2 SALSA2 — scaffold long-read genome assemblies using Hi-C proximity ligation data. Takes a draft contig assembly and Hi-C read alignments (BAM/BED) to produce chromosome-scale scaffolds. Iteratively co	marbl/SALSA	Genomics	10
Sambamba Sambamba — high-performance BAM/CRAM processing tool written in D with native multi-threading. Provides fast sorting, indexing, duplicate marking, merging, filtering, depth calculation, flagstat, and	biod/sambamba	Genomics	11
scib -- Single-Cell Integration Benchmarking scib (single-cell integration benchmarking) -- Python framework for evaluating and benchmarking batch correction and data integration methods in single-cell omics. Computes standardized metrics for bi	theislab/scib	Single-Cell	9
SciPy Use when working with SciPy — the foundational Python scientific computing library — for statistical testing, signal processing, optimization, linear algebra, spatial analysis, and numerical integrati	scipy/scipy	Utilities & Infrastructure	10
SEACR (Sparse Enrichment Analysis for CUT&RUN) SEACR (Sparse Enrichment Analysis for CUT&RUN) — peak caller specifically designed for CUT&RUN and CUT&Tag chromatin profiling data. Uses the sparse signal characteristics of CUT&RUN to call enriched	FredHutch/SEACR	Genomics	11
seqtk seqtk — fast lightweight C toolkit for processing FASTA and FASTQ files. Supports format conversion (FASTQ↔FASTA), random subsampling, quality trimming, reverse complement, base composition, sequence	lh3/seqtk	QC & Preprocessing	11
Shasta Shasta — fast de novo long-read genome assembler optimized for Oxford Nanopore (ONT) reads. Produces haploid or phased diploid assemblies from nanopore data using run-length encoding, MinHash-based ov	paoloshasta/shasta	Genomics	10
sleuth sleuth — R package for differential expression analysis of RNA-seq data at the transcript level. Works with kallisto bootstrap quantifications to model technical variability using a response error mod	pachterlab/sleuth	Transcriptomics	11
smartpca/EIGENSOFT smartpca/EIGENSOFT -- C/C++ tool for principal component analysis of genome-wide SNP genotype data. Computes eigenvectors and eigenvalues for population structure analysis, ancestry inference, stratif	DReichLab/EIG	Genomics	8
SNAP Use when working with SNAP (Scalable Nucleotide Alignment Program) — a fast DNA sequence aligner developed at UC Berkeley's AMPLab. Use for aligning short or long DNA reads to a reference genome, buil	amplab/snap	Genomics	10
SnapATAC2 SnapATAC2 — Python/Rust toolkit for single-cell ATAC-seq analysis. Provides fragment file import, cell-by-bin/peak matrix generation, spectral embedding dimensionality reduction, leiden clustering, MA	kaizhang/SnapATAC2	Single-Cell	11
Sniffles2 Sniffles2 — fast structural variant caller for long-read sequencing data (PacBio HiFi, Oxford Nanopore). Detects deletions, insertions, duplications, inversions, and translocations from BAM/CRAM align	fritzsedlazeck/Sniffles	Genomics	11
SnpEff SnpEff — fast Java-based variant annotation and effect prediction tool that annotates genomic variants (SNPs, indels, MNPs) with gene impact, protein changes, loss-of-function predictions, and HGVS no	pcingola/SnpEff	Genomics	10
SOAPdenovo2 SOAPdenovo2 — de novo short-read genome assembler for large plant and animal genomes using de Bruijn graph construction. Runs a four-stage pipeline: pregraph (k-mer graph), contig (initial contigs), m	aquaskyline/SOAPdenovo2	Genomics	10
SOAPnuke SOAPnuke — C++ quality control and preprocessing tool for high-throughput sequencing data. Filters and trims paired-end or single-end FASTQ reads by adapter content, low quality bases, N-base ratio, r	BGI-flexlab/SOAPnuke	QC & Preprocessing	11
SortMeRNA SortMeRNA — fast filtering of ribosomal RNA reads from metatranscriptomic and RNA-seq data using local sequence alignment against curated rRNA databases (SILVA, RFAM). CLI tool for rRNA removal, rRNA	sortmerna/sortmerna	QC & Preprocessing	10
Souporcell Verified Souporcell — genotype-based demultiplexing of pooled single-cell RNA-seq experiments. Assigns cells to donor of origin using SNP variants from the aligned BAM file without requiring known genotypes. D	wheaton5/souporcell	Single-Cell	10
squigualiser squigualiser is a Python tool for visualizing raw nanopore sequencing signal (squiggle) data aligned to reference sequences. Generates interactive HTML-based plots using Bokeh that overlay raw current	hiruna72/squigualiser	Genomics	11
Stacks Stacks is a software pipeline for building loci from short-read sequencing data (RAD-seq, GBS, ddRAD, 2b-RAD) for population genomics and phylogeography. Supports de novo and reference-guided assembly	catchenlab/stacks	Other	9
STAR-Fusion STAR-Fusion — detects candidate fusion transcripts from RNA-seq data using STAR alignments and the FusionInspector validation framework. Integrates with CTAT genome resource libraries for comprehensiv	STAR-Fusion/STAR-Fusion	Transcriptomics	11
STARR-seq Tools (STARRpeaker) Use when working with STARR-seq (Self-Transcribing Active Regulatory Region sequencing) data analysis. STARRpeaker calls enhancer peaks from STARR-seq BAM files using negative binomial regression to a	gersteinlab/starrpeaker	Genomics	10
Strelka2 Strelka2 — fast and accurate small variant caller for germline and somatic analysis. Detects SNVs and indels (up to ~49 bp) from mapped paired-end sequencing reads with tiered haplotype modeling, adap	Illumina/strelka	Transcriptomics	10
SUPPA2 SUPPA2 — fast, accurate analysis of alternative splicing from RNA-seq data. Calculates PSI (Percent Spliced In) values per event and per transcript from transcript quantification (Salmon, kallisto, RS	comprna/SUPPA	Transcriptomics	11
SURVIVOR SURVIVOR — C++ toolkit for structural variation (SV) analysis including merging multi-caller VCF files into consensus callsets, SV simulation on reference genomes, benchmarking SV callers against trut	fritzsedlazeck/SURVIVOR	Genomics	11
SvABA SvABA -- structural variant and indel caller using genome-wide local assembly. Detects deletions, insertions, duplications, inversions, and complex rearrangements from short-read (Illumina) whole-geno	walaj/svaba	Genomics	11
SVIM SVIM — structural variant identification from long-read sequencing data (PacBio, Oxford Nanopore). Detects deletions, insertions, tandem and interspersed duplications, inversions, and translocations f	eldariont/svim	Genomics	10
Sylph Sylph — ultrafast metagenomic profiling and containment ANI estimation using k-mer sketching. Performs species-level taxonomic profiling with abundance quantification and genome querying against pre-b	bluenote-1577/sylph	Metagenomics	10
Tandem Repeats Finder (TRF) Tandem Repeats Finder (TRF) — command-line tool for locating and displaying tandem repeats in DNA sequences. Detects microsatellites (STRs), minisatellites, and larger tandem duplications using a prob	Benson-Genomics-Lab/TRF	Genomics	10
TRUST4 Use when reconstructing T-cell receptor (TCR) or B-cell receptor (BCR) repertoires from bulk RNA-seq or single-cell RNA-seq data using TRUST4. Covers IMGT reference preparation, BAM/FASTQ input, CDR3	liulab-dfci/TRUST4	Clinical Genomics	9
UNITE UNITE is the reference database and taxonomy system for fungal ITS (Internal Transcribed Spacer) metabarcoding and amplicon sequencing. Use for classifying fungal sequences against Species Hypotheses	manual	Metagenomics	9
USEARCH USEARCH — ultra-fast amplicon sequence analysis toolkit for 16S/ITS/18S microbiome studies. Supports FASTQ quality filtering (fastq_filter), paired-end merging (fastq_mergepairs), dereplication (derep	manual	Metagenomics	10
VarScan2 VarScan2 -- Java-based variant caller for somatic and germline SNV/indel detection, copy number analysis, and LOH detection from samtools mpileup output. Supports tumor-normal paired somatic calling,	dkoboldt/varscan	Genomics	9
Velocyto Velocyto — RNA velocity estimation tool that distinguishes unspliced and spliced mRNAs in single-cell RNA-seq data to predict future cell states. Provides a CLI for counting spliced/unspliced/ambiguou	velocyto-team/velocyto.py	Transcriptomics	11
VEP VEP (Ensembl Variant Effect Predictor) — gold-standard tool for annotating and predicting the functional effects of genomic variants on genes, transcripts, and protein sequences. Provides consequence	Ensembl/ensembl-vep	Genomics	11
Verkko Verkko — hybrid genome assembler for telomere-to-telomere (T2T) diploid assembly from PacBio HiFi and Oxford Nanopore ultra-long reads. Combines MBG de Bruijn graphs with progressive ONT resolution, t	marbl/verkko	Genomics	10
VSEARCH VSEARCH — open-source, multithreaded alternative to USEARCH for amplicon and metagenomics sequence analysis. Performs dereplication, chimera detection (de novo and reference-based), OTU/ASV clustering	torognes/vsearch	Metagenomics	10
vt vt — C++ command-line variant tool set for manipulating VCF files. Provides variant normalization (left-alignment and trimming), multiallelic decomposition, VCF summary statistics (peek), annotation,	atks/vt	Genomics	10
Wengan Wengan — hybrid genome assembler combining short and long reads using a synthetic scaffolding approach. Integrates short-read assembly backends (Minia3, ABySS2, DiscovarDenovo) with long-read pseudo-a	adigenova/wengan	Genomics	10
WiggleTools WiggleTools — command-line toolkit for streaming arithmetic and set operations on genomic signal tracks stored in Wiggle, BigWig, BedGraph, and BAM/CRAM formats. Computes sums, means, products, log tr	Ensembl/WiggleTools	Utilities & Infrastructure	10
wtdbg2 wtdbg2 — ultrafast de novo long-read genome assembler using a fuzzy de Bruijn graph approach. Assembles PacBio (RSII, Sequel, CCS) and Oxford Nanopore reads without prior error correction. Two-step wo	ruanjue/wtdbg2	Genomics	10
YaHS Use when scaffolding genome assemblies with Hi-C chromatin contact data using YaHS (Yet Another Hi-C Scaffolding Tool). Covers Hi-C BAM preparation, contig scaffolding, AGP output, juicer_tools contac	c-zhou/yahs	Genomics	9
Cf Python cf-python is a Python library implementing the CF (Climate and Forecast) metadata conventions for reading, writing, and analysing Earth-science datasets stored in netCDF, Zarr, PP, and UM formats. Use	NCAS-CMS/cf-python	Other	8
EDAM Ontology Explorer Navigate the EDAM ontology hierarchy, find compatible tools and data types, and check format compatibility	edamontology/edamontology	Genomics	9
GTDB-Tk Verified GTDB-Tk — toolkit for objective taxonomic classification of bacterial and archaeal genomes using the Genome Taxonomy Database (GTDB). Assigns taxonomy based on placement in reference trees inferred fr	Ecogenomics/GTDBTk	Metagenomics	10
VarDict Verified VarDict variant caller for SNVs, MNVs, indels, complex variants, and structural variants from BAM files. Supports somatic paired tumor-normal calling and single-sample germline mode. Ultra-sensitive v	AstraZeneca-NGS/VarDict	Genomics	10
AnnData AnnData — annotated data matrices for single-cell and multi-omics analysis. Core data structure for the scverse ecosystem storing expression matrices (X) with observation metadata (obs), variable meta	scverse/anndata	Single-Cell	8

Tool	Registry	Domain	Docs
Roary Roary — rapid large-scale prokaryotic pan genome analysis. Calculates the pan genome from annotated assemblies (GFF3 from Prokka/Bakta), producing core and accessory gene clusters, gene presence/absen	sanger-pathogens/Roary	Phylogenetics	10
Sage Use when working with sage — sage — ultrafast Rust proteomics search	lazear/sage	Proteomics	9
SAIGE SAIGE — scalable genome-wide association tests in biobank-scale data using generalized mixed models with saddlepoint approximation. Fits null logistic/linear mixed models with sparse or full GRM to ac	saigegit/SAIGE	Population Genetics	11
salmon salmon — Fast, bias-aware transcript quantification from RNA-seq data using selective alignment to the transcriptome. Supports bulk RNA-seq (mapping-based and alignment-based modes), single-cell RNA-s	COMBINE-lab/salmon	Transcriptomics	11
SALSA2 SALSA2 — scaffold long-read genome assemblies using Hi-C proximity ligation data. Takes a draft contig assembly and Hi-C read alignments (BAM/BED) to produce chromosome-scale scaffolds. Iteratively co	marbl/SALSA	Genomics	10
Sambamba Sambamba — high-performance BAM/CRAM processing tool written in D with native multi-threading. Provides fast sorting, indexing, duplicate marking, merging, filtering, depth calculation, flagstat, and	biod/sambamba	Genomics	11
scib -- Single-Cell Integration Benchmarking scib (single-cell integration benchmarking) -- Python framework for evaluating and benchmarking batch correction and data integration methods in single-cell omics. Computes standardized metrics for bi	theislab/scib	Single-Cell	9
SciPy Use when working with SciPy — the foundational Python scientific computing library — for statistical testing, signal processing, optimization, linear algebra, spatial analysis, and numerical integrati	scipy/scipy	Utilities & Infrastructure	10
SEACR (Sparse Enrichment Analysis for CUT&RUN) SEACR (Sparse Enrichment Analysis for CUT&RUN) — peak caller specifically designed for CUT&RUN and CUT&Tag chromatin profiling data. Uses the sparse signal characteristics of CUT&RUN to call enriched	FredHutch/SEACR	Genomics	11
seqtk seqtk — fast lightweight C toolkit for processing FASTA and FASTQ files. Supports format conversion (FASTQ↔FASTA), random subsampling, quality trimming, reverse complement, base composition, sequence	lh3/seqtk	QC & Preprocessing	11
Shasta Shasta — fast de novo long-read genome assembler optimized for Oxford Nanopore (ONT) reads. Produces haploid or phased diploid assemblies from nanopore data using run-length encoding, MinHash-based ov	paoloshasta/shasta	Genomics	10
sleuth sleuth — R package for differential expression analysis of RNA-seq data at the transcript level. Works with kallisto bootstrap quantifications to model technical variability using a response error mod	pachterlab/sleuth	Transcriptomics	11
smartpca/EIGENSOFT smartpca/EIGENSOFT -- C/C++ tool for principal component analysis of genome-wide SNP genotype data. Computes eigenvectors and eigenvalues for population structure analysis, ancestry inference, stratif	DReichLab/EIG	Genomics	8
SNAP Use when working with SNAP (Scalable Nucleotide Alignment Program) — a fast DNA sequence aligner developed at UC Berkeley's AMPLab. Use for aligning short or long DNA reads to a reference genome, buil	amplab/snap	Genomics	10
SnapATAC2 SnapATAC2 — Python/Rust toolkit for single-cell ATAC-seq analysis. Provides fragment file import, cell-by-bin/peak matrix generation, spectral embedding dimensionality reduction, leiden clustering, MA	kaizhang/SnapATAC2	Single-Cell	11
Sniffles2 Sniffles2 — fast structural variant caller for long-read sequencing data (PacBio HiFi, Oxford Nanopore). Detects deletions, insertions, duplications, inversions, and translocations from BAM/CRAM align	fritzsedlazeck/Sniffles	Genomics	11
SnpEff SnpEff — fast Java-based variant annotation and effect prediction tool that annotates genomic variants (SNPs, indels, MNPs) with gene impact, protein changes, loss-of-function predictions, and HGVS no	pcingola/SnpEff	Genomics	10
SOAPdenovo2 SOAPdenovo2 — de novo short-read genome assembler for large plant and animal genomes using de Bruijn graph construction. Runs a four-stage pipeline: pregraph (k-mer graph), contig (initial contigs), m	aquaskyline/SOAPdenovo2	Genomics	10
SOAPnuke SOAPnuke — C++ quality control and preprocessing tool for high-throughput sequencing data. Filters and trims paired-end or single-end FASTQ reads by adapter content, low quality bases, N-base ratio, r	BGI-flexlab/SOAPnuke	QC & Preprocessing	11
SortMeRNA SortMeRNA — fast filtering of ribosomal RNA reads from metatranscriptomic and RNA-seq data using local sequence alignment against curated rRNA databases (SILVA, RFAM). CLI tool for rRNA removal, rRNA	sortmerna/sortmerna	QC & Preprocessing	10
Souporcell Verified Souporcell — genotype-based demultiplexing of pooled single-cell RNA-seq experiments. Assigns cells to donor of origin using SNP variants from the aligned BAM file without requiring known genotypes. D	wheaton5/souporcell	Single-Cell	10
squigualiser squigualiser is a Python tool for visualizing raw nanopore sequencing signal (squiggle) data aligned to reference sequences. Generates interactive HTML-based plots using Bokeh that overlay raw current	hiruna72/squigualiser	Genomics	11
Stacks Stacks is a software pipeline for building loci from short-read sequencing data (RAD-seq, GBS, ddRAD, 2b-RAD) for population genomics and phylogeography. Supports de novo and reference-guided assembly	catchenlab/stacks	Other	9
STAR-Fusion STAR-Fusion — detects candidate fusion transcripts from RNA-seq data using STAR alignments and the FusionInspector validation framework. Integrates with CTAT genome resource libraries for comprehensiv	STAR-Fusion/STAR-Fusion	Transcriptomics	11
STARR-seq Tools (STARRpeaker) Use when working with STARR-seq (Self-Transcribing Active Regulatory Region sequencing) data analysis. STARRpeaker calls enhancer peaks from STARR-seq BAM files using negative binomial regression to a	gersteinlab/starrpeaker	Genomics	10
Strelka2 Strelka2 — fast and accurate small variant caller for germline and somatic analysis. Detects SNVs and indels (up to ~49 bp) from mapped paired-end sequencing reads with tiered haplotype modeling, adap	Illumina/strelka	Transcriptomics	10
SUPPA2 SUPPA2 — fast, accurate analysis of alternative splicing from RNA-seq data. Calculates PSI (Percent Spliced In) values per event and per transcript from transcript quantification (Salmon, kallisto, RS	comprna/SUPPA	Transcriptomics	11
SURVIVOR SURVIVOR — C++ toolkit for structural variation (SV) analysis including merging multi-caller VCF files into consensus callsets, SV simulation on reference genomes, benchmarking SV callers against trut	fritzsedlazeck/SURVIVOR	Genomics	11
SvABA SvABA -- structural variant and indel caller using genome-wide local assembly. Detects deletions, insertions, duplications, inversions, and complex rearrangements from short-read (Illumina) whole-geno	walaj/svaba	Genomics	11
SVIM SVIM — structural variant identification from long-read sequencing data (PacBio, Oxford Nanopore). Detects deletions, insertions, tandem and interspersed duplications, inversions, and translocations f	eldariont/svim	Genomics	10
Sylph Sylph — ultrafast metagenomic profiling and containment ANI estimation using k-mer sketching. Performs species-level taxonomic profiling with abundance quantification and genome querying against pre-b	bluenote-1577/sylph	Metagenomics	10
Tandem Repeats Finder (TRF) Tandem Repeats Finder (TRF) — command-line tool for locating and displaying tandem repeats in DNA sequences. Detects microsatellites (STRs), minisatellites, and larger tandem duplications using a prob	Benson-Genomics-Lab/TRF	Genomics	10
TRUST4 Use when reconstructing T-cell receptor (TCR) or B-cell receptor (BCR) repertoires from bulk RNA-seq or single-cell RNA-seq data using TRUST4. Covers IMGT reference preparation, BAM/FASTQ input, CDR3	liulab-dfci/TRUST4	Clinical Genomics	9
UNITE UNITE is the reference database and taxonomy system for fungal ITS (Internal Transcribed Spacer) metabarcoding and amplicon sequencing. Use for classifying fungal sequences against Species Hypotheses	manual	Metagenomics	9
USEARCH USEARCH — ultra-fast amplicon sequence analysis toolkit for 16S/ITS/18S microbiome studies. Supports FASTQ quality filtering (fastq_filter), paired-end merging (fastq_mergepairs), dereplication (derep	manual	Metagenomics	10
VarScan2 VarScan2 -- Java-based variant caller for somatic and germline SNV/indel detection, copy number analysis, and LOH detection from samtools mpileup output. Supports tumor-normal paired somatic calling,	dkoboldt/varscan	Genomics	9
Velocyto Velocyto — RNA velocity estimation tool that distinguishes unspliced and spliced mRNAs in single-cell RNA-seq data to predict future cell states. Provides a CLI for counting spliced/unspliced/ambiguou	velocyto-team/velocyto.py	Transcriptomics	11
VEP VEP (Ensembl Variant Effect Predictor) — gold-standard tool for annotating and predicting the functional effects of genomic variants on genes, transcripts, and protein sequences. Provides consequence	Ensembl/ensembl-vep	Genomics	11
Verkko Verkko — hybrid genome assembler for telomere-to-telomere (T2T) diploid assembly from PacBio HiFi and Oxford Nanopore ultra-long reads. Combines MBG de Bruijn graphs with progressive ONT resolution, t	marbl/verkko	Genomics	10
VSEARCH VSEARCH — open-source, multithreaded alternative to USEARCH for amplicon and metagenomics sequence analysis. Performs dereplication, chimera detection (de novo and reference-based), OTU/ASV clustering	torognes/vsearch	Metagenomics	10
vt vt — C++ command-line variant tool set for manipulating VCF files. Provides variant normalization (left-alignment and trimming), multiallelic decomposition, VCF summary statistics (peek), annotation,	atks/vt	Genomics	10
Wengan Wengan — hybrid genome assembler combining short and long reads using a synthetic scaffolding approach. Integrates short-read assembly backends (Minia3, ABySS2, DiscovarDenovo) with long-read pseudo-a	adigenova/wengan	Genomics	10
WiggleTools WiggleTools — command-line toolkit for streaming arithmetic and set operations on genomic signal tracks stored in Wiggle, BigWig, BedGraph, and BAM/CRAM formats. Computes sums, means, products, log tr	Ensembl/WiggleTools	Utilities & Infrastructure	10
wtdbg2 wtdbg2 — ultrafast de novo long-read genome assembler using a fuzzy de Bruijn graph approach. Assembles PacBio (RSII, Sequel, CCS) and Oxford Nanopore reads without prior error correction. Two-step wo	ruanjue/wtdbg2	Genomics	10
YaHS Use when scaffolding genome assemblies with Hi-C chromatin contact data using YaHS (Yet Another Hi-C Scaffolding Tool). Covers Hi-C BAM preparation, contig scaffolding, AGP output, juicer_tools contac	c-zhou/yahs	Genomics	9
Cf Python cf-python is a Python library implementing the CF (Climate and Forecast) metadata conventions for reading, writing, and analysing Earth-science datasets stored in netCDF, Zarr, PP, and UM formats. Use	NCAS-CMS/cf-python	Other	8
EDAM Ontology Explorer Navigate the EDAM ontology hierarchy, find compatible tools and data types, and check format compatibility	edamontology/edamontology	Genomics	9
GTDB-Tk Verified GTDB-Tk — toolkit for objective taxonomic classification of bacterial and archaeal genomes using the Genome Taxonomy Database (GTDB). Assigns taxonomy based on placement in reference trees inferred fr	Ecogenomics/GTDBTk	Metagenomics	10
VarDict Verified VarDict variant caller for SNVs, MNVs, indels, complex variants, and structural variants from BAM files. Supports somatic paired tumor-normal calling and single-sample germline mode. Ultra-sensitive v	AstraZeneca-NGS/VarDict	Genomics	10
AnnData AnnData — annotated data matrices for single-cell and multi-omics analysis. Core data structure for the scverse ecosystem storing expression matrices (X) with observation metadata (obs), variable meta	scverse/anndata	Single-Cell	8

Browse Tools

Browse Tools