Browse the BioContext7 deep skill library. Tool pages surface documentation, registry links, and install details for agent-facing workflows.
2,064 tools — page 7 of 42
| Tool | Registry | Domain | Docs |
|---|---|---|---|
Roary — rapid large-scale prokaryotic pan genome analysis. Calculates the pan genome from annotated assemblies (GFF3 from Prokka/Bakta), producing core and accessory gene clusters, gene presence/absen | sanger-pathogens/Roary | Phylogenetics | 10 |
Use when working with sage — sage — ultrafast Rust proteomics search | lazear/sage | Proteomics | 9 |
SAIGE — scalable genome-wide association tests in biobank-scale data using generalized mixed models with saddlepoint approximation. Fits null logistic/linear mixed models with sparse or full GRM to ac | saigegit/SAIGE | Population Genetics | 11 |
salmon — Fast, bias-aware transcript quantification from RNA-seq data using selective alignment to the transcriptome. Supports bulk RNA-seq (mapping-based and alignment-based modes), single-cell RNA-s | COMBINE-lab/salmon | Transcriptomics | 11 |
SALSA2 — scaffold long-read genome assemblies using Hi-C proximity ligation data. Takes a draft contig assembly and Hi-C read alignments (BAM/BED) to produce chromosome-scale scaffolds. Iteratively co | marbl/SALSA | Genomics | 10 |
Sambamba — high-performance BAM/CRAM processing tool written in D with native multi-threading. Provides fast sorting, indexing, duplicate marking, merging, filtering, depth calculation, flagstat, and | biod/sambamba | Genomics | 11 |
scib (single-cell integration benchmarking) -- Python framework for evaluating and benchmarking batch correction and data integration methods in single-cell omics. Computes standardized metrics for bi | theislab/scib | Single-Cell | 9 |
Use when working with SciPy — the foundational Python scientific computing library — for statistical testing, signal processing, optimization, linear algebra, spatial analysis, and numerical integrati | scipy/scipy | Utilities & Infrastructure | 10 |
SEACR (Sparse Enrichment Analysis for CUT&RUN) — peak caller specifically designed for CUT&RUN and CUT&Tag chromatin profiling data. Uses the sparse signal characteristics of CUT&RUN to call enriched | FredHutch/SEACR | Genomics | 11 |
seqtk — fast lightweight C toolkit for processing FASTA and FASTQ files. Supports format conversion (FASTQ↔FASTA), random subsampling, quality trimming, reverse complement, base composition, sequence | lh3/seqtk | QC & Preprocessing | 11 |
Shasta — fast de novo long-read genome assembler optimized for Oxford Nanopore (ONT) reads. Produces haploid or phased diploid assemblies from nanopore data using run-length encoding, MinHash-based ov | paoloshasta/shasta | Genomics | 10 |
sleuth — R package for differential expression analysis of RNA-seq data at the transcript level. Works with kallisto bootstrap quantifications to model technical variability using a response error mod | pachterlab/sleuth | Transcriptomics | 11 |
smartpca/EIGENSOFT -- C/C++ tool for principal component analysis of genome-wide SNP genotype data. Computes eigenvectors and eigenvalues for population structure analysis, ancestry inference, stratif | DReichLab/EIG | Genomics | 8 |
Use when working with SNAP (Scalable Nucleotide Alignment Program) — a fast DNA sequence aligner developed at UC Berkeley's AMPLab. Use for aligning short or long DNA reads to a reference genome, buil | amplab/snap | Genomics | 10 |
SnapATAC2 — Python/Rust toolkit for single-cell ATAC-seq analysis. Provides fragment file import, cell-by-bin/peak matrix generation, spectral embedding dimensionality reduction, leiden clustering, MA | kaizhang/SnapATAC2 | Single-Cell | 11 |
Sniffles2 — fast structural variant caller for long-read sequencing data (PacBio HiFi, Oxford Nanopore). Detects deletions, insertions, duplications, inversions, and translocations from BAM/CRAM align | fritzsedlazeck/Sniffles | Genomics | 11 |
SnpEff — fast Java-based variant annotation and effect prediction tool that annotates genomic variants (SNPs, indels, MNPs) with gene impact, protein changes, loss-of-function predictions, and HGVS no | pcingola/SnpEff | Genomics | 10 |
SOAPdenovo2 — de novo short-read genome assembler for large plant and animal genomes using de Bruijn graph construction. Runs a four-stage pipeline: pregraph (k-mer graph), contig (initial contigs), m | aquaskyline/SOAPdenovo2 | Genomics | 10 |
SOAPnuke — C++ quality control and preprocessing tool for high-throughput sequencing data. Filters and trims paired-end or single-end FASTQ reads by adapter content, low quality bases, N-base ratio, r | BGI-flexlab/SOAPnuke | QC & Preprocessing | 11 |
SortMeRNA — fast filtering of ribosomal RNA reads from metatranscriptomic and RNA-seq data using local sequence alignment against curated rRNA databases (SILVA, RFAM). CLI tool for rRNA removal, rRNA | sortmerna/sortmerna | QC & Preprocessing | 10 |
Souporcell Verified Souporcell — genotype-based demultiplexing of pooled single-cell RNA-seq experiments. Assigns cells to donor of origin using SNP variants from the aligned BAM file without requiring known genotypes. D | wheaton5/souporcell | Single-Cell | 10 |
squigualiser is a Python tool for visualizing raw nanopore sequencing signal (squiggle) data aligned to reference sequences. Generates interactive HTML-based plots using Bokeh that overlay raw current | hiruna72/squigualiser | Genomics | 11 |
Stacks is a software pipeline for building loci from short-read sequencing data (RAD-seq, GBS, ddRAD, 2b-RAD) for population genomics and phylogeography. Supports de novo and reference-guided assembly | catchenlab/stacks | Other | 9 |
STAR-Fusion — detects candidate fusion transcripts from RNA-seq data using STAR alignments and the FusionInspector validation framework. Integrates with CTAT genome resource libraries for comprehensiv | STAR-Fusion/STAR-Fusion | Transcriptomics | 11 |
Use when working with STARR-seq (Self-Transcribing Active Regulatory Region sequencing) data analysis. STARRpeaker calls enhancer peaks from STARR-seq BAM files using negative binomial regression to a | gersteinlab/starrpeaker | Genomics | 10 |
Strelka2 — fast and accurate small variant caller for germline and somatic analysis. Detects SNVs and indels (up to ~49 bp) from mapped paired-end sequencing reads with tiered haplotype modeling, adap | Illumina/strelka | Transcriptomics | 10 |
SUPPA2 — fast, accurate analysis of alternative splicing from RNA-seq data. Calculates PSI (Percent Spliced In) values per event and per transcript from transcript quantification (Salmon, kallisto, RS | comprna/SUPPA | Transcriptomics | 11 |
SURVIVOR — C++ toolkit for structural variation (SV) analysis including merging multi-caller VCF files into consensus callsets, SV simulation on reference genomes, benchmarking SV callers against trut | fritzsedlazeck/SURVIVOR | Genomics | 11 |
SvABA -- structural variant and indel caller using genome-wide local assembly. Detects deletions, insertions, duplications, inversions, and complex rearrangements from short-read (Illumina) whole-geno | walaj/svaba | Genomics | 11 |
SVIM — structural variant identification from long-read sequencing data (PacBio, Oxford Nanopore). Detects deletions, insertions, tandem and interspersed duplications, inversions, and translocations f | eldariont/svim | Genomics | 10 |
Sylph — ultrafast metagenomic profiling and containment ANI estimation using k-mer sketching. Performs species-level taxonomic profiling with abundance quantification and genome querying against pre-b | bluenote-1577/sylph | Metagenomics | 10 |
Tandem Repeats Finder (TRF) — command-line tool for locating and displaying tandem repeats in DNA sequences. Detects microsatellites (STRs), minisatellites, and larger tandem duplications using a prob | Benson-Genomics-Lab/TRF | Genomics | 10 |
Use when reconstructing T-cell receptor (TCR) or B-cell receptor (BCR) repertoires from bulk RNA-seq or single-cell RNA-seq data using TRUST4. Covers IMGT reference preparation, BAM/FASTQ input, CDR3 | liulab-dfci/TRUST4 | Clinical Genomics | 9 |
UNITE is the reference database and taxonomy system for fungal ITS (Internal Transcribed Spacer) metabarcoding and amplicon sequencing. Use for classifying fungal sequences against Species Hypotheses | manual | Metagenomics | 9 |
USEARCH — ultra-fast amplicon sequence analysis toolkit for 16S/ITS/18S microbiome studies. Supports FASTQ quality filtering (fastq_filter), paired-end merging (fastq_mergepairs), dereplication (derep | manual | Metagenomics | 10 |
VarScan2 -- Java-based variant caller for somatic and germline SNV/indel detection, copy number analysis, and LOH detection from samtools mpileup output. Supports tumor-normal paired somatic calling, | dkoboldt/varscan | Genomics | 9 |
Velocyto — RNA velocity estimation tool that distinguishes unspliced and spliced mRNAs in single-cell RNA-seq data to predict future cell states. Provides a CLI for counting spliced/unspliced/ambiguou | velocyto-team/velocyto.py | Transcriptomics | 11 |
VEP (Ensembl Variant Effect Predictor) — gold-standard tool for annotating and predicting the functional effects of genomic variants on genes, transcripts, and protein sequences. Provides consequence | Ensembl/ensembl-vep | Genomics | 11 |
Verkko — hybrid genome assembler for telomere-to-telomere (T2T) diploid assembly from PacBio HiFi and Oxford Nanopore ultra-long reads. Combines MBG de Bruijn graphs with progressive ONT resolution, t | marbl/verkko | Genomics | 10 |
VSEARCH — open-source, multithreaded alternative to USEARCH for amplicon and metagenomics sequence analysis. Performs dereplication, chimera detection (de novo and reference-based), OTU/ASV clustering | torognes/vsearch | Metagenomics | 10 |
vt — C++ command-line variant tool set for manipulating VCF files. Provides variant normalization (left-alignment and trimming), multiallelic decomposition, VCF summary statistics (peek), annotation, | atks/vt | Genomics | 10 |
Wengan — hybrid genome assembler combining short and long reads using a synthetic scaffolding approach. Integrates short-read assembly backends (Minia3, ABySS2, DiscovarDenovo) with long-read pseudo-a | adigenova/wengan | Genomics | 10 |
WiggleTools — command-line toolkit for streaming arithmetic and set operations on genomic signal tracks stored in Wiggle, BigWig, BedGraph, and BAM/CRAM formats. Computes sums, means, products, log tr | Ensembl/WiggleTools | Utilities & Infrastructure | 10 |
wtdbg2 — ultrafast de novo long-read genome assembler using a fuzzy de Bruijn graph approach. Assembles PacBio (RSII, Sequel, CCS) and Oxford Nanopore reads without prior error correction. Two-step wo | ruanjue/wtdbg2 | Genomics | 10 |
Use when scaffolding genome assemblies with Hi-C chromatin contact data using YaHS (Yet Another Hi-C Scaffolding Tool). Covers Hi-C BAM preparation, contig scaffolding, AGP output, juicer_tools contac | c-zhou/yahs | Genomics | 9 |
cf-python is a Python library implementing the CF (Climate and Forecast) metadata conventions for reading, writing, and analysing Earth-science datasets stored in netCDF, Zarr, PP, and UM formats. Use | NCAS-CMS/cf-python | Other | 8 |
Navigate the EDAM ontology hierarchy, find compatible tools and data types, and check format compatibility | edamontology/edamontology | Genomics | 9 |
GTDB-Tk Verified GTDB-Tk — toolkit for objective taxonomic classification of bacterial and archaeal genomes using the Genome Taxonomy Database (GTDB). Assigns taxonomy based on placement in reference trees inferred fr | Ecogenomics/GTDBTk | Metagenomics | 10 |
VarDict Verified VarDict variant caller for SNVs, MNVs, indels, complex variants, and structural variants from BAM files. Supports somatic paired tumor-normal calling and single-sample germline mode. Ultra-sensitive v | AstraZeneca-NGS/VarDict | Genomics | 10 |
AnnData — annotated data matrices for single-cell and multi-omics analysis. Core data structure for the scverse ecosystem storing expression matrices (X) with observation metadata (obs), variable meta | scverse/anndata | Single-Cell | 8 |