Chai-1 -- multi-modal foundation model for molecular structure prediction from Chai Discovery. Predicts 3D structures of proteins, nucleic acids (DNA/RNA), small molecules, glycans, ions, and their co

10

ChEMBL

ChEMBL — EMBL-EBI's manually curated database of bioactive molecules with drug-like properties. Query compound bioactivity data, ADMET properties, target annotations, mechanism of action, and approved

10

ChEMBL Database

ChEMBL Database — manually curated database of bioactive molecules with drug-like properties maintained by EMBL-EBI. Contains 2.4M compounds, 1.6M assays, 20M+ activity measurements, 15K+ targets, and

9

CHESS

CHESS (Comparison of Hi-C Experiments using Structural Similarity) — Python command-line tool for quantitative comparison and automatic feature extraction of chromatin contact data using the structura

10

CKMRsim

Use when working with the R package CKMRsim for Close-Kin Mark-Recapture simulation and population size estimation. CKMRsim supports power analysis for CKMR studies, pairwise kinship likelihood ratio

9

cmocean

cmocean — perceptually uniform colormaps for oceanographic data visualization. Provides 22 colormaps (thermal, haline, solar, ice, deep, dense, algae, matter, turbid, speed, amp, tempo, rain, phase, t

9

ColabFold

ColabFold — fast protein structure prediction combining AlphaFold2 with MMseqs2 for rapid MSA generation. Predict monomer and multimer structures, generate multiple sequence alignments, run batch pred

11

Comet

Comet — open-source tandem mass spectrometry (MS/MS) sequence database search engine for peptide identification. Searches MS/MS spectra against FASTA protein databases producing pepXML, mzIdentML, SQT

11

Conda/Mamba

Conda and Mamba package/environment management for bioinformatics. Create, export, and reproduce isolated software environments using conda or mamba (fast C++ solver). Manage Bioconda channels, resolv

9

COPASI

COPASI — COmplex PAthway SImulator for systems biology. Simulates and analyzes biochemical reaction networks using deterministic ODE integration (LSODA, Radau5, CVODE), stochastic simulation (Gibson-B

9

COSMOS

Use when inferring mechanistic causal links across multiple omics layers with COSMOS (Causal Oriented Search of Multi-Omics Space). Integrates metabolomics, transcriptomics, and signaling pathway data

10

CPTAC Pipelines

CPTAC pipelines — Python toolkit for accessing and analyzing Clinical Proteomic Tumor Analysis Consortium (CPTAC) multi-omics cancer data. Provides programmatic download of proteomics, phosphoproteomi

9

CRISPResso2

CRISPResso2 — Python tool for quantifying CRISPR genome editing outcomes from amplicon sequencing data. Analyzes NHEJ (insertions, deletions), HDR, and mixed repair outcomes. Supports single amplicon

9

cryoSPARC

cryoSPARC — cryo-EM single particle analysis platform for high-resolution 3D structure determination. Provides motion correction, CTF estimation, particle picking (blob, template, Topaz), 2D classific

10

Cyrius

Cyrius — CYP2D6 star allele genotyping from whole-genome sequencing data. Call CYP2D6 diplotypes (*1/*2, *4/*5) from BAM/CRAM files for pharmacogenomics clinical reporting. Handles CYP2D6/CYP2D7 fusio

9

DeepChem

DeepChem — Python framework for deep learning in drug discovery, materials science, quantum chemistry, and biology. Provides molecular featurizers (ECFP, graph convolutions, Coulomb matrices), pre-bui

10

Dental/Oral Microbiome Analysis

Use when working with dental or oral microbiome analysis — profiling microbial communities from saliva, dental plaque, gingival crevicular fluid, tongue dorsum, or buccal mucosa. Covers 16S rRNA ampli

9

DepMap/Chronos

DepMap/Chronos — Bayesian algorithm for inferring gene fitness effects from CRISPR knockout screen readcount data. Separates true gene knockout effects from copy-number artifacts, guide efficacy varia

9

DGL (Deep Graph Library)

Use when working with DGL (Deep Graph Library) — the Python framework for building and training graph neural networks (GNNs). Apply GCN, GAT, GraphSAGE, GIN, and other GNN architectures to biological

9

DIABLO

DIABLO (Data Integration Analysis for Biomarker discovery using Latent cOmponents) — supervised multi-omics integration method from the mixOmics R package. Performs sparse generalized canonical correl

9

DiffDock

DiffDock — diffusion generative model for molecular docking that predicts protein-ligand binding poses. Uses a diffusion process over translations, rotations, and torsion angles to generate and rank d

11

DNABERT

DNABERT — pre-trained BERT model for DNA sequence understanding and classification. Tokenizes DNA using overlapping k-mers (k=3,4,5,6) and provides contextualized embeddings for promoter prediction, s

10

DNAplotlib

dnaplotlib — Python library for programmable visualization of genetic designs and synthetic biology circuits. Renders genetic parts (Promoter, RBS, CDS, Terminator, Spacer) as vector-quality matplotli

9

Dockstore

Dockstore — open platform for sharing Docker-based bioinformatics tools and workflows written in CWL, WDL, Nextflow, and Galaxy. Provides workflow discovery, versioned registrations, TRS API for progr

10

DoRothEA

DoRothEA — Discriminant Regulon Expression Analysis R/Bioconductor package for transcription factor (TF) activity inference from gene expression data. Provides curated TF regulons (confidence levels A

9

DVC (Data Version Control)

DVC (Data Version Control) — Git-based version control for data, models, and ML experiments. Provides data versioning with .dvc files, ML pipeline definition via dvc.yaml DAGs, experiment tracking and

10

Earth2Mip

NVIDIA earth2mip — Earth-2 Model Interoperability Platform for AI weather and climate forecasting. Unified Python framework for running FourCastNet (v1/v2), CorrDiff, and other AI models at 0.25° glob

10

echtvar

'Use when working with echtvar — echtvar — ultra-fast Rust CLI for variant

9

eigenMT

eigenMT — multiple testing correction for QTL mapping using eigenvalue decomposition of the genotype correlation matrix. Estimates the effective number of independent tests (M_eff) per gene for cis-eQ

9

emmeans

emmeans — Estimated Marginal Means (least-squares means) in R. Compute marginal means from fitted models with emmeans(), pairwise comparisons with pairs() and contrast(), interaction analysis with joi

9

EpiDISH

EpiDISH — Bioconductor R package for cell-type deconvolution of DNA methylation data. Estimates cell-type proportions from Illumina 450K and EPIC arrays using reference-based methods: Robust Partial C

10

Evo

Evo — genomic foundation model for DNA sequence modeling at single-nucleotide resolution. Uses StripedHyena architecture (7B parameters) trained on 2.7M prokaryotic and phage genomes (OpenGenome datas

10

EvoDiff

EvoDiff — discrete diffusion models for protein sequence generation directly from evolutionary-scale data. Generates novel protein sequences without requiring 3D structures using order-agnostic autore

10

factoextra

factoextra — R package for extracting and visualizing multivariate analysis results with ggplot2. Covers PCA (fviz_pca_ind, fviz_pca_var, fviz_pca_biplot), CA (fviz_ca_row, fviz_ca_col, fviz_ca_biplot

10

Firecloud / WARP Pipelines

WARP (WDL Analysis Research Pipelines) — Broad Institute's cloud-optimized collection of genomics pipelines written in WDL (Workflow Description Language). Covers whole-genome sequencing (WGS), whole-

9

FPocket

FPocket — fast open-source protein pocket detection and druggability estimation using Voronoi tessellation and alpha spheres. Identifies binding sites on protein surfaces from PDB structures, scores p

10

FUSION

Use when running Transcriptome-Wide Association Studies (TWAS) with FUSION, integrating GWAS summary statistics with precomputed eQTL expression weights to identify genes whose predicted expression is

9

GA4GH Passport

GA4GH Passport — the Global Alliance for Genomics and Health standard for researcher identity and federated data access authorization. Uses JWT-encoded visa bundles to grant controlled access across i

9

Geneformer

Geneformer — transformer-based foundation model pretrained on ~30 million single-cell transcriptomes for context-specific gene network analysis. Supports fine-tuning for cell type classification, gene

10

NCBI Genome Data Viewer

NCBI Genome Data Viewer (GDV) — interactive web-based genome browser for exploring RefSeq genome assemblies with gene, variant, and functional annotation tracks. Supports URL-based navigation to speci

9

GenSLMs

Use when working with GenSLMs (Genome-Scale Language Models) for genome sequence generation, embedding, evolutionary analysis, or fine-tuning on viral or bacterial genomes. Covers model loading, infer

9

GEOfetch

GEOfetch — Python tool for downloading and converting GEO and SRA metadata and data into PEP (Portable Encapsulations of Projects) format. Fetches sample metadata from NCBI GEO/SRA, builds standardize

10

GeoPandas

GeoPandas — geospatial vector data analysis in Python. Read/write Shapefile, GeoJSON, GeoPackage, and PostGIS with read_file()/to_file(). Spatial joins with sjoin(), overlay operations (union, interse

9

ggsci

Use when working with ggsci, an R package of ggplot2 color palettes inspired by scientific journals, visualization libraries, and science fiction themes. Covers scale_color_*(), scale_fill_*(), and pa

10

Gnina

Gnina — deep learning molecular docking program built on AutoDock Vina. Uses convolutional neural networks (CNNs) to rescore protein-ligand poses for improved binding pose prediction and virtual scree

11

GUIDES

GUIDES (Graphical User Interface for DNA Editing Screens) — web-based tool for designing customized CRISPR-Cas9 guide RNA libraries. Integrates Doench on-target efficiency scoring, tissue-specific gen

10

HLA-HD

Use when performing HLA typing from NGS data with HLA-HD. Covers high-resolution HLA genotyping from WGS, WES, or targeted sequencing FASTQ files, calling classical and non-classical HLA alleles at 3-

9

HMMRATAC

HMMRATAC — Hidden Markov Model-based peak caller for ATAC-seq chromatin accessibility data. Identifies nucleosome-free regions (NFR), mono-, di-, and tri-nucleosomal fragments using an HMM that models

9

IDR

IDR (Irreproducible Discovery Rate) — statistical framework for assessing and controlling reproducibility of high-throughput sequencing experiments across biological replicates. Fits a copula mixture

9

ImageJ/Fiji

Use when working with imagej — open-source image processing program with

8

Tool	Registry	Domain	Docs
Chai-1 Chai-1 -- multi-modal foundation model for molecular structure prediction from Chai Discovery. Predicts 3D structures of proteins, nucleic acids (DNA/RNA), small molecules, glycans, ions, and their co	chaidiscovery/chai-lab	Structure Prediction	10
ChEMBL ChEMBL — EMBL-EBI's manually curated database of bioactive molecules with drug-like properties. Query compound bioactivity data, ADMET properties, target annotations, mechanism of action, and approved	chembl/chembl_webresource_client	Utilities & Infrastructure	10
ChEMBL Database ChEMBL Database — manually curated database of bioactive molecules with drug-like properties maintained by EMBL-EBI. Contains 2.4M compounds, 1.6M assays, 20M+ activity measurements, 15K+ targets, and	chembl/chembl_webresource_client	Drug Discovery	9
CHESS CHESS (Comparison of Hi-C Experiments using Structural Similarity) — Python command-line tool for quantitative comparison and automatic feature extraction of chromatin contact data using the structura	vaquerizaslab/chess	Genomics	10
CKMRsim Use when working with the R package CKMRsim for Close-Kin Mark-Recapture simulation and population size estimation. CKMRsim supports power analysis for CKMR studies, pairwise kinship likelihood ratio	eriqande/CKMRsim	Other	9
cmocean cmocean — perceptually uniform colormaps for oceanographic data visualization. Provides 22 colormaps (thermal, haline, solar, ice, deep, dense, algae, matter, turbid, speed, amp, tempo, rain, phase, t	matplotlib/cmocean	Other	9
ColabFold ColabFold — fast protein structure prediction combining AlphaFold2 with MMseqs2 for rapid MSA generation. Predict monomer and multimer structures, generate multiple sequence alignments, run batch pred	sokrypton/ColabFold	Proteomics	11
Comet Comet — open-source tandem mass spectrometry (MS/MS) sequence database search engine for peptide identification. Searches MS/MS spectra against FASTA protein databases producing pepXML, mzIdentML, SQT	UWPR/Comet	Proteomics	11
Conda/Mamba Conda and Mamba package/environment management for bioinformatics. Create, export, and reproduce isolated software environments using conda or mamba (fast C++ solver). Manage Bioconda channels, resolv	mamba-org/mamba	Workflows	9
COPASI COPASI — COmplex PAthway SImulator for systems biology. Simulates and analyzes biochemical reaction networks using deterministic ODE integration (LSODA, Radau5, CVODE), stochastic simulation (Gibson-B	copasi/COPASI	Systems Biology	9
COSMOS Use when inferring mechanistic causal links across multiple omics layers with COSMOS (Causal Oriented Search of Multi-Omics Space). Integrates metabolomics, transcriptomics, and signaling pathway data	saezlab/COSMOS	Systems Biology	10
CPTAC Pipelines CPTAC pipelines — Python toolkit for accessing and analyzing Clinical Proteomic Tumor Analysis Consortium (CPTAC) multi-omics cancer data. Provides programmatic download of proteomics, phosphoproteomi	PayneLab/cptac	Utilities & Infrastructure	9
CRISPResso2 CRISPResso2 — Python tool for quantifying CRISPR genome editing outcomes from amplicon sequencing data. Analyzes NHEJ (insertions, deletions), HDR, and mixed repair outcomes. Supports single amplicon	pinellolab/CRISPResso2	Systems Biology	9
cryoSPARC cryoSPARC — cryo-EM single particle analysis platform for high-resolution 3D structure determination. Provides motion correction, CTF estimation, particle picking (blob, template, Topaz), 2D classific	manual	Structure Prediction	10
Cyrius Cyrius — CYP2D6 star allele genotyping from whole-genome sequencing data. Call CYP2D6 diplotypes (1/2, 4/5) from BAM/CRAM files for pharmacogenomics clinical reporting. Handles CYP2D6/CYP2D7 fusio	Illumina/Cyrius	Drug Discovery	9
DeepChem DeepChem — Python framework for deep learning in drug discovery, materials science, quantum chemistry, and biology. Provides molecular featurizers (ECFP, graph convolutions, Coulomb matrices), pre-bui	deepchem/deepchem	Structure Prediction	10
Dental/Oral Microbiome Analysis Use when working with dental or oral microbiome analysis — profiling microbial communities from saliva, dental plaque, gingival crevicular fluid, tongue dorsum, or buccal mucosa. Covers 16S rRNA ampli	biobakery/MetaPhlAn	Metagenomics	9
DepMap/Chronos DepMap/Chronos — Bayesian algorithm for inferring gene fitness effects from CRISPR knockout screen readcount data. Separates true gene knockout effects from copy-number artifacts, guide efficacy varia	broadinstitute/chronos	Systems Biology	9
DGL (Deep Graph Library) Use when working with DGL (Deep Graph Library) — the Python framework for building and training graph neural networks (GNNs). Apply GCN, GAT, GraphSAGE, GIN, and other GNN architectures to biological	dmlc/dgl	Machine Learning	9
DIABLO DIABLO (Data Integration Analysis for Biomarker discovery using Latent cOmponents) — supervised multi-omics integration method from the mixOmics R package. Performs sparse generalized canonical correl	mixOmicsTeam/mixOmics	Systems Biology	9
DiffDock DiffDock — diffusion generative model for molecular docking that predicts protein-ligand binding poses. Uses a diffusion process over translations, rotations, and torsion angles to generate and rank d	gcorso/DiffDock	Structure Prediction	11
DNABERT DNABERT — pre-trained BERT model for DNA sequence understanding and classification. Tokenizes DNA using overlapping k-mers (k=3,4,5,6) and provides contextualized embeddings for promoter prediction, s	jerryji1993/DNABERT	Machine Learning	10
DNAplotlib dnaplotlib — Python library for programmable visualization of genetic designs and synthetic biology circuits. Renders genetic parts (Promoter, RBS, CDS, Terminator, Spacer) as vector-quality matplotli	VoigtLab/dnaplotlib	Systems Biology	9
Dockstore Dockstore — open platform for sharing Docker-based bioinformatics tools and workflows written in CWL, WDL, Nextflow, and Galaxy. Provides workflow discovery, versioned registrations, TRS API for progr	dockstore/dockstore	Workflows	10
DoRothEA DoRothEA — Discriminant Regulon Expression Analysis R/Bioconductor package for transcription factor (TF) activity inference from gene expression data. Provides curated TF regulons (confidence levels A	saezlab/dorothea	Systems Biology	9
DVC (Data Version Control) DVC (Data Version Control) — Git-based version control for data, models, and ML experiments. Provides data versioning with .dvc files, ML pipeline definition via dvc.yaml DAGs, experiment tracking and	iterative/dvc	Machine Learning	10
Earth2Mip NVIDIA earth2mip — Earth-2 Model Interoperability Platform for AI weather and climate forecasting. Unified Python framework for running FourCastNet (v1/v2), CorrDiff, and other AI models at 0.25° glob	NVIDIA/earth2mip	Machine Learning	10
echtvar 'Use when working with echtvar — echtvar — ultra-fast Rust CLI for variant	brentp/echtvar	Genomics	9
eigenMT eigenMT — multiple testing correction for QTL mapping using eigenvalue decomposition of the genotype correlation matrix. Estimates the effective number of independent tests (M_eff) per gene for cis-eQ	cran/eigenMT	Population Genetics	9
emmeans emmeans — Estimated Marginal Means (least-squares means) in R. Compute marginal means from fitted models with emmeans(), pairwise comparisons with pairs() and contrast(), interaction analysis with joi	rvlenth/emmeans	Statistics	9
EpiDISH EpiDISH — Bioconductor R package for cell-type deconvolution of DNA methylation data. Estimates cell-type proportions from Illumina 450K and EPIC arrays using reference-based methods: Robust Partial C	sjczheng/EpiDISH	Epigenomics	10
Evo Evo — genomic foundation model for DNA sequence modeling at single-nucleotide resolution. Uses StripedHyena architecture (7B parameters) trained on 2.7M prokaryotic and phage genomes (OpenGenome datas	evo-design/evo	Machine Learning	10
EvoDiff EvoDiff — discrete diffusion models for protein sequence generation directly from evolutionary-scale data. Generates novel protein sequences without requiring 3D structures using order-agnostic autore	microsoft/evodiff	Structure Prediction	10
factoextra factoextra — R package for extracting and visualizing multivariate analysis results with ggplot2. Covers PCA (fviz_pca_ind, fviz_pca_var, fviz_pca_biplot), CA (fviz_ca_row, fviz_ca_col, fviz_ca_biplot	kassambara/factoextra	Statistics	10
Firecloud / WARP Pipelines WARP (WDL Analysis Research Pipelines) — Broad Institute's cloud-optimized collection of genomics pipelines written in WDL (Workflow Description Language). Covers whole-genome sequencing (WGS), whole-	broadinstitute/warp	Utilities & Infrastructure	9
FPocket FPocket — fast open-source protein pocket detection and druggability estimation using Voronoi tessellation and alpha spheres. Identifies binding sites on protein surfaces from PDB structures, scores p	Discngine/fpocket	Structure Prediction	10
FUSION Use when running Transcriptome-Wide Association Studies (TWAS) with FUSION, integrating GWAS summary statistics with precomputed eQTL expression weights to identify genes whose predicted expression is	gusevlab/fusion_twas	Population Genetics	9
GA4GH Passport GA4GH Passport — the Global Alliance for Genomics and Health standard for researcher identity and federated data access authorization. Uses JWT-encoded visa bundles to grant controlled access across i	ga4gh/data-security	Clinical Genomics	9
Geneformer Geneformer — transformer-based foundation model pretrained on ~30 million single-cell transcriptomes for context-specific gene network analysis. Supports fine-tuning for cell type classification, gene	manual	Machine Learning	10
NCBI Genome Data Viewer NCBI Genome Data Viewer (GDV) — interactive web-based genome browser for exploring RefSeq genome assemblies with gene, variant, and functional annotation tracks. Supports URL-based navigation to speci	manual	Genomics	9
GenSLMs Use when working with GenSLMs (Genome-Scale Language Models) for genome sequence generation, embedding, evolutionary analysis, or fine-tuning on viral or bacterial genomes. Covers model loading, infer	ramanathanlab/genslm	Genomics	9
GEOfetch GEOfetch — Python tool for downloading and converting GEO and SRA metadata and data into PEP (Portable Encapsulations of Projects) format. Fetches sample metadata from NCBI GEO/SRA, builds standardize	pepkit/geofetch	Workflows	10
GeoPandas GeoPandas — geospatial vector data analysis in Python. Read/write Shapefile, GeoJSON, GeoPackage, and PostGIS with read_file()/to_file(). Spatial joins with sjoin(), overlay operations (union, interse	geopandas/geopandas	Other	9
ggsci Use when working with ggsci, an R package of ggplot2 color palettes inspired by scientific journals, visualization libraries, and science fiction themes. Covers scale_color_(), scale_fill_(), and pa	nanxstats/ggsci	Visualization	10
Gnina Gnina — deep learning molecular docking program built on AutoDock Vina. Uses convolutional neural networks (CNNs) to rescore protein-ligand poses for improved binding pose prediction and virtual scree	gnina/gnina	Structure Prediction	11
GUIDES GUIDES (Graphical User Interface for DNA Editing Screens) — web-based tool for designing customized CRISPR-Cas9 guide RNA libraries. Integrates Doench on-target efficiency scoring, tissue-specific gen	sanjanalab/GUIDES	Systems Biology	10
HLA-HD Use when performing HLA typing from NGS data with HLA-HD. Covers high-resolution HLA genotyping from WGS, WES, or targeted sequencing FASTQ files, calling classical and non-classical HLA alleles at 3-	ANHIG/IMGTHLA	Clinical Genomics	9
HMMRATAC HMMRATAC — Hidden Markov Model-based peak caller for ATAC-seq chromatin accessibility data. Identifies nucleosome-free regions (NFR), mono-, di-, and tri-nucleosomal fragments using an HMM that models	LiuLabIUPUI/HMMRATAC	Epigenomics	9
IDR IDR (Irreproducible Discovery Rate) — statistical framework for assessing and controlling reproducibility of high-throughput sequencing experiments across biological replicates. Fits a copula mixture	nboley/idr	Epigenomics	9
ImageJ/Fiji Use when working with imagej — open-source image processing program with	manual	Imaging	8

Tool	Registry	Domain	Docs
Chai-1 Chai-1 -- multi-modal foundation model for molecular structure prediction from Chai Discovery. Predicts 3D structures of proteins, nucleic acids (DNA/RNA), small molecules, glycans, ions, and their co	chaidiscovery/chai-lab	Structure Prediction	10
ChEMBL ChEMBL — EMBL-EBI's manually curated database of bioactive molecules with drug-like properties. Query compound bioactivity data, ADMET properties, target annotations, mechanism of action, and approved	chembl/chembl_webresource_client	Utilities & Infrastructure	10
ChEMBL Database ChEMBL Database — manually curated database of bioactive molecules with drug-like properties maintained by EMBL-EBI. Contains 2.4M compounds, 1.6M assays, 20M+ activity measurements, 15K+ targets, and	chembl/chembl_webresource_client	Drug Discovery	9
CHESS CHESS (Comparison of Hi-C Experiments using Structural Similarity) — Python command-line tool for quantitative comparison and automatic feature extraction of chromatin contact data using the structura	vaquerizaslab/chess	Genomics	10
CKMRsim Use when working with the R package CKMRsim for Close-Kin Mark-Recapture simulation and population size estimation. CKMRsim supports power analysis for CKMR studies, pairwise kinship likelihood ratio	eriqande/CKMRsim	Other	9
cmocean cmocean — perceptually uniform colormaps for oceanographic data visualization. Provides 22 colormaps (thermal, haline, solar, ice, deep, dense, algae, matter, turbid, speed, amp, tempo, rain, phase, t	matplotlib/cmocean	Other	9
ColabFold ColabFold — fast protein structure prediction combining AlphaFold2 with MMseqs2 for rapid MSA generation. Predict monomer and multimer structures, generate multiple sequence alignments, run batch pred	sokrypton/ColabFold	Proteomics	11
Comet Comet — open-source tandem mass spectrometry (MS/MS) sequence database search engine for peptide identification. Searches MS/MS spectra against FASTA protein databases producing pepXML, mzIdentML, SQT	UWPR/Comet	Proteomics	11
Conda/Mamba Conda and Mamba package/environment management for bioinformatics. Create, export, and reproduce isolated software environments using conda or mamba (fast C++ solver). Manage Bioconda channels, resolv	mamba-org/mamba	Workflows	9
COPASI COPASI — COmplex PAthway SImulator for systems biology. Simulates and analyzes biochemical reaction networks using deterministic ODE integration (LSODA, Radau5, CVODE), stochastic simulation (Gibson-B	copasi/COPASI	Systems Biology	9
COSMOS Use when inferring mechanistic causal links across multiple omics layers with COSMOS (Causal Oriented Search of Multi-Omics Space). Integrates metabolomics, transcriptomics, and signaling pathway data	saezlab/COSMOS	Systems Biology	10
CPTAC Pipelines CPTAC pipelines — Python toolkit for accessing and analyzing Clinical Proteomic Tumor Analysis Consortium (CPTAC) multi-omics cancer data. Provides programmatic download of proteomics, phosphoproteomi	PayneLab/cptac	Utilities & Infrastructure	9
CRISPResso2 CRISPResso2 — Python tool for quantifying CRISPR genome editing outcomes from amplicon sequencing data. Analyzes NHEJ (insertions, deletions), HDR, and mixed repair outcomes. Supports single amplicon	pinellolab/CRISPResso2	Systems Biology	9
cryoSPARC cryoSPARC — cryo-EM single particle analysis platform for high-resolution 3D structure determination. Provides motion correction, CTF estimation, particle picking (blob, template, Topaz), 2D classific	manual	Structure Prediction	10
Cyrius Cyrius — CYP2D6 star allele genotyping from whole-genome sequencing data. Call CYP2D6 diplotypes (1/2, 4/5) from BAM/CRAM files for pharmacogenomics clinical reporting. Handles CYP2D6/CYP2D7 fusio	Illumina/Cyrius	Drug Discovery	9
DeepChem DeepChem — Python framework for deep learning in drug discovery, materials science, quantum chemistry, and biology. Provides molecular featurizers (ECFP, graph convolutions, Coulomb matrices), pre-bui	deepchem/deepchem	Structure Prediction	10
Dental/Oral Microbiome Analysis Use when working with dental or oral microbiome analysis — profiling microbial communities from saliva, dental plaque, gingival crevicular fluid, tongue dorsum, or buccal mucosa. Covers 16S rRNA ampli	biobakery/MetaPhlAn	Metagenomics	9
DepMap/Chronos DepMap/Chronos — Bayesian algorithm for inferring gene fitness effects from CRISPR knockout screen readcount data. Separates true gene knockout effects from copy-number artifacts, guide efficacy varia	broadinstitute/chronos	Systems Biology	9
DGL (Deep Graph Library) Use when working with DGL (Deep Graph Library) — the Python framework for building and training graph neural networks (GNNs). Apply GCN, GAT, GraphSAGE, GIN, and other GNN architectures to biological	dmlc/dgl	Machine Learning	9
DIABLO DIABLO (Data Integration Analysis for Biomarker discovery using Latent cOmponents) — supervised multi-omics integration method from the mixOmics R package. Performs sparse generalized canonical correl	mixOmicsTeam/mixOmics	Systems Biology	9
DiffDock DiffDock — diffusion generative model for molecular docking that predicts protein-ligand binding poses. Uses a diffusion process over translations, rotations, and torsion angles to generate and rank d	gcorso/DiffDock	Structure Prediction	11
DNABERT DNABERT — pre-trained BERT model for DNA sequence understanding and classification. Tokenizes DNA using overlapping k-mers (k=3,4,5,6) and provides contextualized embeddings for promoter prediction, s	jerryji1993/DNABERT	Machine Learning	10
DNAplotlib dnaplotlib — Python library for programmable visualization of genetic designs and synthetic biology circuits. Renders genetic parts (Promoter, RBS, CDS, Terminator, Spacer) as vector-quality matplotli	VoigtLab/dnaplotlib	Systems Biology	9
Dockstore Dockstore — open platform for sharing Docker-based bioinformatics tools and workflows written in CWL, WDL, Nextflow, and Galaxy. Provides workflow discovery, versioned registrations, TRS API for progr	dockstore/dockstore	Workflows	10
DoRothEA DoRothEA — Discriminant Regulon Expression Analysis R/Bioconductor package for transcription factor (TF) activity inference from gene expression data. Provides curated TF regulons (confidence levels A	saezlab/dorothea	Systems Biology	9
DVC (Data Version Control) DVC (Data Version Control) — Git-based version control for data, models, and ML experiments. Provides data versioning with .dvc files, ML pipeline definition via dvc.yaml DAGs, experiment tracking and	iterative/dvc	Machine Learning	10
Earth2Mip NVIDIA earth2mip — Earth-2 Model Interoperability Platform for AI weather and climate forecasting. Unified Python framework for running FourCastNet (v1/v2), CorrDiff, and other AI models at 0.25° glob	NVIDIA/earth2mip	Machine Learning	10
echtvar 'Use when working with echtvar — echtvar — ultra-fast Rust CLI for variant	brentp/echtvar	Genomics	9
eigenMT eigenMT — multiple testing correction for QTL mapping using eigenvalue decomposition of the genotype correlation matrix. Estimates the effective number of independent tests (M_eff) per gene for cis-eQ	cran/eigenMT	Population Genetics	9
emmeans emmeans — Estimated Marginal Means (least-squares means) in R. Compute marginal means from fitted models with emmeans(), pairwise comparisons with pairs() and contrast(), interaction analysis with joi	rvlenth/emmeans	Statistics	9
EpiDISH EpiDISH — Bioconductor R package for cell-type deconvolution of DNA methylation data. Estimates cell-type proportions from Illumina 450K and EPIC arrays using reference-based methods: Robust Partial C	sjczheng/EpiDISH	Epigenomics	10
Evo Evo — genomic foundation model for DNA sequence modeling at single-nucleotide resolution. Uses StripedHyena architecture (7B parameters) trained on 2.7M prokaryotic and phage genomes (OpenGenome datas	evo-design/evo	Machine Learning	10
EvoDiff EvoDiff — discrete diffusion models for protein sequence generation directly from evolutionary-scale data. Generates novel protein sequences without requiring 3D structures using order-agnostic autore	microsoft/evodiff	Structure Prediction	10
factoextra factoextra — R package for extracting and visualizing multivariate analysis results with ggplot2. Covers PCA (fviz_pca_ind, fviz_pca_var, fviz_pca_biplot), CA (fviz_ca_row, fviz_ca_col, fviz_ca_biplot	kassambara/factoextra	Statistics	10
Firecloud / WARP Pipelines WARP (WDL Analysis Research Pipelines) — Broad Institute's cloud-optimized collection of genomics pipelines written in WDL (Workflow Description Language). Covers whole-genome sequencing (WGS), whole-	broadinstitute/warp	Utilities & Infrastructure	9
FPocket FPocket — fast open-source protein pocket detection and druggability estimation using Voronoi tessellation and alpha spheres. Identifies binding sites on protein surfaces from PDB structures, scores p	Discngine/fpocket	Structure Prediction	10
FUSION Use when running Transcriptome-Wide Association Studies (TWAS) with FUSION, integrating GWAS summary statistics with precomputed eQTL expression weights to identify genes whose predicted expression is	gusevlab/fusion_twas	Population Genetics	9
GA4GH Passport GA4GH Passport — the Global Alliance for Genomics and Health standard for researcher identity and federated data access authorization. Uses JWT-encoded visa bundles to grant controlled access across i	ga4gh/data-security	Clinical Genomics	9
Geneformer Geneformer — transformer-based foundation model pretrained on ~30 million single-cell transcriptomes for context-specific gene network analysis. Supports fine-tuning for cell type classification, gene	manual	Machine Learning	10
NCBI Genome Data Viewer NCBI Genome Data Viewer (GDV) — interactive web-based genome browser for exploring RefSeq genome assemblies with gene, variant, and functional annotation tracks. Supports URL-based navigation to speci	manual	Genomics	9
GenSLMs Use when working with GenSLMs (Genome-Scale Language Models) for genome sequence generation, embedding, evolutionary analysis, or fine-tuning on viral or bacterial genomes. Covers model loading, infer	ramanathanlab/genslm	Genomics	9
GEOfetch GEOfetch — Python tool for downloading and converting GEO and SRA metadata and data into PEP (Portable Encapsulations of Projects) format. Fetches sample metadata from NCBI GEO/SRA, builds standardize	pepkit/geofetch	Workflows	10
GeoPandas GeoPandas — geospatial vector data analysis in Python. Read/write Shapefile, GeoJSON, GeoPackage, and PostGIS with read_file()/to_file(). Spatial joins with sjoin(), overlay operations (union, interse	geopandas/geopandas	Other	9
ggsci Use when working with ggsci, an R package of ggplot2 color palettes inspired by scientific journals, visualization libraries, and science fiction themes. Covers scale_color_(), scale_fill_(), and pa	nanxstats/ggsci	Visualization	10
Gnina Gnina — deep learning molecular docking program built on AutoDock Vina. Uses convolutional neural networks (CNNs) to rescore protein-ligand poses for improved binding pose prediction and virtual scree	gnina/gnina	Structure Prediction	11
GUIDES GUIDES (Graphical User Interface for DNA Editing Screens) — web-based tool for designing customized CRISPR-Cas9 guide RNA libraries. Integrates Doench on-target efficiency scoring, tissue-specific gen	sanjanalab/GUIDES	Systems Biology	10
HLA-HD Use when performing HLA typing from NGS data with HLA-HD. Covers high-resolution HLA genotyping from WGS, WES, or targeted sequencing FASTQ files, calling classical and non-classical HLA alleles at 3-	ANHIG/IMGTHLA	Clinical Genomics	9
HMMRATAC HMMRATAC — Hidden Markov Model-based peak caller for ATAC-seq chromatin accessibility data. Identifies nucleosome-free regions (NFR), mono-, di-, and tri-nucleosomal fragments using an HMM that models	LiuLabIUPUI/HMMRATAC	Epigenomics	9
IDR IDR (Irreproducible Discovery Rate) — statistical framework for assessing and controlling reproducibility of high-throughput sequencing experiments across biological replicates. Fits a copula mixture	nboley/idr	Epigenomics	9
ImageJ/Fiji Use when working with imagej — open-source image processing program with	manual	Imaging	8

Browse Tools

Browse Tools