hifiasm

bio.toolsVerified7 docs · 3.9K tokens

v0.19.9

Haplotype-resolved de novo assembler for PacBio HiFi reads

BioContainers

75trust score

FAIR

GitHub →View on bio.tools →License: MIT

Topic:0196Operation:0525

Use with Claude

Paste this MCP call into Claude to instantly fetch hifiasm documentation:

MCP call

resolve-library-id("hifiasm")

Or share this page: biocontext7.com/tools/biotools/hifiasm

EDAM Annotations

Topics

Topic:0196

Operations

Operation:0525

Documentation Snippets

3,893 tokens · 7 snippets

Source

<a name="started"></a>Getting Started

```sh

Install hifiasm (requiring g++ and zlib)

git clone https://github.com/chhylp123/hifiasm

cd hifiasm && make

Run on test data (use -f0 for small datasets)

wget https://github.com/chhylp123/hifiasm/releases/download/v0.7/chr11-2M.fa.gz

./hifiasm -o test -t4 -f0 chr11-2M.fa.gz 2> test.log

awk '/^S/{print ">"$2;print $3}' test.bp.p_ctg.gfa > test.p_ctg.fa # get primary contigs in FASTA

Assemble inbred/homozygous genomes (-l0 disables duplication purging)

hifiasm -o CHM13.asm -t32 -l0 CHM13-HiFi.fa.gz 2> CHM13.asm.log

Assemble heterozygous genomes with built-in duplication purging

hifiasm -o HG002.asm -t32 HG002-file1.fq.gz HG002-file2.fq.gz

Assemble genomes with ONT R10 reads rather than PacBio HiFi reads using the latest release of hifiasm (>0.21.0-r686)

hifiasm -o HG002.asm --ont -t32 HG002-ont.fq.gz

Hi-C phasing with paired-end short reads in two FASTQ files

hifiasm -o HG002.asm --h1 read1.fq.gz --h2 read2.fq.gz HG002-HiFi.fq.gz

Trio binning assembly (requiring https://github.com/lh3/yak)

yak count -b37 -t16 -o pat.yak <(cat pat_1.fq.gz pat_2.fq.gz) <(cat pat_1.fq.gz pat_2.fq.gz)

yak count -b37 -t16 -o mat.yak <(cat mat_1.fq.gz mat_2.fq.gz) <(cat mat_1.fq.gz mat_2.fq.gz)

hifiasm -o HG002.asm -t32 -1 pat.yak -2 mat.yak HG002-HiFi.fa.gz

Improve contiguity for diploid genome assembly by self-scaffolding (--dual-scaf)

hifiasm -o HG002.asm --dual-scaf --h1 read1.fq.gz --h2 read2.fq.gz HG002-HiFi.fq.gz

Preserve more telomeres for human genomes (--telo-m CCCTAA)

hifiasm -o HG002.asm --telo-m CCCTAA --h1 read1.fq.gz --h2 read2.fq.gz HG002-HiFi.fq.gz

Hybrid assembly with HiFi, ultralong and Hi-C reads

hifiasm -o HG002.asm --h1 read1.fq.gz --h2 read2.fq.gz --ul ul.fq.gz HG002-HiFi.fq.gz

Single-sample telomere-to-telomere assembly for diploid human genomes

hifiasm -o HG002.asm --dual-scaf --telo-m CCCTAA --h1 read1.fq.gz --h2 read2.fq.gz --ul ul.fq.gz HG002-HiFi.fq.gz

```

See [tutorial][tutorial] for more details.

Source

Table of Contents

[Getting Started](#started)
[Introduction](#intro)
[Why Hifiasm?](#why)
[Usage](#use)
[Assembling HiFi reads without additional data types](#hifionly)
[Assembling ONT reads](#ontonly)
[Hi-C integration](#hic)
[Trio binning](#trio)
[Ultra-long ONT integration](#ul)
[Output files](#output)
[Results](#results)
[Getting Help](#help)
[Limitations](#limit)
[Citing Hifiasm](#cite)

<a name="intro"></a>Introduction

Hifiasm is a fast haplotype-resolved de novo assembler initially designed for PacBio HiFi reads.

Its latest release could support the telomere-to-telomere assembly by utilizing ultralong Oxford Nanopore reads. Hifiasm produces arguably the best single-sample telomere-to-telomere assemblies combing HiFi, ultralong and Hi-C reads, and it is one of the best haplotype-resolved assemblers for the trio-binning assembly given parental short reads. For a human genome, hifiasm can produce the telomere-to-telomere assembly in one day.

<a name="why"></a>Why Hifiasm?

Hifiasm delivers high-quality telomere-to-telomere assemblies. It tends to generate longer contigs

and resolve more segmental duplications than other assemblers.

Given Hi-C reads or short reads from the parents, hifiasm can produce overall the best

haplotype-resolved assembly so far. It is the assembler of choice by the

[Human Pangenome Project][hpp] for the first batch of samples.

Hifiasm can purge duplications between haplotigs without relying on

third-party tools such as purge\_dups. Hifiasm does not need polishing tools

like pilon or racon, either. This simplifies the assembly pipeline and saves

running time.

Hifiasm is fast. It can assemble a human genome in half a day and assemble a

~30Gb redwood genome in three days. No genome is too large for hifiasm.

Hifiasm is trivial to install and easy to use. It does not required Python,

R or C++11 compilers, and can be compiled into a single executable. The

default setting works well with a variety of genomes.

[hpp]: https://humanpangenome.org

<a name="use"></a>Usage

Source

Compiled Skill

No compiled skill yet for hifiasm.

Installation

conda

conda install -c bioconda hifiasm

Container Images

quay.io

docker pull quay.io/hifiasm:0.19.9--h43eeafb_0

Version History

v0.19.9