High-Quality <em> de novo</em> Genome Assembly of Huajingxian 74, a Receptor Parent of Single Segment Substitution Lines

Cite this Article

Li Fangping, Gao Yanhao, Wu Bingqi, Cai Qingpei, Zhan Pengling, Yang Weifeng, Shi Wanxuan, Li Xiaohua, Yang Zifeng, Tan Quanya, Luan Xin, Zhang Guiquan, Wang Shaokui. 2021, High-Quality de novo Genome Assembly of Huajingxian 74, a Receptor Parent of Single Segment Substitution Lines . Rice Science 水稻科学(英文版), 28(2): 109-113. 复制到剪切板

Doi:10.1016/j.rsci.2020.09.010

Permissions

China National Rice Research Institute

High-Quality de novo Genome Assembly of Huajingxian 74, a Receptor Parent of Single Segment Substitution Lines

Li Fangping, Gao Yanhao, Wu Bingqi, Cai Qingpei, Zhan Pengling, Yang Weifeng, Shi Wanxuan, Li Xiaohua, Yang Zifeng, Tan Quanya, Luan Xin, Zhang Guiquan, Wang Shaokui

Guangdong Provincial Key Laboratory of Plant Molecular Breeding / State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou 510642, China

Corresponding author: Wang Shaokui (shaokuiwang@scau.edu.cn)

Show Figures

Rice (Oryza sativa L.) is grown nearly worldwide and provides the staple food for more than half of the global population (Luo et al, 2017). The genomes of several cultivated rice varieties including Nipponbare (NPB) (Kawahara et al, 2013; Sakai et al, 2013), IR64 (Tanaka et al, 2020), 93-11 (Zhang et al, 2018) and R498 (Du et al, 2017) at chromosome level, and Minghui 63 and Zhenshan 97 (Zhang et al, 2016) at scaffold level have been assembled, annotated and released, among which the R498 and NPB genomes are widely used as reference genomes in rice research. However, there are thousands of rice cultivars, landraces and wild rice varieties in the world with dramatically different genetic backgrounds, and the genomes of native rice varieties in South China, which is one of the major rice production areas in China, have not been de novo assembled. Huajingxian 74 (HJX74) is an indica rice variety bred in South China Agricultural University, Guangdong Province with widely environmental adaptability and high yield (www.ricedata.cn/ variety/varis/602548.htm). HJX74 exhibits significant phenotypic and genetic differences from those varieties whose whole genomes have been properly sequenced and assembled (Fig. 1).

Figure Option
View Download New Window

Fig. 1. Phenotype (A) and phylogeney (B) of HJX74 (Huajingxian 74).
Phylogenetic tree constructed by the maximum- likelihood method using coding sequences of single copy lineal homologous genes (the genes were showed in Table S4). Totally 12 species or varieties were used for alignment, 9 of them are cultivated rice (Nipponbare, 93-11, R498, Zhenshan 97, IR64, Minghui 63, Basmati, DomSuid and HJX74) and the other 3 are wild rice (O. meridionalis, O. barthiiandO. rufipogon).

In the past 30 years, a large library of single segment substitution lines (SSSLs) has been constructed using HJX74 as the receptor plant and 43 accessions that belong to 7 species of rice AA genome as donors. Hence, all these SSSLs are in the same genetic background (Zhang, 2019). The SSSL library has made a great contribution to the identification of QTLs/genes involved in disease resistance, fertility, panicle length, stress resistance, grain shape determination and so on (Wang S K et al, 2015; Fang et al, 2019; Wang et al, 2019). In addition, the SSSL library has provided a powerful platform for rice breeding by design (Luan et al, 2019; Zhao et al, 2019). The construction of a high-quality genome of the receptor parent (HJX74) of the SSSL library is therefore essential for improving the efficiency of rice genetic and mechanism studies for desirable agronomic traits, as well as accelerating the process of rice breeding by design. We produced a high-precision HJX74 chromosomal genome by performing whole-genome sequencing in the PacBio platform (Rhoads and Au, 2015), followed by the Hi-C-assisted assembly mount technology (van Berkum et al, 2010). The corresponding online platform has been constructed as well (https://RiceGenomicHJX.xiaomy.net). The sequence and de novo assembly of the HJX74 genome will significantly enrich the understanding of rice genome and provide a powerful tool for rice studies.

A total of 7 380 677 reads (137.31 Gb) of the HJX74 genome sequences were produced by PacBio SeqⅡ (Fig. 2-A and -B), and 51.23 Gb and 40.93 Gb of the sequence data were generated by common and Hi-C library preparation illumina sequencing, respectively. The overlapped group files (contig) consisting of 155 fasta format sequences with the size of 399.00 Mb (N50 = 14.41 Mb) (Table S1) were produced after being assembled and polished.

Figure Option
View Download New Window

Fig. 2. Characteristics of Huajingxian 74 (HJX74) genome and synteny examining, SNPs (single nucleotide polymorphisms) and InDels (Inserts/Deletes) mining, dN/dS comparison with Nipponbare (NPB) and R498.
A, Distribution of SNPs and InDels between HJX74 and NPB (the data refer to Table S5).
B, Distribution of SNPs and InDels between HJX74 and R498 (the data refer to Table S6).
C, dN/dS distribution of different combinations. The vertical lines represent average values of dN/dS.
D, Chromosomal synteny among HJX74 and two reference genomes of rice.
E, Interactive dot plot between HJX74 and NPB.
F, Interactive dot plot between HJX74 and R498.
G, Characteristics of the HJX74 genome. Tracks from outside to inside are the 12 chromosomes of HJX74, GC content, long terminal repeat density, and simple sequence repeat density (the data refer to Table S11).

Table S1. Comparison of contigs and scaffolds among Huajingxian 74 (HJX74) and two reference rice genomes (Nipponbare and R498).

Visualization of the Hi-C signals indicated that 12 square matrix areas in the Hi-C heat map displayed significant differences from the background signal corresponding to the chromosome number of the rice nuclear genome (Fig. S1). The final polished scaffold genome was constructed by the Hi-C data and the consensus sequence file spanned 398.87 Mb, and there were 108 contigs for HJX74 including 12 chromosome lengths contigs (Fig. 2-D and Table S1). The genome assemblies recovered more than 98% of the 1 440 Benchmarking Universal single- copy orthologs (BUSCO) embryophyte genes and completely assembled more than 92.5% of the 248 embryophyte core genes from the Core Eukaryotic Genes Mapping Approach (CEGMA) database (Li et al, 2020) (Table S2). Long terminal repeat-retotransposons (LTR-RTs) assembly index (LAI) of the HJX74 genome was calculated to be 23.42, which is close to the high-quality rice genome of NPB (22.59) and R498 (23.94) (Table S3).

	Figure Option View Download New Window
	Fig. S1. Hi-C interactive heat map.

Table S2. Evaluation of Huangjingxian 74 genome assembly by Benchmarking Universal single-copy ortholog (BUSCO) and Core Eukaryotic Genes Mapping Approach (CEGMA).

Table S3. LTR-TRs assembly index (LAI) of R498, Nipponbare and Huajingxian 74 (HJX74).

Combining ab initio, protein and expressed sequence tag (EST) evidences with consensus gene prediction (Zhang et al, 2015), we annotated the HJX74 genome with 46 993 non-redundant genes. Among them, 39 002 genes (83.0%) form 27 202 clusters with genes from 11 otherOryza species, whereas 7 991 genes present singletons in the OrthoVenn2 (Wang Y et al, 2015). The clustering analysis based on Markov Clustering (MCL) algorithm indicates high annotation reliability. Totally 2 850 single-copy gene clusters were generated by the orthologous cluster analysis of direct homology in 9 Oryza sativa varieties, O. rufipogon, O. meridionalis and O. barthii on the platform OthoVenn2 (Table S4). The phylogenetic tree constructed by using the coding region nucleic acid sequence of 2 850 single- copy lineal homologous gene clusters indicated that HJX74 was clustered in the clade of O. sativasubsp. indica and had the closest genetic relationship with IR64 (Fig. 1-B and Table S5). HJX74 was genetically far from NPB and R498, even HJX74 and R498 were clustered within the indica rice clade, which is consistent with the SNPs, InDels and persence and absence variations (PAVs) across the 12 chromosomes in HJX74 compared to NPB and R498 (Fig. 2-A and -B; Fig. S2 and Tables S6 and S7). In addition, more genes were presented in HJX74/ R498 at a peak of 0.4-0.5 than HJX74/NPB from the density curve of dN/dS(Kryazhimskiy and Plotkin, 2008), which suggested more genes in HJX74 were positively selected when compared with R498 than the comparation with NPB (Fig. 2-C). The reason for this phenomenon is possibly due to the crossbreeding between rice subspecies (indica andjaponica) during the HJX74 breeding process and preference to tropical japonica as germplasm resources for rice breeding in South China.

	Figure Option View Download New Window
	Fig. S2. Cumulative sequence length and presence and absence variations (PAVs) distribution. A and B, Comparison of cumulative sequence length of HJX74 and two published reference rice genomes (NPB and R498); C and D, Comparison of PAVs length distribution in HJX74 and two published reference rice genomes (NPB and R498).

The relative lengths of HJX74 chromosomes are consistent with NPB and R498 (Table S3). According to the whole- genome comparison, the genome of HJX74, at the position of about 12-17 Mb on chromosome 6, showed a sequence inversion with a length of about 5 Mb compared with the NPB genome, while the HJX74 sequence was in the same order as R498 (Fig. S3). Besides, the HJX74 genome was nearly 8.1 Mb and 25.2 Mb larger than R498 (390.9 Mb) and NPB (373.8 Mb), respectively. We performed a whole-genome comparison to examine the synteny between the HJX74 and R498/NPB genomes using the python version program MCScanX (Wang et al, 2012). HJX74 showed a high degree of synteny and the same large inversion in the middle of chromosome 6 with indica/japonica genomes, which was consistent with the whole- genome alignment between the HJX74 and R498/NPB genomes (Fig. 2-D to -F and Fig. S3). This phenomenon or the disordered alignment to NPB in the same locus was also respectively detected in the genomes of O. sativa, Basmati 334 and DomSufid (Choi et al, 2020; Xie et al, 2020). This long fragment staining inversion phenomenon existed in this site indeed, which suggested that the inversion might have been occurred during the process of rice subspecies differentiation. There is a about 3 Mb large-scale syntenic block between the short arms of chromosomes 11 and 12 according to the synteny plot, which was estimated to result from a duplication event 7.7 million years ago and was consistent with previous research (The Rice Chromosomes 11 and 12 Sequencing Consortia, 2005).

	Figure Option View Download New Window
	Fig. S3. Interactive dot plot of Huajingxian 74 (HJX74) and two reference rice genomes (R498 and Nipponbare, NPB). A to C, Paired comparison of HJX74 and two reference genomes on genome level; D, Comparison of HJX74 and two reference rice genomes on the region of chromosome 6 (un, Unique alignment; re, Repetitive alignment; sh, Show repetitive alignment).

There are a considerable number of PAVs between the genomes of HJX74 and NPB (Table S8 and Fig. S2-B). Compared with NPB, the HJX74 genome has more long-fragment insertion sequences and repeated fragment expansions (Fig. S2-B). Three NPB chromosomes (NPB-Chr.02, NPB-Chr.03 and NPB-Chr.10) with the greatest difference from HJX74 were compared. The long-term insertions (> 10 kb) and tandem/repeats contributed significantly to the longer chromosome length of HJX74 compared to NPB (Fig. S4-A to -C). This result tallies with the previous report that the chromosome length difference was most probably due to the changes in tandem/repeat regions (Kim et al, 2017). In contrast, the length of each chromosome of HJX74 was close to that of R498 with an average length difference about 0.075 Mb (Table S9).

	Figure Option View Download New Window
	Fig. S4. Presence and absence variations (PAVs) types and distribution of some chromosomes with significantly different lengths between Huajingxian 74 (HJX74) and Nipponbare (NPB). A, Chromosome 2; B, Chromosome 3; C, Chromosome10.

Table S7. Clustering of homologous genes of 12 species of rice.

Table S9. Chromosome lengths of Huajingxian 74 (HJX74), Nipponbare (NPB) and R498.

Then, we found that the LTR-RT length and type ratio (Gypsy/ Copia/unknown) of the HJX74 genome were similar to those of R498, but significantly different from those of NPB (Table S10). Previous research reported that the two subspecies of rice, indica and japonica, have experienced independent amplification or loss of LTR-RTs after the divergence (Du et al, 2017). In this study, the chromosome structure comparison showed fewer differences in PAVs and LTR-RTs between two indica varieties HJX74 and R498, but their PAVs and LTR-RTs were very different from those of NPB. Meanwhile, a total of 26 647 simple sequence repeat loci, with the number of repeating units ≥ 3 bp, were detected in 12 chromosomes of HJX74 (Fig. 2-G and Table S11), which demonstrated the promising application of the HJX74 genome in the development of molecular breeding markers.

To encourage the use of the genome of HJX74 and other rice varieties, a platform (https://RiceGenomicHJX.xiaomy.net) supporting sequence search (Blast), gene browse, download and extraction were built with the support from the Guangdong Provincial Key Laboratory of Plant Molecular Breeding, China. The platform also collects information about the mutation sites in HJX74 and other rice genomes, and multiple rice research platforms and websites. Further improvement and development of the platform is underway to optimize its application (Fig. S5).

Table S10. Repeat content (LTR-RTs) in Huajingxian 74 (HJX74), R498 and Nipponbare (NPB).

	Figure Option View Download New Window
	Fig. S5. Online accesible platfrom of HJX74 Genome data. A, Display of platform homepage; B, Platform sequence search (BLAST) function; C, Display of FPtools page and online sequence extraction page; D, Display of genes browsing page.

In previous studies, considerable progress has been made by combining bioinformatics and whole genome sequencing methods (such as RNA-seq and genome-wide association study) with traditional molecular biology methods for germplasm resource mining and molecular breeding in rice (Shao et al, 2019; Groen et al, 2020). However, these technologies require a reliable reference genome. Here, we presented a highly contiguous and near-complete genome assembly for HJX74, a high-yielding indica rice variety widely-grown in South China. As a platform variety, HJX74 has been implemented to construct a large SSSL library with 2 360 independent lines (Zhang, 2019). The SSSL library has an excellent application prospect in rice breeding by design and QTL/gene identifications (Zhou et al, 2017). Compared with NPB, the utilization of the HJX74 reference genome is able to detect more SNP loci or insertion/deletion sites in many PAVs while combining with whole genome sequencing technologies (Fig. S6). Our work provides a precise reference genome and an accessible utilization platform for further research based on the SSSL library. There is no doubt that this reference genome of the receptor parent of the SSSL library will contribute to simplifying the mining and identification processes of rice functional genes controlling agronomic traits of interest, thereby promoting the research and application of rice breeding by design.

	Figure Option View Download New Window
	Fig. S6. Sequence difference in presence and absence variations (PAVs) locus. A, SNP detection in PAVs locus; B, Insertion sites detection in PAVs locus.

AcknowledgEments

This study was supported by the National Key Research and Development Program of China (Grant No. 2016YFD0100406), National College Students Innovation and Entrepreneurship Foundation of China (Grant No. 201910564054), National Natural Science Foundation of China (Grant Nos. 91735304 and 31622041) and Special Project for Leading Talents in Innovation of Science and Technology of Guangdong Province, China (Grant No. 2016TX03N224). We thank Ji Zhe (Department of Plant Sciences, University of Oxford) for suggestions.

Supplemental DatA

The following materials are available in the online version of this article at http://www.sciencedirect.com/science/journal/rice- science; http://www.ricescience.org.

File S1. Methods.

Fig. S1. Hi-C interactive heat map.

Fig. S2. Cumulative sequence length and presence and absence variation distribution.

Fig. S3. Interactive dot plot of Huajingxian 74 and two reference rice genomes (R498 and Nipponbare).

Fig. S4. Presence and absence variations types and distribution of some chromosomes with significantly different lengths between Huajingxian 74 and Nipponbare.

Fig. S5. Online platform of Huajingxian 74 genome data.

Fig. S6. Sequence difference in presence and absence variation locus.

Table S1. Comparison of contigs and scaffolds among Huajingxian 74 and two reference rice genomes.

Table S2. Evaluation of Huajingxian 74 genome assembly by Benchmarking Universal single-copy ortholog and Core Eukaryotic Genes Mapping Approach.

Table S3. Long terminal repeat-retotransposons assembly index of R498, Nipponbare, 93-11 and Huajingxian 74.

Table S4. Clustering of homologous genes of 12 species of rice.

Table S5. Single copy homologous genes of 12 Oryza species.

Table S6. Mutation site of Huajingxian 74 compared with Nipponbare.

Table S7. Mutation site of Huajingxian 74 compared with R498.

Table S8. Presence and absence variations length distribution.

Table S9. Chromosome lengths of Huajingxian 74, Nipponbare and R498.

Table S10. Long terminal repeat-retotransposons in Huajingxian 74, R498 and Nipponbare.

Table S11. Detection of simple sequence repeat locus on Huajingxian 74 genome.

Methods

Collection of rice samples and DNA extractions

Plant materials were planted at the experimental station at the South China Agricultural University in Guangdong Province, China (23º 10′ 3.07′ ′ N, 113º 21′ 41.39′ ′ E). The seeds used for germination were produced from flowers that were bagged to prevent cross-pollination, and harvested in the late season of 2012. After hydroponic at 28 º C, plant tissue of about 2.0 g in weight was collected from two-week-old seedlings of Huangjingxian 74 (HJX74) for DNA extraction.

Genomic DNA was extracted by the CTAB method (Porebski et al, 1997). The quality of the extracted DNA library was determined by running on 1% agarose gel electrophoresis and using the Qubit^TM Fluorometer fluorescence platform. Specifically, only a single band of DNA should be present on the agarose gel, and the length of the band should be more than 30 kb. The concentration of the genomic DNA was 302 ng/µ L.

Library preparation and sequencing

SMRTbell Express Template Prep Kit 2.0 was used for library preparation of the HJX74 template and the platform Pacbio SeqⅡ was used for sequencing. Platform Illumina Hiseq was used to generate short fragment pair-end sequence (2 × 150). In the Hi-C experiment, the genomic DNA of HJX74 was fixed with 1% formaldehyde solution, before being digested by the DpnⅡ restriction enzyme. The binding to biotin markers was followed by using T₄ DNA ligase (NEB, Ipswich, USA). DNA fragments labeled by biotin were finally separated on Dynabeads^® M-280 Streptavidin (Thermo Fisher Scientific , Waltham, USA). Hi-C libraries were sequenced and quality controlled on an Illumina Hiseq X Ten sequencer (Illumina, San Diego, USA).

Genome assembly, polishing and scaffolding

PacBio reads were assembled by Canu 2.0 (corOutCoverage = 120, corMinCoverage = 2, minReadLength = 2000, minOverlapLength = 500). To improve the accuracy of the result, the preliminary.config file were subsequently polished using Pacbio reads by Quiver (https://github.com/PacificBiosciences/GenomicConsensus) and illumina reads by Pilon v2.0 (https://github.com/broadinstitute/pilon) and NextPolish (Hu et al, 2019).

Short Hi-C reads were mapped by BWA mem (Li, 2013), Samtools v1.9 (Li et al, 2009) and GATK software (https://github.com/broadinstitute/gatk/). HiC-Pro (Servant et al, 2015) and HiCplotter (Akdemir and Chin, 2015) were used for the visualization of the Hi-C interaction signals. HapCUT (Bansal and Bafna, 2008) algorithm and HICUP pipeline (Wingett et al, 2015) were respectively used for haplotype reconstructing and generating contact matric based on the Hi-C data. The integrity of the assembled genome was assessed by Benchmarking Universal single-copy orthologs (BUSCO) and Core Eukaryotic Genes Mapping Approach (CEGMA) (Li et al, 2020It is not listed in REFERENCE. Please supply.).

Gene annotation

A combined strategy was adopted to predict the protein-coding genes in HJX74 genome. Augustus v3.0.3 (Stanke et al, 2004) was used to detect the hypothetical gene-coding regions in the HJX74 genome (ab initio). We developed a Pipline called RiceOrthoblast (https://github.com/lipingfangs/ RiceOrthoblast) by intergrating Software BLAST+ v2.9.0 (ftp://ftp.ncbi.nlm.nih.gov/blast/ executables/blast+/) and Genewise v2.4.1 (http://www. ebi.ac.uk/~birney/wise2/) to annotate the HJX74 genome with genomic homology from the reference protein sequence file (Oryza_sativa. IRGSP-1.0.pep.all.fa; ftp://ftp.ensemblgenomes.org/ pub/plants/release-47/fasta/ oryza_sativa/pep/), which generate the non-redundant homologous gene annotation file (GFF3; Pep) for lineal homology phylogenetic analysis as well as the protein evidence of further annotation. To further aid gene annotation, 21 682 489 Illumina RNA-seq reads were mapped to the HJX74 genome by Hisat2 v2.1.0 (Pertea et al, 2016) and samtools v1.9. The .bam file generated above was assembled into transcripts and annotations files by Stringties v1.3.5 (Pertea et al, 2016) and TransDecoder v5.5.0 (Kim et al, 2016). EVidenceModeler (EVM) (Haas et al, 2008) was used to combine the predicted results from ab initio, protein and Expressed Sequence Tag (ESTPlease supply the full name.) evidences into consensus gene predictions.

Phylogenomic analysis

OrthoVenn2 (Wang et al, 2015It is not listed in REFERENCE. Please supply.) and its corresponding database were used for the comparison and clustering of nine cultivated rice samples [Nipponbare (NPB), 93-11, R498, IR64, Basmati, DomSuid, Zhenshan 97 (ZS97), Minghui 63 (MH63) and HJX74] and three wild rice samples (Oryza meridionalis, O. barthii and O. rufipogon). Protein sequences of single-copy gene clusters produced by direct homologous clustering of the 12 samples were aligned by MAFFT v7.0 (Katoh et al, 2002). Iqtree v1.1.6.1 (Nguyen et al, 2015) was used to match the optimal amino acid replacement model (Modelfinder module) and to build the phylogenetic tree.

Repetitive DNA annotation

The LTR_finder software (Xu and Wang, 2007) was used for long terminal repeat (LTR) locus detection of HJX74 and three published cultivated rice genomes (NPB, 93-11, R498). The result files were conveyed to LTR_retriever (Ou and Jiang, 2018) for LTR retrotransposons (LTR-RTs) site recognition, combined screening of the HJX74 genome and the calculation of LTR-RTs Assembly index (LAI).

MISA Please supply the full name.It is the full name of the software itself.(Thiel et al, 2003) was used for the detection of simple sequence repeats (SSR) of HJX74 (the number of repeating unit ≥ 3 bp was calculated). The GC content and the density of SSR and LTR-RTs were visualized by R package OmicCircos (Hu et al, 2014).

Genome-wide comparison

Mummer v4.0.0 (-maxmatch -l 100 -c 500) (http://mummer.sourceforge.net/) was used for the comparison of the HJX74 and three published cultivated rice genomes (NPB, R498Only two?Yes, only two cultivated rice were used in this analysis.). We examined and classified the presence/absence variants (PAVs) between these genomes by the computational packages assemblytics (Nattestad and Schatz, 2016It is not listed in REFERENCE. Please supply.). Using the python version of the software MCScanX (https://github.com/tanghaibao/jcvi/ wiki/MCscan-(Python-version)) with default parameters, we identified collinear orthologous genes, plotted the synteny blocks, and combined codeml programme of PAML Please supply the full name.It is the full name of the software itself.(Yang, 1997) to calculated the dN/dSvalues (Kryazhimskiy and Plotkin, 2008) for HJX74 and NPB, R498. The distribution of dN/dSvalues was plotted with R package ggplot2.

Platform construction

The platform (https://RiceGenomicHJX.xiaomy.net) was rooted in the Guangdong Provincial Key Laboratory of Plant Molecular Breeding, State Key Laboratory for Conservation and Utilization of Subtropical Agro- Bioresources, South China Agricultural. Its front-end kernel is based on HTML5, CSS, and the framework vue.js of Javascript. The back-end sequence search (blast) function implementation is based on the open-source PHP project Viroblast (https://indra.mullins.microbiol.washington.edu/viroblast/viroblast.php). The ssequence extraction software was realized by web pages as well as the attached tool FPtools developed by custom Python script.

Data availability

All analyses and quality control steps were coded in Python3 scripts or Linux shell commands/scripts except where stated explicitly. The custom Python3 codes for building overlap and generating final sequences are provided as https://github.com/lipingfangs. The sequence reads are available at The Genome Sequence Archive (GSA) (http://gsa.big.ac.cn/index.jsp) under project code PRJCA002801. The genome assembly of HJX74, and the data from whole genome sequence (WGS) sequencing have been deposited under NCBI BioProjects: PRJNA636594 (common illumina sequence data), PRJNA637189 (Pacbio data) and PRJNA637223 (Hi-C illumina sequence data). The accessions to the HJX74 assembly and the Illumina RNA-seq reads are PRJNA637414 and PRJNA639680, respectively. The assembled genome, annotated genes and protein files for HJX74, NPB, R498, 93-11, IR64, Basmati, DomSuid, MH63 and ZS97, and the data for displaying the example of sequence difference in PAVs locus are also accessible at https://RiceGenomicHJX.xiaomy.net.

REFERENCES

Akdemir K C, Chin L. 2015. HiCPlotter integrates genomic data with interaction matrices. Genome Biol, 16: 198.

Bansal V, Bafna V. 2008. HapCUT: An efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24: 153-159.

Haas B J, Salzberg S L, Zhu W, Haas M, Allen J E, Orvis J, White O, Buell R C, Wortman J R. 2008. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol, 9: R7.

Hu Y, Yan C H, Hsu C H, Chen Q R, Niu K, Komatsoulis G A, Meerzaman D. 2014. OmicCircos: A simple-to-use R package for the circular visualization of multidimensional omics data. Cancer Inform, 13: 13-20.

Hu, J., J. Fan, Z. Sun, S. Liu. 2019. NextPolish: a fast and efficient genome polishing tool for long read assembly. Bioinformatics 36: 2253-2255.

Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: A novel method for NPB id multiple sequence alignment based on fast Fourier transform. Nucl Acids Res, 30: 3059-3066.

Kim H, Hwang D, Lee B, Park J C, Lee Y H, Lee J. 2016. De novo assembly and annotation of the marine mysid (Neomysis awatschensis) transcriptome. Mar Genom, 28: 41-43.

Kryazhimskiy S, Plotkin J B. 2008. The population genetics of dN/dS.PLoS Genet, 4: e1000304.

Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Genomics, 1303: 1-3.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. 2009. The sequence alignment/map format and SAMtools. Bioinformatics, 25: 2078-2079.

Li, W., K. Li, Q. Zhang, T. Zhu, Y. Zhang, C. Shi, Y. Liu, E. Xia, J. Jiang, C. Shi, L. Zhang, H. Huang, Y. Tong, Y. Liu, D. Zhang, Y. Zhao, W. Jiang, Y. Zhao, S. Mao, L. Gao. 2020. Improved hybrid de novo genome assembly and annotation of African wild rice, Oryza longistaminata, from Illumina and PacBio sequencing reads. The Plant Genome 13:e20001.

Nguyen L, Schmidt H A, von Haeseler A, Minh B Q. 2015. IQ-TREE: A fast and effective stochastic Algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol, 32: 268-274.

Nattestad M, Schatz M C. 2016. Assemblytics: a web analytics tool for the detection of variants from an assembly. Bioinformatics 32: 3021-3023.

Ou S, Jiang N. 2018. LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol, 176: 1410-1422.

Pertea M, Kim D, Pertea G M, Leek J T, Salzberg S L. 2016. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc, 11: 1650-1667.

Porebski S, Bailey L G, Baum B R. 1997. Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Mol Biol Rep, 15: 8-15.

Servant N, Varoquaux N, Lajoie B R, Viara E, Chen C J, Vert J P, Heard E, Dekker J, Barillot E. 2015. HiC-Pro: An optimized and flexible pipeline for Hi-C data processing. Genome Biol, 16: 259.

Stanke M, Steinkamp R, Waack S, Morgenstern B. 2004. AUGUSTUS: A web server for gene finding in eukaryotes. Nucl Acids Res, 32: W309-W312.

Thiel T, Michalek W, Varshney R, Graner A. 2003. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor Appl Genet, 106: 411-422.

Wang, Y., D. Coleman-Derr, G. Chen, Y. Gu. 2015. OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species.Nucl Acids Res, 43: W78-84.

Wingett S W, Ewels P, Furlan-Magaril M, Nagano T, Schoenfelder S, Fraser P, Andrews S. 2015. HiCUP: Pipeline for mapping and processing Hi-C data. F1000 Res, 4: 1310.

Xu Z, Wang H. 2007. LTR_FINDER: An efficient tool for the prediction of full-length LTR retrotransposons. Nucl Acids Res, 35: W265-W268.

Yang Z. 1997. PAML: A program package for phylogenetic analysis by maximum likelihood. Computer Appl Biosci, 13: 555-556.

Reference

View Option

[1]	Choi J Y, Lye Z N, Groen S C, Dai X G, Rughani P, Zaaijer S, Harrington E D, Juul S, Purugganan M D. 2020. Nanopore sequencing-based genome assembly and evolutionary genomics of circum-basmati rice. Genome Biol, 21(1): 21. [Cited Within:1]
[2]	Du H L, Yu Y, Ma Y F, Gao Q, Cao Y H, Chen Z, Ma B, Qi M, Li Y, Zhao X F, Wang J, Liu K F, Qin P, Yang X, Zhu L H, Li S G, Liang C Z. 2017. *Sequencing and de novo* assembly of a near complete indica rice genome*. Nat Commun*, 8(1): 15324. [Cited Within:2]
[3]	Fang C W, Li L, He R M, Wang D Q, Wang M, Hu Q, Ma Q R, Qin K Y, Feng X Y, Zhang G Q, Fu X L, Liu Z Q. 2019. Identification of S23 causing both interspecific hybrid male sterility and environment-conditioned male sterility in rice. Rice, 12(1): 10. [Cited Within:1]
[4]	Groen S C, Calic I, Joly-Lopez Z, Platts A E, Choi J Y, Natividad M, Dorph K, Mauck III W M, Bracken B, Cabral C L U, Kumar A, Torres R O, Satija R, Vergara G, Henry A, Franks S J, Purugganan M D. 2020. The strength and pattern of natural selection on gene expression in rice. Nature, 578: 572-576. [Cited Within:1]
[5]	Kawahara Y, de la Bastide M, Hamilton J P, Kanamori H, Mccombie W R, Ouyang S, Schwartz D C, Tanaka T, Wu J Z, Zhou S G, Childs K L, Davidson R M, Lin H N, Quesada-Ocampo L, Vaillancourt B, Sakai H, Lee S S, Kim J, Numa H, Itoh T, Buell C R, Matsumoto T. 2013. *Improvement of the Oryza sativa* Nipponbare reference genome using next generation sequence and optical map data*. Rice*, 6(1): 4. [Cited Within:1]
[6]	Kim S, Park J, Yeom S I, Kim Y M, Seo E, Kim K T, Kim M S, Lee J M, Cheong K, Shin H S, Kim S B, Han K, Lee J, Park M, Lee H A, Lee H Y, Lee Y, Oh S, Lee J H, Choi E, Choi E, Lee S E, Jeon J, Kim H, Choi G, Song H, Lee J, Lee S C, Kwon J K, Lee H Y, Koo N, Hong Y, Kim R W, Kang W H, Huh J H, Kang B C, Yang T J, Lee Y H, Bennetzen J L, Choi D. 2017. New reference genome sequences of hot pepper reveal the massive evolution of plant disease-resistance genes by retroduplication. Genome Biol, 18(1): 210-221. [Cited Within:1]
[7]	Kryazhimskiy S, Plotkin J B. 2008. The population genetics of dN/dS. PLoS Genet, 4(12): e1000304. [Cited Within:2]
[8]	Li W, Li K, Zhang Q J, Zhu T, Zhang Y, Shi C, Liu Y L, Xia E H, Jiang J J, Shi C, Zhang L P, Huang H, Tong Y, Liu Y, Zhang D, Zhao Y, Jiang W K, Zhao Y J, Mao S Y, Jiao J Y, Xu P Z, Yang L L, Yin G Y, Gao L Z. 2020. *Improved hybrid de novo* genome assembly and annotation of African wild rice, Oryza longistaminata, from Illumina and PacBio sequencing reads*. Plant Genome*, 13(1): e20001. [Cited Within:1]
[9]	Luan X, Dai Z J, Yang W F, Tan Q Y, Lu Q, Guo J, Zhu H T, Liu G F, Wang S K, Zhang G Q. 2019. Breeding by design of CMS lines on the platform of SSSL library in rice. Mol Breeding, 39(9): 126. [Cited Within:1]
[10]	Luo Y C, Ma T C, Zhang A F, Ong K H, Luo Z X, Li Z F, Yang J B, Yin Z C. 2017. Marker-assisted breeding of Chinese elite rice cultivar 9311 for disease resistance to rice blast and bacterial blight and tolerance to submergence. Mol Breeding, 37(8): 106. [Cited Within:1]
[11]	Rhoads A, Au K F. 2015. PacBio sequencing and its applications. Genom Proteom Bioinf, 13(5): 278-289. [Cited Within:1]
[12]	Sakai H, Lee S S, Tanaka T, Numa H, Kim J, Kawahara Y, Wakimoto H, Yang C C, Iwamoto M, Abe T, Yamada Y, Muto A, Inokuchi H, Ikemura T, Matsumoto T, Sasaki T, Itoh T. 2013. Rice Annotation Project Database (RAP-DB): An integrative and interactive database for rice genomics. Plant Cell Physiol, 54(2): e6. [Cited Within:1]
[13]	Shao L, Xing F, Xu C H, Zhang Q H, Che J, Wang X M, Song J M, Li X H, Xiao J H, Chen L L, Ouyang Y D, Zhang Q F. 2019. Patterns of genome-wide allele-specific expression in hybrid rice and the implications on the genetic basis of heterosis. Prol Natl Acad Sci USA, 116(12): 5653-5658. [Cited Within:1]
[14]	Tanaka T, Nishijima R, Teramoto S, Kitomi Y, Hayashi T, Uga Y, Kawakatsu T. 2020. De novo genome assembly of the indica rice variety IR64 using linked-read sequencing and nanopore sequencing. G3: Genes Genom Genet, 10(5): 1495-1501. [Cited Within:1]
[15]	The Rice Chromosomes 11 and 12 Sequencing Consortia. 2005. The sequence of rice chromosomes 11 and 12, rich in disease resistance genes and recent gene duplications. BMC Biol, 3(1): 20. [Cited Within:1]
[16]	van Berkum N L, Lieberman-Aiden E, Williams L, Imakaev M, Gnirke A, Mirny L A, Dekker J, Land er E S. 2010. Hi-C: A method to study the three-dimensional architecture of genomes. J Visualized Exp, 39: e1869. [Cited Within:1]
[17]	Wang S K, Li S, Liu Q, Wu K, Zhang J Q, Wang S S, Wang Y, Chen X B, Zhang Y, Gao C X, Wang F, Huang H X, Fu X D. 2015. The OsSPL16-GW7 regulatory module determines grain shape and simultaneously improves rice yield and grain quality. Nat Genet, 47(8): 949-954. [Cited Within:2]
[18]	Wang X L, Liu G F, Wang Z Q, Chen S L, Xiao Y L, Yu C Y. 2019. *Identification and application of major quantitative trait loci for panicle length in rice ( Oryza sativa) through single-segment substitution lines. Plant Breeding*, 138(3): 299-308. [Cited Within:1]
[19]	Wang Y, Coleman-Derr D, Chen G P, Gu Y Q. 2015. OrthoVenn: A web server for genome wide comparison and annotation of orthologous clusters across multiple species. Nucl Acids Res, 43: W78-W84. [Cited Within:1]
[20]	Wang Y P, Tang H B, Debarry J D, Tan X, Li J P, Wang X Y, Lee T H, Jin H Z, Marler B, Guo H, Kissinger J C, Paterson A H. 2012. MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucl Acids Res, 40(7): e49. [Cited Within:1]
[21]	Xie X R, Du H L, Tang H W, Tang J N, Tan X Y, Liu W Z, Li T, Lin Z S, Liang C Z, Liu Y G. 2020. *A chromosome-level genome assembly of the wild rice Oryza rufipogon* facilitates tracing the origins of Asian cultivated rice*. Sci China: Life Sci*, 5(1): 1-11. [Cited Within:1]
[22]	Zhang G Q. 2019. The platform of breeding by design based on the SSSL library in rice. Hereditas, 41(8): 754-760. [Cited Within:2]
[23]	Zhang J W, Chen L L, Xing F, Kudrna D A, Yao W, Copetti D, Mu T, Li W M, Song J M, Xie W B, Lee S, Talag J, Shao L, An Y, Zhang C L, Ouyang Y D, Sun S, Jiao W B, Lv F, Du B G, Luo M Z, Maldonado C E, Goicechea J L, Xiong L Z, Wu C Y, Xing Y Z, Zhou D X, Yu S B, Zhao B, Wang G W, Yu Y, Luo Y J, Zhou Z W, Hurtado B E P, Danowitz A, Wing R A, Zhang Q F. 2016. Extensive sequence divergence between the reference genomes of two elite indica rice varieties Zhenshan 97 and Minghui 63. Proc Natl Acad Sci USA, 113: E5163-E5171. [Cited Within:1]
[24]	Zhang Q, Liang Z, Cui X A, Ji C M, Li Y, Zhang P X, Liu J R, Riaz A, Yao P, Liu M, Wang Y P, Lu T G, Yu H, Yang D L, Zheng H K, Gu X F. 2018. *N-6-methyladenine DNA methylation in japonica* and indica rice genomes and its association with gene expression, plant development, and stress responses*. Mol Plant*, 11(12): 1492-1508. [Cited Within:1]
[25]	Zhang T Z, Hu Y, Jiang W K, Fang L, Guan X Y, Chen J D, Zhang J B, Saski C A, Scheffler B E, Stelly D M, Hulse-Kemp A M, Wan Q, Liu B L, Liu C X, Wang S, Pan M Q, Wang Y K, Wang D W, Ye W X, Chang L J, Zhang W P, Song Q X, Kirkbride R C, Chen X Y, Dennis E, Llewellyn D J, Peterson D G, Thaxton P, Jones D C, Wang Q, Xu X Y, Zhang H, Wu H T, Zhou L, Mei G F, Chen S Q, Tian Y, Xiang D, Li X H, Ding J, Zuo Q Y, Tao L N, Liu Y C, Li J, Lin Y, Hui Y Y, Cao Z S, Cai C P, Zhu X F, Jiang Z, Zhou B L, Guo W Z, Li R Q, Chen Z J. 2015. *Sequencing of allotetraploid cotton ( Gossypium hirsutum* L. acc. TM-1) provides a resource for fiber improvement*. Nat Biotechnol*, 33(5): 531-537. [Cited Within:1]
[26]	Zhao H W, Sun L L, Xiong T Y, Wang Z Q, Liao Y, Zou T, Zheng M M, Zhang Z, Pan X P, He N, Zhang G Q, Zhu H T, Liu Z Q, He P, Fu X L. 2019. *Genetic characterization of the chromosome single-segment substitution lines of O. glumaepatula* and O. barthiiand identification of QTLs for yield-related traits*. Mol Breeding*, 39(4): 51. [Cited Within:1]
[27]	Zhou Y L, Xie Y H, Cai J L, Liu C B, Zhu H T, Jiang R, Zhong Y Y, Zhang G L, Tan B, Liu G F, Fu X L, Liu Z Q, Wang S K, Zhang G Q, Zeng R Z. 2017. Substitution mapping of QTLs controlling seed dormancy using single segment substitution lines derived from multiple cultivated rice donors in seven cropping seasons. Theor Appl Genet, 130(6): 1191-1205. [Cited Within:1]

2020

0.0

... sativa, Basmati 334 and DomSufid (Choi et al, 2020 ...

2017

0.0

... Sakai et al, 2013), IR64 (Tanaka et al, 2020), 93-11 (Zhang et al, 2018) and R498 (Du et al, 2017) at chromosome level, and Minghui 63 and Zhenshan 97 (Zhang et al, 2016) at scaffold level have been assembled, annotated and released, among which the R498 and NPB genomes are widely used as reference genomes in rice research ...

... Previous research reported that the two subspecies of rice, indica and japonica, have experienced independent amplification or loss of LTR-RTs after the divergence (Du et al, 2017) ...

2019

0.0

... Fang et al, 2019 ...

2020

0.0

... Groen et al, 2020) ...

2013

0.0

... The genomes of several cultivated rice varieties including Nipponbare (NPB) (Kawahara et al, 2013 ...

2017

0.0

... This result tallies with the previous report that the chromosome length difference was most probably due to the changes in tandem/repeat regions (Kim et al, 2017) ...

2008

0.0

... 5 than HJX74/NPB from the density curve of dN/dS (Kryazhimskiy and Plotkin, 2008), which suggested more genes in HJX74 were positively selected when compared with R498 than the comparation with NPB (Fig ...

... (Yang, 1997) to calculated the dN/dS values (Kryazhimskiy and Plotkin, 2008) for HJX74 and NPB, R498 ...

2020

0.0

... 5% of the 248 embryophyte core genes from the Core Eukaryotic Genes Mapping Approach (CEGMA) database (Li et al, 2020) (Table S2) ...

2019

0.0

... In addition, the SSSL library has provided a powerful platform for rice breeding by design (Luan et al, 2019 ...

2017

0.0

... ) is grown nearly worldwide and provides the staple food for more than half of the global population (Luo et al, 2017) ...

2015

0.0

... We produced a high-precision HJX74 chromosomal genome by performing whole-genome sequencing in the PacBio platform (Rhoads and Au, 2015), followed by the Hi-C-assisted assembly mount technology (van Berkum et al, 2010) ...

2013

0.0

2019

0.0

... In previous studies, considerable progress has been made by combining bioinformatics and whole genome sequencing methods (such as RNA-seq and genome-wide association study) with traditional molecular biology methods for germplasm resource mining and molecular breeding in rice (Shao et al, 2019 ...

2020

0.0

2005

0.0

2010

0.0

2015

0.0

... The SSSL library has made a great contribution to the identification of QTLs/genes involved in disease resistance, fertility, panicle length, stress resistance, grain shape determination and so on (Wang S K et al, 2015 ...

... 0%) form 27 202 clusters with genes from 11 other Oryza species, whereas 7 991 genes present singletons in the OrthoVenn2 (Wang Y et al, 2015) ...

2019

0.0

... Wang et al, 2019) ...

2015

0.0

2012

0.0

... We performed a whole-genome comparison to examine the synteny between the HJX74 and R498/NPB genomes using the python version program MCScanX (Wang et al, 2012) ...

2020

0.0

... Xie et al, 2020) ...

2019

0.0

... Hence, all these SSSLs are in the same genetic background (Zhang, 2019) ...

... As a platform variety, HJX74 has been implemented to construct a large SSSL library with 2 360 independent lines (Zhang, 2019) ...

2016

0.0

2018

0.0

2015

0.0

... Combining ab initio, protein and expressed sequence tag (EST) evidences with consensus gene prediction (Zhang et al, 2015), we annotated the HJX74 genome with 46 993 non-redundant genes ...

2019

0.0

... Zhao et al, 2019) ...

2017

0.0

... The SSSL library has an excellent application prospect in rice breeding by design and QTL/gene identifications (Zhou et al, 2017) ...