Characteristics and phylogenetic analysis of the complete chloroplast genome of Cupressus tonkinensis (Cupressaceae)
Article information
Abstract
Cupressus tonkinensis Silba (1994), locally known as “Hoàng Đàn Hữu Liên,” is a critically endangered conifer endemic to the limestone karst of Huu Lien Nature Reserve in Lang Son Province of northern Vietnam. Despite its high ecological, cultural, and economic value, this species faces severe threats from overexploitation and habitat degradation, and its taxonomic identity remains controversial. To address these issues, we reported the complete chloroplast (cp) genome of C. tonkinensis, which was assembled using high-throughput sequencing data and annotated against the closely related Cupressus plastome. The cp genome was 128,000 bp in length and encodes 119 genes, consisting of 82 protein-coding genes, 33 tRNA genes, and 4 rRNA genes. The genome exhibited the typical composition of large single-copy regions, small single-copy regions, and a pair of inverted repeat regions of 91,607bp, 35,945bp, and 224bp, respectively. A comparative analysis revealed high structural conservation among the Cupressus plastome, while phylogenetic reconstruction based on 82 concatenated protein-coding genes resolved C. tonkinensis as one of the earliest diverging lineages within the genus. This robust phylogenetic placement helps clarify long-standing taxonomic ambiguities and reinforces the recognition of a distinct clade of Tonkinensis. This study provides the first complete cp genome of C. tonkinensis from its type locality in northern Vietnam. These genomic resources serve as a critical foundation for further evolutionary studies, species identification, and conservation planning, supporting urgent efforts to safeguard this unique and irreplaceable conifer lineage.
INTRODUCTION
Cupressus tonkinensis Silba (1994), locally known in Vietnam as “Hoàng Đàn Hữu Liên”, is a rare and critically endangered conifer species of the Cupressaceae family. Its distribution is extremely narrow, being confined to the Huu Lien Nature Reserve, Lang Son Province, northern Vietnam (Little et al., 2011; Van The et al., 2013; Plants of The World, 2025). The species has long been valued for its durable, fragrant wood and essential oils, leading to overexploitation (Little et al., 2011; Van The et al., 2013). Combined with habitat degradation, this has caused a drastic population decline, placing C. tonkinensis among the most threatened conifers in Vietnam and making conservation a pressing priority (Ministry of Science and Technology, 2007).
Despite its ecological and economic importance, the species’ taxonomic identity remains controversial. Various scientific names have been applied in different studies, reflecting inconsistencies in morphological classification and highlighting the need for molecular data to resolve its systematic position (Rushforth, 2007). This taxonomic uncertainty is not only of academic concern but also has practical implications for conservation because precise species delimitation is fundamental for the design of effective management strategies.
With the advent of high-throughput sequencing, chloroplast (cp) genomes have become an indispensable resource in plant systematics and conservation genetics (Nguyen et al., 2023, 2024). Their relatively conserved structure, uniparental inheritance, and abundant phylogenetically informative markers make them particularly valuable for resolving taxonomic disputes, reconstructing evolutionary relationships, and developing species-specific molecular tools (Dobrogojski et al., 2020; Zhang et al., 2023).
In this context, assembling the complete cp genome of C. tonkinensis is both timely and essential. This study provides a reliable genetic reference to clarify its taxonomic status, sheds light on its evolutionary history within the Cupressaceae family, and establishes a foundation for population genetic studies. More importantly, these genomic resources directly contribute to conservation planning and sustainable management, ensuring the long-term survival of this unique endemic species.
MATERIALS AND METHODS
Sample collection and genomic DNA extraction
Fresh leaves of C. tonkinensis were collected from the Huu Lien Nature Reserve in Lang Son Province, Vietnam, and immediately preserved in silica gel. A voucher specimen was deposited in the herbarium of the Institute of Forest Tree Improvement and Biotechnology - Vietnamese Academy of Forest Sciences under the accession number HDHL_02_2024. Total genomic DNA was extracted from silica-dried leaf tissue using the DNeasy Plant Mini Kit (Cat. No. 69104, Qiagen, Hilden, Germany) according to the manufacturer’s protocol. Whole-genome sequencing was performed using the MGI DNBSeq-G99 platform (KTest Science Co., Ltd, Ho Chi Minh City, Vietnam).
Sequencing, assembly, and annotation of the cp genome
Raw sequencing reads were quality-filtered using the fastp tool version 0.24.2 with the following default parameters: a Phred quality score threshold of 15 and a minimum read length of 100 bp (Chen, 2023). High-quality reads were then used to assemble the complete cp genome of C. tonkinensis using GetOrganelle version 1.7.7.1 (Jin et al., 2020). Genome annotation was conducted using the GeSeq online platform (https://chlorobox.mpimp-golm.mpg.de/geseq.html) (Tillich et al., 2017). The resulting annotation was manually inspected and refined in Geneious Prime version 2025.0.2, using reference cp genomes of C. tonkinensis (GenBank: NC_039562), Cupressus gigantea (GenBank: NC_028155), and Cupressus duclouxiana (GenBank: NC_065031). The complete annotated C. tonkinensis cp genome was submitted to GenBank under accession number PV786099. A circular genome map was generated using the OGDraw tool (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html) (Greiner et al., 2019).
Phylogenetic analysis
To clarify the phylogenetic position of C. tonkinensis relative to related species, 33 complete cp genome sequences from Cupressaceae and Taxaceae were retrieved from GenBank (Table 1). The species Amentotaxus yunnanensis (belonging to the Taxaceae family) was selected as an outgroup. Protein-coding sequences from all genomes were extracted and aligned using MAFFT version 7 (Katoh and Standley, 2013). Poorly aligned regions and gaps were removed using TrimAl version 1.5.0 in “automated1” mode (Capella-Gutiérrez et al., 2009). The resulting alignment was uploaded into IQ-TREE for phylogenetic tree reconstruction. ModelFinder version 1.5.4 was used to determine the best-fit substitution model, which selected GTR + GAMMA (Kalyaanamoorthy et al., 2017). Phylogenetic reconstruction was performed using the maximum likelihood (ML) method with 1,000 ultrafast bootstrap replicates to assess nodal support. The final phylogenetic tree was visualized using the iTOL web server (https://itol.embl.de/upload.cgi) (Letunic and Bork, 2021).
RESULTS
Chloroplast genome characteristics of Cupressus tonkinensis
The complete cp genome of C. tonkinensis is a circular, double-stranded DNA molecule of 128,000 bp (Fig. 1). The overall GC content of the genome is 34.7%. A total of 119 genes were annotated, including 82 protein-coding genes (PCGs), 33 transfer RNA (tRNA) genes, and 4 ribosome RNA (rRNA) genes (Table 2). Among these, 14 genes (atpF, ndhA, ndhB, petB, petD, rpl16, rpl2, rpoC1, trnA-UGC, trnG-UCC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC) harbor a single intron. Notably, two genes (pafI and rps12) each contains two introns (Tables 2, 3). The rps12 gene is a well-characterized trans-spliced gene.
Map of the chloroplast genome of Cupressus tonkinensis. Genes on the inner side of the circle are transcribed clockwise, whereas those on the outer side are transcribed counterclockwise. Genes are categorized by function and color-coded accordingly. The dark gray shading of the inner ring indicates regions with high GC content, while light gray areas represent AT-rich regions. SSC, small single-copy; LSC, large single-copy.
Phylogenetic analysis
In this study, we used PCGs from the cp genomes of 32 species within the Cupressaceae family to resolve the phylogenetic position of C. tonkinensis (Fig. 2). The ML phylogenetic tree demonstrated high overall support, with 96% of the nodes (32 out of 33) showing bootstrap values of >90%. The resulting tree resolved all sampled Cupressaceae taxa into four major and well-supported monophyletic clades (bootstrap value = 100), corresponding to the currently recognized subfamilies, including Sequoioideae, Taxodioideae, Callitroideae, and Cupressoideae.
The phylogenetic tree of the Cupressaceae family is based on the protein-coding sequences of concatenated chloroplasts. The tree was reconstructed using the maximum likelihood method with 1,000 ultrafast bootstrap replicates. Numbers next to the branches indicate bootstrap support values. For clarity, values of 100 are not shown. The newly assembled Cupressus tonkinensis sequence in this study is highlighted in bold.
All sampled Cupressus species within the Cupressoideae subfamily formed a distinct, strongly supported clade (bootstrap value = 99). Our C. tonkinensis accession (GenBank: PV786099, from Vietnam) clustered closely with the previously reported C. tonkinensis cp genome (GenBank: NC_039562), reflecting high sequence similarity and low inter-accession divergence. Furthermore, C. tonkinensis was one of the earliest diverging lineages within the Cupressus genus based on the currently available cp genomic data.
DISCUSSION
Cupressus tonkinensis is a critically endangered conifer species that is endemic to northern Vietnam (Little et al., 2011; Van The et al., 2013; Plants of The World, 2025) with an extremely limited distribution confined to the limestone karst landscapes of Huu Lung District, Lang Son Province. This species has undergone a sharp population decline due to overexploitation and extensive habitat degradation (Little et al., 2011; Van The et al., 2013). This study contributes to the genomic resources available for C. tonkinensis by assembling its complete cp genome as part of ongoing genomic conservation efforts, representing an effort toward the species’ long-term conservation at the molecular level. Importantly, the previously available C. tonkinensis cp genome (GenBank: NC_039562) was derived from outside the species’ native range (the species is endemic to Vietnam) (Zhu et al., 2018). To better represent to Vietnamese populations, we generated a Vietnam-specific reference plastome from native individuals. This reference is intended to capture provenance-specific (regional) genomic features, thereby improving downstream applications in conservation genomics, population structure and phylogeography, and DNA barcoding for C. tonkinensis in Vietnam. In this study, we successfully assembled the cp genome from whole-genome sequencing reads without prior cp enrichment, a strategy that has proven effective in recent plastid genome studies (Nguyen et al., 2025).
The newly sequenced cp genome is 128,000 bp in length, which falls within the observed genome size range of 127,835 bp (C. tonkinensis) to 129,959 bp (C. duclouxiana) (Table 4). The overall GC content was remarkably consistent across species, varying only slightly between 34.6% and 34.7% (Zhang et al., 2017; Li et al., 2019; Chen et al., 2022). The cp genome of C. tonkinensis comprises 119 genes, including 82 PCGs, 33 tRNA genes, and 4 rRNA genes. Its gene content is consistent with other Cupressus species, such as C. sempervirens, C. gigantea, C. chengiana, C. torulosa, and C. duclouxiana, all of which exhibit identical gene counts across PCGs and RNA genes (Zhang et al., 2017; Li et al., 2019; Chen et al., 2022). However, a minor discrepancy was noted when compared with the previously published plastome of C. tonkinensis (GenBank: NC_039562), which encodes 118 genes in total, including 32 tRNA genes, lacking trnQ-UUG gene. But, annotation-driven differences in reported plastome gene counts have been noted previously (Kahraman and Lucas, 2019) and do not affect conclusions regarding core plastid genes or downstream analyses. Additionally, the cp genome of C. tonkinensis (GenBank: PV786099) shares 96.4–97.5% identity with other Cupressus species, corresponding to ~3,218–4,827 nucleotide differences-highest similarity to C. chengiana (97.5%; 3,218 differences) and the lowest to C. duclouxiana (96.3%; 4,827 differences) (Table 5). As expected, it is nearly identical to the previously published C. tonkinensis plastome (GenBank: NC_039562), with 99.7% identity and only 362 differences. These findings show that the C. tonkinensis cp genome conforms to the highly conserved Cupressus plastome architecture and gene content, and it exhibits high interspecific similarity but near-identity to the previous C. tonkinensis assembly.
In most plant lineages, the cp genome has emerged as a robust tool for species identification and phylogenetic reconstruction, often regarded as a “super-barcoding” marker due to its high resolution and uniparental inheritance (Thi Huynh et al., 2024; Nguyen et al., 2025). However, earlier phylogenetic studies of C. tonkinensis primarily used cp fragments such as matK and rbcL or nuclear ribosomal ITS regions (Terry et al., 2018; Pham et al., 2021). Although useful, these markers provide limited resolution, which may hinder the accurate inference of phylogenetic relationships. To address this limitation, the ML phylogeny was reconstructed using 82 unique PCGs from cp genomes. The resulting tree topology was congruent with previous plastome-based phylogenetic frameworks and strongly supported the placement of C. tonkinensis within a well-resolved lineage (Zhu et al., 2018; Ping et al., 2021).
In terms of taxonomic placement, Terry et al. (2018) performed a phylogenetic analysis based on 5.4 kb of concatenated cp and nuclear loci, identifying three major lineages within the Cupressus genus, including the Duclouxiana, Tonkinensis, and Sempervirens clades (Terry et al., 2018), and placed C. tonkinensis within the Tonkinensis clade. In our study, based on the available cp genome of Cupressus species, C. tonkinensis, representing the Tonkinensis clade, was consistently recovered as a distinct and well-supported lineage, clearly separated from the Duclouxiana and Sempervirens clades. These results reinforced the taxonomy that C. tonkinensis corresponds to the Tonkinensis clade, as originally delineated by Terry et al. (2018).
In this study, we successfully sequenced and characterized the complete cp genome of C. tonkinensis. The genome size of C. tonkinensis was consistent with previously reported cp genome lengths of 128 kb. Comparative analysis of seven Cupressus plastomes revealed a conserved gene content, with each genome comprising 118–119 genes, including 82 PCGs, 32 or 33 tRNA genes, and 4 rRNA genes. Phylogenetic reconstruction based on concatenated PCGs supported the monophyly of the Cupressus genus, with all species forming a single well-supported clade. The species C. tonkinensis was identified as one of the earliest diverging lineages within the genus. Overall, our study provides valuable genomic resources for further evolutionary, phylogenetic, and conservation studies of the Cupressus genus.
Notes
ACKNOWLEDGMENTS
This work was conducted within the framework of “Research on genetic resource conservation and development of Cupressus tonkinensis Silba. J. in the northern part of Vietnam (Pro. No. NVQG-2020/ĐT.20)”, funded by the Ministry of Science and Technology (MOST). The author would like to thank MOST for the Project.
CONFLICTS OF INTEREST
The authors declare that there are no conflicts of interest.
