Characteristics and phylogenetic analysis of the complete chloroplast genome of Cupressus tonkinensis (Cupressaceae)

Article information

Korean J. Pl. Taxon. 2025;55(4):275-282
Publication date (electronic) : 2025 December 31
doi : https://doi.org/10.11110/kjpt.2025.55.4.275
1Institute of Forest Tree Improvement and Biotechnology, Vietnamese Academy of Forest Sciences, Hanoi, 10000, Vietnam
2School of Biological Sciences, University of Tasmania, Hobart, 7005, Australia
3Forest Science Centre of North-Eastern Vietnam, Vietnamese Academy of Forest Sciences, Hanoi, 10000, Vietnam
4Thai Nguyen University of Agriculture and Forestry, Thai Nguyen, 24000, Vietnam
5Tay Bac University, Son La, 34000, Vietnam
6Department of Microbiology - Parasitology, School of Pharmacy, University of Medicine and Pharmacy at HoChi Minh City, Ho Chi Minh City, 70000, Vietnam
Corresponding author: Son LE, E-mail: leson@vafs.gov.vn, son.le@utas.edu.au. Minh Trong QUANG, E-mail: qtminh@ump.edu.vn
Editor: Sang-Tae KIM
Received 2025 August 27; Revised 2025 October 24; Accepted 2025 November 22.

Abstract

Cupressus tonkinensis Silba (1994), locally known as “Hoàng Đàn Hữu Liên,” is a critically endangered conifer endemic to the limestone karst of Huu Lien Nature Reserve in Lang Son Province of northern Vietnam. Despite its high ecological, cultural, and economic value, this species faces severe threats from overexploitation and habitat degradation, and its taxonomic identity remains controversial. To address these issues, we reported the complete chloroplast (cp) genome of C. tonkinensis, which was assembled using high-throughput sequencing data and annotated against the closely related Cupressus plastome. The cp genome was 128,000 bp in length and encodes 119 genes, consisting of 82 protein-coding genes, 33 tRNA genes, and 4 rRNA genes. The genome exhibited the typical composition of large single-copy regions, small single-copy regions, and a pair of inverted repeat regions of 91,607bp, 35,945bp, and 224bp, respectively. A comparative analysis revealed high structural conservation among the Cupressus plastome, while phylogenetic reconstruction based on 82 concatenated protein-coding genes resolved C. tonkinensis as one of the earliest diverging lineages within the genus. This robust phylogenetic placement helps clarify long-standing taxonomic ambiguities and reinforces the recognition of a distinct clade of Tonkinensis. This study provides the first complete cp genome of C. tonkinensis from its type locality in northern Vietnam. These genomic resources serve as a critical foundation for further evolutionary studies, species identification, and conservation planning, supporting urgent efforts to safeguard this unique and irreplaceable conifer lineage.

INTRODUCTION

Cupressus tonkinensis Silba (1994), locally known in Vietnam as “Hoàng Đàn Hữu Liên”, is a rare and critically endangered conifer species of the Cupressaceae family. Its distribution is extremely narrow, being confined to the Huu Lien Nature Reserve, Lang Son Province, northern Vietnam (Little et al., 2011; Van The et al., 2013; Plants of The World, 2025). The species has long been valued for its durable, fragrant wood and essential oils, leading to overexploitation (Little et al., 2011; Van The et al., 2013). Combined with habitat degradation, this has caused a drastic population decline, placing C. tonkinensis among the most threatened conifers in Vietnam and making conservation a pressing priority (Ministry of Science and Technology, 2007).

Despite its ecological and economic importance, the species’ taxonomic identity remains controversial. Various scientific names have been applied in different studies, reflecting inconsistencies in morphological classification and highlighting the need for molecular data to resolve its systematic position (Rushforth, 2007). This taxonomic uncertainty is not only of academic concern but also has practical implications for conservation because precise species delimitation is fundamental for the design of effective management strategies.

With the advent of high-throughput sequencing, chloroplast (cp) genomes have become an indispensable resource in plant systematics and conservation genetics (Nguyen et al., 2023, 2024). Their relatively conserved structure, uniparental inheritance, and abundant phylogenetically informative markers make them particularly valuable for resolving taxonomic disputes, reconstructing evolutionary relationships, and developing species-specific molecular tools (Dobrogojski et al., 2020; Zhang et al., 2023).

In this context, assembling the complete cp genome of C. tonkinensis is both timely and essential. This study provides a reliable genetic reference to clarify its taxonomic status, sheds light on its evolutionary history within the Cupressaceae family, and establishes a foundation for population genetic studies. More importantly, these genomic resources directly contribute to conservation planning and sustainable management, ensuring the long-term survival of this unique endemic species.

MATERIALS AND METHODS

Sample collection and genomic DNA extraction

Fresh leaves of C. tonkinensis were collected from the Huu Lien Nature Reserve in Lang Son Province, Vietnam, and immediately preserved in silica gel. A voucher specimen was deposited in the herbarium of the Institute of Forest Tree Improvement and Biotechnology - Vietnamese Academy of Forest Sciences under the accession number HDHL_02_2024. Total genomic DNA was extracted from silica-dried leaf tissue using the DNeasy Plant Mini Kit (Cat. No. 69104, Qiagen, Hilden, Germany) according to the manufacturer’s protocol. Whole-genome sequencing was performed using the MGI DNBSeq-G99 platform (KTest Science Co., Ltd, Ho Chi Minh City, Vietnam).

Sequencing, assembly, and annotation of the cp genome

Raw sequencing reads were quality-filtered using the fastp tool version 0.24.2 with the following default parameters: a Phred quality score threshold of 15 and a minimum read length of 100 bp (Chen, 2023). High-quality reads were then used to assemble the complete cp genome of C. tonkinensis using GetOrganelle version 1.7.7.1 (Jin et al., 2020). Genome annotation was conducted using the GeSeq online platform (https://chlorobox.mpimp-golm.mpg.de/geseq.html) (Tillich et al., 2017). The resulting annotation was manually inspected and refined in Geneious Prime version 2025.0.2, using reference cp genomes of C. tonkinensis (GenBank: NC_039562), Cupressus gigantea (GenBank: NC_028155), and Cupressus duclouxiana (GenBank: NC_065031). The complete annotated C. tonkinensis cp genome was submitted to GenBank under accession number PV786099. A circular genome map was generated using the OGDraw tool (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html) (Greiner et al., 2019).

Phylogenetic analysis

To clarify the phylogenetic position of C. tonkinensis relative to related species, 33 complete cp genome sequences from Cupressaceae and Taxaceae were retrieved from GenBank (Table 1). The species Amentotaxus yunnanensis (belonging to the Taxaceae family) was selected as an outgroup. Protein-coding sequences from all genomes were extracted and aligned using MAFFT version 7 (Katoh and Standley, 2013). Poorly aligned regions and gaps were removed using TrimAl version 1.5.0 in “automated1” mode (Capella-Gutiérrez et al., 2009). The resulting alignment was uploaded into IQ-TREE for phylogenetic tree reconstruction. ModelFinder version 1.5.4 was used to determine the best-fit substitution model, which selected GTR + GAMMA (Kalyaanamoorthy et al., 2017). Phylogenetic reconstruction was performed using the maximum likelihood (ML) method with 1,000 ultrafast bootstrap replicates to assess nodal support. The final phylogenetic tree was visualized using the iTOL web server (https://itol.embl.de/upload.cgi) (Letunic and Bork, 2021).

The species used in phylogenetic analysis.

RESULTS

Chloroplast genome characteristics of Cupressus tonkinensis

The complete cp genome of C. tonkinensis is a circular, double-stranded DNA molecule of 128,000 bp (Fig. 1). The overall GC content of the genome is 34.7%. A total of 119 genes were annotated, including 82 protein-coding genes (PCGs), 33 transfer RNA (tRNA) genes, and 4 ribosome RNA (rRNA) genes (Table 2). Among these, 14 genes (atpF, ndhA, ndhB, petB, petD, rpl16, rpl2, rpoC1, trnA-UGC, trnG-UCC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC) harbor a single intron. Notably, two genes (pafI and rps12) each contains two introns (Tables 2, 3). The rps12 gene is a well-characterized trans-spliced gene.

Fig. 1

Map of the chloroplast genome of Cupressus tonkinensis. Genes on the inner side of the circle are transcribed clockwise, whereas those on the outer side are transcribed counterclockwise. Genes are categorized by function and color-coded accordingly. The dark gray shading of the inner ring indicates regions with high GC content, while light gray areas represent AT-rich regions. SSC, small single-copy; LSC, large single-copy.

The gene composition of the Cupressus tonkinensis chloroplast genome.

Genes harboring introns in the chloroplast genome of Cupressus tonkinensis.

Phylogenetic analysis

In this study, we used PCGs from the cp genomes of 32 species within the Cupressaceae family to resolve the phylogenetic position of C. tonkinensis (Fig. 2). The ML phylogenetic tree demonstrated high overall support, with 96% of the nodes (32 out of 33) showing bootstrap values of >90%. The resulting tree resolved all sampled Cupressaceae taxa into four major and well-supported monophyletic clades (bootstrap value = 100), corresponding to the currently recognized subfamilies, including Sequoioideae, Taxodioideae, Callitroideae, and Cupressoideae.

Fig. 2

The phylogenetic tree of the Cupressaceae family is based on the protein-coding sequences of concatenated chloroplasts. The tree was reconstructed using the maximum likelihood method with 1,000 ultrafast bootstrap replicates. Numbers next to the branches indicate bootstrap support values. For clarity, values of 100 are not shown. The newly assembled Cupressus tonkinensis sequence in this study is highlighted in bold.

All sampled Cupressus species within the Cupressoideae subfamily formed a distinct, strongly supported clade (bootstrap value = 99). Our C. tonkinensis accession (GenBank: PV786099, from Vietnam) clustered closely with the previously reported C. tonkinensis cp genome (GenBank: NC_039562), reflecting high sequence similarity and low inter-accession divergence. Furthermore, C. tonkinensis was one of the earliest diverging lineages within the Cupressus genus based on the currently available cp genomic data.

DISCUSSION

Cupressus tonkinensis is a critically endangered conifer species that is endemic to northern Vietnam (Little et al., 2011; Van The et al., 2013; Plants of The World, 2025) with an extremely limited distribution confined to the limestone karst landscapes of Huu Lung District, Lang Son Province. This species has undergone a sharp population decline due to overexploitation and extensive habitat degradation (Little et al., 2011; Van The et al., 2013). This study contributes to the genomic resources available for C. tonkinensis by assembling its complete cp genome as part of ongoing genomic conservation efforts, representing an effort toward the species’ long-term conservation at the molecular level. Importantly, the previously available C. tonkinensis cp genome (GenBank: NC_039562) was derived from outside the species’ native range (the species is endemic to Vietnam) (Zhu et al., 2018). To better represent to Vietnamese populations, we generated a Vietnam-specific reference plastome from native individuals. This reference is intended to capture provenance-specific (regional) genomic features, thereby improving downstream applications in conservation genomics, population structure and phylogeography, and DNA barcoding for C. tonkinensis in Vietnam. In this study, we successfully assembled the cp genome from whole-genome sequencing reads without prior cp enrichment, a strategy that has proven effective in recent plastid genome studies (Nguyen et al., 2025).

The newly sequenced cp genome is 128,000 bp in length, which falls within the observed genome size range of 127,835 bp (C. tonkinensis) to 129,959 bp (C. duclouxiana) (Table 4). The overall GC content was remarkably consistent across species, varying only slightly between 34.6% and 34.7% (Zhang et al., 2017; Li et al., 2019; Chen et al., 2022). The cp genome of C. tonkinensis comprises 119 genes, including 82 PCGs, 33 tRNA genes, and 4 rRNA genes. Its gene content is consistent with other Cupressus species, such as C. sempervirens, C. gigantea, C. chengiana, C. torulosa, and C. duclouxiana, all of which exhibit identical gene counts across PCGs and RNA genes (Zhang et al., 2017; Li et al., 2019; Chen et al., 2022). However, a minor discrepancy was noted when compared with the previously published plastome of C. tonkinensis (GenBank: NC_039562), which encodes 118 genes in total, including 32 tRNA genes, lacking trnQ-UUG gene. But, annotation-driven differences in reported plastome gene counts have been noted previously (Kahraman and Lucas, 2019) and do not affect conclusions regarding core plastid genes or downstream analyses. Additionally, the cp genome of C. tonkinensis (GenBank: PV786099) shares 96.4–97.5% identity with other Cupressus species, corresponding to ~3,218–4,827 nucleotide differences-highest similarity to C. chengiana (97.5%; 3,218 differences) and the lowest to C. duclouxiana (96.3%; 4,827 differences) (Table 5). As expected, it is nearly identical to the previously published C. tonkinensis plastome (GenBank: NC_039562), with 99.7% identity and only 362 differences. These findings show that the C. tonkinensis cp genome conforms to the highly conserved Cupressus plastome architecture and gene content, and it exhibits high interspecific similarity but near-identity to the previous C. tonkinensis assembly.

The structural features of the complete chloroplast genomes across eight Cupressus species.

Pairwise similarity and nucleotide differences among Cupressus chloroplast genomes.

In most plant lineages, the cp genome has emerged as a robust tool for species identification and phylogenetic reconstruction, often regarded as a “super-barcoding” marker due to its high resolution and uniparental inheritance (Thi Huynh et al., 2024; Nguyen et al., 2025). However, earlier phylogenetic studies of C. tonkinensis primarily used cp fragments such as matK and rbcL or nuclear ribosomal ITS regions (Terry et al., 2018; Pham et al., 2021). Although useful, these markers provide limited resolution, which may hinder the accurate inference of phylogenetic relationships. To address this limitation, the ML phylogeny was reconstructed using 82 unique PCGs from cp genomes. The resulting tree topology was congruent with previous plastome-based phylogenetic frameworks and strongly supported the placement of C. tonkinensis within a well-resolved lineage (Zhu et al., 2018; Ping et al., 2021).

In terms of taxonomic placement, Terry et al. (2018) performed a phylogenetic analysis based on 5.4 kb of concatenated cp and nuclear loci, identifying three major lineages within the Cupressus genus, including the Duclouxiana, Tonkinensis, and Sempervirens clades (Terry et al., 2018), and placed C. tonkinensis within the Tonkinensis clade. In our study, based on the available cp genome of Cupressus species, C. tonkinensis, representing the Tonkinensis clade, was consistently recovered as a distinct and well-supported lineage, clearly separated from the Duclouxiana and Sempervirens clades. These results reinforced the taxonomy that C. tonkinensis corresponds to the Tonkinensis clade, as originally delineated by Terry et al. (2018).

In this study, we successfully sequenced and characterized the complete cp genome of C. tonkinensis. The genome size of C. tonkinensis was consistent with previously reported cp genome lengths of 128 kb. Comparative analysis of seven Cupressus plastomes revealed a conserved gene content, with each genome comprising 118–119 genes, including 82 PCGs, 32 or 33 tRNA genes, and 4 rRNA genes. Phylogenetic reconstruction based on concatenated PCGs supported the monophyly of the Cupressus genus, with all species forming a single well-supported clade. The species C. tonkinensis was identified as one of the earliest diverging lineages within the genus. Overall, our study provides valuable genomic resources for further evolutionary, phylogenetic, and conservation studies of the Cupressus genus.

Notes

ACKNOWLEDGMENTS

This work was conducted within the framework of “Research on genetic resource conservation and development of Cupressus tonkinensis Silba. J. in the northern part of Vietnam (Pro. No. NVQG-2020/ĐT.20)”, funded by the Ministry of Science and Technology (MOST). The author would like to thank MOST for the Project.

CONFLICTS OF INTEREST

The authors declare that there are no conflicts of interest.

References

Capella-Gutiérrez S., Silla-Martínez J. M., Gabaldón T.. 2009;trimAl: A tool for automated alignment trimming in largescale phylogenetic analyses. Bioinformatics 25:1972–1973.
Chen S.. 2023;Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta 2:e107.
Chen C., Xia X., Peng J., Wang D.. 2022;Comparative analyses of six complete chloroplast genomes from the genus Cupressus and Juniperus (Cupressaceae). Gene 837:146696.
Dobrogojski J., Adamiec M., Luciński R.. 2020;The chloroplast genome: A review. Acta Physiologiae Plantarum 42:98.
Greiner S., Lehwark P., Bock R.. 2019;OrganellarGenome-DRAW (OGDRAW) version 1.3.1: Expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Research 47:W59–W64.
Jin J.-J., Yu W.-B., Yang J.-B., Song Y., dePamphilis C. W., Yi T.-S., Li D.-Z.. 2020;GetOrganelle: A fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biology 21:241.
Kahraman K., Lucas S. J.. 2019;Comparison of different annotation tools for characterization of the complete chloroplast genome of Corylus avellana cv Tombul. BMC Genomics 20:874.
Kalyaanamoorthy S., Minh B. Q., Wong T. K. F., von Haeseler A., Jermiin L. S.. 2017;ModelFinder: Fast model selection for accurate phylogenetic estimates. Nature Methods 14:587–589.
Katoh K., Standley D. M.. 2013;MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution 30:772–780.
Letunic I., Bork P.. 2021;Interactive Tree Of Life (iTOL) v5: An online tool for phylogenetic tree display and annotation. Nucleic Acids Research 49:W293–W296.
Li J., Wu J., Zhang L., Zhang L., Wang L., Mao K.. 2019;The complete chloroplast genome of Cupressus jiangeensis (cupressaceae), a critically endangered conifer species in China. Conservation Genetics Resources 11:67–69.
Little D. P., Thomas P., Nguyen H. T., Phan L. K.. 2011;Before it had a name: Diagnostic characteristics, geographic distribution, and the conservation of Cupressus tonkinensis (Cupressaceae). Brittonia 63:171–196.
Ministry of Science and Technology (MOST). 2007. Vietnam Red Data Book, Part II. Plants Publisher of Science and Technology. Hanoi: p. 590.
Nguyen H. D., Do H. D. K., Vu M. T.. 2024;Comparative genomics revealed new insights into the plastome evolution of Ludwigia (Onagraceae, Myrtales). Science Progress 107:368504241272741.
Nguyen H. D., Vu M. T., Do H. D. K.. 2023;The complete chloroplast genome of Syzygium polyanthum (Wight) Walp. (Myrtales: Myrtaceae). Journal of Asia-Pacific Biodiversity 16:267–271.
Nguyen H. D., Vu N. H., Do H. D. K., Vu M. T.. 2025;Comparative chloroplast genomic analysis of Pithecellobium dulce (Roxb.) Benth 1844 and related species within Caesalpinioideae. Genetica 153:19.
Pham M. P., Tran V. H., Vu D. D., Nguyen Q. K., Shah M.. 2021;Phylogenetics of native conifer species in Vietnam based on two chloroplast gene regions rbcL and matK . Czech Journal of Genetics and Plant Breeding 57:58–66.
Ping J., Feng P., Li J., Zhang R., Su Y., Wang T.. 2021;Molecular evolution and SSRs analysis based on the chloroplast genome of Callitropsis funebris . Ecology and Evolution 11:4786–4802.
Plants of The World (POWO). 2025. Plants of the World Online. Facilitated by the Royal Botanic Gardens, Kew. Published on the Internet Retrieved Jun 5, 2025, available from http://www.plantsoftheworldonline.org/.
Rushforth K.. 2007;Notes on the Cupressaceae in Vietnam. Academia Journal of Biology 29:32–39.
Terry R. G., Schwarzbach A. E., Bartel J. A.. 2018;A molecular phylogeny of the Old World cypresses (Cupressus: Cupressaceae): Evidence from nuclear and chloroplast DNA sequences. Plant Systematics and Evolution 304:1181–1197.
Van The P., Loc P. K., Hiep N. T., Silba J.. 2013;The status of wild and cultivated populations of Cupressus tonkinensis Silba in Vietnam. Bulletin of the Cupressus Conservation Project 2:10–16.
Thi Huynh T. T., Quang M. T., Nguyen H. D.. 2024;Complete chloroplast genome sequence of the medicinal plant Oxyceros horridus (Rubiaceae) and phylogenetic analysis. Mitochondrial DNA Part B Resources 9:1658–1663.
Tillich M., Lehwark P., Pellizzer T., Ulbricht-Jones E. S., Fischer A., Bock R., Greiner S.. 2017;GeSeq: Versatile and accurate annotation of organelle genomes. Nucleic Acids Research 45:W6–W11.
Zhang D., Tu J., Ding X., Guan W., Gong L., Qiu X., Huang Z., Su H.. 2023;Analysis of the chloroplast genome and phylogenetic evolution of Bidens pilosa . BMC Genomics 24:113.
Zhang Z.-L., Ma L.-Y., Yao H., Yang X., Luo J.-H., Gong X., Wei S.-Y., Li Q.-F., Wang W., Sun H.-B.. 2017;The complete chloroplast genome of Cupressus chengiana . Conservation Genetics Resources 9:347–349.
Zhu A., Fan W., Adams R. P., Mower J. P.. 2018;Phylogenomic evidence for ancient recombination between plastid genomes of the Cupressus-Juniperus-Xanthocyparis complex (Cupressaceae). BMC Evolutionary Biology 18:137.

Article information Continued

Fig. 1

Map of the chloroplast genome of Cupressus tonkinensis. Genes on the inner side of the circle are transcribed clockwise, whereas those on the outer side are transcribed counterclockwise. Genes are categorized by function and color-coded accordingly. The dark gray shading of the inner ring indicates regions with high GC content, while light gray areas represent AT-rich regions. SSC, small single-copy; LSC, large single-copy.

Fig. 2

The phylogenetic tree of the Cupressaceae family is based on the protein-coding sequences of concatenated chloroplasts. The tree was reconstructed using the maximum likelihood method with 1,000 ultrafast bootstrap replicates. Numbers next to the branches indicate bootstrap support values. For clarity, values of 100 are not shown. The newly assembled Cupressus tonkinensis sequence in this study is highlighted in bold.

Table 1

The species used in phylogenetic analysis.

No. Name of the species GenBank accession no.
1 Cupressus gigantea NC_028155
2 Cupressus duclouxiana NC_065031
3 Cupressus torulosa NC_039563
4 Cupressus chengiana NC_034788
5 Cupressus jiangeensis NC_036939
6 Cupressus sempervirens NC_026296
7 Cupressus tonkinensis PV786099
8 Cupressus tonkinensis NC_039562
9 Callitropsis vietnamensis NC_026298
10 Hesperocyparis stephensonii NC_061008
11 Juniperus communis NC_035068
12 Microbiota decussata LC500580
13 Platycladus orientalis KX832626
14 Tetraclinis articulata LC500583
15 Calocedrus formosana NC_023121
16 Chamaecyparis formosensis NC_034943
17 Thujopsis dolabrata KX832628
18 Thuja occidentalis NC_042177
19 Fitzroya cupressoides LC500578
20 Diselma archeri MW470978
21 Widdringtonia nodiflora MW470997
22 Callitris pyramidalis MW470972
23 Pilgerodendron uviferum LC500581
24 Libocedrus plumosa MW470984
25 Austrocedrus chilensis LC500576
26 Papuacedrus papuana MW470986
27 Glyptostrobus pensilis NC_031354
28 Taxodium distichum NC_034941
29 Cryptomeria japonica NC_010548
30 Sequoiadendron giganteum LC500582
31 Sequoia sempervirens NC_030372
32 Metasequoia glyptostroboides NC_027423
33 Amentotaxus yunnanensis NC_060492

Table 2

The gene composition of the Cupressus tonkinensis chloroplast genome.

Group of genes Name of the genes
Photosynthetic Genes ATP synthase atpA, atpB, atpE, atpFa, atpH, atpI
Photosystem I psaA, psaB, psaC, psaI, psaJ, psaM, pafIa, pafII
Photosystem II psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbT, psbZ, pbf1
Cytochrome b/f complex petA, petBa, petDa, petG, petL, petN
Biosynthesis of chlorophyll chlB, chlL, chlN
The large subunit of Rubisco rbcL
NADPH dehydrogenase ndhAa, ndhBa, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
Genes for replication Ribosomal RNA rrn16S, rrn23S, rrn5S, rrn4.5S
Transfer RNA trnA-UGCa, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-GCC, trnG-UCCa, trnH-GUG, trnI-GAUa, trnK-UUUa, trnL-CAA, trnL-UAAa, trnL-UAG, trnM-CAU (×4), trnN-GUU, trnP-GGG, trnP-UGG, trnQ-UUG (×2), trnR-ACG, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC, trnV-UACa, trnW-CCA, trnY-GUA
Small ribosomal units (SUs) rps11, rps12a, rps14, rps15, rps18, rps19, rps2, rps3, rps4, rps7, rps8
Large ribosomal units (LRUs) rpl14, rpl16a, rpl2a, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36
RNA polymerase subunits rpoA, rpoB, rpoC1a, rpoC2
The translation initiation factor infA
Miscellaneous Genes C-type cytochrome synthesis ccsA
Acetyl-CoA-carboxylase subunit accD
Enveloped membrane protein cemA
ATP-dependent protease P subunit clpP1
Maturase matK
Hypothetical reading frames of genes ycf1, ycf2

(×2), (×4) – gene found two and four cpoies in chloroplast genome, respectively.

a

Labeled genes have an intron;

Table 3

Genes harboring introns in the chloroplast genome of Cupressus tonkinensis.

Gene Exon 1 (bp) Intron 1 (bp) Exon 2 (bp) Intron 2 (bp) Exon 3 (bp)
rpl2 399 640 432 - -
rpl16 9 826 414 - -
petD 8 654 490 - -
petB 6 763 648 - -
rps12 114 519 232 - 26
ndhB 723 723 756 - -
trnI-GAU 42 881 35 - -
trnA-UGC 38 770 36 - -
ndhA 558 751 549 - -
trnL-UAA 38 284 50 - -
trnG-UCC 23 756 48 - -
atpF 144 660 411 - -
rpoC1 441 744 1,668 - -
pafI 129 721 225692 156
trnK-UUU 37 2,441 25 - -
trnV-UAC 39 527 37 - -

Table 4

The structural features of the complete chloroplast genomes across eight Cupressus species.

Name of the species Accession no. Total length (bp) Overall GC (%) PCGs/tRNAs/rRNAs
Cupressus tonkinensis PV786099 128,000 34.7 82/33/4
Cupressus tonkinensis NC_039562 127,835 34.7 82/32/4
Cupressus sempervirens NC_026296 129,150 34.6 82/33/4
Cupressus gigantea NC_028155 128,244 34.7 82/33/4
Cupressus chengiana NC_034788 128,151 34.7 82/33/4
Cupressus jiangeensis NC_036939 128,286 34.7 82/33/4
Cupressus torulosa NC_039563 128,322 34.6 82/33/4
Cupressus duclouxiana NC_065031 129,959 34.6 82/33/4

PCG, protein-coding gene; tRNAs, transfer RNA; rRNA, ribosomal RNA.

Table 5

Pairwise similarity and nucleotide differences among Cupressus chloroplast genomes.

Species (GenBank accession number) (1) (2) (3) (4) (5) (6) (7) (8)
1) Cupressus sempervirens (NC_026296) 3,719 3,293 3,778 5,454 4,332 4,732 4,746
(2) Cupressus gigantea (NC_028155) 97.1 1,654 2,484 3,925 2,912 3,502 3,575
(3) Cupressus chengiana (NC_034788) 97.5 98.7 1,050 3,775 2,457 3,140 3,218
(4) Cupressus jiangeensis (NC_036939) 97.1 98.1 99.2 3,741 2,356 3,414 3,492
(5) Cupressus duclouxiana (NC_065031) 95.9 97 97.1 97.1 3,548 4,816 4,827
(6) Cupressus torulosa (NC_039563) 96.7 97.8 98.1 98.2 97.3 3,597 3,627
(7) Cupressus tonkinensis (NC_039562) 96.4 97.3 97.6 97.4 96.3 97.2 362
(8) Cupressus tonkinensis (PV786099) 96.4 97.2 97.5 97.3 96.3 97.2 99.7

Values in the lower triangle are percent nucleotide identity between genome pairs; values in the upper triangle are the absolute number of nucleotide differences.