The complete chloroplast genome of Dendrobium porphyrochilum (Orchidaceae)
Article information
Abstract
Dendrobium porphyrochilum Lindl. is a valued traditional herb in China for over 2000 years. However, due to overexploitation and habitat destruction, this species now faces the threat of extinction. This study presents the first sequencing and assembly of the complete chloroplast (cp) genome of D. porphyrochilum and investigates its phylogenetic relationships. The result showed that the cp genome of D. porphyrochilum is 154,604 bp in length, comprising a large single copy (LSC) region of 84,816 bp, a small single copy (SSC) region of 17,660 bp, and two inverted repeats (IR) regions of 52,128 bp each. The overall GC content of the plastome was determined to be 37.3%, with GC contents of 35.0% for the LSC, 30.4% for the SSC, and 43.5% for IRa and IRb. In addition, 118 genes were annotated, including 72 protein-coding genes, 8 ribosomal RNA genes, and 38 transfer RNA genes. A preliminary phylogenetic analysis confirmed that D. porphyrochilum belongs to the genus Dendrobium based on the cp genome and coding sequences. These findings provide reliable genetic information for future investigations into the taxonomy and evolutionary relationships of D. porphyrochilum and related species.
INTRODUCTION
The genus Dendrobium Sw., a member of the Orchidaceae family, includes approximately 1,500 species predominantly found in tropical and subtropical Asia, as well as eastern Australia (Wang et al., 2019; Tan et al., 2023). Many Dendrobium species are noted for their extensive pharmacological activities, such as anti-inflammatory, anti-bacterial, anti-oxidant, and anti-tumor effects (Zhao et al., 2022; Tan et al., 2023). However, these wild resources have been severely devastated due to prolonged predatory mining. The protection of germplasm resources and the advancement of fundamental research on this genus are pressing concerns. D. porphyrochilum and D. strongylanthum are classified within Sect. Stachyobium of Dendrobium. This section is characterized by succulent growth forms with swollen stems and small flowers featuring light green petals and red veins (Lang et al., 1999; Liu et al., 2021). Research on D. porphyrochilum has mainly concentrated on its cultivation, morphological features, and process (Chen et al., 2018; Li et al., 2018). However, studies on its genetic characteristics and structure are limited, limiting our understanding of its genetic background, germplasm resources, and phylogenetic evolution. Addressing these gaps in genetic information is crucial for advancing the application and development of D. porphyrochilum.
The chloroplast (cp) genome is a key tool for phylogenetic analyses, which have been employed to elucidate genetic and evolutionary relationships (Feng et al., 2023). Some studies have indicated that the cp genome may occasionally yield misleading phylogenetic relationships due to variations in length, the presence of gaps/indels, and the accuracy of sequence evolution models (Goremykin et al., 2005). Nonetheless, shared coding sequences (CDSs) are relatively stable and can be used with cp genomes to construct more accurate plant phylogenetic relationships (Jiang et al., 2022). There are no reports regarding the use of CDSs to investigate the phylogenetic relationships of D. porphyrochilum.
In this study, we present the first de novo sequencing and assembly of the cp genome of D. porphyrochilum. The phylogenetic relationships of D. porphyrochilum and D. strongylanthum were elucidated using the cp genome and CDSs. These findings offer a valuable scientific reference for future phylogenetic investigations and conservation efforts of D. porphyrochilum.
MATERIALS AND METHODS
Fresh leaves of D. porphyrochilum and D. strongylanthum were obtained from the Institute of Caulis Dendrobii in Longling County, Yunnan, China. The voucher specimens (Y23SH504) and (Y23SH492) were authenticated by Prof. Baozhong Duan and are preserved at the herbarium of Dali University. Approximately 1.0 g of fresh leaves were collected, immediately frozen in liquid nitrogen, and stored for subsequent DNA extraction. Total DNA was extracted using the Plant Genomic DNA kit (Tiangen, Beijing, China) following the manufacturer’s guidelines. The extracted DNA’s quality, purity, and quantity were evaluated using a high-sensitivity Qubit 4.0 Fluorometer (Life Technologies, Inc., Carlsbad, CA, USA).
Library construction was performed using 100 ng/μL of qualified DNA, followed by sequencing on the Illumina NovaSeq system (Illumina, San Diego, CA, USA). Paired-end reads were filtered to remove adapter sequences, and lower-quality reads (Phred quality score of <30) using the NGS QC ToolKit_v2.3.3 software. The cp genome was assembled with GetOrganelle v.1.6.4, exploiting Bowtie2 v.2.4.4, SPAdes v.3.13.0, and Blast v.2.5.0 software as dependencies (Jin et al., 2020). The circular cp genome was then annotated using CPGAVAS2 (http://47.96.249.172:16019/analyzer/annotate) (Shi et al., 2019) and GeSeq (https://chlorobox.mpimp-golm.mpg.de/geseq.html) (Tillich et al., 2017). The annotated cp genome sequences were submitted to the NCBI GenBank database with accession numbers PP479730 and PP786689. The OGDRAW tool (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html) (Lohse et al., 2007) was employed to generate a circular gene map of the D. porphyrochilum.
GC content, genome size, tRNA, and repeat content were analyzed using Geneious 9.0.2 software (Darling et al., 2010). Codon usage analysis, including relative synonymous codon usage (RSCU), effective number of codons (ENc), GC content of synonymous third codon positions (GC3s), codon adaptation index (CAI), and frequency of optimal codons (Fop), was conducted using CodonW 1.4.2 (Bylaiah et al., 2021). REPuter web service (https://bibiserv.cebitec.unibielefeld.de/reputer/) was used to visualize four types of dispersed repeats (F, forward; R, reverse; P, palindromic; C, complementary) with a minimum repeat size of 30 bp, edit distances of 3 bp, and a similarity threshold of 90% between repeat pairs (Kurtz et al., 2001). Simple sequence repeats (SSRs) were identified using MISA (http://pgrc.ipk-gatersleben.de/misa/) with thresholds of ‘10’ in mono-, ‘5’ in di-, ‘4’ in tri-, and ‘3’ in tetra-, penta-, and hexa-nucleotide motifs (Beier et al., 2017). Boundary information for the cp genomes of D. porphyrochilum and D. strongylanthum was analyzed using the IRSCOPE online tool (https://irscope.shinyapps.io/irapp/) (Amiryousefi et al., 2018). Maximum likelihood (ML) trees were constructed using the cp genome and CDSs from 23 species (Table 1). Two species, Cymbidium elegans Lindl. (GenBank NC067753) and C. floribundum Lindl. (GenBank NC063952), were chosen as outgroup taxa. A total of 23 cp genomes were aligned using MAFFT with default settings and subsequently trimmed using (TrimAl v.1.4, RRID: SCR_017334) (http://trimal.cgenomics.org/) with the automated option (Katoh and Standley, 2013). The best-fit nucleotide substitution model was selected using ModelFinder (Kalyaanamoorthy et al., 2017), based on the Bayesian Information Criterion, as implemented in IQ-tree v.1.6.12 (RRID: SCR_017254) (http://www.iqtree.org/) (Nguyen et al., 2015). A bootstrap analysis with 1,000 replicates was performed in IQtree, using the following parameters: iqtree -s input -m TVM+F+R3 -bb 1000 -alrt 1000 -nt AUTO -o NC063952, NC067753.
RESULTS AND DISCUSSION
After filtering the raw data of D. porphyrochilum to eliminate adaptors and low-quality reads, the cp genomes were assembled and spliced. As illustrated in Fig. 1, the cp genome of D. porphyrochilum is a circular DNA molecule that exhibits a typical quadripartite structure, consisting of two inverted repeats (IRa and IRb) separated by large single copy (LSC) and small single copy (SSC) regions, respectively. The total length of the cp genome of D. porphyrochilum is 154,604 bp, comprising 84,816 bp for the LSC, 17,660 bp for the SSC, and 52,128 bp for the IR regions (Table 2). The overall GC content of D. porphyrochilum cp genomes is 37.3%, with the IR regions showing a higher GC content (43.5%) compared to the LSC (35.0%) and SSC (30.4%) regions. This discrepancy is likely due to four ribosomal RNA (rRNA) genes with high GC content in the IR regions (Bock, 2007).
The cp genome of D. porphyrochilum contains 118 genes, comprising 72 protein-coding genes, 38 transfer RNA (tRNA), and eight rRNA genes. These genes can be categorized into four groups: photosynthesis-related genes, replication-related genes, protein genes, and other genes (Table 3). Of 118 genes, 17 genes are located in the IR region, including eight tRNA genes (trnL-UAA, trnG-UCC, trnV-UAC, trnI-GAU, trnA-UGC, trnA-UGC, trnI-GAU, trnK-UUU), four rRNA genes (rrn16S, rrn23S, rrn4.5S, rrn5S), and six protein-coding genes (rpl22, rpl23, rps19, rps7, rps12, and ndhB). Notably, 19 genes in D. porphyrochilum contain introns, with two genes (clpP, ycf3) having two introns each, while the remaining genes contain a single intron (Table 3).
The amino acid frequency, number of codons, codon usage bias, and RSCU of D. porphyrochilum’s cp genome were analyzed. The results showed that the cp genome encoded 21 amino acids contained 64 codons. Leucine and isoleucine were the most frequent amino acids, while tryptophan was the rarest (Fig. 2). Similar findings have been observed in other Dendrobium species (Shang et al., 2023). An RSCU value below 1.00 indicates lower-than-expected codon usage frequency, while a value above 1.00 indicates higher-than-expected usage (Sharp and Li, 1987). In this study, 28 codons had RSCU values greater than 1, 33 had values below 1, and 3 codons (AUG, UGA, and UGG) had an RSCU value of 1. Additionally, GC3s was 25.8%, reflecting a preference for A/U-ending codons in D. porphyrochilum. The ENc value was 48.62, and the CAI and Fop were less than 0.5, suggesting a slight codon usage bias in D. porphyrochilum.
Codon content of 21 amino acids and stop codons in all protein-coding genes of the chloroplast genome of Dendrobium porphyrochilum. RSCU, relative synonymous codon usage.
In addition, 53 dispersed repeat sequences were identified in the cp genome of D. porphyrochilum, consisting of 2 complementary repeats (C), 12 reverse repeats (R), 36 forward repeats (F), and 49 palindromic repeats (P). These repeat sequences ranged from 30 to 50 bp, with the most common lengths being 20 to 39 bp, while repeats longer than 50 bp were the least common. F and P repeat sequences were more prevalent than the R and C repeats (Table 4).
SSRs are tandem repeats of 1 to 6 nucleotide motifs, widely used as valuable tools for species identification and genetic diversity research due to their high polymorphism, site specificity, and presence of multiple alleles (Ahmad et al., 2018). A total of 43 SSRs were identified within the cp genome of D. porphyrochilum, which can be categorized into different types of repeat sequences based on the number of nucleotides, including mono-nucleotide (A/T), di-nucleotide (AG/CT, and AT/AT), tri-nucleotide (AAT/ATT), tetra-nucleotide (AAAG/CTTT, AATT/AATT, ACAG/CTGT, and AGAT/ATCT), and penta-nucleotide (AAATTC/AATTTG) repeats. The most common repeat units were A and T bases, indicating a strong A/T preference in the cp genome, consistent with previous studies on angiosperm cp genomes (Liao et al., 2024).
The cp genome of D. porphyrochilum and D. strongylanthum displayed four boundaries: LSC-IRb, IRb-SSC, SSC-IRa, and IRa-LSC. As illustrated in Fig. 3, the rps19 and psbA genes were entirely located in the IR and LSC regions, respectively. The rpl22 gene was found at the junction of LSC-IRb, originating from the LSC and integrating into the IRb region in both species, with a size of 37 bp. In addition, the ndhF genes were situated at the junction of IRb-SSC in D. strongylanthum. It originated from the IRb region and integrated into the SSC, with a size of 2,201 bp. The ycf1 gene was found at the junctions of IRb-SSC and SSC-IRa in D. strongylanthum. Notably, the ndhF and ycf1 genes were absent in D. porphyrochilum, suggesting that these genes may serve as distinguishing markers between D. porphyrochilum and D. strongylanthum.
The cp genome is a valuable source of phylogenetic information, commonly used for reconstructing phylogenies and analyzing plant populations (Kim et al., 2015; Xu et al., 2017). ML trees were generated using cp genomes and CDSs from 21 Dendrobium species. As illustrated in Fig. 4, the two trees exhibit highly consistent topologies, and the phylogenetic relationships among the species remain robust, regardless of the inclusion of non-coding regions. Notably, D. porphyrochilum formed a distinct evolutionary branch, while the two samples of D. strongylanthum did not cluster into a monophyletic group but were instead placed in separate branches. Previous studies have confirmed intraspecific variability in D. devonianum (Shang et al., 2023) and Isodon rubescens (Zhou et al., 2022), collected from different geographical areas. This phenomenon suggests that geographical origin may influence the genetic diversity of D. strongylanthum. Notably, D. hercogiossum and D. aduncum were initially classified as Sect. Brevifores according to traditional taxonomic studies. However, our research shows that these two species are deeply nested within the Sect. Dendrobium branch based on cp genomes and CDSs. Previous studies have also confirmed that D. hercoglossum and D. aduncum should be placed in Sect. Dendrobium rather than the conventional Sect. Breviflores, based on internal transcribed spacer sequences (Wang et al., 2006; Li et al., 2012). Therefore, these findings suggest that merging Sect. Breviflores into Sect. Dendrobium might be a more reasonable classification.
In summary, this study has confirmed the preliminarily phylogenetic placement of D. porphyrochilum, providing valuable insights for future research on the taxonomy and evolutionary relationships of D. porphyrochilum and its related species.
Notes
ACKNOWLEDGMENTS
This work was supported by the Yunnan academician expert workstation (202205AF150026), the science and technology plan project of Yunnan province (202301BA070001-042). We thank Northeast Forestry University and the China Academy of Chinese Medical Sciences for technical assistance.
CONFLICTS OF INTEREST
The authors declare that there are no conflicts of interest.
