INTRODUCTION
Dillenia philippinensis Rolfe, commonly known as the elephant apple, is an important fruit-bearing tree species belonging to the subfamily Dillenioideae of the family Dilleniaceae (Horn, 2009). It typically inhabits wet tropical biomes, particularly primary and secondary forests from lowland areas up to elevations of 2,000 meters (Rugayah and Lemmens, 1995; POWO, 2025). Endemic to the Philippines (Pelser et al., 2011), it is locally referred to as ‘katmon’ and is commonly found growing wild along riverbanks and forested slopes. The species is widely distributed, with documented occurrences across major islands and numerous provinces throughout the country (Barcelo, 2015). Despite its relatively broad distribution, D. philippinensis is currently listed as Near Threatened under criteria B2ab(ii,iii) (Energy Development Corporation, 2020) reflecting ongoing threats that may increase its risk of endangerment without effective conservation measures.
Almost all parts of the plant are traditionally utilized for culinary, medicinal, and ornamental purposes. The locals commonly use its timber for furniture making and light construction, and the sweetish-sour and astringent edible fruit as a souring agent in cooking (Lim, 2011; Artes et al., 2024). Its fruit juice is given orally to relieve fever and cough symptoms, pain in the chest, and for cleaning hair (Macahig et al., 2011; Quattrocchi, 2012). The tree produces phytochemicals, including flavonoids, triterpenoids, and other compounds such as ionones and phenolics, contributing to its various pharmacological activities (Sabandar et al., 2017). It is a source of important bioactive compounds that have antibacterial and chemotherapeutic properties (Yu et al. 2020; Ansari et al. 2021). Also, the tree is widely cultivated for its large leaves and flowers in gardens and urban landscapes (Kerrigan et al., 2011).
Although D. philippinensis holds significant ethnopharmacological (Dayrit et al., 2021), ecological, and economic importance, its complete chloroplast (cp) genome has not yet been reported. To date, only two cp genomes from the genus Dillenia have been published in the National Center for Biotechnology Information (NCBI) (Tan et al., 2019; Li et al., 2021). In this study, we assembled, annotated, and analyzed the first complete cp genome sequence of D. philippinensis to enhance our understanding of this near-threatened species, particularly in terms of its systematics, genetic characteristics, and phylogenetic relationships within the Dilleniaceae family. Furthermore, the genomic data provide valuable baseline information to support future conservation and management efforts.
MATERIALS AND METHODS
Plant sampling
The study used disease-free leaves of D. philippinensis collected from a mature tree maintained at the field gene bank of the Crop Breeding and Genetic Resources Division (CBGRD), Institute of Crop Science, College of Agriculture and Food Science, University of the Philippines Los Baños (UPLB), Laguna, Philippines (14°09′44.6″N, 121°14′46.0″E). A voucher specimen (ICROPS 0102025002) was prepared and deposited in the Philippine Herbarium of Cultivated Plants of the Institute of Crop Science at UPLB (https://cafs.uplb.edu.ph/icrops/, Curator Dr. Renerio P. Gentallan Jr., rpgentallan@up.edu.ph). To confirm and support the taxonomic identification of the specimen archived in the herbarium, morphological characterization was performed using diagnostic traits based on the monograph by Hoogland (1948).
DNA extraction, sequencing, assembly, and annotation of cp genome
Total genomic DNA was extracted from 50 mg of silicadried leaf tissue using a liquid-nitrogen-free cetyltrimethy-lammonium bromide (CTAB) protocol for next-generation sequencing and genome assembly developed by Quiñones et al. (2024). DNA quality was assessed by 1% agarose gel electrophoresis, and concentration and purity were determined using the A260/280 ratio measured on a DeNovix DS-11+ spectrophotometer. A high-quality DNA sample was subsequently submitted to NovogeneAIT Genomics Singapore PTE Ltd. (Singapore) for whole cp genome sequencing on the Illumina HiSeq-PE150 platform (Illumina Inc., San Diego, CA, USA). Raw reads were quality-filtered using Fastp version 0.20.0 (Chen et al., 2018), yielding 29,580,580 clean reads after successive filtering of 150-bp paired-end data. The cp genome of D. philippinensis was assembled using GetOrganelle v1.7.5 (Jin et al., 2020), resulting in a complete circular genome. Assembly parameters followed the recommended settings for embryophyte plastid genomes, utilizing 2 Gb of raw 150-bp paired-end reads. The circularized genome was subsequently annotated using GeSeq (Tillich et al., 2017) and CPGAVAS2 (Shi et al., 2019) using the reference sequence of D. turbinata under the NCBI GenBank accession number NC_062798.1. Visualization of the annotated plastome was performed with OGDRAW (Greiner et al., 2019) and Chloroplast Genome Viewer (CPGView) (Liu et al., 2023) to generate a graphical representation of gene content and genome structure. Then, the complete cp genome sequence was deposited in GenBank under accession number PQ768872.1 at the NCBI. The associated BioProject, BioSample, and Sequence Read Archive (SRA) identifiers are PRJNA1331504, SAMN51604990, and SRR35708821, respectively.
Simple sequence repeat locus analysis
Simple sequence repeats (SSRs), also known as microsatellites, are tandem repeats of nucleotide motifs typically consisting of 1–6 bp repeat units. SSRs identified within the cp genome are referred to as chloroplast SSRs (cpSSRs). Detection of cpSSRs was performed using MISA v1.0 (MIcroSAtellite identification tool; https://webblast.ipkgatersleben.de/misa/) (Beier et al., 2017). The minimum repeat thresholds were set as follows: mononucleotide repeats ≥8, dinucleotide repeats ≥5, trinucleotide repeats ≥3, tetranucleotide repeats ≥3, pentanucleotide repeats ≥3, and hexanucleotide repeats ≥3.
Inverted repeat boundary analysis
The cp genome typically exhibits a circular quadripartite structure comprising a large single-copy (LSC) region, a small single-copy (SSC) region, and two inverted repeat (IR) regions (IRA and IRB), which form four junctions: LSC/IRB, IRB/SSC, SSC/IRA, and IRA/LSC. Expansion and contraction of IR boundaries during genome evolution can result in shifts in gene positions between the IR and single-copy regions. To assess and visualize the contraction and expansion of IR regions, the cp genome of D. philippinensis was compared with those of two closely related Dillenia species using IRscope (Amiryousefi et al., 2018).
Phylogenetic analysis
To infer phylogenetic relationships, the complete cp genomes of D. philippinensis and two congeneric species, D. turbinata and D. indica, were used as the ingroup, with Tetracera sarmentosa (Yang et al., 2022) included as the sister group within the family Dilleniaceae. Three additional species: Vitis californica (Wen et al., 2018), Sabia yunnanensis (Sun et al., 2015), and Macadamia ternifolia (Liu et al., 2017) served as outgroups. Taxon selection was guided by previous phylogenetic studies of Dilleniaceae (Horn, 2009). Complete plastome sequences of all seven species were retrieved from GenBank and aligned using MAFFT v7.4 (Katoh and Standley, 2013). The alignment was not partitioned and no additional trimming was applied given the high conservation and quality of the sequences. Phylogenetic reconstruction was performed using the maximum likelihood method under the best-fit substitution model (GTR + G) (Nei and Kumar, 2000) implemented in MEGA v11.0.13 (Tamura et al., 2021). MEGA was used for its reliable implementation of maximum likelihood analyses and ease of bootstrap testing. Node support was assessed using 1,000 bootstrap replicates.
RESULTS AND DISCUSSION
Morphological characteristics of the plant material
Dillenia philippinensis is a tree with fissured bark (Fig. 1A), elliptic leaves alternately arranged, 10–15-nerved, 19–26 × 9–15 cm, chartaceous, with obtuse, often slightly acuminate apex, obtuse base, and slightly dentate margin (Fig. 1D, E); petiole is 2–5 cm long, with half-elliptic, 1–3 mm broad, caducous wings (Fig. 1C). The inflorescences are racemes, mostly terminal; flower ca. 10–15 cm in diameter (Fig. 1F); sepals 5, inner three longer than the outer two (Fig. 1G); petals 5, white, 6–9 × 5–7 cm (Fig. 1H); stamens in 2 distinct groups, the outer ones yellow, the inner ones purplish, with their apex reflexed in bud (Fig. 1Ib–Ie); carpels 10–12, with linear, ca. 17 mm long, spreading styles, each with 10–12 ovules. The fruit is indehiscent, depressed-globose, 4–5 cm high, 5–6 cm diam. including the enclosing slightly fleshy sepals; carpels are slightly spirally twisted, fleshy, 1–2-seeded (Fig. 1Jb). The seeds are 5 × 3 mm, very finely echinate, at the base enclosed by a 2 mm long, membranaceous aril (Fig. 1K). These diagnostic features are consistent with previous descriptions (Hoogland, 1948) and support the taxonomic identity of the sampled specimen (ICROPS 0102025002, Fig. 2), in agreement with its plastome-based phylogenetic placement.
Genome content and organization
The first cp genome of D. philippinensis was successfully sequenced and assembled. It had a generally high and uniform sequencing depth, with an average coverage of 555.8× (Fig. 3), supporting a reliable and well-resolved assembly. The depth and evenness of sequencing coverage are key indicators of genome assembly quality (Jenke et al., 2025). Although coverage varied across regions, lower depths are likely attributable to GC bias or polymerase chain reaction inefficiencies, whereas elevated coverage in repetitive regions, such as the IRs typical of cp genomes, reflects increased read redundancy. Minor fluctuations in coverage are consistent with the stochastic nature of sequencing processes (Benjamini and Speed, 2012). The assembled cp genome of D. philippinensis exhibited the typical quadripartite structure observed in other Dillenia species (Fig. 4). The complete plastome has a total length of 161,584 bp, which is longer than that of D. indica (159,266 bp) (Tan et al., 2019) but shorter than D. turbinata (163,250 bp) (Li et al., 2021). It comprises an LSC region (89,409 bp), a SSC region (19,203 bp), and a pair of IR regions (IRA and IRB: 26,486 bp each). The overall GC content of the cp genome was 36.3% with base compositions of 31.4% A, 18.4% C, 17.8% G, and 32.3% T. However, the GC contents in its three regions (SSC, LSC, and IR) differ significantly; GC content is the highest in the IR region at 42.8%, followed by the LSC region at 33.8%, and the SSC region has the lowest GC content at 29.6% (Table 1).
Gene composition
A total of 129 functional genes were found (Table 2), comprising 36 tRNA genes, eight rRNA genes, and 85 protein-coding genes. These include 44 genes for photosynthesis, 29 genes for self-replication, six other genes (accD, ccsA, cemA, clpP, infA, and matK) and six conserved ORFs (ycf1, ycf15 (×2), ycf2 (×2), and ycf4). Furthermore, a total of 13 cis-splicing genes were identified, of which nine (rps16, atpF, rpoC1, ycf3, clpP, petB, petD, rpl16, and ndhA) were unique and two (rpl2 and ndhB) were duplicates (Fig. 5A) and rps12 is a trans-splicing gene (Fig. 5B). The complete cp genome sequence was deposited in GenBank under the accession number PQ768872.1, with associated BioProject, SRA, and Bio-Sample numbers of PRJNA1331504, SRR35708821, and SAMN51604990, respectively.
SSR locus analysis
A total of 299 SSR loci were identified in the cp genome (161,584 bp), including 85 compound SSRs (Fig. 6). The detected SSRs ranged from mono-to hexanucleotide repeat types, with shorter repeats being more common. Mononucleotide repeats were the most abundant, comprising 181 loci (60.54%), and were largely dominated by A/T motifs (177), while C/G repeats were rare (4). Trinucleotide repeats were the second most frequent (99; 33.11%), followed by dinucleotide repeats (13; 4.35%). Tetranucleotide, pentanucleotide, and hexanucleotide repeats were present in very low numbers, with 4 (1.34%), 1 (0.33%), and 1 (0.33%) loci, respectively. Overall, SSR frequency decreased as repeat length increased. The strong predominance of A/T-rich mononucleotide repeats is consistent with typical cp genomes and has been widely reported in previous plastome studies (Liu et al., 2021; Lubna et al., 2025). These patterns are often attributed to replication slippage and the inherent AT-rich nature of cp DNA (Golchini and Soorni, 2026). The identified SSRs may serve as useful molecular markers for future studies on genetic diversity, species identification, and phylogenetic analysis (Khade et al., 2025).
IR boundary analysis among three Dillenia species
The expansion and contraction of IR regions are recognized as major driving forces in the evolution of land plant plastomes (Xiong et al., 2009). In this study, we compared the positions of the LSC/IR junction (JL) and the IR/SSC junction (JS) across three Dillenia plastomes (Fig. 7). The lengths of the IR regions were relatively conserved, ranging from approximately 26,457 to 26,497 bp, indicating high conservation with only minor contraction/expansion among the three Dillenia plastomes. The JL (IR-LSC: rpl22 & rps19) boundary showed a generally similar pattern across the three plastomes, with slight variation in the positioning of the rps19 gene. At the IR-LSC boundary, rps19 extended into the IR region by approximately 47 bp in D. indica, 22 bp in D. turbinata, and 3 bp in D. philippinensis, indicating small shifts associated with IR boundary dynamics. The JS (IR-SSC: ycf1 & ndhF) boundaries were also largely conserved among the Dillenia plastomes. The ndhF gene extended into the IRB region by approximately 11 bp in D. indica, 32 bp in D. philippinensis, and 57 bp in D. turbinata, while the remaining portion of the gene was located in the SSC region. At the JSA (SSC-IRA) boundary, the ycf1 gene crossed the SSC/IR boundary in all species and extended into the IR region, with IR-located segments measuring approximately 1,136 bp in D. indica, 1,129 bp in D. philippinensis, and 1,129 bp in D. turbinata. The total length of ycf1 within the SSC region ranged from 4,529 to 4,537 bp across the species. At the JLA (IRA/LSC) boundary, the intergenic spacer between the boundary and the trnH gene (located in the LSC region) varied among species, decreasing from 107 bp in D. indica to 58 bp in D. philippinensis and 19 bp in D. turbinata. These results indicate that the plastomes of Dillenia species are highly conserved in structure, with minor IR expansion observed, particularly in D. turbinata, contributing to slight genome size variation among species.
Phylogenetic analysis
The reconstructed maximum-likelihood phylogenetic tree based on available whole cp genomes revealed a close evolutionary relationship among D. philippinensis, D. turbinata, and D. indica, which together form a well-supported monophyletic clade, with Tetracera sarmentosa as their sister group (Fig. 8). These results are consistent with previous studies on the infrafamilial relationships of Dilleniaceae, in which Tetracera, the only pantropical genus in the family, is sister to all other Dilleniaceae, as inferred from plastid loci (Horn, 2009). However, certain limitations should be noted. The present analysis relies solely on cp genome data, and taxon sampling remains limited due to the scarcity of complete plastome sequences within Dilleniaceae. Future studies incorporating broader taxon sampling and additional genomic datasets will be essential to further resolve and refine phylogenetic relationships within the family. Nevertheless, this study provides a new plastome sequence for the genus Dillenia, contributing to an improved understanding of evolutionary relationships within the Dilleniaceae family. The generated plastome data also represent a valuable genomic resource for future studies in species identification and comparative genomics. Moreover, these data establish a useful baseline information for downstream conservation genetic research, particularly for threatened taxa such as D. philippinensis, by facilitating the development of SSR markers and enabling population genetic analyses (Hu et al., 2024).








