A clearly defined phylogenetic relationship can allow comparison of genetic characters between two lineages. Thus, we constructed a highly supported phylogenetic tree based on the 77 mtDNA genome sequences from the current study (including 73 donkeys from China, Tadzhikistan, Kyrgyzstan, Kenya, and Nigeria, two Asiatic wild asses, and two Tadzhikistan indigenous horses) combined with seven mtDNA genomes downloaded from NCBI (including two donkeys, two Somali wild asses, and two Asian wild asses) (Supplementary Table S1). Using the horse as an outgroup, the Somali wild ass formed a sister clade with all donkeys, a topology highly differentiated from that based on D-loop sequences, which was highly supported by the neighbor-joining, maximum likelihood, and Bayesian methods analyzed here (Figure 1). This topology was also consistent with previous study that concentrated on the major mitochondrial coding regions and placed the Somali wild ass as a sister clade outside all domestic donkeys with high confidence, although they did not label the lineage information (Sun et al., 2016). Here, we obtained consistent topologies for the major clades according to both whole mitochondrial genomes and coding regions only (Figure 1; Supplementary Figure S2). With a view that coding regions experience restricted selective sweep, whereas selective force is potentially relaxed on the D-loop regions (Endicott et al., 2009), it is reasonable to assume that the phylogenetic relationship revealed by mitochondrial genome data would be much closer to reality.
The two donkey lineages were clearly separated from each other, with high support (Figure 1), further confirming the existence of two lineages. Considering the lack of geographic representation from mtDNA genomes (58 of 75 domestic donkeys were from China), we reconstructed the phylogenetic tree using both neighbor-joining and maximum likelihood methods with D-loop sequences from the 171 sequences obtained in this study and 536 sequences publicly available on NCBI (see sample information in Supplementary Table S1 and phylogenetic tree in Supplementary Figure S3). Similarly, the donkeys were divided into two groups. Although the clades showed lower support, which is common in analysis of short segments like the D-loop region, all sequences with published lineage information clustered into their corresponding clades. In addition, the 73 donkey D-loop sequences obtained in this study showed consistent lineage sorting, as observed in the whole mtDNA genome phylogeny, further supporting the D-loop region phylogenetic relationship. Our phylogenetic tree based on D-loop sequences also revealed a sister relationship between the Somali wild ass and all domesticated donkeys (Supplementary Figure S3), consistent with the topology revealed by the mitochondrial genome, but inconsistent with previous studies showing different topology using D-loop sequences (Beja-Pereira et al., 2004; Han et al., 2014). The number of haplotypes used for phylogenetic construction may account for this discrepancy, as more haplotypes represent greater genetic diversity, and subsequently more statistical credibility. To test this, we randomly selected donkey haplotypes of equal sample size, as previously described in Han et al. (2014), and constructed a phylogenetic tree with the same parameters in this study. We obtained similar phylogenetic topology (Supplementary Figure S4), showing a closer relationship shared by the Somali wild ass and domesticated Clade II lineage. Therefore, a large sample size for phylogenetic analysis can compensate for the limited information obtained from short segments.
As the donkey samples were credibly sorted, we subsequently compared several population characteristics between the two lineages. First, we scanned sequence variation within the mitochondrial genome for each lineage. The Clade I lineage showed a total of 288 varied sites: 258 in the coding region and 30 in the non-coding regions (D-loop). The Clade II lineage showed a total of 188 varied sites: 168 in the coding region and 20 in the non-coding regions (D-loop). Figure 2A illustrates the distribution of nucleotide diversity (π) within each lineage along the genome based on assessment of 200 nt windows (step size=100 nt) centered at the midpoint. Although the highest diversity was observed in the D-loop region presenting the short tandem repeats of CACACCCACACACCCATGCGCGCA (donkey reference NC_001788: from 16 173 nt to 16 340 nt) in both lineages, the Clade I lineage possessed significantly more segregation sites (average segregation sites per window=1.300) in the coding region than the Clade II lineage (average segregation sites per window=1.091) (Wilcox test: P=0.016), whereas no significant differences were found in segregation counts in the non-coding region in the two lineages (average segregation sites per window in Clade I and Clade II were 3.057 and 2.303, respectively; Wilcox test: P=0.628). This discrepancy may be due to the saturation effect in the D-loop region, demographics, selective forces, and/or other factors suggestive of potential independent domestic histories between the two lineages. To further address this issue, we used mtDNA genomes to reconstruct ancestral population dynamics with BSP for each lineage (Figure 2B). A similar demographic history would be expected under the assumption of simultaneous domestication. However, we observed a constant effective population size in the Clade II lineage during most history, whereas the Clade I lineage experienced an apparent population increase approximately 8 000 years ago, coinciding with the archeological date 7 770±95 before present (BP) (Marshall, 2007). Noticeably, both lineages exhibited a marked decrease in recent years, probably due to the occurrence of the industrial revolution. To further confirm this discrepancy in population dynamics, we assessed the demographic history of the two lineages based on D-loop sequences, with nucleotide mismatch distribution (Rogers & Harpending, 1992), Fu's Fs test, and Tajima's D test. The nucleotide mismatch curve showed a single peak in the Clade I lineage and double peaks in the Clade II lineage (Figure 2C). Moreover, results showed that both Fu's Fs test and Tajima's D test significantly deviated from neutrality in the Clade I lineage (Tajima's D=–2.310, P<0.01; Fu's Fs=–33.909) but not in the Clade II lineage (Tajima's D=–0.993, P>0.10; Fu's Fs=–17.464). These results were in accordance with the expansion in the Clade I lineage and constant size in the Clade II lineage, confirming the demographic dynamics revealed by mitochondrial genomes.
In addition to demographic estimation, we also investigated several other genetic characters to assess whether they showed similar evolution patterns between the two lineages. We first constructed a reduced median network based on D-loop sequences (Figure 2D). Similar to previous study (Beja-Pereira et al., 2004; Kimura et al., 2011), the Clade II haplotypes were mainly derived from a single major haplotype (H16), with a simple star-like shape, whereas the genetic architecture of the Clade I haplotypes was more complicated, with more universally occurring haplotypes (e.g., H20, H52, H22). This was consistent with the much higher genetic distance and nucleotide diversity within the Clade I lineage (average pairwise distance=1.932 7, SD=1.100 7; Pi=0.008 4, SD=0.000 3) than within the Clade II lineage (average pairwise distance=1.074 4, SD=0.711 2; Pi=0.004 6, SD=0.000 5), implying that the Clade I lineage involved many more individuals at the beginning of domestication compared with the Clade II lineage (Table 1).
Pairwise distance SD (distance) Pi SD (Pi) Clade I lineage 1.932 7 1.100 7 0.008 4 0.000 3 Clade II lineage 1.074 4 0.711 2 0.004 6 0.000 5
Table 1. Genetic distance and nucleotide diversity of two clades
Independent migration events may result in a geographical structure. Therefore, we estimated the migration routes of the two lineages by referring to their proportion across the world. D-loop sequences sampled from the Balkans and microsatellites sampled from northeast Africa, the Near East, and the Arabian Peninsula indicated the absence of a geographical structure (Pérez-Pardal et al., 2014; Rosenbom et al., 2015). Indeed, samples collected from Middle Asia (Pakistan, Kazakhstan) and major areas of China demonstrated an almost equal proportion of the two lineages (Figure 3A). The Arabian Peninsula is assumed to be the melting pot from where domestic donkeys migrated to the world (Rosenbom et al., 2015). As this area showed a nearly equal proportion of the two lineages, it is reasonable that a similar pattern is commonly observed across the Eurasian mainland (Beja-Pereira et al., 2004). Nevertheless, all 20 samples collected from Nigeria belonged to the Clade I lineage. When we focused on the distribution in Africa, an apparent spatial structure was detected: donkeys from sub-Saharan Africa (e.g., Ghana, Guinea, Benin, Mali, Senegal, South Africa, and Burkina Faso) tended to be descended from the Clade I lineage, whereas the Clade II lineage was dominant along the East (e.g., Eritrea, Somalia, Swaziland, and Zambia) and North coasts (e.g., Libya, Tunisia, and Morocco) (Figure 3A).
Figure 3. Worldwide geographic distribution of two lineages plotted using “rworldmap” package in R (A) and genetic diversity of Clade I (B) and Clade II (C) lineages across China
The time of migration to areas distant from the domestication center can be inferred by dating the time to the major haplotype for the derived haplotypes only found in that area. In this way, we inferred that the Clade I lineage migrated into sub-Sahara 4 874.28±1 817.92 years ago, around the commencement of desertification in the Sahara (5 000 to 7 000 BP) (Marshall, 2000). Due to the excellent tolerance of donkeys for deserts, it is reasonable to assume that donkeys, rather than horses, were the major means of transport across the Sahara during the initial period of desertification. Given the biased Clade I lineage distribution in the sub-Sahara, this migration time provides possible evidence for the "pastoralist hypothesis": i. e., pastoralists in northeastern Africa domesticated Clade I lineage donkeys in response to the increasing aridity in the Sahara.
Donkeys are thought to be have been brought into Europe by the second millennium BC, possibly through viticulture introduction, as the donkey is associated with the Syrian god of wine, Dionysus (Meutchieye et al., 2017). Interestingly, the majority of sequences from the Iberian Peninsula belonged to the Clade II lineage, much different from those collected in other parts of Europe (Figure 3A). Additionally, the estimated time of arrival in the Iberian Peninsula was 5 336.8±1 652.56 years ago, much earlier than the known history of ~4 000 years (Cardoso et al., 2013). Considering the dominance of the Clade II lineage in Morocco, it could be assumed that donkeys migrated into the Iberian Peninsula directly through the Strait of Gibraltar. Furthermore, the unequal distribution of Clade I and Clade II in Europe may account for the pronounced footprint in American following the complex process of colonization, where an apparent geographic structure has been observed (Jordana et al., 2016; Xia et al., 2019). The Clade II lineage also migrated dominantly along East Africa, implying a potential association with ancient social expansion, such as the Bantu expansion (Hiernaux, 1968; Herrera & Garcia-Bertrand, 2018), and ancient trade routes of empires such as the Punt and Aksum (Andrews, 2017; Mark, 2011; Mukhtār, 1981).
As populations expand from centers of origin, genetic diversity is lost as a consequence of the limited numbers of individuals involved in the expansionist movement. Although no apparent geographic structure was detected across China, where the two lineages show an almost equal proportion (Figure 3A), discrepancies in diversity decay may exist between the two lineages if non-simultaneous expansion into China occurred, thereby suggesting possible private migration routes. The Xinjiang Province may be a transportation center as it demonstrated almost the highest nucleotide diversity for both lineages (Figure 3B, C), consistent with previous studies on genetic diversity among various Chinese breeds (Ge et al., 2007) and written records (Xie, 1987). The genetic diversity of Clade I remained relatively high in the Qinghai and Henan provinces, but declined gradually towards north and south, respectively (Figure 3B), consistent with a migration route from Xinjiang to the Guanzhong Plain through the Ningxia and Gansu provinces, and finally other areas of China inferred previously (Ge et al., 2007). The genetic diversity of Clade II remained relatively high in the Inner Mongolia and Yunnan provinces, but declined substantially in the middle region of China (Figure 3C), consistent with previously inferred migration routes from Xinjiang to Inner Mongolia (north forward) and to Yunnan (south forward) (Ge et al., 2007). These potential migration routes partially match written records, which suggest that the "Taihang donkey" was introduced into Hebei Province through Inner Mongolia; the "Yunnan donkey" was introduced from Xinjiang to Yunnan, as it shares many morphological traits with the Xinjiang donkey; and many other breeds may have been formed through migration routes from Xinjiang to Gansu and then Guanzhong Plain, Henan, and other areas of China (Xie, 1987). Therefore, it is likely that the two lineages expanded into China through different migration events, although from the same transportation center (Xinjiang).
Two clearly separated domesticated donkey clades
Different demographic dynamics of two lineages
Discrepancies of haplotype structure
Biased distribution of two lineages in Africa
Different routes of two lineages during expansion to China
|ZoolRes-41-1-51-Supplementary Tables and Figures.zip|