26 resultados para phylogenomics
Resumo:
A set of Chinese muntjac (Muntiacus reevesi) chromosome-specific paints has been hybridized onto the metaphases of sika deer (Cervus nippon, CNI, 2n = 66), red deer (Cervus elaphus, CEL, 2n = 62) and tufted deer (Elaphodus cephalophus, ECE, 2n = 47). Thir
Resumo:
BACKGROUND: The evolutionary relationships of modern birds are among the most challenging to understand in systematic biology and have been debated for centuries. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders, and used the genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomics analyses (Jarvis et al. in press; Zhang et al. in press). Here we release assemblies and datasets associated with the comparative genome analyses, which include 38 newly sequenced avian genomes plus previously released or simultaneously released genomes of Chicken, Zebra finch, Turkey, Pigeon, Peregrine falcon, Duck, Budgerigar, Adelie penguin, Emperor penguin and the Medium Ground Finch. We hope that this resource will serve future efforts in phylogenomics and comparative genomics. FINDINGS: The 38 bird genomes were sequenced using the Illumina HiSeq 2000 platform and assembled using a whole genome shotgun strategy. The 48 genomes were categorized into two groups according to the N50 scaffold size of the assemblies: a high depth group comprising 23 species sequenced at high coverage (>50X) with multiple insert size libraries resulting in N50 scaffold sizes greater than 1 Mb (except the White-throated Tinamou and Bald Eagle); and a low depth group comprising 25 species sequenced at a low coverage (~30X) with two insert size libraries resulting in an average N50 scaffold size of about 50 kb. Repetitive elements comprised 4%-22% of the bird genomes. The assembled scaffolds allowed the homology-based annotation of 13,000 ~ 17000 protein coding genes in each avian genome relative to chicken, zebra finch and human, as well as comparative and sequence conservation analyses. CONCLUSIONS: Here we release full genome assemblies of 38 newly sequenced avian species, link genome assembly downloads for the 7 of the remaining 10 species, and provide a guideline of genomic data that has been generated and used in our Avian Phylogenomics Project. To the best of our knowledge, the Avian Phylogenomics Project is the biggest vertebrate comparative genomics project to date. The genomic data presented here is expected to accelerate further analyses in many fields, including phylogenetics, comparative genomics, evolution, neurobiology, development biology, and other related areas.
Resumo:
BACKGROUND: Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. FINDINGS: Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. CONCLUSIONS: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.
Resumo:
Affiliation: H. Philippe: Département de Biochimie, Université de Montréal
Resumo:
Affiliation: Département de Biochimie, Université de Montréal
Resumo:
Bien que les champignons soient régulièrement utilisés comme modèle d'étude des systèmes eucaryotes, leurs relations phylogénétiques soulèvent encore des questions controversées. Parmi celles-ci, la classification des zygomycètes reste inconsistante. Ils sont potentiellement paraphylétiques, i.e. regroupent de lignées fongiques non directement affiliées. La position phylogénétique du genre Schizosaccharomyces est aussi controversée: appartient-il aux Taphrinomycotina (précédemment connus comme archiascomycetes) comme prédit par l'analyse de gènes nucléaires, ou est-il plutôt relié aux Saccharomycotina (levures bourgeonnantes) tel que le suggère la phylogénie mitochondriale? Une autre question concerne la position phylogénétique des nucléariides, un groupe d'eucaryotes amiboïdes que l'on suppose étroitement relié aux champignons. Des analyses multi-gènes réalisées antérieurement n'ont pu conclure, étant donné le choix d'un nombre réduit de taxons et l'utilisation de six gènes nucléaires seulement. Nous avons abordé ces questions par le biais d'inférences phylogénétiques et tests statistiques appliqués à des assemblages de données phylogénomiques nucléaires et mitochondriales. D'après nos résultats, les zygomycètes sont paraphylétiques (Chapitre 2) bien que le signal phylogénétique issu du jeu de données mitochondriales disponibles est insuffisant pour résoudre l'ordre de cet embranchement avec une confiance statistique significative. Dans le Chapitre 3, nous montrons à l'aide d'un jeu de données nucléaires important (plus de cent protéines) et avec supports statistiques concluants, que le genre Schizosaccharomyces appartient aux Taphrinomycotina. De plus, nous démontrons que le regroupement conflictuel des Schizosaccharomyces avec les Saccharomycotina, venant des données mitochondriales, est le résultat d'un type d'erreur phylogénétique connu: l'attraction des longues branches (ALB), un artéfact menant au regroupement d'espèces dont le taux d'évolution rapide n'est pas représentatif de leur véritable position dans l'arbre phylogénétique. Dans le Chapitre 4, en utilisant encore un important jeu de données nucléaires, nous démontrons avec support statistique significatif que les nucleariides constituent le groupe lié de plus près aux champignons. Nous confirmons aussi la paraphylie des zygomycètes traditionnels tel que suggéré précédemment, avec support statistique significatif, bien que ne pouvant placer tous les membres du groupe avec confiance. Nos résultats remettent en cause des aspects d'une récente reclassification taxonomique des zygomycètes et de leurs voisins, les chytridiomycètes. Contrer ou minimiser les artéfacts phylogénétiques telle l'attraction des longues branches (ALB) constitue une question récurrente majeure. Dans ce sens, nous avons développé une nouvelle méthode (Chapitre 5) qui identifie et élimine dans une séquence les sites présentant une grande variation du taux d'évolution (sites fortement hétérotaches - sites HH); ces sites sont connus comme contribuant significativement au phénomène d'ALB. Notre méthode est basée sur un test de rapport de vraisemblance (likelihood ratio test, LRT). Deux jeux de données publiés précédemment sont utilisés pour démontrer que le retrait graduel des sites HH chez les espèces à évolution accélérée (sensibles à l'ALB) augmente significativement le support pour la topologie « vraie » attendue, et ce, de façon plus efficace comparée à d'autres méthodes publiées de retrait de sites de séquences. Néanmoins, et de façon générale, la manipulation de données préalable à l'analyse est loin d’être idéale. Les développements futurs devront viser l'intégration de l'identification et la pondération des sites HH au processus d'inférence phylogénétique lui-même.
Resumo:
Resolving the relationships between Metazoa and other eukaryotic groups as well as between metazoan phyla is central to the understanding of the origin and evolution of animals. The current view is based on limited data sets, either a single gene with many species (e.g., ribosomal RNA) or many genes but with only a few species. Because a reliable phylogenetic inference simultaneously requires numerous genes and numerous species, we assembled a very large data set containing 129 orthologous proteins (similar to30,000 aligned amino acid positions) for 36 eukaryotic species. Included in the alignments are data from the choanoflagellate Monosiga ovata, obtained through the sequencing of about 1,000 cDNAs. We provide conclusive support for choanoflagellates as the closest relative of animals and for fungi as the second closest. The monophyly of Plantae and chromalveolates was recovered but without strong statistical support. Within animals, in contrast to the monophyly of Coelomata observed in several recent large-scale analyses, we recovered a paraphyletic Coelamata, with nematodes and platyhelminths nested within. To include a diverse sample of organisms, data from EST projects were used for several species, resulting in a large amount of missing data in our alignment (about 25%). By using different approaches, we verify that the inferred phylogeny is not sensitive to these missing data. Therefore, this large data set provides a reliable phylogenetic framework for studying eukaryotic and animal evolution and will be easily extendable when large amounts of sequence information become available from a broader taxonomic range.
Resumo:
The Escherichia coli O26 serogroup includes important food-borne pathogens associated with human and animal diarrheal disease. Current typing methods have revealed great genetic heterogeneity within the O26 group; the data are often inconsistent and focus only on verotoxin (VT)-positive O26 isolates. To improve current understanding of diversity within this serogroup, the genomic relatedness of VT-positive and -negative O26 strains was assessed by comparative genomic indexing. Our results clearly demonstrate that irrespective of virulence characteristics and pathotype designation, the O26 strains show greater genomic similarity to each other than to any other strain included in this study. Our data suggest that enteropathogenic and VT-expressing E. coli O26 strains represent the same clonal lineage and that W-expressing E. coli O26 strains have gained additional virulence characteristics. Using this approach, we established the core genes which are central to the E. coli species and identified regions of variation from the E. coli K-12 chromosomal backbone.
Resumo:
Alignment-free methods, in which shared properties of sub-sequences (e.g. identity or match length) are extracted and used to compute a distance matrix, have recently been explored for phylogenetic inference. However, the scalability and robustness of these methods to key evolutionary processes remain to be investigated. Here, using simulated sequence sets of various sizes in both nucleotides and amino acids, we systematically assess the accuracy of phylogenetic inference using an alignment-free approach, based on D2 statistics, under different evolutionary scenarios. We find that compared to a multiple sequence alignment approach, D2 methods are more robust against among-site rate heterogeneity, compositional biases, genetic rearrangements and insertions/deletions, but are more sensitive to recent sequence divergence and sequence truncation. Across diverse empirical datasets, the alignment-free methods perform well for sequences sharing low divergence, at greater computation speed. Our findings provide strong evidence for the scalability and the potential use of alignment-free methods in large-scale phylogenomics.
Resumo:
Background The obligate intracellular bacterium Chlamydia pneumoniae is a common respiratory pathogen, which has been found in a range of hosts including humans, marsupials and amphibians. Whole genome comparisons of human C. pneumoniae have previously highlighted a highly conserved nucleotide sequence, with minor but key polymorphisms and additional coding capacity when human and animal strains are compared. Results In this study, we sequenced three Australian human C. pneumoniae strains, two of which were isolated from patients in remote indigenous communities, and compared them to all available C. pneumoniae genomes. Our study demonstrated a phylogenetically distinct human C. pneumoniae clade containing the two indigenous Australian strains, with estimates that the most recent common ancestor of these strains predates the arrival of European settlers to Australia. We describe several polymorphisms characteristic to these strains, some of which are similar in sequence to animal C. pneumoniae strains, as well as evidence to suggest that several recombination events have shaped these distinct strains. Conclusions Our study reveals a greater sequence diversity amongst both human and animal C. pneumoniae strains, and suggests that a wider range of strains may be circulating in the human population than current sampling indicates.
Resumo:
分子系统发育分析的主要任务包括:(1)帮助建立生命之树(tree of life);(2)追踪基因和基因家族(gene family)的起源和进化, 以获知基因在进化过程中的功能分化和伴随发生的重要分子事件(key molecular events)和形态性状的关键创新(key innovation)。这两个方面在本研究中都有所涉及。对于前者,选用植物线粒体matR基因重建被子植物蔷薇类群的系统发育关系;对于后者,则以SET基因超家族为例,探讨其在真核生物中的进化分类以及与功能多样性的关系。 I 蔷薇类的分子系统学 蔷薇类(rosids)是基于分子数据建立的被子植物的主要分支之一,包含13个目,大约三分之一的被子植物物种。两个主要蔷薇类内部分支是豆类fabids(包含7个目)和锦葵类malvids(包含3个目)。然而,这两个分支内部,以及这两个分支与蔷薇类基部类群,包括牻牛儿苗目(Geraniales)、桃金娘目(Myrtales)和流苏子目(Crossosomatales)之间的关系大多是不清楚的。本研究中,我们选取174个物种来代表72个蔷薇类(rosids)的科,利用两个数据集,即线粒体matR单基因数据集和包括线粒体matR基因、两个质体基因(rbcL、 atpB)和一个核基因(18S rDNA) 的4基因数据集,重建蔷薇类在科以上分类阶元水平的系统发育关系。同时,还对线粒体matR基因的进化特征和用于大尺度系统发育分析的适合度和潜力进行了评价。 线粒体matR单基因数据支持malvids和大多数蔷薇类目的单系性质,然而,豆类(fabids)成员没有形成一个分支,其COM亚支,包括卫矛目(Celastrales)、酢浆草目(Oxalidales)、金虎尾目(Malpighiales)和蒜树科(Huaceae),分辨为锦葵类(malvids)的姐妹群。这个关系在最近根据花结构特征曾被提出过,但从未在之前的分子系统发育分析中得到分辨。4基因数据集支持首先是牻牛儿苗目(Geraniales),接着是桃金娘目(Myrtales)作为蔷薇类(rosids)的最基部的分支;流苏子目(Crossosomatales)是锦葵类(malvids)姐妹群,以及蔷薇类(rosids)的核心部分包括豆类(fabids),锦葵类(malvids)和流苏子目(Crossosomatales)。线粒体matR基因的进化特征分析显示,与两个叶绿体基因(rbcL 和atpB)比较,同义替代速率约是它们的1/4,而非同义替代速率接近于自身的同义替代速率,表明matR 基因具有松弛的选择压力。线粒体matR基因相对慢速的进化使非同源相似(homoplasious)突变减少,提高了系统发育信息的质量,同时,松弛的选择压力使非同义替代数量增加,弥补了慢速进化导致的系统发育信息数量不足的缺陷,这两个方面的结合使线粒体matR基因非常适用于被子植物在科以上水平的系统发育研究。 II SET基因超家族的系统发育基因组学分析 SET基因超家族基因编码含有SET结构域的蛋白,在真核生物中,SET-domain蛋白一般是多结构域(multi-domain)的。SET-domain蛋白具有对组蛋白H3和H4的N末端尾部进行赖氨酸残基甲基化修饰的酶活性;从异染色质形成到基因转录,甲基化的组蛋白广泛影响染色质水平的基因调控。依据SET结构域一级序列的相似性和结构域组织(domain architecture)特征,目前,SET-domain基因超家族被划分为4-7个家族。由于这些划分或者使用动物或者使用植物SET基因,只有少数其它类群的物种加入分析,因此这样的划分可能是不完整的。本研究采用系统发育基 因组学方法(phylogenomic approach),在真核生物范围内广泛取样,期望获得相对完整的SET-domain基因家族的 进化分类方案,在此基础上加深理解SET-domain基因的进化机制和功能多样性。 在提取了17个物种,代表5个真核超群的SET蛋白序列基础上,系统发育分析结合“结构域组织特征”鉴别了9个SET基因家族,其中一个是新的SET基因家族。以前的SET8和Class VI家族,及SMYD和SUV4-20家族分别合并为一个家族。大部分家族在进化过程中发生了2次以上的基因重复事件,通过获得不同的结构域产生具有不同功能的新基因。一个SET基因家族在进化过程中推测发生了从脊椎动物祖先向盘基网柄菌(Dictyostelium discoideum)的水平基因转移。
Resumo:
稻属(OrvzaL.)是禾本科(Poaceae)中的重要植物类群,包含20多个野生种和两个栽培种,共有十个基因组类型,即A,B,C,E,F,G,BC,CD,HJ和HK,蕴藏了极为丰富的遗传资源,是水稻遗传改良的重要基因库。考虑到该属现存物种中的多倍体都是由二倍体杂交起源的,因此,弄清二倍体基因组之间的进化关系对于正确理解整个稻属的进化历史至关重要,同时也为稻属及其近缘类群的进化生物学、比较基因组学和功能基因学研究等提供了一个重要的工作基础。迄今,对稻属各基因组之间的系统发育关系还没有一致的结论,特别是对A、B和C基因组三者之间的关系,以及稻属基部类群的归属问题还存在争议。本研究选取来自不同二倍体基因组的6个稻属物种为研究对象,以近缘属Leersia中的L,tisserantti作外类群,通过对基因组水平的多基因序列数据的详尽分析,探讨了稻属二倍体基因组之间的亲缘关系问题,基因树与基因树之间冲突的机理,以及利用基因组水平的多基因序列做系统发育分析的方法,主要研究结果如下。 利用已完成的水稻两亚种(O.sativaL.ssp. indica和O.sativaL.ssp.japonica)的全基因组序列,筛选并扩增出遍布核基因组12条染色体的142个单拷贝核基因片段。通过对全部基因位点的合并分析,我们得到了一棵有完全分辨并得到显著统计支持的系统树。分别提取各基因的外显子区、内含子区和第三密码子进行合并建树时发现,除了合并外显子区的MP分析以外,所得系统树的树形均不变,说明这棵树基本上不会因为选取基因组不同区域或碱基位点而改变,尽管不同区域或碱基位点受到不同的选择约束力。以基因为单位进行放回式抽样也强烈支持合并建树的分析结果,表明多基因合并序列的系统发育估计并没有受到少数特殊基因的支配。为了考察基因组内物种取样对建树的影响,我们增加了2个A基因组物种以及C基因组的另外两个物种,随机选取其中的62个基因位点进行扩增和测序(增加的O.sativa的序列来自BGI-RIS数据库)。将全部II个物种62个基因位点的序列合并建树分析,得到基因组之间的进化关系均未改变。我们进一步评估了合并数据的系统误差,结果发现,合并数据的系统发育重建也未受到系统误差的影响。综上所述,本研究通过系统发育基因组学方法所得到的系统树反映了类群真实的进化关系。 为了深入探讨以往研究中出现相互矛盾的系统发育关系的原因,我们对142个基因位点分别做了单独的建树分析,并用系统发育网络方法分析了数据中基因之间系统发育信息矛盾的集中位置及其矛盾程度。基于单基因的建树分析及系统误差分析,我们排除了随机误差和系统误差直接造成基因之间信息冲突的可能性。基于溯祖理论( Coalescence theory)的进一步分析表明,稻属进化过程中发生了两次世代间隔较短的连续分化事件,由于祖先居群较大引起基因的谱系分选,进而使得在利用现有物种基因序列来重建这些分化事件时基因树不能正确反映物种树,且呈现出基因组水平的基因树冲突现象。这两次间隔较短的连续分化事件分别对应了稻属中两次物种快速分化过程,整个稻属基因组的多样性几乎都是在这两次物种快速分化过程中形成的。随机抽样分析表明,需要大量的分子序列数据才能正确分辨稻属二倍体基因组的系统发育关系(若取95%的概率,则至少需要120个基因或50kb的随机碱基位点)。本研究用基因组水平的多基因合并数据克服了谱系分选对构建系统树所带来的“噪音”,在存在广泛单基因系统发育信息矛盾的前提下获得了对物种树的正确估计,这充分证明系统发育基因组学方法在解决快速分化类群的进化关系问题中有着巨大潜力和广阔的应用前景。 基于本文所采用的142个核基因,我们初步探讨了利用多基因序列数据构建系统树时如何进行模型选择和插入缺失编码等问题,并评估了数据缺失对基因组水平系统发育重建的影响。结果表明,对合并数据而言,混合模型比单一模型能更好的拟合数据的进化模式;找到合并数据中异质性的根源并做出适当的数据分割是成功运用混合模型的关键;某些模型成分在提高模型对数据的适合度上发挥着重要作用,尤其要考虑位点之间以及谱系之间的突变速率异质性。我们认为,在设置模型时,最复杂的不一定是最好的,把握数据中最重要的进化特征远比简单的增加模型的复杂度重要。插入缺失的编码分析表明,编码后显著增加了对A基因组和B基因组聚为一枝的支持,但对稻属基部类群的分辨状况改善不明显。另外,我们通过去除数据缺失比例较大的类群来降低数据缺 失对系统发育推断的影响,结果所得的系统发育关系不变,支持率也仅有极微小的变化,说明基因组水平的多基因数据由于具有丰富的系统发育信息,因而对数据缺失具有很好的缓冲能力。