46 resultados para Whole Genome Sequences
Resumo:
Background: Despite the small number of ursid species, bear phylogeny has long been a focus of study due to their conservation value, as all bear genera have been classified as endangered at either the species or subspecies level. The Ursidae family repre
Resumo:
The human genome project has been recently complemented by whole-genome assessment sequence of 32 mammals and 24 nonmammalian vertebrate species suitable for comparative genomic analyses. Here we anticipate a precipitous drop in costs and increase in sequ
Resumo:
Mitochondria are essential for cellular energy production in most eukaryotic organisms. However, when glucose is abundant, yeast species that underwent whole-genome duplication (WGD) mostly conduct fermentation even under aerobic conditions, and most can
Resumo:
Transcription factor binding sites (TFBS) play key roles in genebior 6.8 wavelet expression and regulation. They are short sequence segments with de¯nite structure and can be recognized by the corresponding transcription factors correctly. From the viewpoint of statistics, the candidates of TFBS should be quite di®erent from the segments that are randomly combined together by nucleotide. This paper proposes a combined statistical model for ¯nding over- represented short sequence segments in di®erent kinds of data set. While the over-represented short sequence segment is described by position weight matrix, the nucleotide distribution at most sites of the segment should be far from the background nucleotide distribution. The central idea of this approach is to search for such kind of signals. This algorithm is tested on 3 data sets, including binding sites data set of cyclic AMP receptor protein in E.coli, PlantProm DB which is a non-redundant collection of proximal promoter sequences from di®erent species, collection of the intergenic sequences of the whole genome of E.Coli. Even though the complexity of these three data sets is quite di®erent, the results show that this model is rather general and sensible.
Resumo:
With complete sets of chromosome-specific painting probes derived from flow-sorted chromosomes of human and grey squirrel (Sciurus carolinensis), the whole genome homologies between human and representatives of tree squirrels (Sciurus carolinensis, Callosciurus erythraeus), flying squirrels (Petaurista albiventer) and chipmunks (Tamias sibiricus) have been defined by cross-species chromosome painting. The results show that, unlike the highly rearranged karyotypes of mouse and rat, the karyotypes of squirrels are highly conserved. Two methods have been used to reconstruct the genome phylogeny of squirrels with the laboratory rabbit (Oryctolagus cuniculus) as the out-group: ( 1) phylogenetic analysis by parsimony using chromosomal characters identified by comparative cytogenetic approaches; ( 2) mapping the genome rearrangements onto recently published sequence-based molecular trees. Our chromosome painting results, in combination with molecular data, show that flying squirrels are phylogenetically close to New World tree squirrels. Chromosome painting and G-banding comparisons place chipmunks ( Tamias sibiricus), with a derived karyotype, outside the clade comprising tree and flying squirrels. The superorder Glires (order Rodentia + order Lagomorpha) is firmly supported by two conserved syntenic associations between human chromosomes 1 and 10p homologues, and between 9 and 11 homologues.
Resumo:
We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped superscaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000 - 40,000. Only 2% - 3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism ( SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.
Sequencing, annotation and comparative analysis of nine BACs of giant panda (Ailuropoda melanoleuca)
Resumo:
A 10-fold BAC library for giant panda was constructed and nine BACs were selected to generate finish sequences. These BACs could be used as a validation resource for the de novo assembly accuracy of the whole genome shotgun sequencing reads of giant panda newly generated by the Illumina GA sequencing technology. Complete sanger sequencing, assembly, annotation and comparative analysis were carried out on the selected BACs of a joint length 878 kb. Homologue search and de novo prediction methods were used to annotate genes and repeats. Twelve protein coding genes were predicted, seven of which could be functionally annotated. The seven genes have an average gene size of about 41 kb, an average coding size of about 1.2 kb and an average exon number of 6 per gene. Besides, seven tRNA genes were found. About 27 percent of the BAC sequence is composed of repeats. A phylogenetic tree was constructed using neighbor-join algorithm across five species, including giant panda, human, dog, cat and mouse, which reconfirms dog as the most related species to giant panda. Our results provide detailed sequence and structure information for new genes and repeats of giant panda, which will be helpful for further studies on the giant panda.
Resumo:
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.
Resumo:
Background: Giardia are a group of widespread intestinal protozoan parasites in a number of vertebrates. Much evidence from G. lamblia indicated they might be the most primitive extant eukaryotes. When and how such a group of the earliest branching unicellular eukaryotes developed the ability to successfully parasitize the latest branching higher eukaryotes (vertebrates) is an intriguing question. Gene duplication has long been thought to be the most common mechanism in the production of primary resources for the origin of evolutionary novelties. In order to parse the evolutionary trajectory of Giardia parasitic lifestyle, here we carried out a genome-wide analysis about gene duplication patterns in G. lamblia. Results: Although genomic comparison showed that in G. lamblia the contents of many fundamental biologic pathways are simplified and the whole genome is very compact, in our study 40% of its genes were identified as duplicated genes. Evolutionary distance analyses of these duplicated genes indicated two rounds of large scale duplication events had occurred in G. lamblia genome. Functional annotation of them further showed that the majority of recent duplicated genes are VSPs (Variant-specific Surface Proteins), which are essential for the successful parasitic life of Giardia in hosts. Based on evolutionary comparison with their hosts, it was found that the rapid expansion of VSPs in G. lamblia is consistent with the evolutionary radiation of placental mammals. Conclusions: Based on the genome-wide analysis of duplicated genes in G. lamblia, we found that gene duplication was essential for the origin and evolution of Giardia parasitic lifestyle. The recent expansion of VSPs uniquely occurring in G. lamblia is consistent with the increment of its hosts. Therefore we proposed a hypothesis that the increment of Giradia hosts might be the driving force for the rapid expansion of VSPs.
Resumo:
The Sox gene family is found in a broad range of animal taxa and encodes important gene regulatory proteins involved in a variety of developmental processes. We have obtained clones representing the HMG boxes of twelve Sox genes from grass carp (Ctenopharyngodon idella), one of the four major domestic carps in China. The cloned Sox genes belong to group B1, B2 and C. Our analyses show that whereas the human genome contains a single copy of Sox4, Sox11 and Sox14, each of these genes has two co-orthologs in grass carp, and the duplication of Sox4 and Sox11 occurred before the divergence of grass carp and zebrafish, which support the "fish-specific whole-genome duplication" theory. An estimation for the origin of grass carp based on the molecular clock using Sox1, Sox3 and Sox11 genes as markers indicates that grass carp (subfamily Leuciscinae) and zebrafish (subfamily Danioninae) diverged approximately 60 million years ago. The potential uses of Sox genes as markers in revealing the evolutionary history of grass carp are discussed.
Resumo:
Full-length and partial genome sequences of four members of the genus Aquareovirus, family Reoviridae (Golden shiner reovirus, Grass carp reovirus, Striped bass reovirus and golden ide reovirus) were characterized. Based on sequence comparison, the unclassified Grass carp reovirus was shown to be a member of the species Aquareovirus C The status of golden ide reovirus, another unclassified aquareovirus, was also examined. Sequence analysis showed that it did not belong to the species Aquareovirus A or C, but assessment of its relationship to the species Aquareovirus B, D, E and F was hampered by the absence of genetic data from these species. In agreement with previous reports of ultrastructural resemblance between aquareoviruses and orthoreoviruses, genetic analysis revealed homology in the genes of the two groups. This homology concerned eight of the 11 segments of the aquareovirus genome (amino acid identity 17-42%), and similar genetic organization was observed in two other segments. The conserved terminal sequences in the genomes of members of the two groups were also similar. These data are undoubtedly an indication of the common evolutionary origin of these viruses. This clear genetic relatedness between members of distinct genera is unique within the family Reoviridae. Such a genetic relationship is usually observed between members of a single genus. However, the current taxonomic classification of aquareoviruses and orthoreoviruses in two different genera is supported by a number of characteristics, including their distinct G+C contents, unequal numbers of genome segments, absence of an antigenic relationship, different cytopathic effects and specific econiches.
Resumo:
Vibrio anguillarum is a common bacterial pathogen in fish. However, little is known about its pathogenic mechanism, in part, because the entire genome has not been completely sequenced. We constructed a fosmid library for V. anguillarum containing 960 clones with an average insert size of 37.7 kb and 8.6-fold genome coverage. We characterized the library by end-sequencing 50 randomly selected clones. This generated 93 sequences with a total length of 57 485 by covering 1.4% of the whole genome. Of these sequences, 58 (62.4%) were homologous to known genes, 30 (32.3%) were genes with hypothetical functions, and the remaining 5 (5.3%) were unknown genes. We demonstrated the utility of this library by PCR screening of 10 genes. This resulted in an average of 6.2 fosmid clones per screening. This fosmid library offers a new tool for gene screening and cloning of V. anguillarum, and for comparative genomic studies among Vibrio species.
Resumo:
Arthrospira (Spirulina) (Setchell& Gardner) is an important cyanobacterium not only in its nutritional potential but in its special biological characteristics. An unbiased fosmid library of Arthrospira maxima FACHB438 that contains 4300 clones was constructed. The size distribution of insert fragments is from 15.5 to 48.9 kb and the average size is 37.6 kb. The recombination frequency is 100%. Therefore the library is 29.9 equivalents to the Arthrospira genome size of 5.4 Mb. A total of 719 sample clones were randomly chosen from the library and 602 available sequences, which consisted of 307,547 bases, covering 5.70% of the whole genome. The codon usage of A. maxima was not strongly biased. GC content at the first position of codons (46.9%) was higher than the second (39.8%) and the third (45.5%) positions. GC content of the genome was 43.6%. Of these sequences, 287 (47.7%) showed high similarities to known genes, 63 (10.5%) to hypothetical genes and the remaining 252 (41.8%) had no significant similarities. The assigned genes were classified into 22 categories with respect to different biological roles. Remarkably, the high presence of 25 sequences (4.2%) encoding reverse transcriptase indicates the RT gene may have multiple copies in the A. maxima genome and might play an important role in the evolutionary history and metabolic regulation. In addition, the sequences encoding the ATP-binding cassette transport system and the two-component signal transduction system were the second and third most frequent genes, respectively. These genomic features provide some clues as to the mechanisms by which this organism adapts to the high concentration of bicarbonate and to the high pH environment.
Resumo:
Galloanserae is an ancient and diverse avian group, for which comprehensive molecular evidence relevant to phylogenetic analysis in the context of molecular chronology is lacking. In this study, we present two additional mitochondrial genome sequences of Galloanserae (the whistling duck, Dendrocygna javanica, and the black swan, Cygnus atratus) to broaden the scope of molecular phylogenetic reconstruction. The lengths of the whistling duck's and black swan's mitochondrial genomes are 16,753 and 16,748 bases, respectively. Phylogenetic analyses suggest that Dendrocygna is more likely to be in a basal position of the branch consisting of Anatinae and Anserinae, an affiliation that does not conform to its traditional classification. Bayesian approaches were employed to provide a rough timescale for Galloanserae evolution. In general, a narrow range of 95% confidence intervals gave younger estimates than those based on limited genes and estimated that at least two lineages originated before the Coniacian epoch around 90 MYA, well before the Cretaceous-Tertiary boundary. In addition, these results, which were compatible with estimates from fossil evidence, also imply that the origin of numerous genera in Anseriformes took place in the late Oligocene to early Miocene. Taken together, the results presented here provide a working framework for future research on Galloanserae evolution, and they underline the utility of whole mitochondrial genome sequences for the resolution of deep divergence.
Resumo:
The Chinese pangolin (Manis pentadactyla), a representative species of the order Pholidota, has been enlisted in the mammalian whole-genome sequencing project mainly because of its phylogenetic importance. Previous studies showed that the diploid number o