790 resultados para Datasets
Resumo:
A recent phylogenetic study based on multiple datasets is used as the framework for a more detailed examination of one of the ten molecularly circumscribed groups identified, the Ophrys fuciflora aggregate. The group is highly morphologically variable, prone to phenotypic convergence, shows low levels of sequence divergence and contains an unusually large proportion of threatened taxa, including the rarest Ophrys species in the UK. The aims of this study were to (a) circumscribe minimum resolvable genetically distinct entities within the O. fuciflora aggregate, and (b) assess the likelihood of gene flow between genetically and geographically distinct entities at the species and population levels. Fifty-five accessions sampled in Europe and Asia Minor from the O. fuciflora aggregate were studied using the AFLP genetic fingerprinting technique to evaluate levels of infraspecific and interspecific genetic variation and to assess genetic relationships between UK populations of O. fuciflora s.s. in Kent and in their continental European and Mediterranean counterparts. The two genetically and geographically distinct groups recovered, one located in England and central Europe and one in south-eastern Europe, are incongruent with current species delimitation within the aggregate as a whole and also within O. fuciflora s.s. Genetic diversity is higher in Kent than in the rest of western and central Europe. Gene flow is more likely to occur between populations in closer geographical proximity than those that are morphologically more similar. Little if any gene flow occurs between populations located in the south-eastern Mediterranean and those dispersed throughout the remainder of the distribution, revealing a genetic discontinuity that runs north-south through the Adriatic. This discontinuity is also evident in other clades of Ophrys and is tentatively attributed to the long-term influence of prevailing winds on the long-distance distribution of pollinia and especially seeds. A cline of gene flow connects populations from Kent and central and southern Europe; these individuals should therefore be considered part of an extensive meta-population. Gene flow is also evident among populations from Kent, which appear to constitute a single metapopulation. They show some evidence of hybridization, and possibly also introgression, with O. apifera.
Resumo:
Background and Aims Highly variable, yet possibly convergent, morphology and lack of sequence variation have severely hindered production of a robust phylogenetic framework for the genus Ophrys. The aim of this study is to produce this framework as a basis for more rigorous species delimitation and conservation recommendations. Methods Nuclear and plastid DNA sequencing and amplified fragment length polymorphism (AFLP) were performed on 85 accessions of Ophrys, spanning the full range of species aggregates currently recognized. Data were analysed using a combination of parsimony and Bayesian tree-building techniques and by principal coordinates analysis. Key Results Complementary phylogenetic analyses and ordinations using nuclear, plastid and AFLP datasets identify ten genetically distinct groups (six robust) within the genus that may in turn be grouped into three sections (treated as subgenera by some authors). Additionally, genetic evidence is provided for a close relationship between the O. tenthredinifera, O. bombyliflora and O. speculum groups. The combination of these analytical techniques provides new insights into Ophrys systematics, notably recognition of the novel O. umbilicata group. Conclusions Heterogeneous copies of the nuclear ITS region show that some putative Ophrys species arose through hybridization rather than divergent speciation. The supposedly highly specific pseudocopulatory pollination syndrome of Ophrys is demonstrably 'leaky', suggesting that the genus has been substantially over-divided at the species level.
Resumo:
The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon-known as heterotachy-can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.
Resumo:
A phylogenetic approach was taken to investigate the evolutionary history of seed appendages in the plant family Polygalaceae (Fabales) and determine which factors might be associated with evolution of elaiosomes through comparisons to abiotic (climate) and biotic (ant species number and abundance) timelines. Molecular datasets from three plastid regions representing 160 species were used to reconstruct a phylogenetic tree of the order Fabales, focusing on Polygalaceae. Bayesian dating methods were used to estimate the age of the appearance of ant-dispersed elaiosomes in Polygalaceae, shown by likelihood optimizations to have a single origin in the family. Topology-based tests indicated a diversification rate shift associated with appearance of caruncular elaiosomes. We show that evolution of the caruncular elaiosome type currently associated with ant dispersal occurred 54.0-50.5 million year ago. This is long after an estimated increase in ant lineages in the Late Cretaceous based on molecular studies, but broadly concomitant with increasing global temperatures culminating in the Late Paleocene-Early Eocene thermal maxima. These results suggest that although most major ant clades were present when elaiosomes appeared, the environmental significance of elaiosomes may have been an important factor in success of elaiosome-bearing lineages. Ecological abundance of ants is perhaps more important than lineage numbers in determining significance of ant dispersal. Thus, our observation that elaiosomes predate increased ecological abundance of ants inferred from amber deposits could be indicative of an initial abiotic environmental function.
Resumo:
Stable isotope labeling combined with MS is a powerful method for measuring relative protein abundances, for instance, by differential metabolic labeling of some or all amino acids with 14N and 15N in cell culture or hydroponic media. These and most other types of quantitative proteomics experiments using high-throughput technologies, such as LC-MS/MS, generate large amounts of raw MS data. This data needs to be processed efficiently and automatically, from the mass spectrometer to statistically evaluated protein identifications and abundance ratios. This paper describes in detail an approach to the automated analysis of uniformly 14N/15N-labeled proteins using MASCOT peptide identification in conjunction with the trans-proteomic pipeline (TPP) and a few scripts to integrate the analysis workflow. Two large proteomic datasets from uniformly labeled Arabidopsis thaliana were used to illustrate the analysis pipeline. The pipeline can be fully automated and uses only common or freely available software.
Resumo:
Nested clade phylogeographic analysis (NCPA) is a popular method for reconstructing the demographic history of spatially distributed populations from genetic data. Although some parts of the analysis are automated, there is no unique and widely followed algorithm for doing this in its entirety, beginning with the data, and ending with the inferences drawn from the data. This article describes a method that automates NCPA, thereby providing a framework for replicating analyses in an objective way. To do so, a number of decisions need to be made so that the automated implementation is representative of previous analyses. We review how the NCPA procedure has evolved since its inception and conclude that there is scope for some variability in the manual application of NCPA. We apply the automated software to three published datasets previously analyzed manually and replicate many details of the manual analyses, suggesting that the current algorithm is representative of how a typical user will perform NCPA. We simulate a large number of replicate datasets for geographically distributed, but entirely random-mating, populations. These are then analyzed using the automated NCPA algorithm. Results indicate that NCPA tends to give a high frequency of false positives. In our simulations we observe that 14% of the clades give a conclusive inference that a demographic event has occurred, and that 75% of the datasets have at least one clade that gives such an inference. This is mainly due to the generation of multiple statistics per clade, of which only one is required to be significant to apply the inference key. We survey the inferences that have been made in recent publications and show that the most commonly inferred processes (restricted gene flow with isolation by distance and contiguous range expansion) are those that are commonly inferred in our simulations. However, published datasets typically yield a richer set of inferences with NCPA than obtained in our random-mating simulations, and further testing of NCPA with models of structured populations is necessary to examine its accuracy.
Resumo:
An important element of the developing field of proteomics is to understand protein-protein interactions and other functional links amongst genes. Across-species correlation methods for detecting functional links work on the premise that functionally linked proteins will tend to show a common pattern of presence and absence across a range of genomes. We describe a maximum likelihood statistical model for predicting functional gene linkages. The method detects independent instances of the correlated gain or loss of pairs of proteins on phylogenetic trees, reducing the high rates of false positives observed in conventional across-species methods that do not explicitly incorporate a phylogeny. We show, in a dataset of 10,551 protein pairs, that the phylogenetic method improves by up to 35% on across-species analyses at identifying known functionally linked proteins. The method shows that protein pairs with at least two to three correlated events of gain or loss are almost certainly functionally linked. Contingent evolution, in which one gene's presence or absence depends upon the presence of another, can also be detected phylogenetically, and may identify genes whose functional significance depends upon its interaction with other genes. Incorporating phylogenetic information improves the prediction of functional linkages. The improvement derives from having a lower rate of false positives and from detecting trends that across-species analyses miss. Phylogenetic methods can easily be incorporated into the screening of large-scale bioinformatics datasets to identify sets of protein links and to characterise gene networks.
Resumo:
Phylogenetic hypotheses for the largely South African genus Pelargonium L'Hér. (Geraniaceae) were derived based on DNA sequence data from nuclear, chloroplast and mitochondrial encoded regions. The datasets were unequally represented and comprised cpDNA trnL-F sequences for 152 taxa, nrDNA ITS sequences for 55 taxa, and mtDNA nad1 b/c exons for 51 taxa. Phylogenetic hypotheses derived from the separate three datasets were overall congruent. A single hypothesis synthesising the information in the three datasets was constructed following a total evidence approach and implementing dataset specific stepmatrices in order to correct for substitution biases. Pelargonium was found to consist of five main clades, some with contrasting evolutionary patterns with respect to biogeographic distributions, dispersal capacity, pollination biology and karyological diversification. The five main clades are structured in two (subgeneric) clades that correlate with chromosome size. One of these clades includes a "winter rainfall clade" containing more than 70% of all currently described Pelargonium species, and all restricted to the South African Cape winter rainfall region. Apart from (woody) shrubs and small herbaceous rosette subshrubs, this clade comprises a large "xerophytic" clade including geophytes, stem and leaf succulents, harbouring in total almost half of the genus. This clade is considered to be the result of in situ proliferation, possibly in response to late-Miocene and Pliocene aridification events. Nested within it is a radiation comprising c. 80 species from the geophytic Pelargonium section Hoarea, all characterised by the possession of (a series of) tunicate tubers.
Resumo:
Stable isotope labeling combined with MS is a powerful method for measuring relative protein abundances, for instance, by differential metabolic labeling of some or all amino acids with N-14 and N-15 in cell culture or hydroponic media. These and most other types of quantitative proteomics experiments using high-throughput technologies, such as LC-MS/MS, generate large amounts of raw MS data. This data needs to be processed efficiently and automatically, from the mass spectrometer to statistically evaluated protein identifications and abundance ratios. This paper describes in detail an approach to the automated analysis of Uniformly N-14/N-15-labeled proteins using MASCOT peptide identification in conjunction with the trans-proteomic pipeline (TPP) and a few scripts to integrate the analysis workflow. Two large proteomic datasets from uniformly labeled Arabidopsis thaliana were used to illustrate the analysis pipeline. The pipeline can be fully automated and uses only common or freely available software.
Resumo:
In this paper, we give an overview of our studies by static and time-resolved X-ray diffraction of inverse cubic phases and phase transitions in lipids. In 1, we briefly discuss the lyotropic phase behaviour of lipids, focusing attention on non-lamellar structures, and their geometric/topological relationship to fusion processes in lipid membranes. Possible pathways for transitions between different cubic phases are also outlined. In 2, we discuss the effects of hydrostatic pressure on lipid membranes and lipid phase transitions, and describe how the parameters required to predict the pressure dependence of lipid phase transition temperatures can be conveniently measured. We review some earlier results of inverse bicontinuous cubic phases from our laboratory, showing effects such as pressure-induced formation and swelling. In 3, we describe the technique of pressure-jump synchrotron X-ray diffraction. We present results that have been obtained from the lipid system 1:2 dilauroylphosphatidylcholine/lauric acid for cubic-inverse hexagonal, cubic-cubic and lamellar-cubic transitions. The rate of transition was found to increase with the amplitude of the pressure-jump and with increasing temperature. Evidence for intermediate structures occurring transiently during the transitions was also obtained. In 4, we describe an IDL-based 'AXCESS' software package being developed in our laboratory to permit batch processing and analysis of the large X-ray datasets produced by pressure-jump synchrotron experiments. In 5, we present some recent results on the fluid lamellar-Pn3m cubic phase transition of the single-chain lipid 1-monoelaidin, which we have studied both by pressure-jump and temperature-jump X-ray diffraction. Finally, in 6, we give a few indicators of future directions of this research. We anticipate that the most useful technical advance will be the development of pressure-jump apparatus on the microsecond time-scale, which will involve the use of a stack of piezoelectric pressure actuators. The pressure-jump technique is not restricted to lipid phase transitions, but can be used to study a wide range of soft matter transitions, ranging from protein unfolding and DNA unwinding and transitions, to phase transitions in thermotropic liquid crystals, surfactants and block copolymers.
Resumo:
Senescence of plant organs is a genetically controlled process that regulates cell death to facilitate nutrient recovery and recycling, and frequently precedes, or is concomitant with, ripening of reproductive structures. In Arabidopsis thaliana, the seeds are contained within a silique, which is itself a photosynthetic organ in the early stages of development and undergoes a programme of senescence prior to dehiscence. A transcriptional analysis of the silique wall was undertaken to identify changes in gene expression during senescence and to correlate these events with ultrastructural changes. The study revealed that the most highly up-regulated genes in senescing silique wall tissues encoded seed storage proteins, and the significance of this finding is discussed. Global transcription profiles of senescing siliques were compared with those from senescing Arabidopsis leaf or petal tissues using microarray datasets and metabolic pathway analysis software (MapMan). In all three tissues, members of NAC and WRKY transcription factor families were up-regulated, but components of the shikimate and cell-wall biosynthetic pathways were down-regulated during senescence. Expression of genes encoding ethylene biosynthesis and action showed more similarity between senescing siliques and petals than between senescing siliques and leaves. Genes involved in autophagy were highly expressed in the late stages of death of all plant tissues studied, but not always during the preceding remobilization phase of senescence. Analyses showed that, during senescence, silique wall tissues exhibited more transcriptional features in common with petals than with leaves. The shared and distinct regulatory events associated with senescence in the three organs are evaluated and discussed.
Resumo:
Tycho was conceived in 2003 in response to a need by the GridRM [1] resource-monitoring project for a ldquolight-weightrdquo, scalable and easy to use wide-area distributed registry and messaging system. Since Tycho's first release in 2006 a number of modifications have been made to the system to make it easier to use and more flexible. Since its inception, Tycho has been utilised across a number of application domains including widearea resource monitoring, distributed queries across archival databases, providing services for the nodes of a Cray supercomputer, and as a system for transferring multi-terabyte scientific datasets across the Internet. This paper provides an overview of the initial Tycho system, describes a number of applications that utilise Tycho, discusses a number of new utilities, and how the Tycho infrastructure has evolved in response to experience of building applications with it.
Resumo:
As Terabyte datasets become the norm, the focus has shifted away from our ability to produce and store ever larger amounts of data, onto its utilization. It is becoming increasingly difficult to gain meaningful insights into the data produced. Also many forms of the data we are currently producing cannot easily fit into traditional visualization methods. This paper presents a new and novel visualization technique based on the concept of a Data Forest. Our Data Forest has been designed to be used with vir tual reality (VR) as its presentation method. VR is a natural medium for investigating large datasets. Our approach can easily be adapted to be used in a variety of different ways, from a stand alone single user environment to large multi-user collaborative environments. A test application is presented using multi-dimensional data to demonstrate the concepts involved.
Resumo:
The Self-Organizing Map (SOM) is a popular unsupervised neural network able to provide effective clustering and data visualization for data represented in multidimensional input spaces. In this paper, we describe Fast Learning SOM (FLSOM) which adopts a learning algorithm that improves the performance of the standard SOM with respect to the convergence time in the training phase. We show that FLSOM also improves the quality of the map by providing better clustering quality and topology preservation of multidimensional input data. Several tests have been carried out on different multidimensional datasets, which demonstrate better performances of the algorithm in comparison with the original SOM.
Resumo:
Objective: This paper presents a detailed study of fractal-based methods for texture characterization of mammographic mass lesions and architectural distortion. The purpose of this study is to explore the use of fractal and lacunarity analysis for the characterization and classification of both tumor lesions and normal breast parenchyma in mammography. Materials and methods: We conducted comparative evaluations of five popular fractal dimension estimation methods for the characterization of the texture of mass lesions and architectural distortion. We applied the concept of lacunarity to the description of the spatial distribution of the pixel intensities in mammographic images. These methods were tested with a set of 57 breast masses and 60 normal breast parenchyma (dataset1), and with another set of 19 architectural distortions and 41 normal breast parenchyma (dataset2). Support vector machines (SVM) were used as a pattern classification method for tumor classification. Results: Experimental results showed that the fractal dimension of region of interest (ROIs) depicting mass lesions and architectural distortion was statistically significantly lower than that of normal breast parenchyma for all five methods. Receiver operating characteristic (ROC) analysis showed that fractional Brownian motion (FBM) method generated the highest area under ROC curve (A z = 0.839 for dataset1, 0.828 for dataset2, respectively) among five methods for both datasets. Lacunarity analysis showed that the ROIs depicting mass lesions and architectural distortion had higher lacunarities than those of ROIs depicting normal breast parenchyma. The combination of FBM fractal dimension and lacunarity yielded the highest A z value (0.903 and 0.875, respectively) than those based on single feature alone for both given datasets. The application of the SVM improved the performance of the fractal-based features in differentiating tumor lesions from normal breast parenchyma by generating higher A z value. Conclusion: FBM texture model is the most appropriate model for characterizing mammographic images due to self-affinity assumption of the method being a better approximation. Lacunarity is an effective counterpart measure of the fractal dimension in texture feature extraction in mammographic images. The classification results obtained in this work suggest that the SVM is an effective method with great potential for classification in mammographic image analysis.