902 resultados para Large Data Sets


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Understanding the evolutionary history of threatened populations can improve their conservation management. Re-establishment of past but recent gene flow could re-invigorate threatened populations and replenish genetic diversity, necessary for population persistence. One of the four nominal subspecies of the common yellow-tufted honeyeater, Lichenostomus melanops cassidix, is critically endangered despite substantial conservation efforts over 55 years. Using a combination of morphometric, genetic and modelling approaches we tested for its evolutionary distinctiveness and conservation merit. We confirmed that cassidix has at least one morphometric distinction. It also differs genetically from the other subspecies in allele frequencies but not phylogenetically, implying that its evolution was recent. Modelling historical distribution supported the lack of vicariance and suggested a possibility of gene flow among subspecies at least since the late Pleistocene. Multi-locus coalescent analyses indicated that cassidix diverged from its common ancestor with neighbouring subspecies gippslandicus sometime from the mid-Pleistocene to the Holocene, and that it has the smallest historical effective population size of all subspecies. It appears that cassidix diverged from its ancestor with gippslandicus through a combination of drift and local selection. From patterns of genetic subdivision on two spatial scales and morphological variation we concluded that cassidix, gippslandicus and (melanops + meltoni) are diagnosable as subspecies. Low genetic diversity and effective population size of cassidix may translate to low genetic fitness and evolutionary potential, thus managed gene flow from gippslandicus is recommended for its recovery.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The JGOFS International Collection Volume 2: Integrated Data Sets CD is a coherent, organised compilation of existing data sets produced by member countries which participated in JGOFS. In most cases, the data were gathered from the JGOFS International Collection, Volume 1: Discrete Datasets DVD. To produce Vol. 1 data were taken from the original sources and copied "as is" on the DVD. For Vol. 2 data and metadata have been harmonized using the conversion software PanTool and the import routine of PANGAEA checking for completeness of metadata and defining the relations between data and metadata. Prior to the import, data had performed a technical quality control, i.e. format and readability of the file, availability and combination of parameters and units, range of values.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The software PanGet is a special tool for the download of multiple data sets from PANGAEA. It uses the PANGAEA data set ID which is unique and part of the DOI. In a first step a list of ID's of those data sets to be downloaded must be created. There are two choices to define this individual collection of sets. Based on the ID list, the tool will download the data sets. Failed downloads are written to the file *_failed.txt. The functionality of PanGet is also part of the program Pan2Applic (choose File > Download PANGAEA datasets...) and PanTool2 (choose Basic tools > Download PANGAEA datasets...).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Microarray technique is rather powerful, as it allows to test up thousands of genes at a time, but this produces an overwhelming set of data files containing huge amounts of data, which is quite difficult to pre-process, separate, classify and correlate for interesting conclusions to be extracted. Modern machine learning, data mining and clustering techniques based on information theory, are needed to read and interpret the information contents buried in those large data sets. Independent Component Analysis method can be used to correct the data affected by corruption processes or to filter the uncorrectable one and then clustering methods can group similar genes or classify samples. In this paper a hybrid approach is used to obtain a two way unsupervised clustering for a corrected microarray data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We examine, with recently developed Lagrangian tools, altimeter data and numerical simulations obtained from the HYCOM model in the Gulf of Mexico. Our data correspond to the months just after the Deepwater Horizon oil spill in the year 2010. Our Lagrangian analysis provides a skeleton that allows the interpretation of transport routes over the ocean surface. The transport routes are further verified by the simultaneous study of the evolution of several drifters launched during those months in the Gulf of Mexico. We find that there exist Lagrangian structures that justify the dynamics of the drifters, although the agreement depends on the quality of the data. We discuss the impact of the Lagrangian tools on the assessment of the predictive capacity of these data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Funding The International Primary Care Respiratory Group (IPCRG) provided funding for this research project as an UNLOCK group study for which the funding was obtained through an unrestricted grant by Novartis AG, Basel, Switzerland. The latter funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. Database access for the OPCRD was provided by the Respiratory Effectiveness Group (REG) and Research in Real Life; the OPCRD statistical analysis was funded by REG. The Bocholtz Study was funded by PICASSO for COPD, an initiative of Boehringer Ingelheim, Pfizer and the Caphri Research Institute, Maastricht University, The Netherlands.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and self-organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. We test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, we use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thermodynamic consistency of almost 90 VLE data series, including isothermal and isobaric conditions for systems of both total and partial miscibility in the liquid phase, has been examined by means of the area and point-to-point tests. In addition, the Gibbs energy of mixing function calculated from these experimental data has been inspected, with some rather surprising results: certain data sets exhibiting high dispersion or leading to Gibbs energy of mixing curves inconsistent with the total or partial miscibility of the liquid phase, surprisingly, pass the tests. Several possible inconsistencies in the tests themselves or in their application are discussed. Related to this is a very interesting and ambitious initiative that arose within the NIST organization: the development of an algorithm to assess the quality of experimental VLE data. The present paper questions the applicability of two of the five tests that are combined in the algorithm. It further shows that the deviation of the experimental VLE data from the correlation obtained by a given model, the basis of some point-to-point tests, should not be used to evaluate the quality of these data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Questions of handling unbalanced data considered in this article. As models for classification, PNN and MLP are used. Problem of estimation of model performance in case of unbalanced training set is solved. Several methods (clustering approach and boosting approach) considered as useful to deal with the problem of input data.