842 resultados para Data Mining, Yield Improvement, Self Organising Map, Clustering Quality


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many classifiers achieve high levels of accuracy but have limited applicability in real world situations because they do not lead to a greater understanding or insight into the^way features influence the classification. In areas such as health informatics a classifier that clearly identifies the influences on classification can be used to direct research and formulate interventions. This research investigates the practical applications of Automated Weighted Sum, (AWSum), a classifier that provides accuracy comparable to other techniques whilst providing insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. The merits of this approach in classification and insight are evaluated on a Cystic Fibrosis and Diabetes datasets with positive results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Our procedure to detect moving groups in the solar neighbourhood (Chen et al., 1997) in the four-dimensional space of the stellar velocity components and age has been improved. The method, which takes advantadge of non-parametric estimators of density distribution to avoid any a priori knowledge of the kinematic properties of these stellar groups, now includes the effect of observational errors on the process to select moving group stars, uses a better estimation of the density distribution of the total sample and field stars, and classifies moving group stars using all the available information. It is applied here to an accurately selected sample of early-type stars with known radial velocities and Strömgren photometry. Astrometric data are taken from the HIPPARCOS catalogue (ESA, 1997), which results in an important decrease in the observational errors with respect to ground-based data, and ensures the uniformity of the observed data. Both the improvement of our method and the use of precise astrometric data have allowed us not only to confirm the existence of classical moving groups, but also to detect finer structures that in several cases can be related to kinematic properties of nearby open clusters or associations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aim To improve our understanding of how biological communities assemble, we investigated changes in bumblebee communities in space along an elevation gradient. We assessed how much deterministic abiotic and biotic factors shape community assembly. We focused on proboscis length (influencing the species' dietary regime) and phylogenetic relatedness to investigate if competition and environmental filtering occur in more and less productive climates, respectively. Location Western Swiss Alps. Methods We recorded bumblebee species in 149 plots along a 1800-m wide elevation gradient. We contrasted two major clades of bumblebees, a short-tongued and a long-tongued clade. We calculated the phylogenetic and proboscis-length diversity of the bumblebee communities and compared these observed data with a random distribution to detect clustering likely to be caused by environmental filtering or overdispersion likely to be caused by competition. We compared the prevalence of clustered and overdispersed communities along the gradients of plant species richness (biotic) and temperature (abiotic). Results Under colder conditions, where plant species richness is lower and floral resources are scarcer, the clade with shorter proboscides prevails over the clade with longer proboscides, and communities are functionally and phylogenetic clustered. Under warmer conditions, we found phylogenetic but not functional overdispersion in communities. Main conclusions We show for the first time a strong correlation between phylogenetic relatedness, proboscis length and species distribution along temperature and plant richness gradients shaping bumblebee communities. The low temperatures and low levels of plant species richness limit the dispersal of the species from the long-tongued clade, which have more specialized diets, into high-elevation areas. Competition under warmer conditions may produce communities composed of less closely related species that share distinct ecological preferences. Our empirical results corroborate theoretical expectation as well as experiments on the prevalence of deterministic processes in the most severe and most productive parts of environmental gradients.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose:To describe a novel in silico method to gather and analyze data from high-throughput heterogeneous experimental procedures, i.e. gene and protein expression arrays. Methods:Each microarray is assigned to a database which handles common data (names, symbols, antibody codes, probe IDs, etc.). Links between informations are automatically generated from knowledge obtained in freely accessible databases (NCBI, Swissprot, etc). Requests can be made from any point of entry and the displayed result is fully customizable. Results:The initial database has been loaded with two sets of data: a first set of data originating from an Affymetrix-based retinal profiling performed in an RPE65 knock-out mouse model of Leber's congenital amaurosis. A second set of data generated from a Kinexus microarray experiment done on the retinas from the same mouse model has been added. Queries display wild type versus knock out expressions at several time points for both genes and proteins. Conclusions:This freely accessible database allows for easy consultation of data and facilitates data mining by integrating experimental data and biological pathways.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The induction of fungal metabolites by fungal co-cultures grown on solid media was explored using multi-well co-cultures in 2 cm diameter Petri dishes. Fungi were grown in 12-well plates to easily and rapidly obtain the large number of replicates necessary for employing metabolomic approaches. Fungal culture using such a format accelerated the production of metabolites by several weeks compared with using the large-format 9 cm Petri dishes. This strategy was applied to a co-culture of a Fusarium and an Aspergillus strain. The metabolite composition of the cultures was assessed using ultra-high pressure liquid chromatography coupled to electrospray ionisation and time-of-flight mass spectrometry, followed by automated data mining. The de novo production of metabolites was dramatically increased by nutriment reduction. A time-series study of the induction of the fungal metabolites of interest over nine days revealed that they exhibited various induction patterns. The concentrations of most of the de novo induced metabolites increased over time. However, interesting patterns were observed, such as with the presence of some compounds only at certain time points. This result indicates the complexity and dynamic nature of fungal metabolism. The large-scale production of the compounds of interest was verified by co-culture in 15 cm Petri dishes; most of the induced metabolites of interest (16/18) were found to be produced as effectively as on a small scale, although not in the same time frames. Large-scale production is a practical solution for the future production, identification and biological evaluation of these metabolites.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

ObjectiveCandidate genes for non-alcoholic fatty liver disease (NAFLD) identified by a bioinformatics approach were examined for variant associations to quantitative traits of NAFLD-related phenotypes.Research Design and MethodsBy integrating public database text mining, trans-organism protein-protein interaction transferal, and information on liver protein expression a protein-protein interaction network was constructed and from this a smaller isolated interactome was identified. Five genes from this interactome were selected for genetic analysis. Twenty-one tag single-nucleotide polymorphisms (SNPs) which captured all common variation in these genes were genotyped in 10,196 Danes, and analyzed for association with NAFLD-related quantitative traits, type 2 diabetes (T2D), central obesity, and WHO-defined metabolic syndrome (MetS).Results273 genes were included in the protein-protein interaction analysis and EHHADH, ECHS1, HADHA, HADHB, and ACADL were selected for further examination. A total of 10 nominal statistical significant associations (P<0.05) to quantitative metabolic traits were identified. Also, the case-control study showed associations between variation in the five genes and T2D, central obesity, and MetS, respectively. Bonferroni adjustments for multiple testing negated all associations.ConclusionsUsing a bioinformatics approach we identified five candidate genes for NAFLD. However, we failed to provide evidence of associations with major effects between SNPs in these five genes and NAFLD-related quantitative traits, T2D, central obesity, and MetS.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Over the past three decades, pedotransfer functions (PTFs) have been widely used by soil scientists to estimate soils properties in temperate regions in response to the lack of soil data for these regions. Several authors indicated that little effort has been dedicated to the prediction of soil properties in the humid tropics, where the need for soil property information is of even greater priority. The aim of this paper is to provide an up-to-date repository of past and recently published articles as well as papers from proceedings of events dealing with water-retention PTFs for soils of the humid tropics. Of the 35 publications found in the literature on PTFs for prediction of water retention of soils of the humid tropics, 91 % of the PTFs are based on an empirical approach, and only 9 % are based on a semi-physical approach. Of the empirical PTFs, 97 % are continuous, and 3 % (one) is a class PTF; of the empirical PTFs, 97 % are based on multiple linear and polynomial regression of n th order techniques, and 3 % (one) is based on the k-Nearest Neighbor approach; 84 % of the continuous PTFs are point-based, and 16 % are parameter-based; 97 % of the continuous PTFs are equation-based PTFs, and 3 % (one) is based on pattern recognition. Additionally, it was found that 26 % of the tropical water-retention PTFs were developed for soils in Brazil, 26 % for soils in India, 11 % for soils in other countries in America, and 11 % for soils in other countries in Africa.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Amplified Fragment Length Polymorphisms (AFLPs) are a cheap and efficient protocol for generating large sets of genetic markers. This technique has become increasingly used during the last decade in various fields of biology, including population genomics, phylogeography, and genome mapping. Here, we present RawGeno, an R library dedicated to the automated scoring of AFLPs (i.e., the coding of electropherogram signals into ready-to-use datasets). Our program includes a complete suite of tools for binning, editing, visualizing, and exporting results obtained from AFLP experiments. RawGeno can either be used with command lines and program analysis routines or through a user-friendly graphical user interface. We describe the whole RawGeno pipeline along with recommendations for (a) setting the analysis of electropherograms in combination with PeakScanner, a program freely distributed by Applied Biosystems; (b) performing quality checks; (c) defining bins and proceeding to scoring; (d) filtering nonoptimal bins; and (e) exporting results in different formats.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Iowa Department of Transportation is committed to improved management systems, which in turn has led to increased automation to record and manage construction data. A possible improvement to the current data management system can be found with pen-based computers. Pen-based computers coupled with user friendly software are now to the point where an individual's handwriting can be captured and converted to typed text to be used for data collection. It would appear pen-based computers are sufficiently advanced to be used by construction inspectors to record daily project data. The objective of this research was to determine: (1) if pen-based computers are durable enough to allow maintenance-free operation for field work during Iowa's construction season; and (2) if pen-based computers can be used effectively by inspectors with little computer experience. The pen-based computer's handwriting recognition was not fast or accurate enough to be successfully utilized. The IBM Thinkpad with the pen pointing device did prove useful for working in Windows' graphical environment. The pen was used for pointing, selecting and scrolling in the Windows applications because of its intuitive nature.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advanced neuroinformatics tools are required for methods of connectome mapping, analysis, and visualization. The inherent multi-modality of connectome datasets poses new challenges for data organization, integration, and sharing. We have designed and implemented the Connectome Viewer Toolkit - a set of free and extensible open source neuroimaging tools written in Python. The key components of the toolkit are as follows: (1) The Connectome File Format is an XML-based container format to standardize multi-modal data integration and structured metadata annotation. (2) The Connectome File Format Library enables management and sharing of connectome files. (3) The Connectome Viewer is an integrated research and development environment for visualization and analysis of multi-modal connectome data. The Connectome Viewer's plugin architecture supports extensions with network analysis packages and an interactive scripting shell, to enable easy development and community contributions. Integration with tools from the scientific Python community allows the leveraging of numerous existing libraries for powerful connectome data mining, exploration, and comparison. We demonstrate the applicability of the Connectome Viewer Toolkit using Diffusion MRI datasets processed by the Connectome Mapper. The Connectome Viewer Toolkit is available from http://www.cmtk.org/

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En este artículo se propone el análisis de las interacciones entre usuarios de Twitter, tanto lo que se genera alrededor de un usuario concreto como el análisis de un hashtag dado durante un periodo de tiempo establecido.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Past and current climate change has already induced drastic biological changes. We need projections of how future climate change will further impact biological systems. Modeling is one approach to forecast future ecological impacts, but requires data for model parameterization. As collecting new data is costly, an alternative is to use the increasingly available georeferenced species occurrence and natural history databases. Here, we illustrate the use of such databases to assess climate change impacts on mountain flora. We show that these data can be used effectively to derive dynamic impact scenarios, suggesting upward migration of many species and possible extinctions when no suitable habitat is available at higher elevations. Systematically georeferencing all existing natural history collections data in mountain regions could allow a larger assessment of climate change impact on mountain ecosystems in Europe and elsewhere.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PURPOSE: Pharmacovigilance methods have advanced greatly during the last decades, making post-market drug assessment an essential drug evaluation component. These methods mainly rely on the use of spontaneous reporting systems and health information databases to collect expertise from huge amounts of real-world reports. The EU-ADR Web Platform was built to further facilitate accessing, monitoring and exploring these data, enabling an in-depth analysis of adverse drug reactions risks.METHODS: The EU-ADR Web Platform exploits the wealth of data collected within a large-scale European initiative, the EU-ADR project. Millions of electronic health records, provided by national health agencies, are mined for specific drug events, which are correlated with literature, protein and pathway data, resulting in a rich drug-event dataset. Next, advanced distributed computing methods are tailored to coordinate the execution of data-mining and statistical analysis tasks. This permits obtaining a ranked drug-event list, removing spurious entries and highlighting relationships with high risk potential.RESULTS: The EU-ADR Web Platform is an open workspace for the integrated analysis of pharmacovigilance datasets. Using this software, researchers can access a variety of tools provided by distinct partners in a single centralized environment. Besides performing standalone drug-event assessments, they can also control the pipeline for an improved batch analysis of custom datasets. Drug-event pairs can be substantiated and statistically analysed within the platform's innovative working environment.CONCLUSIONS: A pioneering workspace that helps in explaining the biological path of adverse drug reactions was developed within the EU-ADR project consortium. This tool, targeted at the pharmacovigilance community, is available online at https://bioinformatics.ua.pt/euadr/. Copyright © 2012 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND: We examined the associations between substance use (cigarette smoking, alcohol drinking, and cannabis use) and psychosocial characteristics at the individual and family levels among adolescents of the Seychelles, a rapidly developing small island state in the African region. METHODS: A school survey was conducted in a representative sample of 1432 students aged 11-17 years from all secondary schools. Data came from a self-administered anonymous questionnaire conducted along a standard methodology (Global School-based Health Survey, GSHS). Risk behaviors and psychosocial characteristics were dichotomized. Association analyses were adjusted for a possible classroom effect. RESULTS: The prevalence of cigarette smoking, alcohol drinking and cannabis use was higher in boys than in girls and increased with age. Age-adjusted and multivariate analyses showed that several individual level characteristics (e.g. suicidal ideation and truancy) and family level characteristics (e.g. poor parental monitoring) were associated with substance use among students. CONCLUSIONS: Our results suggest that health promotion programs should simultaneously address multiple risk behaviors and take into account a wide range of psychosocial characteristics of the students at the individual and family levels.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

El objetivo de este artículo es introducir al lector español en algunos debates recientes de la comunidad de humanistas digitales de habla inglesa. En lugar de intentar definir la disciplina en términos absolutos, se ha optado por una aproximación diacrónica aunque se ha puesto el acento en algunos principios como la interdisciplinariedad y la construcción de modelos, valores como el acceso y el código abierto, y prácticas como la minería de datos y la colaboración.