41 resultados para Data Mining, Yield Improvement, Self Organising Map, Clustering Quality
Resumo:
Aim To improve our understanding of how biological communities assemble, we investigated changes in bumblebee communities in space along an elevation gradient. We assessed how much deterministic abiotic and biotic factors shape community assembly. We focused on proboscis length (influencing the species' dietary regime) and phylogenetic relatedness to investigate if competition and environmental filtering occur in more and less productive climates, respectively. Location Western Swiss Alps. Methods We recorded bumblebee species in 149 plots along a 1800-m wide elevation gradient. We contrasted two major clades of bumblebees, a short-tongued and a long-tongued clade. We calculated the phylogenetic and proboscis-length diversity of the bumblebee communities and compared these observed data with a random distribution to detect clustering likely to be caused by environmental filtering or overdispersion likely to be caused by competition. We compared the prevalence of clustered and overdispersed communities along the gradients of plant species richness (biotic) and temperature (abiotic). Results Under colder conditions, where plant species richness is lower and floral resources are scarcer, the clade with shorter proboscides prevails over the clade with longer proboscides, and communities are functionally and phylogenetic clustered. Under warmer conditions, we found phylogenetic but not functional overdispersion in communities. Main conclusions We show for the first time a strong correlation between phylogenetic relatedness, proboscis length and species distribution along temperature and plant richness gradients shaping bumblebee communities. The low temperatures and low levels of plant species richness limit the dispersal of the species from the long-tongued clade, which have more specialized diets, into high-elevation areas. Competition under warmer conditions may produce communities composed of less closely related species that share distinct ecological preferences. Our empirical results corroborate theoretical expectation as well as experiments on the prevalence of deterministic processes in the most severe and most productive parts of environmental gradients.
Resumo:
Purpose:To describe a novel in silico method to gather and analyze data from high-throughput heterogeneous experimental procedures, i.e. gene and protein expression arrays. Methods:Each microarray is assigned to a database which handles common data (names, symbols, antibody codes, probe IDs, etc.). Links between informations are automatically generated from knowledge obtained in freely accessible databases (NCBI, Swissprot, etc). Requests can be made from any point of entry and the displayed result is fully customizable. Results:The initial database has been loaded with two sets of data: a first set of data originating from an Affymetrix-based retinal profiling performed in an RPE65 knock-out mouse model of Leber's congenital amaurosis. A second set of data generated from a Kinexus microarray experiment done on the retinas from the same mouse model has been added. Queries display wild type versus knock out expressions at several time points for both genes and proteins. Conclusions:This freely accessible database allows for easy consultation of data and facilitates data mining by integrating experimental data and biological pathways.
Resumo:
The induction of fungal metabolites by fungal co-cultures grown on solid media was explored using multi-well co-cultures in 2 cm diameter Petri dishes. Fungi were grown in 12-well plates to easily and rapidly obtain the large number of replicates necessary for employing metabolomic approaches. Fungal culture using such a format accelerated the production of metabolites by several weeks compared with using the large-format 9 cm Petri dishes. This strategy was applied to a co-culture of a Fusarium and an Aspergillus strain. The metabolite composition of the cultures was assessed using ultra-high pressure liquid chromatography coupled to electrospray ionisation and time-of-flight mass spectrometry, followed by automated data mining. The de novo production of metabolites was dramatically increased by nutriment reduction. A time-series study of the induction of the fungal metabolites of interest over nine days revealed that they exhibited various induction patterns. The concentrations of most of the de novo induced metabolites increased over time. However, interesting patterns were observed, such as with the presence of some compounds only at certain time points. This result indicates the complexity and dynamic nature of fungal metabolism. The large-scale production of the compounds of interest was verified by co-culture in 15 cm Petri dishes; most of the induced metabolites of interest (16/18) were found to be produced as effectively as on a small scale, although not in the same time frames. Large-scale production is a practical solution for the future production, identification and biological evaluation of these metabolites.
Resumo:
ObjectiveCandidate genes for non-alcoholic fatty liver disease (NAFLD) identified by a bioinformatics approach were examined for variant associations to quantitative traits of NAFLD-related phenotypes.Research Design and MethodsBy integrating public database text mining, trans-organism protein-protein interaction transferal, and information on liver protein expression a protein-protein interaction network was constructed and from this a smaller isolated interactome was identified. Five genes from this interactome were selected for genetic analysis. Twenty-one tag single-nucleotide polymorphisms (SNPs) which captured all common variation in these genes were genotyped in 10,196 Danes, and analyzed for association with NAFLD-related quantitative traits, type 2 diabetes (T2D), central obesity, and WHO-defined metabolic syndrome (MetS).Results273 genes were included in the protein-protein interaction analysis and EHHADH, ECHS1, HADHA, HADHB, and ACADL were selected for further examination. A total of 10 nominal statistical significant associations (P<0.05) to quantitative metabolic traits were identified. Also, the case-control study showed associations between variation in the five genes and T2D, central obesity, and MetS, respectively. Bonferroni adjustments for multiple testing negated all associations.ConclusionsUsing a bioinformatics approach we identified five candidate genes for NAFLD. However, we failed to provide evidence of associations with major effects between SNPs in these five genes and NAFLD-related quantitative traits, T2D, central obesity, and MetS.
Resumo:
Amplified Fragment Length Polymorphisms (AFLPs) are a cheap and efficient protocol for generating large sets of genetic markers. This technique has become increasingly used during the last decade in various fields of biology, including population genomics, phylogeography, and genome mapping. Here, we present RawGeno, an R library dedicated to the automated scoring of AFLPs (i.e., the coding of electropherogram signals into ready-to-use datasets). Our program includes a complete suite of tools for binning, editing, visualizing, and exporting results obtained from AFLP experiments. RawGeno can either be used with command lines and program analysis routines or through a user-friendly graphical user interface. We describe the whole RawGeno pipeline along with recommendations for (a) setting the analysis of electropherograms in combination with PeakScanner, a program freely distributed by Applied Biosystems; (b) performing quality checks; (c) defining bins and proceeding to scoring; (d) filtering nonoptimal bins; and (e) exporting results in different formats.
Resumo:
Advanced neuroinformatics tools are required for methods of connectome mapping, analysis, and visualization. The inherent multi-modality of connectome datasets poses new challenges for data organization, integration, and sharing. We have designed and implemented the Connectome Viewer Toolkit - a set of free and extensible open source neuroimaging tools written in Python. The key components of the toolkit are as follows: (1) The Connectome File Format is an XML-based container format to standardize multi-modal data integration and structured metadata annotation. (2) The Connectome File Format Library enables management and sharing of connectome files. (3) The Connectome Viewer is an integrated research and development environment for visualization and analysis of multi-modal connectome data. The Connectome Viewer's plugin architecture supports extensions with network analysis packages and an interactive scripting shell, to enable easy development and community contributions. Integration with tools from the scientific Python community allows the leveraging of numerous existing libraries for powerful connectome data mining, exploration, and comparison. We demonstrate the applicability of the Connectome Viewer Toolkit using Diffusion MRI datasets processed by the Connectome Mapper. The Connectome Viewer Toolkit is available from http://www.cmtk.org/
Resumo:
Past and current climate change has already induced drastic biological changes. We need projections of how future climate change will further impact biological systems. Modeling is one approach to forecast future ecological impacts, but requires data for model parameterization. As collecting new data is costly, an alternative is to use the increasingly available georeferenced species occurrence and natural history databases. Here, we illustrate the use of such databases to assess climate change impacts on mountain flora. We show that these data can be used effectively to derive dynamic impact scenarios, suggesting upward migration of many species and possible extinctions when no suitable habitat is available at higher elevations. Systematically georeferencing all existing natural history collections data in mountain regions could allow a larger assessment of climate change impact on mountain ecosystems in Europe and elsewhere.
Resumo:
BACKGROUND: We examined the associations between substance use (cigarette smoking, alcohol drinking, and cannabis use) and psychosocial characteristics at the individual and family levels among adolescents of the Seychelles, a rapidly developing small island state in the African region. METHODS: A school survey was conducted in a representative sample of 1432 students aged 11-17 years from all secondary schools. Data came from a self-administered anonymous questionnaire conducted along a standard methodology (Global School-based Health Survey, GSHS). Risk behaviors and psychosocial characteristics were dichotomized. Association analyses were adjusted for a possible classroom effect. RESULTS: The prevalence of cigarette smoking, alcohol drinking and cannabis use was higher in boys than in girls and increased with age. Age-adjusted and multivariate analyses showed that several individual level characteristics (e.g. suicidal ideation and truancy) and family level characteristics (e.g. poor parental monitoring) were associated with substance use among students. CONCLUSIONS: Our results suggest that health promotion programs should simultaneously address multiple risk behaviors and take into account a wide range of psychosocial characteristics of the students at the individual and family levels.
Resumo:
BACKGROUND: Selective publication of studies, which is commonly called publication bias, is widely recognized. Over the years a new nomenclature for other types of bias related to non-publication or distortion related to the dissemination of research findings has been developed. However, several of these different biases are often still summarized by the term 'publication bias'. METHODS/DESIGN: As part of the OPEN Project (To Overcome failure to Publish nEgative fiNdings) we will conduct a systematic review with the following objectives:- To systematically review highly cited articles that focus on non-publication of studies and to present the various definitions of biases related to the dissemination of research findings contained in the articles identified.- To develop and discuss a new framework on nomenclature of various aspects of distortion in the dissemination process that leads to public availability of research findings in an international group of experts in the context of the OPEN Project.We will systematically search Web of Knowledge for highly cited articles that provide a definition of biases related to the dissemination of research findings. A specifically designed data extraction form will be developed and pilot-tested. Working in teams of two, we will independently extract relevant information from each eligible article.For the development of a new framework we will construct an initial table listing different levels and different hazards en route to making research findings public. An international group of experts will iteratively review the table and reflect on its content until no new insights emerge and consensus has been reached. DISCUSSION: Results are expected to be publicly available in mid-2013. This systematic review together with the results of other systematic reviews of the OPEN project will serve as a basis for the development of future policies and guidelines regarding the assessment and prevention of publication bias.
Resumo:
Integrated in a wide research assessing destabilizing and triggering factors to model cliff dynamic along the Dieppe's shoreline in High Normandy, this study aims at testing boat-based mobile LiDAR capabilities by scanning 3D point clouds of the unstable coastal cliffs. Two acquisition campaigns were performed in September 2012 and September 2013, scanning (1) a 30-km-long shoreline and (2) the same test cliffs in different environmental conditions and device settings. The potentials of collected data for 3D modelling, change detection and landslide monitoring were afterward assessed. By scanning during favourable meteorological and marine conditions and close to the coast, mobile LiDAR devices are able to quickly scan a long shoreline with median point spacing up to 10cm. The acquired data are then sufficiently detailed to map geomorphological features smaller than 0.5m2. Furthermore, our capability to detect rockfalls and erosion deposits (>m3) is confirmed, since using the classical approach of computing differences between sequential acquisitions reveals many cliff collapses between Pourville and Quiberville and only sparse changes between Dieppe and Belleville-sur-Mer. These different change rates result from different rockfall susceptibilities. Finally, we also confirmed the capability of the boat-based mobile LiDAR technique to monitor single large changes, characterizing the Dieppe landslide geometry with two main active scarps, retrogression up to 40m and about 100,000m3 of eroded materials.
Resumo:
The extension of traditional data mining methods to time series has been effectively applied to a wide range of domains such as finance, econometrics, biology, security, and medicine. Many existing mining methods deal with the task of change points detection, but very few provide a flexible approach. Querying specific change points with linguistic variables is particularly useful in crime analysis, where intuitive, understandable, and appropriate detection of changes can significantly improve the allocation of resources for timely and concise operations. In this paper, we propose an on-line method for detecting and querying change points in crime-related time series with the use of a meaningful representation and a fuzzy inference system. Change points detection is based on a shape space representation, and linguistic terms describing geometric properties of the change points are used to express queries, offering the advantage of intuitiveness and flexibility. An empirical evaluation is first conducted on a crime data set to confirm the validity of the proposed method and then on a financial data set to test its general applicability. A comparison to a similar change-point detection algorithm and a sensitivity analysis are also conducted. Results show that the method is able to accurately detect change points at very low computational costs. More broadly, the detection of specific change points within time series of virtually any domain is made more intuitive and more understandable, even for experts not related to data mining.