56 resultados para Datasets


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

DnaSP is a software package for a comprehensive analysis of DNA polymorphism data. Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets. Among other features, the newly implemented methods allow for: (i) analyses on multiple data files; (ii) haplotype phasing; (iii) analyses on insertion/deletion polymorphism data; (iv) visualizing sliding window results integrated with available genome annotations in the UCSC browser.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this work, a new one-class classification ensemble strategy called approximate polytope ensemble is presented. The main contribution of the paper is threefold. First, the geometrical concept of convex hull is used to define the boundary of the target class defining the problem. Expansions and contractions of this geometrical structure are introduced in order to avoid over-fitting. Second, the decision whether a point belongs to the convex hull model in high dimensional spaces is approximated by means of random projections and an ensemble decision process. Finally, a tiling strategy is proposed in order to model non-convex structures. Experimental results show that the proposed strategy is significantly better than state of the art one-class classification methods on over 200 datasets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This case study deals with a rock face monitoring in urban areas using a Terrestrial Laser Scanner. The pilot study area is an almost vertical, fifty meter high cliff, on top of which the village of Castellfollit de la Roca is located. Rockfall activity is currently causing a retreat of the rock face, which may endanger the houses located at its edge. TLS datasets consist of high density 3-D point clouds acquired from five stations, nine times in a time span of 22 months (from March 2006 to January 2008). The change detection, i.e. rockfalls, was performed through a sequential comparison of datasets. Two types of mass movement were detected in the monitoring period: (a) detachment of single basaltic columns, with magnitudes below 1.5 m3 and (b) detachment of groups of columns, with magnitudes of 1.5 to 150 m3. Furthermore, the historical record revealed (c) the occurrence of slab failures with magnitudes higher than 150 m3. Displacements of a likely slab failure were measured, suggesting an apparent stationary stage. Even failures are clearly episodic, our results, together with the study of the historical record, enabled us to estimate a mean detachment of material from 46 to 91.5 m3 year¿1. The application of TLS considerably improved our understanding of rockfall phenomena in the study area.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Over the past two decades, several fungal outbreaks have occurred, including the high-profile 'Vancouver Island' and 'Pacific Northwest' outbreaks, caused by Cryptococcus gattii, which has affected hundreds of otherwise healthy humans and animals. Over the same time period, C. gattii was the cause of several additional case clusters at localities outside of the tropical and subtropical climate zones where the species normally occurs. In every case, the causative agent belongs to a previously rare genotype of C. gattii called AFLP6/VGII, but the origin of the outbreak clades remains enigmatic. Here we used phylogenetic and recombination analyses, based on AFLP and multiple MLST datasets, and coalescence gene genealogy to demonstrate that these outbreaks have arisen from a highly-recombining C. gattii population in the native rainforest of Northern Brazil. Thus the modern virulent C. gattii AFLP6/VGII outbreak lineages derived from mating events in South America and then dispersed to temperate regions where they cause serious infections in humans and animals.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, an advanced technique for the generation of deformation maps using synthetic aperture radar (SAR) data is presented. The algorithm estimates the linear and nonlinear components of the displacement, the error of the digital elevation model (DEM) used to cancel the topographic terms, and the atmospheric artifacts from a reduced set of low spatial resolution interferograms. The pixel candidates are selected from those presenting a good coherence level in the whole set of interferograms and the resulting nonuniform mesh tessellated with the Delauney triangulation to establish connections among them. The linear component of movement and DEM error are estimated adjusting a linear model to the data only on the connections. Later on, this information, once unwrapped to retrieve the absolute values, is used to calculate the nonlinear component of movement and atmospheric artifacts with alternate filtering techniques in both the temporal and spatial domains. The method presents high flexibility with respect to the required number of images and the baselines length. However, better results are obtained with large datasets of short baseline interferograms. The technique has been tested with European Remote Sensing SAR data from an area of Catalonia (Spain) and validated with on-field precise leveling measurements.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Forecasting coal resources and reserves is critical for coal mine development. Thickness maps are commonly used for assessing coal resources and reserves; however they are limited for capturing coal splitting effects in thick and heterogeneous coal zones. As an alternative, three-dimensional geostatistical methods are used to populate facies distributionwithin a densely drilled heterogeneous coal zone in the As Pontes Basin (NWSpain). Coal distribution in this zone is mainly characterized by coal-dominated areas in the central parts of the basin interfingering with terrigenous-dominated alluvial fan zones at the margins. The three-dimensional models obtained are applied to forecast coal resources and reserves. Predictions using subsets of the entire dataset are also generated to understand the performance of methods under limited data constraints. Three-dimensional facies interpolation methods tend to overestimate coal resources and reserves due to interpolation smoothing. Facies simulation methods yield similar resource predictions than conventional thickness map approximations. Reserves predicted by facies simulation methods are mainly influenced by: a) the specific coal proportion threshold used to determine if a block can be recovered or not, and b) the capability of the modelling strategy to reproduce areal trends in coal proportions and splitting between coal-dominated and terrigenousdominated areas of the basin. Reserves predictions differ between the simulation methods, even with dense conditioning datasets. Simulation methods can be ranked according to the correlation of their outputs with predictions from the directly interpolated coal proportion maps: a) with low-density datasets sequential indicator simulation with trends yields the best correlation, b) with high-density datasets sequential indicator simulation with post-processing yields the best correlation, because the areal trends are provided implicitly by the dense conditioning data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a novel image classification scheme for benthic coral reef images that can be applied to both single image and composite mosaic datasets. The proposed method can be configured to the characteristics (e.g., the size of the dataset, number of classes, resolution of the samples, color information availability, class types, etc.) of individual datasets. The proposed method uses completed local binary pattern (CLBP), grey level co-occurrence matrix (GLCM), Gabor filter response, and opponent angle and hue channel color histograms as feature descriptors. For classification, either k-nearest neighbor (KNN), neural network (NN), support vector machine (SVM) or probability density weighted mean distance (PDWMD) is used. The combination of features and classifiers that attains the best results is presented together with the guidelines for selection. The accuracy and efficiency of our proposed method are compared with other state-of-the-art techniques using three benthic and three texture datasets. The proposed method achieves the highest overall classification accuracy of any of the tested methods and has moderate execution time. Finally, the proposed classification scheme is applied to a large-scale image mosaic of the Red Sea to create a completely classified thematic map of the reef benthos

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: The G1-to-S transition of the cell cycle in the yeast Saccharomyces cerevisiae involves an extensive transcriptional program driven by transcription factors SBF (Swi4-Swi6) and MBF (Mbp1-Swi6). Activation of these factors ultimately depends on the G1 cyclin Cln3. Results: To determine the transcriptional targets of Cln3 and their dependence on SBF or MBF, we first have used DNA microarrays to interrogate gene expression upon Cln3 overexpression in synchronized cultures of strains lacking components of SBF and/or MBF. Secondly, we have integrated this expression dataset together with other heterogeneous data sources into a single probabilistic model based on Bayesian statistics. Our analysis has produced more than 200 transcription factor-target assignments, validated by ChIP assays and by functional enrichment. Our predictions show higher internal coherence and predictive power than previous classifications. Our results support a model whereby SBF and MBF may be differentially activated by Cln3. Conclusions: Integration of heterogeneous genome-wide datasets is key to building accurate transcriptional networks. By such integration, we provide here a reliable transcriptional network at the G1-to-S transition in the budding yeast cell cycle. Our results suggest that to improve the reliability of predictions we need to feed our models with more informative experimental data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

SEPServer is a three-year collaborative project funded by the seventh framework programme (FP7-SPACE) of the European Union. The objective of the project is to provide access to state-of-the-art observations and analysis tools for the scientific community on solar energetic particle (SEP) events and related electromagnetic (EM) emissions. The project will eventually lead to better understanding of the particle acceleration and transport processes at the Sun and in the inner heliosphere. These processes lead to SEP events that form one of the key elements of space weather. In this paper we present the first results from the systematic analysis work performed on the following datasets: SOHO/ERNE, SOHO/EPHIN, ACE/EPAM, Wind/WAVES and GOES X-rays. A catalogue of SEP events at 1 AU, with complete coverage over solar cycle 23, based on high-energy (~68-MeV) protons from SOHO/ERNE and electron recordings of the events by SOHO/EPHIN and ACE/EPAM are presented. A total of 115 energetic particle events have been identified and analysed using velocity dispersion analysis (VDA) for protons and time-shifting analysis (TSA) for electrons and protons in order to infer the SEP release times at the Sun. EM observations during the times of the SEP event onset have been gathered and compared to the release time estimates of particles. Data from those events that occurred during the European day-time, i.e., those that also have observations from ground-based observatories included in SEPServer, are listed and a preliminary analysis of their associations is presented. We find that VDA results for protons can be a useful tool for the analysis of proton release times, but if the derived proton path length is out of a range of 1 AU < s[3 AU, the result of the analysis may be compromised, as indicated by the anti-correlation of the derived path length and release time delay from the asso ciated X-ray flare. The average path length derived from VDA is about 1.9 times the nominal length of the spiral magnetic field line. This implies that the path length of first-arriving MeV to deka-MeV protons is affected by interplanetary scattering. TSA of near-relativistic electrons results in a release time that shows significant scatter with respect to the EM emissions but with a trend of being delayed more with increasing distance between the flare and the nominal footpoint of the Earth-connected field line.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes an evaluation framework that allows a standardized and quantitative comparison of IVUS lumen and media segmentation algorithms. This framework has been introduced at the MICCAI 2011 Computing and Visualization for (Intra)Vascular Imaging (CVII) workshop, comparing the results of eight teams that participated. We describe the available data-base comprising of multi-center, multi-vendor and multi-frequency IVUS datasets, their acquisition, the creation of the reference standard and the evaluation measures. The approaches address segmentation of the lumen, the media, or both borders; semi- or fully-automatic operation; and 2-D vs. 3-D methodology. Three performance measures for quantitative analysis have been proposed. The results of the evaluation indicate that segmentation of the vessel lumen and media is possible with an accuracy that is comparable to manual annotation when semi-automatic methods are used, as well as encouraging results can be obtained also in case of fully-automatic segmentation. The analysis performed in this paper also highlights the challenges in IVUS segmentation that remains to be solved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Several clinical studies have reported that EEG synchrony is affected by Alzheimer’s disease (AD). In this paper a frequency band analysis of AD EEG signals is presented, with the aim of improving the diagnosis of AD using EEG signals. In this paper, multiple synchrony measures are assessed through statistical tests (Mann–Whitney U test), including correlation, phase synchrony and Granger causality measures. Moreover, linear discriminant analysis (LDA) is conducted with those synchrony measures as features. For the data set at hand, the frequency range (5-6Hz) yields the best accuracy for diagnosing AD, which lies within the classical theta band (4-8Hz). The corresponding classification error is 4.88% for directed transfer function (DTF) Granger causality measure. Interestingly, results show that EEG of AD patients is more synchronous than in healthy subjects within the optimized range 5-6Hz, which is in sharp contrast with the loss of synchrony in AD EEG reported in many earlier studies. This new finding may provide new insights about the neurophysiology of AD. Additional testing on larger AD datasets is required to verify the effectiveness of the proposed approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Gene filtering is a useful preprocessing technique often applied to microarray datasets. However, it is no common practice because clear guidelines are lacking and it bears the risk of excluding some potentially relevant genes. In this work, we propose to model microarray data as a mixture of two Gaussian distributions that will allow us to obtain an optimal filter threshold in terms of the gene expression level.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The purpose of our project is to contribute to earlier diagnosis of AD and better estimates of its severity by using automatic analysis performed through new biomarkers extracted from non-invasive intelligent methods. The methods selected in this case are speech biomarkers oriented to Sponta-neous Speech and Emotional Response Analysis. Thus the main goal of the present work is feature search in Spontaneous Speech oriented to pre-clinical evaluation for the definition of test for AD diagnosis by One-class classifier. One-class classifi-cation problem differs from multi-class classifier in one essen-tial aspect. In one-class classification it is assumed that only information of one of the classes, the target class, is available. In this work we explore the problem of imbalanced datasets that is particularly crucial in applications where the goal is to maximize recognition of the minority class as in medical diag-nosis. The use of information about outlier and Fractal Dimen-sion features improves the system performance.