813 resultados para microarray data classification


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Gene set enrichment (GSE) analysis is a popular framework for condensing information from gene expression profiles into a pathway or signature summary. The strengths of this approach over single gene analysis include noise and dimension reduction, as well as greater biological interpretability. As molecular profiling experiments move beyond simple case-control studies, robust and flexible GSE methodologies are needed that can model pathway activity within highly heterogeneous data sets. To address this challenge, we introduce Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner. We demonstrate the robustness of GSVA in a comparison with current state of the art sample-wise enrichment methods. Further, we provide examples of its utility in differential pathway activity and survival analysis. Lastly, we show how GSVA works analogously with data from both microarray and RNA-seq experiments. GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. While GSE methods are generally regarded as end points of a bioinformatic analysis, GSVA constitutes a starting point to build pathway-centric models of biology. Moreover, GSVA contributes to the current need of GSE methods for RNA-seq data. GSVA is an open source software package for R which forms part of the Bioconductor project and can be downloaded at http://www.bioconductor.org.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The main objective of this study was todo a statistical analysis of ecological type from optical satellite data, using Tipping's sparse Bayesian algorithm. This thesis uses "the Relevence Vector Machine" algorithm in ecological classification betweenforestland and wetland. Further this bi-classification technique was used to do classification of many other different species of trees and produces hierarchical classification of entire subclasses given as a target class. Also, we carried out an attempt to use airborne image of same forest area. Combining it with image analysis, using different image processing operation, we tried to extract good features and later used them to perform classification of forestland and wetland.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In vivo proton magnetic resonance spectroscopy (¹H-MRS) is a technique capable of assessing biochemical content and pathways in normal and pathological tissue. In the brain, ¹H-MRS complements the information given by magnetic resonance images. The main goal of the present study was to assess the accuracy of ¹H-MRS for the classification of brain tumors in a pilot study comparing results obtained by manual and semi-automatic quantification of metabolites. In vivo single-voxel ¹H-MRS was performed in 24 control subjects and 26 patients with brain neoplasms that included meningiomas, high-grade neuroglial tumors and pilocytic astrocytomas. Seven metabolite groups (lactate, lipids, N-acetyl-aspartate, glutamate and glutamine group, total creatine, total choline, myo-inositol) were evaluated in all spectra by two methods: a manual one consisting of integration of manually defined peak areas, and the advanced method for accurate, robust and efficient spectral fitting (AMARES), a semi-automatic quantification method implemented in the jMRUI software. Statistical methods included discriminant analysis and the leave-one-out cross-validation method. Both manual and semi-automatic analyses detected differences in metabolite content between tumor groups and controls (P < 0.005). The classification accuracy obtained with the manual method was 75% for high-grade neuroglial tumors, 55% for meningiomas and 56% for pilocytic astrocytomas, while for the semi-automatic method it was 78, 70, and 98%, respectively. Both methods classified all control subjects correctly. The study demonstrated that ¹H-MRS accurately differentiated normal from tumoral brain tissue and confirmed the superiority of the semi-automatic quantification method.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The recent rapid development of biotechnological approaches has enabled the production of large whole genome level biological data sets. In order to handle thesedata sets, reliable and efficient automated tools and methods for data processingand result interpretation are required. Bioinformatics, as the field of studying andprocessing biological data, tries to answer this need by combining methods and approaches across computer science, statistics, mathematics and engineering to studyand process biological data. The need is also increasing for tools that can be used by the biological researchers themselves who may not have a strong statistical or computational background, which requires creating tools and pipelines with intuitive user interfaces, robust analysis workflows and strong emphasis on result reportingand visualization. Within this thesis, several data analysis tools and methods have been developed for analyzing high-throughput biological data sets. These approaches, coveringseveral aspects of high-throughput data analysis, are specifically aimed for gene expression and genotyping data although in principle they are suitable for analyzing other data types as well. Coherent handling of the data across the various data analysis steps is highly important in order to ensure robust and reliable results. Thus,robust data analysis workflows are also described, putting the developed tools andmethods into a wider context. The choice of the correct analysis method may also depend on the properties of the specific data setandthereforeguidelinesforchoosing an optimal method are given. The data analysis tools, methods and workflows developed within this thesis have been applied to several research studies, of which two representative examplesare included in the thesis. The first study focuses on spermatogenesis in murinetestis and the second one examines cell lineage specification in mouse embryonicstem cells.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Site-specific management requires accurate knowledge of the spatial variation in a range of soil properties within fields. This involves considerable sampling effort, which is costly. Ancillary data, such as crop yield, elevation and apparent electrical conductivity (ECa) of the soil, can provide insight into the spatial variation of some soil properties. A multivariate classification with spatial constraint imposed by the variogram was used to classify data from two arable crop fields. The yield data comprised 5 years of crop yield, and the ancillary data 3 years of yield data, elevation and ECa. Information on soil chemical and physical properties was provided by intensive surveys of the soil. Multivariate variograms computed from these data were used to constrain sites spatially within classes to increase their contiguity. The constrained classifications resulted in coherent classes, and those based on the ancillary data were similar to those from the soil properties. The ancillary data seemed to identify areas in the field where the soil is reasonably homogeneous. The results of targeted sampling showed that these classes could be used as a basis for management and to guide future sampling of the soil.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Airborne LIght Detection And Ranging (LIDAR) provides accurate height information for objects on the earth, which makes LIDAR become more and more popular in terrain and land surveying. In particular, LIDAR data offer vital and significant features for land-cover classification which is an important task in many application domains. In this paper, an unsupervised approach based on an improved fuzzy Markov random field (FMRF) model is developed, by which the LIDAR data, its co-registered images acquired by optical sensors, i.e. aerial color image and near infrared image, and other derived features are fused effectively to improve the ability of the LIDAR system for the accurate land-cover classification. In the proposed FMRF model-based approach, the spatial contextual information is applied by modeling the image as a Markov random field (MRF), with which the fuzzy logic is introduced simultaneously to reduce the errors caused by the hard classification. Moreover, a Lagrange-Multiplier (LM) algorithm is employed to calculate a maximum A posteriori (MAP) estimate for the classification. The experimental results have proved that fusing the height data and optical images is particularly suited for the land-cover classification. The proposed approach works very well for the classification from airborne LIDAR data fused with its coregistered optical images and the average accuracy is improved to 88.9%.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Distributed and collaborative data stream mining in a mobile computing environment is referred to as Pocket Data Mining PDM. Large amounts of available data streams to which smart phones can subscribe to or sense, coupled with the increasing computational power of handheld devices motivates the development of PDM as a decision making system. This emerging area of study has shown to be feasible in an earlier study using technological enablers of mobile software agents and stream mining techniques [1]. A typical PDM process would start by having mobile agents roam the network to discover relevant data streams and resources. Then other (mobile) agents encapsulating stream mining techniques visit the relevant nodes in the network in order to build evolving data mining models. Finally, a third type of mobile agents roam the network consulting the mining agents for a final collaborative decision, when required by one or more users. In this paper, we propose the use of distributed Hoeffding trees and Naive Bayes classifers in the PDM framework over vertically partitioned data streams. Mobile policing, health monitoring and stock market analysis are among the possible applications of PDM. An extensive experimental study is reported showing the effectiveness of the collaborative data mining with the two classifers.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Pocket Data Mining (PDM) describes the full process of analysing data streams in mobile ad hoc distributed environments. Advances in mobile devices like smart phones and tablet computers have made it possible for a wide range of applications to run in such an environment. In this paper, we propose the adoption of data stream classification techniques for PDM. Evident by a thorough experimental study, it has been proved that running heterogeneous/different, or homogeneous/similar data stream classification techniques over vertically partitioned data (data partitioned according to the feature space) results in comparable performance to batch and centralised learning techniques.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Airborne lidar provides accurate height information of objects on the earth and has been recognized as a reliable and accurate surveying tool in many applications. In particular, lidar data offer vital and significant features for urban land-cover classification, which is an important task in urban land-use studies. In this article, we present an effective approach in which lidar data fused with its co-registered images (i.e. aerial colour images containing red, green and blue (RGB) bands and near-infrared (NIR) images) and other derived features are used effectively for accurate urban land-cover classification. The proposed approach begins with an initial classification performed by the Dempster–Shafer theory of evidence with a specifically designed basic probability assignment function. It outputs two results, i.e. the initial classification and pseudo-training samples, which are selected automatically according to the combined probability masses. Second, a support vector machine (SVM)-based probability estimator is adopted to compute the class conditional probability (CCP) for each pixel from the pseudo-training samples. Finally, a Markov random field (MRF) model is established to combine spatial contextual information into the classification. In this stage, the initial classification result and the CCP are exploited. An efficient belief propagation (EBP) algorithm is developed to search for the global minimum-energy solution for the maximum a posteriori (MAP)-MRF framework in which three techniques are developed to speed up the standard belief propagation (BP) algorithm. Lidar and its co-registered data acquired by Toposys Falcon II are used in performance tests. The experimental results prove that fusing the height data and optical images is particularly suited for urban land-cover classification. There is no training sample needed in the proposed approach, and the computational cost is relatively low. An average classification accuracy of 93.63% is achieved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The bewildering complexity of cortical microcircuits at the single cell level gives rise to surprisingly robust emergent activity patterns at the level of laminar and columnar local field potentials (LFPs) in response to targeted local stimuli. Here we report the results of our multivariate data-analytic approach based on simultaneous multi-site recordings using micro-electrode-array chips for investigation of the microcircuitary of rat somatosensory (barrel) cortex. We find high repeatability of stimulus-induced responses, and typical spatial distributions of LFP responses to stimuli in supragranular, granular, and infragranular layers, where the last form a particularly distinct class. Population spikes appear to travel with about 33 cm/s from granular to infragranular layers. Responses within barrel related columns have different profiles than those in neighbouring columns to the left or interchangeably to the right. Variations between slices occur, but can be minimized by strictly obeying controlled experimental protocols. Cluster analysis on normalized recordings indicates specific spatial distributions of time series reflecting the location of sources and sinks independent of the stimulus layer. Although the precise correspondences between single cell activity and LFPs are still far from clear, a sophisticated neuroinformatics approach in combination with multi-site LFP recordings in the standardized slice preparation is suitable for comparing normal conditions to genetically or pharmacologically altered situations based on real cortical microcircuitry.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Full-waveform laser scanning data acquired with a Riegl LMS-Q560 instrument were used to classify an orange orchard into orange trees, grass and ground using waveform parameters alone. Gaussian decomposition was performed on this data capture from the National Airborne Field Experiment in November 2006 using a custom peak-detection procedure and a trust-region-reflective algorithm for fitting Gauss functions. Calibration was carried out using waveforms returned from a road surface, and the backscattering coefficient c was derived for every waveform peak. The processed data were then analysed according to the number of returns detected within each waveform and classified into three classes based on pulse width and c. For single-peak waveforms the scatterplot of c versus pulse width was used to distinguish between ground, grass and orange trees. In the case of multiple returns, the relationship between first (or first plus middle) and last return c values was used to separate ground from other targets. Refinement of this classification, and further sub-classification into grass and orange trees was performed using the c versus pulse width scatterplots of last returns. In all cases the separation was carried out using a decision tree with empirical relationships between the waveform parameters. Ground points were successfully separated from orange tree points. The most difficult class to separate and verify was grass, but those points in general corresponded well with the grass areas identified in the aerial photography. The overall accuracy reached 91%, using photography and relative elevation as ground truth. The overall accuracy for two classes, orange tree and combined class of grass and ground, yielded 95%. Finally, the backscattering coefficient c of single-peak waveforms was also used to derive reflectance values of the three classes. The reflectance of the orange tree class (0.31) and ground class (0.60) are consistent with published values at the wavelength of the Riegl scanner (1550 nm). The grass class reflectance (0.46) falls in between the other two classes as might be expected, as this class has a mixture of the contributions of both vegetation and ground reflectance properties.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Recent studies showed that features extracted from brain MRIs can well discriminate Alzheimer’s disease from Mild Cognitive Impairment. This study provides an algorithm that sequentially applies advanced feature selection methods for findings the best subset of features in terms of binary classification accuracy. The classifiers that provided the highest accuracies, have been then used for solving a multi-class problem by the one-versus-one strategy. Although several approaches based on Regions of Interest (ROIs) extraction exist, the prediction power of features has not yet investigated by comparing filter and wrapper techniques. The findings of this work suggest that (i) the IntraCranial Volume (ICV) normalization can lead to overfitting and worst the accuracy prediction of test set and (ii) the combined use of a Random Forest-based filter with a Support Vector Machines-based wrapper, improves accuracy of binary classification.