813 resultados para microarray data classification
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
Grasslands in semi-arid regions, like Mongolian steppes, are facing desertification and degradation processes, due to climate change. Mongolia’s main economic activity consists on an extensive livestock production and, therefore, it is a concerning matter for the decision makers. Remote sensing and Geographic Information Systems provide the tools for advanced ecosystem management and have been widely used for monitoring and management of pasture resources. This study investigates which is the higher thematic detail that is possible to achieve through remote sensing, to map the steppe vegetation, using medium resolution earth observation imagery in three districts (soums) of Mongolia: Dzag, Buutsagaan and Khureemaral. After considering different thematic levels of detail for classifying the steppe vegetation, the existent pasture types within the steppe were chosen to be mapped. In order to investigate which combination of data sets yields the best results and which classification algorithm is more suitable for incorporating these data sets, a comparison between different classification methods were tested for the study area. Sixteen classifications were performed using different combinations of estimators, Landsat-8 (spectral bands and Landsat-8 NDVI-derived) and geophysical data (elevation, mean annual precipitation and mean annual temperature) using two classification algorithms, maximum likelihood and decision tree. Results showed that the best performing model was the one that incorporated Landsat-8 bands with mean annual precipitation and mean annual temperature (Model 13), using the decision tree. For maximum likelihood, the model that incorporated Landsat-8 bands with mean annual precipitation (Model 5) and the one that incorporated Landsat-8 bands with mean annual precipitation and mean annual temperature (Model 13), achieved the higher accuracies for this algorithm. The decision tree models consistently outperformed the maximum likelihood ones.
Resumo:
Magdeburg, Univ., Fak. für Inf., Diss., 2014
Resumo:
Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem.
Resumo:
This study presents a classification criteria for two-class Cannabis seedlings. As the cultivation of drug type cannabis is forbidden in Switzerland, law enforcement authorities regularly ask laboratories to determine cannabis plant's chemotype from seized material in order to ascertain that the plantation is legal or not. In this study, the classification analysis is based on data obtained from the relative proportion of three major leaf compounds measured by gas-chromatography interfaced with mass spectrometry (GC-MS). The aim is to discriminate between drug type (illegal) and fiber type (legal) cannabis at an early stage of the growth. A Bayesian procedure is proposed: a Bayes factor is computed and classification is performed on the basis of the decision maker specifications (i.e. prior probability distributions on cannabis type and consequences of classification measured by losses). Classification rates are computed with two statistical models and results are compared. Sensitivity analysis is then performed to analyze the robustness of classification criteria.
Resumo:
One in a series of six data briefings based on regional-level analysis of data from the National Child Measurement Programme (NCMP) undertaken by the National Obesity Observatory (NOO). The briefings are intended to complement the headline results for the region published in January 2010, at Quick Link 20510.
Resumo:
Breast cancer is a heterogeneous disease with varied morphological appearances, molecular features, behavior, and response to therapy. Current routine clinical management of breast cancer relies on the availability of robust clinical and pathological prognostic and predictive factors to support clinical and patient decision making in which potentially suitable treatment options are increasingly available. One of the best-established prognostic factors in breast cancer is histological grade, which represents the morphological assessment of tumor biological characteristics and has been shown to be able to generate important information related to the clinical behavior of breast cancers. Genome-wide microarray-based expression profiling studies have unraveled several characteristics of breast cancer biology and have provided further evidence that the biological features captured by histological grade are important in determining tumor behavior. Also, expression profiling studies have generated clinically useful data that have significantly improved our understanding of the biology of breast cancer, and these studies are undergoing evaluation as improved prognostic and predictive tools in clinical practice. Clinical acceptance of these molecular assays will require them to be more than expensive surrogates of established traditional factors such as histological grade. It is essential that they provide additional prognostic or predictive information above and beyond that offered by current parameters. Here, we present an analysis of the validity of histological grade as a prognostic factor and a consensus view on the significance of histological grade and its role in breast cancer classification and staging systems in this era of emerging clinical use of molecular classifiers.
Resumo:
The 2008 Data Fusion Contest organized by the IEEE Geoscience and Remote Sensing Data Fusion Technical Committee deals with the classification of high-resolution hyperspectral data from an urban area. Unlike in the previous issues of the contest, the goal was not only to identify the best algorithm but also to provide a collaborative effort: The decision fusion of the best individual algorithms was aiming at further improving the classification performances, and the best algorithms were ranked according to their relative contribution to the decision fusion. This paper presents the five awarded algorithms and the conclusions of the contest, stressing the importance of decision fusion, dimension reduction, and supervised classification methods, such as neural networks and support vector machines.
Resumo:
The classical binary classification problem is investigatedwhen it is known in advance that the posterior probability function(or regression function) belongs to some class of functions. We introduceand analyze a method which effectively exploits this knowledge. The methodis based on minimizing the empirical risk over a carefully selected``skeleton'' of the class of regression functions. The skeleton is acovering of the class based on a data--dependent metric, especiallyfitted for classification. A new scale--sensitive dimension isintroduced which is more useful for the studied classification problemthan other, previously defined, dimension measures. This fact isdemonstrated by performance bounds for the skeleton estimate in termsof the new dimension.
Resumo:
BACKGROUND: Prognosis prediction for resected primary colon cancer is based on the T-stage Node Metastasis (TNM) staging system. We investigated if four well-documented gene expression risk scores can improve patient stratification. METHODS: Microarray-based versions of risk-scores were applied to a large independent cohort of 688 stage II/III tumors from the PETACC-3 trial. Prognostic value for relapse-free survival (RFS), survival after relapse (SAR), and overall survival (OS) was assessed by regression analysis. To assess improvement over a reference, prognostic model was assessed with the area under curve (AUC) of receiver operating characteristic (ROC) curves. All statistical tests were two-sided, except the AUC increase. RESULTS: All four risk scores (RSs) showed a statistically significant association (single-test, P < .0167) with OS or RFS in univariate models, but with HRs below 1.38 per interquartile range. Three scores were predictors of shorter RFS, one of shorter SAR. Each RS could only marginally improve an RFS or OS model with the known factors T-stage, N-stage, and microsatellite instability (MSI) status (AUC gains < 0.025 units). The pairwise interscore discordance was never high (maximal Spearman correlation = 0.563) A combined score showed a trend to higher prognostic value and higher AUC increase for OS (HR = 1.74, 95% confidence interval [CI] = 1.44 to 2.10, P < .001, AUC from 0.6918 to 0.7321) and RFS (HR = 1.56, 95% CI = 1.33 to 1.84, P < .001, AUC from 0.6723 to 0.6945) than any single score. CONCLUSIONS: The four tested gene expression-based risk scores provide prognostic information but contribute only marginally to improving models based on established risk factors. A combination of the risk scores might provide more robust information. Predictors of RFS and SAR might need to be different.
Resumo:
The DRG classification provides a useful tool for the evaluation of hospital care. Indicators such as readmissions and mortality rates adjusted for the hospital Casemix could be adopted in Switzerland at the price of minor additions to the hospital discharge record. The additional information required to build patients histories and to identify the deaths occurring after hospital discharge is detailed.
Resumo:
The paper deals with the development and application of the generic methodology for automatic processing (mapping and classification) of environmental data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve the problem of spatial data mapping (regression). The Probabilistic Neural Network (PNN) is considered as an automatic tool for spatial classifications. The automatic tuning of isotropic and anisotropic GRNN/PNN models using cross-validation procedure is presented. Results are compared with the k-Nearest-Neighbours (k-NN) interpolation algorithm using independent validation data set. Real case studies are based on decision-oriented mapping and classification of radioactively contaminated territories.
Resumo:
Expression data contribute significantly to the biological value of the sequenced human genome, providing extensive information about gene structure and the pattern of gene expression. ESTs, together with SAGE libraries and microarray experiment information, provide a broad and rich view of the transcriptome. However, it is difficult to perform large-scale expression mining of the data generated by these diverse experimental approaches. Not only is the data stored in disparate locations, but there is frequent ambiguity in the meaning of terms used to describe the source of the material used in the experiment. Untangling semantic differences between the data provided by different resources is therefore largely reliant on the domain knowledge of a human expert. We present here eVOC, a system which associates labelled target cDNAs for microarray experiments, or cDNA libraries and their associated transcripts with controlled terms in a set of hierarchical vocabularies. eVOC consists of four orthogonal controlled vocabularies suitable for describing the domains of human gene expression data including Anatomical System, Cell Type, Pathology and Developmental Stage. We have curated and annotated 7016 cDNA libraries represented in dbEST, as well as 104 SAGE libraries,with expression information,and provide this as an integrated, public resource that allows the linking of transcripts and libraries with expression terms. Both the vocabularies and the vocabulary-annotated libraries can be retrieved from http://www.sanbi.ac.za/evoc/. Several groups are involved in developing this resource with the aim of unifying transcript expression information.
Resumo:
As part of its 2006 systemic evaluation of DOC’s facilities, operations and programming, the Durrant/PBA consulting group found several shortcomings with the Department’s inmate custody classification system. Specifically, the consultants found that the system: