967 resultados para Classification models
Resumo:
BACKGROUND: Extensive research exists estimating the effect hazardous alcohol¦use on morbidity and mortality, but little research quantifies the association between¦alcohol consumption and utility scores in patients with alcohol dependence.¦In the context of comparative research, the World Health Organisation (WHO)¦proposed to categorise the risk for alcohol-related acute and chronic harm according¦to patients' average daily alcohol consumption. OBJECTIVES: To estimate utility¦scores associated with each category of the WHO drinking risk-level classification¦in patients with alcohol dependence (AD). METHODS: We used data from¦CONTROL, an observational cohort study including 143 AD patients from the Alcohol¦Treatment Center at Lausanne University Hospital, followed for 12 months.¦Average daily alcohol consumption was assessed monthly using the Timeline Follow-¦back method and patients were categorised according to the WHO drinking¦risk-level classification: abstinent, low, medium, high and very high. Other measures¦as sociodemographic characteristics and utility scores derived from the EuroQoL¦5-Dimensions questionnaire (EQ-5D) were collected every three months.¦Mixed models for repeated measures were used to estimate mean utility scores¦associated with WHO drinking risk-level categories. RESULTS: A total of 143 patients¦were included and the 12-month follow-up permitting the assessment of¦1318 person-months. At baseline the mean age of the patients was 44.6 (SD 11.8)¦and the majority of patients was male (63.6%). Using repeated measures analysis,¦utility scores decreased with increasing drinking levels, ranging from 0.80 in abstinent¦patients to 0.62 in patients with very high risk drinking level (p_0.0001).¦CONCLUSIONS: In this sample of patients with alcohol dependence undergoing¦specialized care, utility scores estimated from the EQ-5D appeared to substantially¦and consistently vary according to patients' WHO drinking level.
Resumo:
BACKGROUND: Several studies have established Glioblastoma Multiforme (GBM) prognostic and predictive models based on age and Karnofsky Performance Status (KPS), while very few studies evaluated the prognostic and predictive significance of preoperative MR-imaging. However, to date, there is no simple preoperative GBM classification that also correlates with a highly prognostic genomic signature. Thus, we present for the first time a biologically relevant, and clinically applicable tumor Volume, patient Age, and KPS (VAK) GBM classification that can easily and non-invasively be determined upon patient admission. METHODS: We quantitatively analyzed the volumes of 78 GBM patient MRIs present in The Cancer Imaging Archive (TCIA) corresponding to patients in The Cancer Genome Atlas (TCGA) with VAK annotation. The variables were then combined using a simple 3-point scoring system to form the VAK classification. A validation set (N = 64) from both the TCGA and Rembrandt databases was used to confirm the classification. Transcription factor and genomic correlations were performed using the gene pattern suite and Ingenuity Pathway Analysis. RESULTS: VAK-A and VAK-B classes showed significant median survival differences in discovery (P = 0.007) and validation sets (P = 0.008). VAK-A is significantly associated with P53 activation, while VAK-B shows significant P53 inhibition. Furthermore, a molecular gene signature comprised of a total of 25 genes and microRNAs was significantly associated with the classes and predicted survival in an independent validation set (P = 0.001). A favorable MGMT promoter methylation status resulted in a 10.5 months additional survival benefit for VAK-A compared to VAK-B patients. CONCLUSIONS: The non-invasively determined VAK classification with its implication of VAK-specific molecular regulatory networks, can serve as a very robust initial prognostic tool, clinical trial selection criteria, and important step toward the refinement of genomics-based personalized therapy for GBM patients.
Resumo:
1. Identifying the boundary of a species' niche from observational and environmental data is a common problem in ecology and conservation biology and a variety of techniques have been developed or applied to model niches and predict distributions. Here, we examine the performance of some pattern-recognition methods as ecological niche models (ENMs). Particularly, one-class pattern recognition is a flexible and seldom used methodology for modelling ecological niches and distributions from presence-only data. The development of one-class methods that perform comparably to two-class methods (for presence/absence data) would remove modelling decisions about sampling pseudo-absences or background data points when absence points are unavailable. 2. We studied nine methods for one-class classification and seven methods for two-class classification (five common to both), all primarily used in pattern recognition and therefore not common in species distribution and ecological niche modelling, across a set of 106 mountain plant species for which presence-absence data was available. We assessed accuracy using standard metrics and compared trade-offs in omission and commission errors between classification groups as well as effects of prevalence and spatial autocorrelation on accuracy. 3. One-class models fit to presence-only data were comparable to two-class models fit to presence-absence data when performance was evaluated with a measure weighting omission and commission errors equally. One-class models were superior for reducing omission errors (i.e. yielding higher sensitivity), and two-classes models were superior for reducing commission errors (i.e. yielding higher specificity). For these methods, spatial autocorrelation was only influential when prevalence was low. 4. These results differ from previous efforts to evaluate alternative modelling approaches to build ENM and are particularly noteworthy because data are from exhaustively sampled populations minimizing false absence records. Accurate, transferable models of species' ecological niches and distributions are needed to advance ecological research and are crucial for effective environmental planning and conservation; the pattern-recognition approaches studied here show good potential for future modelling studies. This study also provides an introduction to promising methods for ecological modelling inherited from the pattern-recognition discipline.
Resumo:
The paper deals with the development and application of the generic methodology for automatic processing (mapping and classification) of environmental data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve the problem of spatial data mapping (regression). The Probabilistic Neural Network (PNN) is considered as an automatic tool for spatial classifications. The automatic tuning of isotropic and anisotropic GRNN/PNN models using cross-validation procedure is presented. Results are compared with the k-Nearest-Neighbours (k-NN) interpolation algorithm using independent validation data set. Real case studies are based on decision-oriented mapping and classification of radioactively contaminated territories.
Dissemination of the Swiss Model for Outcome Classification in Health Promotion and Prevention SMOC.
Resumo:
Résumé Suite aux recentes avancées technologiques, les archives d'images digitales ont connu une croissance qualitative et quantitative sans précédent. Malgré les énormes possibilités qu'elles offrent, ces avancées posent de nouvelles questions quant au traitement des masses de données saisies. Cette question est à la base de cette Thèse: les problèmes de traitement d'information digitale à très haute résolution spatiale et/ou spectrale y sont considérés en recourant à des approches d'apprentissage statistique, les méthodes à noyau. Cette Thèse étudie des problèmes de classification d'images, c'est à dire de catégorisation de pixels en un nombre réduit de classes refletant les propriétés spectrales et contextuelles des objets qu'elles représentent. L'accent est mis sur l'efficience des algorithmes, ainsi que sur leur simplicité, de manière à augmenter leur potentiel d'implementation pour les utilisateurs. De plus, le défi de cette Thèse est de rester proche des problèmes concrets des utilisateurs d'images satellite sans pour autant perdre de vue l'intéret des méthodes proposées pour le milieu du machine learning dont elles sont issues. En ce sens, ce travail joue la carte de la transdisciplinarité en maintenant un lien fort entre les deux sciences dans tous les développements proposés. Quatre modèles sont proposés: le premier répond au problème de la haute dimensionalité et de la redondance des données par un modèle optimisant les performances en classification en s'adaptant aux particularités de l'image. Ceci est rendu possible par un système de ranking des variables (les bandes) qui est optimisé en même temps que le modèle de base: ce faisant, seules les variables importantes pour résoudre le problème sont utilisées par le classifieur. Le manque d'information étiquétée et l'incertitude quant à sa pertinence pour le problème sont à la source des deux modèles suivants, basés respectivement sur l'apprentissage actif et les méthodes semi-supervisées: le premier permet d'améliorer la qualité d'un ensemble d'entraînement par interaction directe entre l'utilisateur et la machine, alors que le deuxième utilise les pixels non étiquetés pour améliorer la description des données disponibles et la robustesse du modèle. Enfin, le dernier modèle proposé considère la question plus théorique de la structure entre les outputs: l'intègration de cette source d'information, jusqu'à présent jamais considérée en télédétection, ouvre des nouveaux défis de recherche. Advanced kernel methods for remote sensing image classification Devis Tuia Institut de Géomatique et d'Analyse du Risque September 2009 Abstract The technical developments in recent years have brought the quantity and quality of digital information to an unprecedented level, as enormous archives of satellite images are available to the users. However, even if these advances open more and more possibilities in the use of digital imagery, they also rise several problems of storage and treatment. The latter is considered in this Thesis: the processing of very high spatial and spectral resolution images is treated with approaches based on data-driven algorithms relying on kernel methods. In particular, the problem of image classification, i.e. the categorization of the image's pixels into a reduced number of classes reflecting spectral and contextual properties, is studied through the different models presented. The accent is put on algorithmic efficiency and the simplicity of the approaches proposed, to avoid too complex models that would not be used by users. The major challenge of the Thesis is to remain close to concrete remote sensing problems, without losing the methodological interest from the machine learning viewpoint: in this sense, this work aims at building a bridge between the machine learning and remote sensing communities and all the models proposed have been developed keeping in mind the need for such a synergy. Four models are proposed: first, an adaptive model learning the relevant image features has been proposed to solve the problem of high dimensionality and collinearity of the image features. This model provides automatically an accurate classifier and a ranking of the relevance of the single features. The scarcity and unreliability of labeled. information were the common root of the second and third models proposed: when confronted to such problems, the user can either construct the labeled set iteratively by direct interaction with the machine or use the unlabeled data to increase robustness and quality of the description of data. Both solutions have been explored resulting into two methodological contributions, based respectively on active learning and semisupervised learning. Finally, the more theoretical issue of structured outputs has been considered in the last model, which, by integrating outputs similarity into a model, opens new challenges and opportunities for remote sensing image processing.
Resumo:
Evaluating other individuals with respect to personality characteristics plays a crucial role in human relations and it is the focus of attention for research in diverse fields such as psychology and interactive computer systems. In psychology, face perception has been recognized as a key component of this evaluation system. Multiple studies suggest that observers use face information to infer personality characteristics. Interactive computer systems are trying to take advantage of these findings and apply them to increase the natural aspect of interaction and to improve the performance of interactive computer systems. Here, we experimentally test whether the automatic prediction of facial trait judgments (e.g. dominance) can be made by using the full appearance information of the face and whether a reduced representation of its structure is sufficient. We evaluate two separate approaches: a holistic representation model using the facial appearance information and a structural model constructed from the relations among facial salient points. State of the art machine learning methods are applied to a) derive a facial trait judgment model from training data and b) predict a facial trait value for any face. Furthermore, we address the issue of whether there are specific structural relations among facial points that predict perception of facial traits. Experimental results over a set of labeled data (9 different trait evaluations) and classification rules (4 rules) suggest that a) prediction of perception of facial traits is learnable by both holistic and structural approaches; b) the most reliable prediction of facial trait judgments is obtained by certain type of holistic descriptions of the face appearance; and c) for some traits such as attractiveness and extroversion, there are relationships between specific structural features and social perceptions.
Resumo:
The pace of on-going climate change calls for reliable plant biodiversity scenarios. Traditional dynamic vegetation models use plant functional types that are summarized to such an extent that they become meaningless for biodiversity scenarios. Hybrid dynamic vegetation models of intermediate complexity (hybrid-DVMs) have recently been developed to address this issue. These models, at the crossroads between phenomenological and process-based models, are able to involve an intermediate number of well-chosen plant functional groups (PFGs). The challenge is to build meaningful PFGs that are representative of plant biodiversity, and consistent with the parameters and processes of hybrid-DVMs. Here, we propose and test a framework based on few selected traits to define a limited number of PFGs, which are both representative of the diversity (functional and taxonomic) of the flora in the Ecrins National Park, and adapted to hybrid-DVMs. This new classification scheme, together with recent advances in vegetation modeling, constitutes a step forward for mechanistic biodiversity modeling.
Resumo:
Radioactive soil-contamination mapping and risk assessment is a vital issue for decision makers. Traditional approaches for mapping the spatial concentration of radionuclides employ various regression-based models, which usually provide a single-value prediction realization accompanied (in some cases) by estimation error. Such approaches do not provide the capability for rigorous uncertainty quantification or probabilistic mapping. Machine learning is a recent and fast-developing approach based on learning patterns and information from data. Artificial neural networks for prediction mapping have been especially powerful in combination with spatial statistics. A data-driven approach provides the opportunity to integrate additional relevant information about spatial phenomena into a prediction model for more accurate spatial estimates and associated uncertainty. Machine-learning algorithms can also be used for a wider spectrum of problems than before: classification, probability density estimation, and so forth. Stochastic simulations are used to model spatial variability and uncertainty. Unlike regression models, they provide multiple realizations of a particular spatial pattern that allow uncertainty and risk quantification. This paper reviews the most recent methods of spatial data analysis, prediction, and risk mapping, based on machine learning and stochastic simulations in comparison with more traditional regression models. The radioactive fallout from the Chernobyl Nuclear Power Plant accident is used to illustrate the application of the models for prediction and classification problems. This fallout is a unique case study that provides the challenging task of analyzing huge amounts of data ('hard' direct measurements, as well as supplementary information and expert estimates) and solving particular decision-oriented problems.
Resumo:
A recent study of a pair of sympatric species of cichlids in Lake Apoyo in Nicaragua is viewed as providing probably one of the most convincing examples of sympatric speciation to date. Here, we describe and study a stochastic, individual-based, explicit genetic model tailored for this cichlid system. Our results show that relatively rapid (<20,000 generations) colonization of a new ecological niche and (sympatric or parapatric) speciation via local adaptation and divergence in habitat and mating preferences are theoretically plausible if: (i) the number of loci underlying the traits controlling local adaptation, and habitat and mating preferences is small; (ii) the strength of selection for local adaptation is intermediate; (iii) the carrying capacity of the population is intermediate; and (iv) the effects of the loci influencing nonrandom mating are strong. We discuss patterns and timescales of ecological speciation identified by our model, and we highlight important parameters and features that need to be studied empirically to provide information that can be used to improve the biological realism and power of mathematical models of ecological speciation.
Resumo:
Tire traces can be observed on several crime scenes as vehicles are often used by criminals. The tread abrasion on the road, while braking or skidding, leads to the production of small rubber particles which can be collected for comparison purposes. This research focused on the statistical comparison of Py-GC/MS profiles of tire traces and tire treads. The optimisation of the analytical method was carried out using experimental designs. The aim was to determine the best pyrolysis parameters regarding the repeatability of the results. Thus, the pyrolysis factor effect could also be calculated. The pyrolysis temperature was found to be five time more important than time. Finally, a pyrolysis at 650 °C during 15 s was selected. Ten tires of different manufacturers and models were used for this study. Several samples were collected on each tire, and several replicates were carried out to study the variability within each tire (intravariability). More than eighty compounds were integrated for each analysis and the variability study showed that more than 75% presented a relative standard deviation (RSD) below 5% for the ten tires, thus supporting a low intravariability. The variability between the ten tires (intervariability) presented higher values and the ten most variant compounds had a RSD value above 13%, supporting their high potential of discrimination between the tires tested. Principal Component Analysis (PCA) was able to fully discriminate the ten tires with the help of the first three principal components. The ten tires were finally used to perform braking tests on a racetrack with a vehicle equipped with an anti-lock braking system. The resulting tire traces were adequately collected using sheets of white gelatine. As for tires, the intravariability for the traces was found to be lower than the intervariability. Clustering methods were carried out and the Ward's method based on the squared Euclidean distance was able to correctly group all of the tire traces replicates in the same cluster than the replicates of their corresponding tire. Blind tests on traces were performed and were correctly assigned to their tire source. These results support the hypothesis that the tested tires, of different manufacturers and models, can be discriminated by a statistical comparison of their chemical profiles. The traces were found to be not differentiable from their source but differentiable from all the other tires present in the subset. The results are promising and will be extended on a larger sample set.
Resumo:
This paper presents a validation study on statistical nonsupervised brain tissue classification techniques in magnetic resonance (MR) images. Several image models assuming different hypotheses regarding the intensity distribution model, the spatial model and the number of classes are assessed. The methods are tested on simulated data for which the classification ground truth is known. Different noise and intensity nonuniformities are added to simulate real imaging conditions. No enhancement of the image quality is considered either before or during the classification process. This way, the accuracy of the methods and their robustness against image artifacts are tested. Classification is also performed on real data where a quantitative validation compares the methods' results with an estimated ground truth from manual segmentations by experts. Validity of the various classification methods in the labeling of the image as well as in the tissue volume is estimated with different local and global measures. Results demonstrate that methods relying on both intensity and spatial information are more robust to noise and field inhomogeneities. We also demonstrate that partial volume is not perfectly modeled, even though methods that account for mixture classes outperform methods that only consider pure Gaussian classes. Finally, we show that simulated data results can also be extended to real data.
Resumo:
Background: Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods: Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results: CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69- 75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion: With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients.
Resumo:
Near-infrared spectroscopy (NIRS) was used to analyse the crude protein content of dried and milled samples of wheat and to discriminate samples according to their stage of growth. A calibration set of 72 samples from three growth stages of wheat (tillering, heading and harvest) and a validation set of 28 samples was collected for this purpose. Principal components analysis (PCA) of the calibration set discriminated groups of samples according to the growth stage of the wheat. Based on these differences, a classification procedure (SIMCA) showed a very accurate classification of the validation set samples : all of them were successfully classified in each group using this procedure when both the residual and the leverage were used in the classification criteria. Looking only at the residuals all the samples were also correctly classified except one of tillering stage that was assigned to both tillering and heading stages. Finally, the determination of the crude protein content of these samples was considered in two ways: building up a global model for all the growth stages, and building up local models for each stage, separately. The best prediction results for crude protein were obtained using a global model for samples in the two first growth stages (tillering and heading), and using a local model for the harvest stage samples.
Resumo:
Many classification systems rely on clustering techniques in which a collection of training examples is provided as an input, and a number of clusters c1,...cm modelling some concept C results as an output, such that every cluster ci is labelled as positive or negative. Given a new, unlabelled instance enew, the above classification is used to determine to which particular cluster ci this new instance belongs. In such a setting clusters can overlap, and a new unlabelled instance can be assigned to more than one cluster with conflicting labels. In the literature, such a case is usually solved non-deterministically by making a random choice. This paper presents a novel, hybrid approach to solve this situation by combining a neural network for classification along with a defeasible argumentation framework which models preference criteria for performing clustering.