70 resultados para Classification models


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36 766 member database signatures integrated into 26 238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract : The existence of a causal relationship between the spatial distribution of living organisms and their environment, in particular climate, has been long recognized and is the central principle of biogeography. In turn, this recognition has led scientists to the idea of using the climatic, topographic, edaphic and biotic characteristics of the environment to predict its potential suitability for a given species or biological community. In this thesis, my objective is to contribute to the development of methodological improvements in the field of species distribution modeling. More precisely, the objectives are to propose solutions to overcome limitations of species distribution models when applied to conservation biology issues, or when .used as an assessment tool of the potential impacts of global change. The first objective of my thesis is to contribute to evidence the potential of species distribution models for conservation-related applications. I present a methodology to generate pseudo-absences in order to overcome the frequent lack of reliable absence data. I also demonstrate, both theoretically (simulation-based) and practically (field-based), how species distribution models can be successfully used to model and sample rare species. Overall, the results of this first part of the thesis demonstrate the strong potential of species distribution models as a tool for practical applications in conservation biology. The second objective this thesis is to contribute to improve .projections of potential climate change impacts on species distributions, and in particular for mountain flora. I develop and a dynamic model, MIGCLIM, that allows the implementation of dispersal limitations into classic species distribution models and present an application of this model to two virtual species. Given that accounting for dispersal limitations requires information on seed dispersal, distances, a general methodology to classify species into broad dispersal types is also developed. Finally, the M~GCLIM model is applied to a large number of species in a study area of the western Swiss Alps. Overall, the results indicate that while dispersal limitations can have an important impact on the outcome of future projections of species distributions under climate change scenarios, estimating species threat levels (e.g. species extinction rates) for a mountainous areas of limited size (i.e. regional scale) can also be successfully achieved when considering dispersal as unlimited (i.e. ignoring dispersal limitations, which is easier from a practical point of view). Finally, I present the largest fine scale assessment of potential climate change impacts on mountain vegetation that has been carried-out to date. This assessment involves vegetation from 12 study areas distributed across all major western and central European mountain ranges. The results highlight that some mountain ranges (the Pyrenees and the Austrian Alps) are expected to be more affected by climate change than others (Norway and the Scottish Highlands). The results I obtain in this study also indicate that the threat levels projected by fine scale models are less severe than those derived from coarse scale models. This result suggests that some species could persist in small refugias that are not detected by coarse scale models. Résumé : L'existence d'une relation causale entre la répartition des espèces animales et végétales et leur environnement, en particulier le climat, a été mis en évidence depuis longtemps et est un des principes centraux en biogéographie. Ce lien a naturellement conduit à l'idée d'utiliser les caractéristiques climatiques, topographiques, édaphiques et biotiques de l'environnement afin d'en prédire la qualité pour une espèce ou une communauté. Dans ce travail de thèse, mon objectif est de contribuer au développement d'améliorations méthodologiques dans le domaine de la modélisation de la distribution d'espèces dans le paysage. Plus précisément, les objectifs sont de proposer des solutions afin de surmonter certaines limitations des modèles de distribution d'espèces dans des applications pratiques de biologie de la conservation ou dans leur utilisation pour évaluer l'impact potentiel des changements climatiques sur l'environnement. Le premier objectif majeur de mon travail est de contribuer à démontrer le potentiel des modèles de distribution d'espèces pour des applications pratiques en biologie de la conservation. Je propose une méthode pour générer des pseudo-absences qui permet de surmonter le problème récurent du manque de données d'absences fiables. Je démontre aussi, de manière théorique (par simulation) et pratique (par échantillonnage de terrain), comment les modèles de distribution d'espèces peuvent être utilisés pour modéliser et améliorer l'échantillonnage des espèces rares. Ces résultats démontrent le potentiel des modèles de distribution d'espèces comme outils pour des applications de biologie de la conservation. Le deuxième objectif majeur de ce travail est de contribuer à améliorer les projections d'impacts potentiels des changements climatiques sur la flore, en particulier dans les zones de montagnes. Je développe un modèle dynamique de distribution appelé MigClim qui permet de tenir compte des limitations de dispersion dans les projections futures de distribution potentielle d'espèces, et teste son application sur deux espèces virtuelles. Vu que le fait de prendre en compte les limitations dues à la dispersion demande des données supplémentaires importantes (p.ex. la distance de dispersion des graines), ce travail propose aussi une méthode de classification simplifiée des espèces végétales dans de grands "types de disperseurs", ce qui permet ainsi de d'obtenir de bonnes approximations de distances de dispersions pour un grand nombre d'espèces. Finalement, j'applique aussi le modèle MIGCLIM à un grand nombre d'espèces de plantes dans une zone d'études des pré-Alpes vaudoises. Les résultats montrent que les limitations de dispersion peuvent avoir un impact considérable sur la distribution potentielle d'espèces prédites sous des scénarios de changements climatiques. Cependant, quand les modèles sont utilisés pour évaluer les taux d'extinction d'espèces dans des zones de montages de taille limitée (évaluation régionale), il est aussi possible d'obtenir de bonnes approximations en considérant la dispersion des espèces comme illimitée, ce qui est nettement plus simple d'un point dé vue pratique. Pour terminer je présente la plus grande évaluation à fine échelle d'impact potentiel des changements climatiques sur la flore des montagnes conduite à ce jour. Cette évaluation englobe 12 zones d'études réparties sur toutes les chaines de montages principales d'Europe occidentale et centrale. Les résultats montrent que certaines chaines de montagnes (les Pyrénées et les Alpes Autrichiennes) sont projetées comme plus sensibles aux changements climatiques que d'autres (les Alpes Scandinaves et les Highlands d'Ecosse). Les résultats obtenus montrent aussi que les modèles à échelle fine projettent des impacts de changement climatiques (p. ex. taux d'extinction d'espèces) moins sévères que les modèles à échelle large. Cela laisse supposer que les modèles a échelle fine sont capables de modéliser des micro-niches climatiques non-détectées par les modèles à échelle large.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Extensive research exists estimating the effect hazardous alcohol¦use on morbidity and mortality, but little research quantifies the association between¦alcohol consumption and utility scores in patients with alcohol dependence.¦In the context of comparative research, the World Health Organisation (WHO)¦proposed to categorise the risk for alcohol-related acute and chronic harm according¦to patients' average daily alcohol consumption. OBJECTIVES: To estimate utility¦scores associated with each category of the WHO drinking risk-level classification¦in patients with alcohol dependence (AD). METHODS: We used data from¦CONTROL, an observational cohort study including 143 AD patients from the Alcohol¦Treatment Center at Lausanne University Hospital, followed for 12 months.¦Average daily alcohol consumption was assessed monthly using the Timeline Follow-¦back method and patients were categorised according to the WHO drinking¦risk-level classification: abstinent, low, medium, high and very high. Other measures¦as sociodemographic characteristics and utility scores derived from the EuroQoL¦5-Dimensions questionnaire (EQ-5D) were collected every three months.¦Mixed models for repeated measures were used to estimate mean utility scores¦associated with WHO drinking risk-level categories. RESULTS: A total of 143 patients¦were included and the 12-month follow-up permitting the assessment of¦1318 person-months. At baseline the mean age of the patients was 44.6 (SD 11.8)¦and the majority of patients was male (63.6%). Using repeated measures analysis,¦utility scores decreased with increasing drinking levels, ranging from 0.80 in abstinent¦patients to 0.62 in patients with very high risk drinking level (p_0.0001).¦CONCLUSIONS: In this sample of patients with alcohol dependence undergoing¦specialized care, utility scores estimated from the EQ-5D appeared to substantially¦and consistently vary according to patients' WHO drinking level.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Several studies have established Glioblastoma Multiforme (GBM) prognostic and predictive models based on age and Karnofsky Performance Status (KPS), while very few studies evaluated the prognostic and predictive significance of preoperative MR-imaging. However, to date, there is no simple preoperative GBM classification that also correlates with a highly prognostic genomic signature. Thus, we present for the first time a biologically relevant, and clinically applicable tumor Volume, patient Age, and KPS (VAK) GBM classification that can easily and non-invasively be determined upon patient admission. METHODS: We quantitatively analyzed the volumes of 78 GBM patient MRIs present in The Cancer Imaging Archive (TCIA) corresponding to patients in The Cancer Genome Atlas (TCGA) with VAK annotation. The variables were then combined using a simple 3-point scoring system to form the VAK classification. A validation set (N = 64) from both the TCGA and Rembrandt databases was used to confirm the classification. Transcription factor and genomic correlations were performed using the gene pattern suite and Ingenuity Pathway Analysis. RESULTS: VAK-A and VAK-B classes showed significant median survival differences in discovery (P = 0.007) and validation sets (P = 0.008). VAK-A is significantly associated with P53 activation, while VAK-B shows significant P53 inhibition. Furthermore, a molecular gene signature comprised of a total of 25 genes and microRNAs was significantly associated with the classes and predicted survival in an independent validation set (P = 0.001). A favorable MGMT promoter methylation status resulted in a 10.5 months additional survival benefit for VAK-A compared to VAK-B patients. CONCLUSIONS: The non-invasively determined VAK classification with its implication of VAK-specific molecular regulatory networks, can serve as a very robust initial prognostic tool, clinical trial selection criteria, and important step toward the refinement of genomics-based personalized therapy for GBM patients.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

1. Identifying the boundary of a species' niche from observational and environmental data is a common problem in ecology and conservation biology and a variety of techniques have been developed or applied to model niches and predict distributions. Here, we examine the performance of some pattern-recognition methods as ecological niche models (ENMs). Particularly, one-class pattern recognition is a flexible and seldom used methodology for modelling ecological niches and distributions from presence-only data. The development of one-class methods that perform comparably to two-class methods (for presence/absence data) would remove modelling decisions about sampling pseudo-absences or background data points when absence points are unavailable. 2. We studied nine methods for one-class classification and seven methods for two-class classification (five common to both), all primarily used in pattern recognition and therefore not common in species distribution and ecological niche modelling, across a set of 106 mountain plant species for which presence-absence data was available. We assessed accuracy using standard metrics and compared trade-offs in omission and commission errors between classification groups as well as effects of prevalence and spatial autocorrelation on accuracy. 3. One-class models fit to presence-only data were comparable to two-class models fit to presence-absence data when performance was evaluated with a measure weighting omission and commission errors equally. One-class models were superior for reducing omission errors (i.e. yielding higher sensitivity), and two-classes models were superior for reducing commission errors (i.e. yielding higher specificity). For these methods, spatial autocorrelation was only influential when prevalence was low. 4. These results differ from previous efforts to evaluate alternative modelling approaches to build ENM and are particularly noteworthy because data are from exhaustively sampled populations minimizing false absence records. Accurate, transferable models of species' ecological niches and distributions are needed to advance ecological research and are crucial for effective environmental planning and conservation; the pattern-recognition approaches studied here show good potential for future modelling studies. This study also provides an introduction to promising methods for ecological modelling inherited from the pattern-recognition discipline.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper deals with the development and application of the generic methodology for automatic processing (mapping and classification) of environmental data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve the problem of spatial data mapping (regression). The Probabilistic Neural Network (PNN) is considered as an automatic tool for spatial classifications. The automatic tuning of isotropic and anisotropic GRNN/PNN models using cross-validation procedure is presented. Results are compared with the k-Nearest-Neighbours (k-NN) interpolation algorithm using independent validation data set. Real case studies are based on decision-oriented mapping and classification of radioactively contaminated territories.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Résumé Suite aux recentes avancées technologiques, les archives d'images digitales ont connu une croissance qualitative et quantitative sans précédent. Malgré les énormes possibilités qu'elles offrent, ces avancées posent de nouvelles questions quant au traitement des masses de données saisies. Cette question est à la base de cette Thèse: les problèmes de traitement d'information digitale à très haute résolution spatiale et/ou spectrale y sont considérés en recourant à des approches d'apprentissage statistique, les méthodes à noyau. Cette Thèse étudie des problèmes de classification d'images, c'est à dire de catégorisation de pixels en un nombre réduit de classes refletant les propriétés spectrales et contextuelles des objets qu'elles représentent. L'accent est mis sur l'efficience des algorithmes, ainsi que sur leur simplicité, de manière à augmenter leur potentiel d'implementation pour les utilisateurs. De plus, le défi de cette Thèse est de rester proche des problèmes concrets des utilisateurs d'images satellite sans pour autant perdre de vue l'intéret des méthodes proposées pour le milieu du machine learning dont elles sont issues. En ce sens, ce travail joue la carte de la transdisciplinarité en maintenant un lien fort entre les deux sciences dans tous les développements proposés. Quatre modèles sont proposés: le premier répond au problème de la haute dimensionalité et de la redondance des données par un modèle optimisant les performances en classification en s'adaptant aux particularités de l'image. Ceci est rendu possible par un système de ranking des variables (les bandes) qui est optimisé en même temps que le modèle de base: ce faisant, seules les variables importantes pour résoudre le problème sont utilisées par le classifieur. Le manque d'information étiquétée et l'incertitude quant à sa pertinence pour le problème sont à la source des deux modèles suivants, basés respectivement sur l'apprentissage actif et les méthodes semi-supervisées: le premier permet d'améliorer la qualité d'un ensemble d'entraînement par interaction directe entre l'utilisateur et la machine, alors que le deuxième utilise les pixels non étiquetés pour améliorer la description des données disponibles et la robustesse du modèle. Enfin, le dernier modèle proposé considère la question plus théorique de la structure entre les outputs: l'intègration de cette source d'information, jusqu'à présent jamais considérée en télédétection, ouvre des nouveaux défis de recherche. Advanced kernel methods for remote sensing image classification Devis Tuia Institut de Géomatique et d'Analyse du Risque September 2009 Abstract The technical developments in recent years have brought the quantity and quality of digital information to an unprecedented level, as enormous archives of satellite images are available to the users. However, even if these advances open more and more possibilities in the use of digital imagery, they also rise several problems of storage and treatment. The latter is considered in this Thesis: the processing of very high spatial and spectral resolution images is treated with approaches based on data-driven algorithms relying on kernel methods. In particular, the problem of image classification, i.e. the categorization of the image's pixels into a reduced number of classes reflecting spectral and contextual properties, is studied through the different models presented. The accent is put on algorithmic efficiency and the simplicity of the approaches proposed, to avoid too complex models that would not be used by users. The major challenge of the Thesis is to remain close to concrete remote sensing problems, without losing the methodological interest from the machine learning viewpoint: in this sense, this work aims at building a bridge between the machine learning and remote sensing communities and all the models proposed have been developed keeping in mind the need for such a synergy. Four models are proposed: first, an adaptive model learning the relevant image features has been proposed to solve the problem of high dimensionality and collinearity of the image features. This model provides automatically an accurate classifier and a ranking of the relevance of the single features. The scarcity and unreliability of labeled. information were the common root of the second and third models proposed: when confronted to such problems, the user can either construct the labeled set iteratively by direct interaction with the machine or use the unlabeled data to increase robustness and quality of the description of data. Both solutions have been explored resulting into two methodological contributions, based respectively on active learning and semisupervised learning. Finally, the more theoretical issue of structured outputs has been considered in the last model, which, by integrating outputs similarity into a model, opens new challenges and opportunities for remote sensing image processing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The pace of on-going climate change calls for reliable plant biodiversity scenarios. Traditional dynamic vegetation models use plant functional types that are summarized to such an extent that they become meaningless for biodiversity scenarios. Hybrid dynamic vegetation models of intermediate complexity (hybrid-DVMs) have recently been developed to address this issue. These models, at the crossroads between phenomenological and process-based models, are able to involve an intermediate number of well-chosen plant functional groups (PFGs). The challenge is to build meaningful PFGs that are representative of plant biodiversity, and consistent with the parameters and processes of hybrid-DVMs. Here, we propose and test a framework based on few selected traits to define a limited number of PFGs, which are both representative of the diversity (functional and taxonomic) of the flora in the Ecrins National Park, and adapted to hybrid-DVMs. This new classification scheme, together with recent advances in vegetation modeling, constitutes a step forward for mechanistic biodiversity modeling.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Radioactive soil-contamination mapping and risk assessment is a vital issue for decision makers. Traditional approaches for mapping the spatial concentration of radionuclides employ various regression-based models, which usually provide a single-value prediction realization accompanied (in some cases) by estimation error. Such approaches do not provide the capability for rigorous uncertainty quantification or probabilistic mapping. Machine learning is a recent and fast-developing approach based on learning patterns and information from data. Artificial neural networks for prediction mapping have been especially powerful in combination with spatial statistics. A data-driven approach provides the opportunity to integrate additional relevant information about spatial phenomena into a prediction model for more accurate spatial estimates and associated uncertainty. Machine-learning algorithms can also be used for a wider spectrum of problems than before: classification, probability density estimation, and so forth. Stochastic simulations are used to model spatial variability and uncertainty. Unlike regression models, they provide multiple realizations of a particular spatial pattern that allow uncertainty and risk quantification. This paper reviews the most recent methods of spatial data analysis, prediction, and risk mapping, based on machine learning and stochastic simulations in comparison with more traditional regression models. The radioactive fallout from the Chernobyl Nuclear Power Plant accident is used to illustrate the application of the models for prediction and classification problems. This fallout is a unique case study that provides the challenging task of analyzing huge amounts of data ('hard' direct measurements, as well as supplementary information and expert estimates) and solving particular decision-oriented problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A recent study of a pair of sympatric species of cichlids in Lake Apoyo in Nicaragua is viewed as providing probably one of the most convincing examples of sympatric speciation to date. Here, we describe and study a stochastic, individual-based, explicit genetic model tailored for this cichlid system. Our results show that relatively rapid (<20,000 generations) colonization of a new ecological niche and (sympatric or parapatric) speciation via local adaptation and divergence in habitat and mating preferences are theoretically plausible if: (i) the number of loci underlying the traits controlling local adaptation, and habitat and mating preferences is small; (ii) the strength of selection for local adaptation is intermediate; (iii) the carrying capacity of the population is intermediate; and (iv) the effects of the loci influencing nonrandom mating are strong. We discuss patterns and timescales of ecological speciation identified by our model, and we highlight important parameters and features that need to be studied empirically to provide information that can be used to improve the biological realism and power of mathematical models of ecological speciation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tire traces can be observed on several crime scenes as vehicles are often used by criminals. The tread abrasion on the road, while braking or skidding, leads to the production of small rubber particles which can be collected for comparison purposes. This research focused on the statistical comparison of Py-GC/MS profiles of tire traces and tire treads. The optimisation of the analytical method was carried out using experimental designs. The aim was to determine the best pyrolysis parameters regarding the repeatability of the results. Thus, the pyrolysis factor effect could also be calculated. The pyrolysis temperature was found to be five time more important than time. Finally, a pyrolysis at 650 °C during 15 s was selected. Ten tires of different manufacturers and models were used for this study. Several samples were collected on each tire, and several replicates were carried out to study the variability within each tire (intravariability). More than eighty compounds were integrated for each analysis and the variability study showed that more than 75% presented a relative standard deviation (RSD) below 5% for the ten tires, thus supporting a low intravariability. The variability between the ten tires (intervariability) presented higher values and the ten most variant compounds had a RSD value above 13%, supporting their high potential of discrimination between the tires tested. Principal Component Analysis (PCA) was able to fully discriminate the ten tires with the help of the first three principal components. The ten tires were finally used to perform braking tests on a racetrack with a vehicle equipped with an anti-lock braking system. The resulting tire traces were adequately collected using sheets of white gelatine. As for tires, the intravariability for the traces was found to be lower than the intervariability. Clustering methods were carried out and the Ward's method based on the squared Euclidean distance was able to correctly group all of the tire traces replicates in the same cluster than the replicates of their corresponding tire. Blind tests on traces were performed and were correctly assigned to their tire source. These results support the hypothesis that the tested tires, of different manufacturers and models, can be discriminated by a statistical comparison of their chemical profiles. The traces were found to be not differentiable from their source but differentiable from all the other tires present in the subset. The results are promising and will be extended on a larger sample set.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a validation study on statistical nonsupervised brain tissue classification techniques in magnetic resonance (MR) images. Several image models assuming different hypotheses regarding the intensity distribution model, the spatial model and the number of classes are assessed. The methods are tested on simulated data for which the classification ground truth is known. Different noise and intensity nonuniformities are added to simulate real imaging conditions. No enhancement of the image quality is considered either before or during the classification process. This way, the accuracy of the methods and their robustness against image artifacts are tested. Classification is also performed on real data where a quantitative validation compares the methods' results with an estimated ground truth from manual segmentations by experts. Validity of the various classification methods in the labeling of the image as well as in the tissue volume is estimated with different local and global measures. Results demonstrate that methods relying on both intensity and spatial information are more robust to noise and field inhomogeneities. We also demonstrate that partial volume is not perfectly modeled, even though methods that account for mixture classes outperform methods that only consider pure Gaussian classes. Finally, we show that simulated data results can also be extended to real data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the past few decades, age estimation of living persons has represented a challenging task for many forensic services worldwide. In general, the process for age estimation includes the observation of the degree of maturity reached by some physical attributes, such as dentition or several ossification centers. The estimated chronological age or the probability that an individual belongs to a meaningful class of ages is then obtained from the observed degree of maturity by means of various statistical methods. Among these methods, those developed in a Bayesian framework offer to users the possibility of coherently dealing with the uncertainty associated with age estimation and of assessing in a transparent and logical way the probability that an examined individual is younger or older than a given age threshold. Recently, a Bayesian network for age estimation has been presented in scientific literature; this kind of probabilistic graphical tool may facilitate the use of the probabilistic approach. Probabilities of interest in the network are assigned by means of transition analysis, a statistical parametric model, which links the chronological age and the degree of maturity by means of specific regression models, such as logit or probit models. Since different regression models can be employed in transition analysis, the aim of this paper is to study the influence of the model in the classification of individuals. The analysis was performed using a dataset related to the ossifications status of the medial clavicular epiphysis and results support that the classification of individuals is not dependent on the choice of the regression model.