929 resultados para PREDICTIVE PERFORMANCE
Resumo:
An abstract of a thesis devoted to using helix-coil models to study unfolded states.\\
Research on polypeptide unfolded states has received much more attention in the last decade or so than it has in the past. Unfolded states are thought to be implicated in various
misfolding diseases and likely play crucial roles in protein folding equilibria and folding rates. Structural characterization of unfolded states has proven to be
much more difficult than the now well established practice of determining the structures of folded proteins. This is largely because many core assumptions underlying
folded structure determination methods are invalid for unfolded states. This has led to a dearth of knowledge concerning the nature of unfolded state conformational
distributions. While many aspects of unfolded state structure are not well known, there does exist a significant body of work stretching back half a century that
has been focused on structural characterization of marginally stable polypeptide systems. This body of work represents an extensive collection of experimental
data and biophysical models associated with describing helix-coil equilibria in polypeptide systems. Much of the work on unfolded states in the last decade has not been devoted
specifically to the improvement of our understanding of helix-coil equilibria, which arguably is the most well characterized of the various conformational equilibria
that likely contribute to unfolded state conformational distributions. This thesis seeks to provide a deeper investigation of helix-coil equilibria using modern
statistical data analysis and biophysical modeling techniques. The studies contained within seek to provide deeper insights and new perspectives on what we presumably
know very well about protein unfolded states. \\
Chapter 1 gives an overview of recent and historical work on studying protein unfolded states. The study of helix-coil equilibria is placed in the context
of the general field of unfolded state research and the basics of helix-coil models are introduced.\\
Chapter 2 introduces the newest incarnation of a sophisticated helix-coil model. State of the art modern statistical techniques are employed to estimate the energies
of various physical interactions that serve to influence helix-coil equilibria. A new Bayesian model selection approach is utilized to test many long-standing
hypotheses concerning the physical nature of the helix-coil transition. Some assumptions made in previous models are shown to be invalid and the new model
exhibits greatly improved predictive performance relative to its predecessor. \\
Chapter 3 introduces a new statistical model that can be used to interpret amide exchange measurements. As amide exchange can serve as a probe for residue-specific
properties of helix-coil ensembles, the new model provides a novel and robust method to use these types of measurements to characterize helix-coil ensembles experimentally
and test the position-specific predictions of helix-coil models. The statistical model is shown to perform exceedingly better than the most commonly used
method for interpreting amide exchange data. The estimates of the model obtained from amide exchange measurements on an example helical peptide
also show a remarkable consistency with the predictions of the helix-coil model. \\
Chapter 4 involves a study of helix-coil ensembles through the enumeration of helix-coil configurations. Aside from providing new insights into helix-coil ensembles,
this chapter also introduces a new method by which helix-coil models can be extended to calculate new types of observables. Future work on this approach could potentially
allow helix-coil models to move into use domains that were previously inaccessible and reserved for other types of unfolded state models that were introduced in chapter 1.
Resumo:
The distribution, abundance, behaviour, and morphology of marine species is affected by spatial variability in the wave environment. Maps of wave metrics (e.g. significant wave height Hs, peak energy wave period Tp, and benthic wave orbital velocity URMS) are therefore useful for predictive ecological models of marine species and ecosystems. A number of techniques are available to generate maps of wave metrics, with varying levels of complexity in terms of input data requirements, operator knowledge, and computation time. Relatively simple "fetch-based" models are generated using geographic information system (GIS) layers of bathymetry and dominant wind speed and direction. More complex, but computationally expensive, "process-based" models are generated using numerical models such as the Simulating Waves Nearshore (SWAN) model. We generated maps of wave metrics based on both fetch-based and process-based models and asked whether predictive performance in models of benthic marine habitats differed. Predictive models of seagrass distribution for Moreton Bay, Southeast Queensland, and Lizard Island, Great Barrier Reef, Australia, were generated using maps based on each type of wave model. For Lizard Island, performance of the process-based wave maps was significantly better for describing the presence of seagrass, based on Hs, Tp, and URMS. Conversely, for the predictive model of seagrass in Moreton Bay, based on benthic light availability and Hs, there was no difference in performance using the maps of the different wave metrics. For predictive models where wave metrics are the dominant factor determining ecological processes it is recommended that process-based models be used. Our results suggest that for models where wave metrics provide secondarily useful information, either fetch- or process-based models may be equally useful.
Quantificação de açúcares com uma língua eletrónica: calibração multivariada com seleção de sensores
Resumo:
Este trabalho incide na análise dos açúcares majoritários nos alimentos (glucose, frutose e sacarose) com uma língua eletrónica potenciométrica através de calibração multivariada com seleção de sensores. A análise destes compostos permite contribuir para a avaliação do impacto dos açúcares na saúde e seu efeito fisiológico, além de permitir relacionar atributos sensoriais e atuar no controlo de qualidade e autenticidade dos alimentos. Embora existam diversas metodologias analíticas usadas rotineiramente na identificação e quantificação dos açúcares nos alimentos, em geral, estes métodos apresentam diversas desvantagens, tais como lentidão das análises, consumo elevado de reagentes químicos e necessidade de pré-tratamentos destrutivos das amostras. Por isso se decidiu aplicar uma língua eletrónica potenciométrica, construída com sensores poliméricos selecionados considerando as sensibilidades aos açucares obtidas em trabalhos anteriores, na análise dos açúcares nos alimentos, visando estabelecer uma metodologia analítica e procedimentos matemáticos para quantificação destes compostos. Para este propósito foram realizadas análises em soluções padrão de misturas ternárias dos açúcares em diferentes níveis de concentração e em soluções de dissoluções de amostras de mel, que foram previamente analisadas em HPLC para se determinar as concentrações de referência dos açúcares. Foi então feita uma análise exploratória dos dados visando-se remover sensores ou observações discordantes através da realização de uma análise de componentes principais. Em seguida, foram construídos modelos de regressão linear múltipla com seleção de variáveis usando o algoritmo stepwise e foi verificado que embora fosse possível estabelecer uma boa relação entre as respostas dos sensores e as concentrações dos açúcares, os modelos não apresentavam desempenho de previsão satisfatório em dados de grupo de teste. Dessa forma, visando contornar este problema, novas abordagens foram testadas através da construção e otimização dos parâmetros de um algoritmo genético para seleção de variáveis que pudesse ser aplicado às diversas ferramentas de regressão, entre elas a regressão pelo método dos mínimos quadrados parciais. Foram obtidos bons resultados de previsão para os modelos obtidos com o método dos mínimos quadrados parciais aliado ao algoritmo genético, tanto para as soluções padrão quanto para as soluções de mel, com R²ajustado acima de 0,99 e RMSE inferior a 0,5 obtidos da relação linear entre os valores previstos e experimentais usando dados dos grupos de teste. O sistema de multi-sensores construído se mostrou uma ferramenta adequada para a análise dos iii açúcares, quando presentes em concentrações maioritárias, e alternativa a métodos instrumentais de referência, como o HPLC, por reduzir o tempo da análise e o valor monetário da análise, bem como, ter um preparo mínimo das amostras e eliminar produtos finais poluentes.
Resumo:
As azeitonas de mesa são consumidas e apreciadas em todo o mundo e, embora a sua classificação comercial não seja legalmente exigida, o Conselho Oleícola Internacional sugere que seja regulamentada com base na avaliação sensorial por um painel de provadores. A implementação de tal requer o cumprimento de diretrizes estabelecidas pelo Conselho Oleícola Internacional, resultando numa tarefa complexa, demorada e cujas avaliações não estão isentas de subjetividade. Neste trabalho, pela primeira vez, uma língua eletrónica foi utilizada com o intuito de classificar azeitonas de mesa em categorias comerciais, estipuladas com base na presença e na mediana das intensidades do defeito organolético predominante percebido pelo painel de provadores. Modelos de discriminação lineares foram estabelecidos com base em subconjuntos de sinais potenciométricos de sensores da língua eletrónica, selecionados recorrendo ao algoritmo de arrefecimento simulado. Os desempenhos qualitativo de previsão dos modelos de classificação estabelecidos foram avaliados recorrendo à técnica de validação cruzada leave-one-out e à técnica de validação cruzada K-folds com repetição, que permite minimizar o risco de sobreajustamento, permitindo obter resultados mais realistas. O potencial desta abordagem qualitativa, baseada nos perfis eletroquímicos gerados pela língua eletrónica, foi satisfatoriamente demonstrado: (i) na classificação correta (sensibilidades ≥ 93%) de soluções padrão (ácido n-butírico, 2-mercaptoetanol e ácido ciclohexanocarboxílico) de acordo com o defeito sensorial que mimetizam (butírico, pútrido ou sapateira); (ii) na classificação correta (sensibilidades ≥ 93%) de amostras de referência de azeitonas e salmouras (presença de um defeito único intenso) de acordo com o tipo de defeito percebido (avinhado-avinagrado, butírico, mofo, pútrido ou sapateira), e selecionadas pelo painel de provadores; e, (iii) na classificação correta (sensibilidade ≥ 86%) de amostras de azeitonas de mesa com grande heterogeneidade, contendo um ou mais defeitos organoléticos percebidos pelo painel de provadores nas azeitona e/ou salmouras, de acordo com a sua categoria comercial (azeitona extra sem defeito, extra, 1ª escolha, 2ª escolha e azeitonas que não podem ser comercializadas como azeitonas de mesa). Por fim, a capacidade língua eletrónica em quantificar as medianas das intensidades dos atributos negativos detetados pelo painel nas azeitonas de mesa foi demonstrada recorrendo a modelos de regressão linear múltipla-algoritmo de arrefecimento simulado, com base em subconjuntos selecionados de sinais gerados pela língua eletrónica durante a análise potenciométrica das azeitonas e salmouras. O xii desempenho de previsão dos modelos quantitativos foi validado recorrendo às mesmas duas técnicas de validação cruzada. Os modelos estabelcidos para cada um dos 5 defeitos sensoriais presentes nas amostras de azeitona de mesa, permitiram quantificar satisfatoriamente as medianas das intensidades dos defeitos (R² ≥ 0,97). Assim, a qualidade satisfatória dos resultados qualitativos e quantitativos alcançados permite antever, pela primeira vez, uma possível aplicação prática das línguas eletrónicas como uma ferramenta de análise sensorial de defeitos em azeitonas de mesa, podendo ser usada como uma técnica rápida, económica e útil na avaliação organolética de atributos negativos, complementar à tradicional análise sensorial por um painel de provadores.
Resumo:
In this work, the relationship between diameter at breast height (d) and total height (h) of individual-tree was modeled with the aim to establish provisory height-diameter (h-d) equations for maritime pine (Pinus pinaster Ait.) stands in the Lomba ZIF, Northeast Portugal. Using data collected locally, several local and generalized h-d equations from the literature were tested and adaptations were also considered. Model fitting was conducted by using usual nonlinear least squares (nls) methods. The best local and generalized models selected, were also tested as mixed models applying a first-order conditional expectation (FOCE) approximation procedure and maximum likelihood methods to estimate fixed and random effects. For the calibration of the mixed models and in order to be consistent with the fitting procedure, the FOCE method was also used to test different sampling designs. The results showed that the local h-d equations with two parameters performed better than the analogous models with three parameters. However a unique set of parameter values for the local model can not be used to all maritime pine stands in Lomba ZIF and thus, a generalized model including covariates from the stand, in addition to d, was necessary to obtain an adequate predictive performance. No evident superiority of the generalized mixed model in comparison to the generalized model with nonlinear least squares parameters estimates was observed. On the other hand, in the case of the local model, the predictive performance greatly improved when random effects were included. The results showed that the mixed model based in the local h-d equation selected is a viable alternative for estimating h if variables from the stand are not available. Moreover, it was observed that it is possible to obtain an adequate calibrated response using only 2 to 5 additional h-d measurements in quantile (or random) trees from the distribution of d in the plot (stand). Balancing sampling effort, accuracy and straightforwardness in practical applications, the generalized model from nls fit is recommended. Examples of applications of the selected generalized equation to the forest management are presented, namely how to use it to complete missing information from forest inventory and also showing how such an equation can be incorporated in a stand-level decision support system that aims to optimize the forest management for the maximization of wood volume production in Lomba ZIF maritime pine stands.
Resumo:
Mechanistic models used for prediction should be parsimonious, as models which are over-parameterised may have poor predictive performance. Determining whether a model is parsimonious requires comparisons with alternative model formulations with differing levels of complexity. However, creating alternative formulations for large mechanistic models is often problematic, and usually time-consuming. Consequently, few are ever investigated. In this paper, we present an approach which rapidly generates reduced model formulations by replacing a model’s variables with constants. These reduced alternatives can be compared to the original model, using data based model selection criteria, to assist in the identification of potentially unnecessary model complexity, and thereby inform reformulation of the model. To illustrate the approach, we present its application to a published radiocaesium plant-uptake model, which predicts uptake on the basis of soil characteristics (e.g. pH, organic matter content, clay content). A total of 1024 reduced model formulations were generated, and ranked according to five model selection criteria: Residual Sum of Squares (RSS), AICc, BIC, MDL and ICOMP. The lowest scores for RSS and AICc occurred for the same reduced model in which pH dependent model components were replaced. The lowest scores for BIC, MDL and ICOMP occurred for a further reduced model in which model components related to the distinction between adsorption on clay and organic surfaces were replaced. Both these reduced models had a lower RSS for the parameterisation dataset than the original model. As a test of their predictive performance, the original model and the two reduced models outlined above were used to predict an independent dataset. The reduced models have lower prediction sums of squares than the original model, suggesting that the latter may be overfitted. The approach presented has the potential to inform model development by rapidly creating a class of alternative model formulations, which can be compared.
Resumo:
A Lontra Euroasiática foi alvo de quatro prospeções na Península Ibérica (1990-2008). Em 2003, foi publicado um modelo de distribuição da lontra, com base nos dados de presença/ausência das prospeções publicadas em 1998. Dadas as suas características, este tipo de modelos pode tornar-se um elemento chave nas estratégias de recuperação da lontra como também, de outras espécies, se comprovada a sua fiabilidade e capacidade de antecipar tendências na distribuição das mesmas. Assim, esta dissertação confrontou as previsões do modelo com os dados de distribuição de 2008, a fim de identificar potências áreas de discordância. Os resultados revelam que, o modelo de distribuição de lontra proposto, apesar de ter por base dados de 1998 e de não considerar explicitamente processos biológicos, conseguiu captar o essencial da relação espécie-ambiente, resultando num bom desempenho preditivo para a distribuição da mesma em Espanha, uma década depois da sua construção; Evolution of otter (Lutra lutra L.) distribution in the Iberian Peninsula: Models at different scales and their projection through space and time Abstract: The Eurasian otter was already surveyed four times in the Iberian Peninsula (1990-2008). In 2003, a distribution model for the otter based on presence/absence data from the survey published in 1998, was published. This type of models has advantages that can make it in a key element for otter conservation strategies and also, for other species, but only, if their reliability and capability to predict species distribution tendencies are validated. The present thesis compares the model predictions with 2008 data, in order to find potential mismatch areas. Results suggest that, although the distribution model for the otter was based on data from 1998 and, doesn’t include explicitly biological mechanisms, it managed to correctly identify the essence of the species-environment relationship, what was translated in a good predictive performance for its actual distribution in Spain, after a decade of its construction.
Resumo:
In quantitative risk analysis, the problem of estimating small threshold exceedance probabilities and extreme quantiles arise ubiquitously in bio-surveillance, economics, natural disaster insurance actuary, quality control schemes, etc. A useful way to make an assessment of extreme events is to estimate the probabilities of exceeding large threshold values and extreme quantiles judged by interested authorities. Such information regarding extremes serves as essential guidance to interested authorities in decision making processes. However, in such a context, data are usually skewed in nature, and the rarity of exceedance of large threshold implies large fluctuations in the distribution's upper tail, precisely where the accuracy is desired mostly. Extreme Value Theory (EVT) is a branch of statistics that characterizes the behavior of upper or lower tails of probability distributions. However, existing methods in EVT for the estimation of small threshold exceedance probabilities and extreme quantiles often lead to poor predictive performance in cases where the underlying sample is not large enough or does not contain values in the distribution's tail. In this dissertation, we shall be concerned with an out of sample semiparametric (SP) method for the estimation of small threshold probabilities and extreme quantiles. The proposed SP method for interval estimation calls for the fusion or integration of a given data sample with external computer generated independent samples. Since more data are used, real as well as artificial, under certain conditions the method produces relatively short yet reliable confidence intervals for small exceedance probabilities and extreme quantiles.
Resumo:
Discovery of microRNAs (miRNAs) relies on predictive models for characteristic features from miRNA precursors (pre-miRNAs). The short length of miRNA genes and the lack of pronounced sequence features complicate this task. To accommodate the peculiarities of plant and animal miRNAs systems, tools for both systems have evolved differently. However, these tools are biased towards the species for which they were primarily developed and, consequently, their predictive performance on data sets from other species of the same kingdom might be lower. While these biases are intrinsic to the species, their characterization can lead to computational approaches capable of diminishing their negative effect on the accuracy of pre-miRNAs predictive models. We investigate in this study how 45 predictive models induced for data sets from 45 species, distributed in eight subphyla/classes, perform when applied to a species different from the species used in its induction. Results: Our computational experiments show that the separability of pre-miRNAs and pseudo pre-miRNAs instances is species-dependent and no feature set performs well for all species, even within the same subphylum/class. Mitigating this species dependency, we show that an ensemble of classifiers reduced the classification errors for all 45 species. As the ensemble members were obtained using meaningful, and yet computationally viable feature sets, the ensembles also have a lower computational cost than individual classifiers that rely on energy stability parameters, which are of prohibitive computational cost in large scale applications. Conclusion: In this study, the combination of multiple pre-miRNAs feature sets and multiple learning biases enhanced the predictive accuracy of pre-miRNAs classifiers of 45 species. This is certainly a promising approach to be incorporated in miRNA discovery tools towards more accurate and less species-dependent tools.
Resumo:
Méthodologie: Simulation; Analyse discriminante linéaire et logistique; Arbres de classification; Réseaux de neurones en base radiale
Resumo:
Species distribution and ecological niche models are increasingly used in biodiversity management and conservation. However, one thing that is important but rarely done is to follow up on the predictive performance of these models over time, to check if their predictions are fulfilled and maintain accuracy, or if they apply only to the set in which they were produced. In 2003, a distribution model of the Eurasian otter (Lutra lutra) in Spain was published, based on the results of a country-wide otter survey published in 1998. This model was built with logistic regression of otter presence-absence in UTM 10 km2 cells on a diverse set of environmental, human and spatial variables, selected according to statistical criteria. Here we evaluate this model against the results of the most recent otter survey, carried out a decade later and after a significant expansion of the otter distribution area in this country. Despite the time elapsed and the evident changes in this species’ distribution, the model maintained a good predictive capacity, considering both discrimination and calibration measures. Otter distribution did not expand randomly or simply towards vicinity areas,m but specifically towards the areas predicted as most favourable by the model based on data from 10 years before. This corroborates the utility of predictive distribution models, at least in the medium term and when they are made with robust methods and relevant predictor variables.
Resumo:
In this article an alternate sensitivity analysis is proposed for train schedules. It characterises the schedules robustness or lack thereof and provides unique profiles of performance for different sources of delay and for different values of delay. An approach like this is necessary because train schedules are only a prediction of what will actually happen. They can perform poorly with respect to a variety of performance metrics, when deviations and other delays occur, if for instance they can even be implemented, and as originally intended. The information provided by this analytical approach is beneficial because it can be used as part of a proactive scheduling approach to alter a schedule in advance or to identify suitable courses of action for specific “bad behaviour”. Furthermore this information may be used to quantify the cost of delay. The effect of sectional running time (SRT) deviations and additional dwell time in particular were quantified for three railway schedule performance measures. The key features of this approach were demonstrated in a case study.