118 resultados para multilevel statistical modeling
em Université de Lausanne, Switzerland
Resumo:
PURPOSE: Ocular anatomy and radiation-associated toxicities provide unique challenges for external beam radiation therapy. For treatment planning, precise modeling of organs at risk and tumor volume are crucial. Development of a precise eye model and automatic adaptation of this model to patients' anatomy remain problematic because of organ shape variability. This work introduces the application of a 3-dimensional (3D) statistical shape model as a novel method for precise eye modeling for external beam radiation therapy of intraocular tumors. METHODS AND MATERIALS: Manual and automatic segmentations were compared for 17 patients, based on head computed tomography (CT) volume scans. A 3D statistical shape model of the cornea, lens, and sclera as well as of the optic disc position was developed. Furthermore, an active shape model was built to enable automatic fitting of the eye model to CT slice stacks. Cross-validation was performed based on leave-one-out tests for all training shapes by measuring dice coefficients and mean segmentation errors between automatic segmentation and manual segmentation by an expert. RESULTS: Cross-validation revealed a dice similarity of 95% ± 2% for the sclera and cornea and 91% ± 2% for the lens. Overall, mean segmentation error was found to be 0.3 ± 0.1 mm. Average segmentation time was 14 ± 2 s on a standard personal computer. CONCLUSIONS: Our results show that the solution presented outperforms state-of-the-art methods in terms of accuracy, reliability, and robustness. Moreover, the eye model shape as well as its variability is learned from a training set rather than by making shape assumptions (eg, as with the spherical or elliptical model). Therefore, the model appears to be capable of modeling nonspherically and nonelliptically shaped eyes.
Resumo:
Summary: Global warming has led to an average earth surface temperature increase of about 0.7 °C in the 20th century, according to the 2007 IPCC report. In Switzerland, the temperature increase in the same period was even higher: 1.3 °C in the Northern Alps anal 1.7 °C in the Southern Alps. The impacts of this warming on ecosystems aspecially on climatically sensitive systems like the treeline ecotone -are already visible today. Alpine treeline species show increased growth rates, more establishment of young trees in forest gaps is observed in many locations and treelines are migrating upwards. With the forecasted warming, this globally visible phenomenon is expected to continue. This PhD thesis aimed to develop a set of methods and models to investigate current and future climatic treeline positions and treeline shifts in the Swiss Alps in a spatial context. The focus was therefore on: 1) the quantification of current treeline dynamics and its potential causes, 2) the evaluation and improvement of temperaturebased treeline indicators and 3) the spatial analysis and projection of past, current and future climatic treeline positions and their respective elevational shifts. The methods used involved a combination of field temperature measurements, statistical modeling and spatial modeling in a geographical information system. To determine treeline shifts and assign the respective drivers, neighborhood relationships between forest patches were analyzed using moving window algorithms. Time series regression modeling was used in the development of an air-to-soil temperature transfer model to calculate thermal treeline indicators. The indicators were then applied spatially to delineate the climatic treeline, based on interpolated temperature data. Observation of recent forest dynamics in the Swiss treeline ecotone showed that changes were mainly due to forest in-growth, but also partly to upward attitudinal shifts. The recent reduction in agricultural land-use was found to be the dominant driver of these changes. Climate-driven changes were identified only at the uppermost limits of the treeline ecotone. Seasonal mean temperature indicators were found to be the best for predicting climatic treelines. Applying dynamic seasonal delimitations and the air-to-soil temperature transfer model improved the indicators' applicability for spatial modeling. Reproducing the climatic treelines of the past 45 years revealed regionally different attitudinal shifts, the largest being located near the highest mountain mass. Modeling climatic treelines based on two IPCC climate warming scenarios predicted major shifts in treeline altitude. However, the currently-observed treeline is not expected to reach this limit easily, due to lagged reaction, possible climate feedback effects and other limiting factors. Résumé: Selon le rapport 2007 de l'IPCC, le réchauffement global a induit une augmentation de la température terrestre de 0.7 °C en moyenne au cours du 20e siècle. En Suisse, l'augmentation durant la même période a été plus importante: 1.3 °C dans les Alpes du nord et 1.7 °C dans les Alpes du sud. Les impacts de ce réchauffement sur les écosystèmes - en particuliers les systèmes sensibles comme l'écotone de la limite des arbres - sont déjà visibles aujourd'hui. Les espèces de la limite alpine des forêts ont des taux de croissance plus forts, on observe en de nombreux endroits un accroissement du nombre de jeunes arbres s'établissant dans les trouées et la limite des arbres migre vers le haut. Compte tenu du réchauffement prévu, on s'attend à ce que ce phénomène, visible globalement, persiste. Cette thèse de doctorat visait à développer un jeu de méthodes et de modèles pour étudier dans un contexte spatial la position présente et future de la limite climatique des arbres, ainsi que ses déplacements, au sein des Alpes suisses. L'étude s'est donc focalisée sur: 1) la quantification de la dynamique actuelle de la limite des arbres et ses causes potentielles, 2) l'évaluation et l'amélioration des indicateurs, basés sur la température, pour la limite des arbres et 3) l'analyse spatiale et la projection de la position climatique passée, présente et future de la limite des arbres et des déplacements altitudinaux de cette position. Les méthodes utilisées sont une combinaison de mesures de température sur le terrain, de modélisation statistique et de la modélisation spatiale à l'aide d'un système d'information géographique. Les relations de voisinage entre parcelles de forêt ont été analysées à l'aide d'algorithmes utilisant des fenêtres mobiles, afin de mesurer les déplacements de la limite des arbres et déterminer leurs causes. Un modèle de transfert de température air-sol, basé sur les modèles de régression sur séries temporelles, a été développé pour calculer des indicateurs thermiques de la limite des arbres. Les indicateurs ont ensuite été appliqués spatialement pour délimiter la limite climatique des arbres, sur la base de données de températures interpolées. L'observation de la dynamique forestière récente dans l'écotone de la limite des arbres en Suisse a montré que les changements étaient principalement dus à la fermeture des trouées, mais aussi en partie à des déplacements vers des altitudes plus élevées. Il a été montré que la récente déprise agricole était la cause principale de ces changements. Des changements dus au climat n'ont été identifiés qu'aux limites supérieures de l'écotone de la limite des arbres. Les indicateurs de température moyenne saisonnière se sont avérés le mieux convenir pour prédire la limite climatique des arbres. L'application de limites dynamiques saisonnières et du modèle de transfert de température air-sol a amélioré l'applicabilité des indicateurs pour la modélisation spatiale. La reproduction des limites climatiques des arbres durant ces 45 dernières années a mis en évidence des changements d'altitude différents selon les régions, les plus importants étant situés près du plus haut massif montagneux. La modélisation des limites climatiques des arbres d'après deux scénarios de réchauffement climatique de l'IPCC a prédit des changements majeurs de l'altitude de la limite des arbres. Toutefois, l'on ne s'attend pas à ce que la limite des arbres actuellement observée atteigne cette limite facilement, en raison du délai de réaction, d'effets rétroactifs du climat et d'autres facteurs limitants.
Resumo:
The potential ecological impact of ongoing climate change has been much discussed. High mountain ecosystems were identified early on as potentially very sensitive areas. Scenarios of upward species movement and vegetation shift are commonly discussed in the literature. Mountains being characteristically conic in shape, impact scenarios usually assume that a smaller surface area will be available as species move up. However, as the frequency distribution of additional physiographic factors (e.g., slope angle) changes with increasing elevation (e.g., with few gentle slopes available at higher elevation), species migrating upslope may encounter increasingly unsuitable conditions. As a result, many species could suffer severe reduction of their habitat surface, which could in turn affect patterns of biodiversity. In this paper, results from static plant distribution modeling are used to derive climate change impact scenarios in a high mountain environment. Models are adjusted with presence/absence of species. Environmental predictors used are: annual mean air temperature, slope, indices of topographic position, geology, rock cover, modeled permafrost and several indices of solar radiation and snow cover duration. Potential Habitat Distribution maps were drawn for 62 higher plant species, from which three separate climate change impact scenarios were derived. These scenarios show a great range of response, depending on the species and the degree of warming. Alpine species would be at greatest risk of local extinction, whereas species with a large elevation range would run the lowest risk. Limitations of the models and scenarios are further discussed.
Resumo:
Gliomas are routinely graded according to histopathological criteria established by the World Health Organization. Although this classification can be used to understand some of the variance in the clinical outcome of patients, there is still substantial heterogeneity within and between lesions of the same grade. This study evaluated image-guided tissue samples acquired from a large cohort of patients presenting with either new or recurrent gliomas of grades II-IV using ex vivo proton high-resolution magic angle spinning spectroscopy. The quantification of metabolite levels revealed several discrete profiles associated with primary glioma subtypes, as well as secondary subtypes that had undergone transformation to a higher grade at the time of recurrence. Statistical modeling further demonstrated that these metabolomic profiles could be differentially classified with respect to pathological grading and inter-grade conversions. Importantly, the myo-inositol to total choline index allowed for a separation of recurrent low-grade gliomas on different pathological trajectories, the heightened ratio of phosphocholine to glycerophosphocholine uniformly characterized several forms of glioblastoma multiforme, and the onco-metabolite D-2-hydroxyglutarate was shown to help distinguish secondary from primary grade IV glioma, as well as grade II and III from grade IV glioma. These data provide evidence that metabolite levels are of interest in the assessment of both intra-grade and intra-lesional malignancy. Such information could be used to enhance the diagnostic specificity of in vivo spectroscopy and to aid in the selection of the most appropriate therapy for individual patients.
Resumo:
BACKGROUND: We sought to improve upon previously published statistical modeling strategies for binary classification of dyslipidemia for general population screening purposes based on the waist-to-hip circumference ratio and body mass index anthropometric measurements. METHODS: Study subjects were participants in WHO-MONICA population-based surveys conducted in two Swiss regions. Outcome variables were based on the total serum cholesterol to high density lipoprotein cholesterol ratio. The other potential predictor variables were gender, age, current cigarette smoking, and hypertension. The models investigated were: (i) linear regression; (ii) logistic classification; (iii) regression trees; (iv) classification trees (iii and iv are collectively known as "CART"). Binary classification performance of the region-specific models was externally validated by classifying the subjects from the other region. RESULTS: Waist-to-hip circumference ratio and body mass index remained modest predictors of dyslipidemia. Correct classification rates for all models were 60-80%, with marked gender differences. Gender-specific models provided only small gains in classification. The external validations provided assurance about the stability of the models. CONCLUSIONS: There were no striking differences between either the algebraic (i, ii) vs. non-algebraic (iii, iv), or the regression (i, iii) vs. classification (ii, iv) modeling approaches. Anticipated advantages of the CART vs. simple additive linear and logistic models were less than expected in this particular application with a relatively small set of predictor variables. CART models may be more useful when considering main effects and interactions between larger sets of predictor variables.
Resumo:
An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001.We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling.
Resumo:
With the advancement of high-throughput sequencing and dramatic increase of available genetic data, statistical modeling has become an essential part in the field of molecular evolution. Statistical modeling results in many interesting discoveries in the field, from detection of highly conserved or diverse regions in a genome to phylogenetic inference of species evolutionary history Among different types of genome sequences, protein coding regions are particularly interesting due to their impact on proteins. The building blocks of proteins, i.e. amino acids, are coded by triples of nucleotides, known as codons. Accordingly, studying the evolution of codons leads to fundamental understanding of how proteins function and evolve. The current codon models can be classified into three principal groups: mechanistic codon models, empirical codon models and hybrid ones. The mechanistic models grasp particular attention due to clarity of their underlying biological assumptions and parameters. However, they suffer from simplified assumptions that are required to overcome the burden of computational complexity. The main assumptions applied to the current mechanistic codon models are (a) double and triple substitutions of nucleotides within codons are negligible, (b) there is no mutation variation among nucleotides of a single codon and (c) assuming HKY nucleotide model is sufficient to capture essence of transition- transversion rates at nucleotide level. In this thesis, I develop a framework of mechanistic codon models, named KCM-based model family framework, based on holding or relaxing the mentioned assumptions. Accordingly, eight different models are proposed from eight combinations of holding or relaxing the assumptions from the simplest one that holds all the assumptions to the most general one that relaxes all of them. The models derived from the proposed framework allow me to investigate the biological plausibility of the three simplified assumptions on real data sets as well as finding the best model that is aligned with the underlying characteristics of the data sets. -- Avec l'avancement de séquençage à haut débit et l'augmentation dramatique des données géné¬tiques disponibles, la modélisation statistique est devenue un élément essentiel dans le domaine dé l'évolution moléculaire. Les résultats de la modélisation statistique dans de nombreuses découvertes intéressantes dans le domaine de la détection, de régions hautement conservées ou diverses dans un génome de l'inférence phylogénétique des espèces histoire évolutive. Parmi les différents types de séquences du génome, les régions codantes de protéines sont particulièrement intéressants en raison de leur impact sur les protéines. Les blocs de construction des protéines, à savoir les acides aminés, sont codés par des triplets de nucléotides, appelés codons. Par conséquent, l'étude de l'évolution des codons mène à la compréhension fondamentale de la façon dont les protéines fonctionnent et évoluent. Les modèles de codons actuels peuvent être classés en trois groupes principaux : les modèles de codons mécanistes, les modèles de codons empiriques et les hybrides. Les modèles mécanistes saisir une attention particulière en raison de la clarté de leurs hypothèses et les paramètres biologiques sous-jacents. Cependant, ils souffrent d'hypothèses simplificatrices qui permettent de surmonter le fardeau de la complexité des calculs. Les principales hypothèses retenues pour les modèles actuels de codons mécanistes sont : a) substitutions doubles et triples de nucleotides dans les codons sont négligeables, b) il n'y a pas de variation de la mutation chez les nucléotides d'un codon unique, et c) en supposant modèle nucléotidique HKY est suffisant pour capturer l'essence de taux de transition transversion au niveau nucléotidique. Dans cette thèse, je poursuis deux objectifs principaux. Le premier objectif est de développer un cadre de modèles de codons mécanistes, nommé cadre KCM-based model family, sur la base de la détention ou de l'assouplissement des hypothèses mentionnées. En conséquence, huit modèles différents sont proposés à partir de huit combinaisons de la détention ou l'assouplissement des hypothèses de la plus simple qui détient toutes les hypothèses à la plus générale qui détend tous. Les modèles dérivés du cadre proposé nous permettent d'enquêter sur la plausibilité biologique des trois hypothèses simplificatrices sur des données réelles ainsi que de trouver le meilleur modèle qui est aligné avec les caractéristiques sous-jacentes des jeux de données. Nos expériences montrent que, dans aucun des jeux de données réelles, tenant les trois hypothèses mentionnées est réaliste. Cela signifie en utilisant des modèles simples qui détiennent ces hypothèses peuvent être trompeuses et les résultats de l'estimation inexacte des paramètres. Le deuxième objectif est de développer un modèle mécaniste de codon généralisée qui détend les trois hypothèses simplificatrices, tandis que d'informatique efficace, en utilisant une opération de matrice appelée produit de Kronecker. Nos expériences montrent que sur un jeux de données choisis au hasard, le modèle proposé de codon mécaniste généralisée surpasse autre modèle de codon par rapport à AICc métrique dans environ la moitié des ensembles de données. En outre, je montre à travers plusieurs expériences que le modèle général proposé est biologiquement plausible.
Resumo:
This paper presents a statistical model for the quantification of the weight of fingerprint evidence. Contrarily to previous models (generative and score-based models), our model proposes to estimate the probability distributions of spatial relationships, directions and types of minutiae observed on fingerprints for any given fingermark. Our model is relying on an AFIS algorithm provided by 3M Cogent and on a dataset of more than 4,000,000 fingerprints to represent a sample from a relevant population of potential sources. The performance of our model was tested using several hundreds of minutiae configurations observed on a set of 565 fingermarks. In particular, the effects of various sub-populations of fingers (i.e., finger number, finger general pattern) on the expected evidential value of our test configurations were investigated. The performance of our model indicates that the spatial relationship between minutiae carries more evidential weight than their type or direction. Our results also indicate that the AFIS component of our model directly enables us to assign weight to fingerprint evidence without the need for the additional layer of complex statistical modeling involved by the estimation of the probability distributions of fingerprint features. In fact, it seems that the AFIS component is more sensitive to the sub-population effects than the other components of the model. Overall, the data generated during this research project contributes to support the idea that fingerprint evidence is a valuable forensic tool for the identification of individuals.
Resumo:
Among the largest resources for biological sequence data is the large amount of expressed sequence tags (ESTs) available in public and proprietary databases. ESTs provide information on transcripts but for technical reasons they often contain sequencing errors. Therefore, when analyzing EST sequences computationally, such errors must be taken into account. Earlier attempts to model error prone coding regions have shown good performance in detecting and predicting these while correcting sequencing errors using codon usage frequencies. In the research presented here, we improve the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. We show that our method maintains the performance in detecting coding sequences.
Resumo:
1. Species distribution modelling is used increasingly in both applied and theoretical research to predict how species are distributed and to understand attributes of species' environmental requirements. In species distribution modelling, various statistical methods are used that combine species occurrence data with environmental spatial data layers to predict the suitability of any site for that species. While the number of data sharing initiatives involving species' occurrences in the scientific community has increased dramatically over the past few years, various data quality and methodological concerns related to using these data for species distribution modelling have not been addressed adequately. 2. We evaluated how uncertainty in georeferences and associated locational error in occurrences influence species distribution modelling using two treatments: (1) a control treatment where models were calibrated with original, accurate data and (2) an error treatment where data were first degraded spatially to simulate locational error. To incorporate error into the coordinates, we moved each coordinate with a random number drawn from the normal distribution with a mean of zero and a standard deviation of 5 km. We evaluated the influence of error on the performance of 10 commonly used distributional modelling techniques applied to 40 species in four distinct geographical regions. 3. Locational error in occurrences reduced model performance in three of these regions; relatively accurate predictions of species distributions were possible for most species, even with degraded occurrences. Two species distribution modelling techniques, boosted regression trees and maximum entropy, were the best performing models in the face of locational errors. The results obtained with boosted regression trees were only slightly degraded by errors in location, and the results obtained with the maximum entropy approach were not affected by such errors. 4. Synthesis and applications. To use the vast array of occurrence data that exists currently for research and management relating to the geographical ranges of species, modellers need to know the influence of locational error on model quality and whether some modelling techniques are particularly robust to error. We show that certain modelling techniques are particularly robust to a moderate level of locational error and that useful predictions of species distributions can be made even when occurrence data include some error.
Resumo:
The dynamical analysis of large biological regulatory networks requires the development of scalable methods for mathematical modeling. Following the approach initially introduced by Thomas, we formalize the interactions between the components of a network in terms of discrete variables, functions, and parameters. Model simulations result in directed graphs, called state transition graphs. We are particularly interested in reachability properties and asymptotic behaviors, which correspond to terminal strongly connected components (or "attractors") in the state transition graph. A well-known problem is the exponential increase of the size of state transition graphs with the number of network components, in particular when using the biologically realistic asynchronous updating assumption. To address this problem, we have developed several complementary methods enabling the analysis of the behavior of large and complex logical models: (i) the definition of transition priority classes to simplify the dynamics; (ii) a model reduction method preserving essential dynamical properties, (iii) a novel algorithm to compact state transition graphs and directly generate compressed representations, emphasizing relevant transient and asymptotic dynamical properties. The power of an approach combining these different methods is demonstrated by applying them to a recent multilevel logical model for the network controlling CD4+ T helper cell response to antigen presentation and to a dozen cytokines. This model accounts for the differentiation of canonical Th1 and Th2 lymphocytes, as well as of inflammatory Th17 and regulatory T cells, along with many hybrid subtypes. All these methods have been implemented into the software GINsim, which enables the definition, the analysis, and the simulation of logical regulatory graphs.
Multimodel inference and multimodel averaging in empirical modeling of occupational exposure levels.
Resumo:
Empirical modeling of exposure levels has been popular for identifying exposure determinants in occupational hygiene. Traditional data-driven methods used to choose a model on which to base inferences have typically not accounted for the uncertainty linked to the process of selecting the final model. Several new approaches propose making statistical inferences from a set of plausible models rather than from a single model regarded as 'best'. This paper introduces the multimodel averaging approach described in the monograph by Burnham and Anderson. In their approach, a set of plausible models are defined a priori by taking into account the sample size and previous knowledge of variables influent on exposure levels. The Akaike information criterion is then calculated to evaluate the relative support of the data for each model, expressed as Akaike weight, to be interpreted as the probability of the model being the best approximating model given the model set. The model weights can then be used to rank models, quantify the evidence favoring one over another, perform multimodel prediction, estimate the relative influence of the potential predictors and estimate multimodel-averaged effects of determinants. The whole approach is illustrated with the analysis of a data set of 1500 volatile organic compound exposure levels collected by the Institute for work and health (Lausanne, Switzerland) over 20 years, each concentration having been divided by the relevant Swiss occupational exposure limit and log-transformed before analysis. Multimodel inference represents a promising procedure for modeling exposure levels that incorporates the notion that several models can be supported by the data and permits to evaluate to a certain extent model selection uncertainty, which is seldom mentioned in current practice.
Resumo:
Hidden Markov models (HMMs) are probabilistic models that are well adapted to many tasks in bioinformatics, for example, for predicting the occurrence of specific motifs in biological sequences. MAMOT is a command-line program for Unix-like operating systems, including MacOS X, that we developed to allow scientists to apply HMMs more easily in their research. One can define the architecture and initial parameters of the model in a text file and then use MAMOT for parameter optimization on example data, decoding (like predicting motif occurrence in sequences) and the production of stochastic sequences generated according to the probabilistic model. Two examples for which models are provided are coiled-coil domains in protein sequences and protein binding sites in DNA. A wealth of useful features include the use of pseudocounts, state tying and fixing of selected parameters in learning, and the inclusion of prior probabilities in decoding. AVAILABILITY: MAMOT is implemented in C++, and is distributed under the GNU General Public Licence (GPL). The software, documentation, and example model files can be found at http://bcf.isb-sib.ch/mamot