988 resultados para statistical framework
Resumo:
Coalescent theory represents the most significant progress in theoretical population genetics in the past three decades. The coalescent theory states that all genes or alleles in a given population are ultimately inherited from a single ancestor shared by all members of the population, known as the most recent common ancestor. It is now widely recognized as a cornerstone for rigorous statistical analyses of molecular data from population [1]. The scientists have developed a large number of coalescent models and methods[2,3,4,5,6], which are not only applied in coalescent analysis and process, but also in today’s population genetics and genome studies, even public health. The thesis aims at completing a statistical framework based on computers for coalescent analysis. This framework provides a large number of coalescent models and statistic methods to assist students and researchers in coalescent analysis, whose results are presented in various formats as texts, graphics and printed pages. In particular, it also supports to create new coalescent models and statistical methods. ^
Resumo:
En esta tesis se aborda la detección y el seguimiento automático de vehículos mediante técnicas de visión artificial con una cámara monocular embarcada. Este problema ha suscitado un gran interés por parte de la industria automovilística y de la comunidad científica ya que supone el primer paso en aras de la ayuda a la conducción, la prevención de accidentes y, en última instancia, la conducción automática. A pesar de que se le ha dedicado mucho esfuerzo en los últimos años, de momento no se ha encontrado ninguna solución completamente satisfactoria y por lo tanto continúa siendo un tema de investigación abierto. Los principales problemas que plantean la detección y seguimiento mediante visión artificial son la gran variabilidad entre vehículos, un fondo que cambia dinámicamente debido al movimiento de la cámara, y la necesidad de operar en tiempo real. En este contexto, esta tesis propone un marco unificado para la detección y seguimiento de vehículos que afronta los problemas descritos mediante un enfoque estadístico. El marco se compone de tres grandes bloques, i.e., generación de hipótesis, verificación de hipótesis, y seguimiento de vehículos, que se llevan a cabo de manera secuencial. No obstante, se potencia el intercambio de información entre los diferentes bloques con objeto de obtener el máximo grado posible de adaptación a cambios en el entorno y de reducir el coste computacional. Para abordar la primera tarea de generación de hipótesis, se proponen dos métodos complementarios basados respectivamente en el análisis de la apariencia y la geometría de la escena. Para ello resulta especialmente interesante el uso de un dominio transformado en el que se elimina la perspectiva de la imagen original, puesto que este dominio permite una búsqueda rápida dentro de la imagen y por tanto una generación eficiente de hipótesis de localización de los vehículos. Los candidatos finales se obtienen por medio de un marco colaborativo entre el dominio original y el dominio transformado. Para la verificación de hipótesis se adopta un método de aprendizaje supervisado. Así, se evalúan algunos de los métodos de extracción de características más populares y se proponen nuevos descriptores con arreglo al conocimiento de la apariencia de los vehículos. Para evaluar la efectividad en la tarea de clasificación de estos descriptores, y dado que no existen bases de datos públicas que se adapten al problema descrito, se ha generado una nueva base de datos sobre la que se han realizado pruebas masivas. Finalmente, se presenta una metodología para la fusión de los diferentes clasificadores y se plantea una discusión sobre las combinaciones que ofrecen los mejores resultados. El núcleo del marco propuesto está constituido por un método Bayesiano de seguimiento basado en filtros de partículas. Se plantean contribuciones en los tres elementos fundamentales de estos filtros: el algoritmo de inferencia, el modelo dinámico y el modelo de observación. En concreto, se propone el uso de un método de muestreo basado en MCMC que evita el elevado coste computacional de los filtros de partículas tradicionales y por consiguiente permite que el modelado conjunto de múltiples vehículos sea computacionalmente viable. Por otra parte, el dominio transformado mencionado anteriormente permite la definición de un modelo dinámico de velocidad constante ya que se preserva el movimiento suave de los vehículos en autopistas. Por último, se propone un modelo de observación que integra diferentes características. En particular, además de la apariencia de los vehículos, el modelo tiene en cuenta también toda la información recibida de los bloques de procesamiento previos. El método propuesto se ejecuta en tiempo real en un ordenador de propósito general y da unos resultados sobresalientes en comparación con los métodos tradicionales. ABSTRACT This thesis addresses on-road vehicle detection and tracking with a monocular vision system. This problem has attracted the attention of the automotive industry and the research community as it is the first step for driver assistance and collision avoidance systems and for eventual autonomous driving. Although many effort has been devoted to address it in recent years, no satisfactory solution has yet been devised and thus it is an active research issue. The main challenges for vision-based vehicle detection and tracking are the high variability among vehicles, the dynamically changing background due to camera motion and the real-time processing requirement. In this thesis, a unified approach using statistical methods is presented for vehicle detection and tracking that tackles these issues. The approach is divided into three primary tasks, i.e., vehicle hypothesis generation, hypothesis verification, and vehicle tracking, which are performed sequentially. Nevertheless, the exchange of information between processing blocks is fostered so that the maximum degree of adaptation to changes in the environment can be achieved and the computational cost is alleviated. Two complementary strategies are proposed to address the first task, i.e., hypothesis generation, based respectively on appearance and geometry analysis. To this end, the use of a rectified domain in which the perspective is removed from the original image is especially interesting, as it allows for fast image scanning and coarse hypothesis generation. The final vehicle candidates are produced using a collaborative framework between the original and the rectified domains. A supervised classification strategy is adopted for the verification of the hypothesized vehicle locations. In particular, state-of-the-art methods for feature extraction are evaluated and new descriptors are proposed by exploiting the knowledge on vehicle appearance. Due to the lack of appropriate public databases, a new database is generated and the classification performance of the descriptors is extensively tested on it. Finally, a methodology for the fusion of the different classifiers is presented and the best combinations are discussed. The core of the proposed approach is a Bayesian tracking framework using particle filters. Contributions are made on its three key elements: the inference algorithm, the dynamic model and the observation model. In particular, the use of a Markov chain Monte Carlo method is proposed for sampling, which circumvents the exponential complexity increase of traditional particle filters thus making joint multiple vehicle tracking affordable. On the other hand, the aforementioned rectified domain allows for the definition of a constant-velocity dynamic model since it preserves the smooth motion of vehicles in highways. Finally, a multiple-cue observation model is proposed that not only accounts for vehicle appearance but also integrates the available information from the analysis in the previous blocks. The proposed approach is proven to run near real-time in a general purpose PC and to deliver outstanding results compared to traditional methods.
Resumo:
We present an approach for assessing the significance of sequence and structure comparisons by using nearly identical statistical formalisms for both sequence and structure. Doing so involves an all-vs.-all comparison of protein domains [taken here from the Structural Classification of Proteins (scop) database] and then fitting a simple distribution function to the observed scores. By using this distribution, we can attach a statistical significance to each comparison score in the form of a P value, the probability that a better score would occur by chance. As expected, we find that the scores for sequence matching follow an extreme-value distribution. The agreement, moreover, between the P values that we derive from this distribution and those reported by standard programs (e.g., blast and fasta validates our approach. Structure comparison scores also follow an extreme-value distribution when the statistics are expressed in terms of a structural alignment score (essentially the sum of reciprocated distances between aligned atoms minus gap penalties). We find that the traditional metric of structural similarity, the rms deviation in atom positions after fitting aligned atoms, follows a different distribution of scores and does not perform as well as the structural alignment score. Comparison of the sequence and structure statistics for pairs of proteins known to be related distantly shows that structural comparison is able to detect approximately twice as many distant relationships as sequence comparison at the same error rate. The comparison also indicates that there are very few pairs with significant similarity in terms of sequence but not structure whereas many pairs have significant similarity in terms of structure but not sequence.
Resumo:
Fourier-phase information is important in determining the appearance of natural scenes, but the structure of natural-image phase spectra is highly complex and difficult to relate directly to human perceptual processes. This problem is addressed by extending previous investigations of human visual sensitivity to the randomisation and quantisation of Fourier phase in natural images. The salience of the image changes induced by these physical processes is shown to depend critically on the nature of the original phase spectrum of each image, and the processes of randomisation and quantisation are shown to be perceptually equivalent provided that they shift image phase components by the same average amount. These results are explained by assuming that the visual system is sensitive to those phase-domain image changes which also alter certain global higher-order image statistics. This assumption may be used to place constraints on the likely nature of cortical processing: mechanisms which correlate the outputs of a bank of relative-phase-sensitive units are found to be consistent with the patterns of sensitivity reported here.
Resumo:
Mountain regions worldwide are particularly sensitive to on-going climate change. Specifically in the Alps in Switzerland, the temperature has increased twice as fast than in the rest of the Northern hemisphere. Water temperature closely follows the annual air temperature cycle, severely impacting streams and freshwater ecosystems. In the last 20 years, brown trout (Salmo trutta L) catch has declined by approximately 40-50% in many rivers in Switzerland. Increasing water temperature has been suggested as one of the most likely cause of this decline. Temperature has a direct effect on trout population dynamics through developmental and disease control but can also indirectly impact dynamics via food-web interactions such as resource availability. We developed a spatially explicit modelling framework that allows spatial and temporal projections of trout biomass using the Aare river catchment as a model system, in order to assess the spatial and seasonal patterns of trout biomass variation. Given that biomass has a seasonal variation depending on trout life history stage, we developed seasonal biomass variation models for three periods of the year (Autumn-Winter, Spring and Summer). Because stream water temperature is a critical parameter for brown trout development, we first calibrated a model to predict water temperature as a function of air temperature to be able to further apply climate change scenarios. We then built a model of trout biomass variation by linking water temperature to trout biomass measurements collected by electro-fishing in 21 stations from 2009 to 2011. The different modelling components of our framework had overall a good predictive ability and we could show a seasonal effect of water temperature affecting trout biomass variation. Our statistical framework uses a minimum set of input variables that make it easily transferable to other study areas or fish species but could be improved by including effects of the biotic environment and the evolution of demographical parameters over time. However, our framework still remains informative to spatially highlight where potential changes of water temperature could affect trout biomass. (C) 2015 Elsevier B.V. All rights reserved.-
Resumo:
Les séquences protéiques naturelles sont le résultat net de l’interaction entre les mécanismes de mutation, de sélection naturelle et de dérive stochastique au cours des temps évolutifs. Les modèles probabilistes d’évolution moléculaire qui tiennent compte de ces différents facteurs ont été substantiellement améliorés au cours des dernières années. En particulier, ont été proposés des modèles incorporant explicitement la structure des protéines et les interdépendances entre sites, ainsi que les outils statistiques pour évaluer la performance de ces modèles. Toutefois, en dépit des avancées significatives dans cette direction, seules des représentations très simplifiées de la structure protéique ont été utilisées jusqu’à présent. Dans ce contexte, le sujet général de cette thèse est la modélisation de la structure tridimensionnelle des protéines, en tenant compte des limitations pratiques imposées par l’utilisation de méthodes phylogénétiques très gourmandes en temps de calcul. Dans un premier temps, une méthode statistique générale est présentée, visant à optimiser les paramètres d’un potentiel statistique (qui est une pseudo-énergie mesurant la compatibilité séquence-structure). La forme fonctionnelle du potentiel est par la suite raffinée, en augmentant le niveau de détails dans la description structurale sans alourdir les coûts computationnels. Plusieurs éléments structuraux sont explorés : interactions entre pairs de résidus, accessibilité au solvant, conformation de la chaîne principale et flexibilité. Les potentiels sont ensuite inclus dans un modèle d’évolution et leur performance est évaluée en termes d’ajustement statistique à des données réelles, et contrastée avec des modèles d’évolution standards. Finalement, le nouveau modèle structurellement contraint ainsi obtenu est utilisé pour mieux comprendre les relations entre niveau d’expression des gènes et sélection et conservation de leur séquence protéique.
Resumo:
Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.
Resumo:
Statistical modelling and statistical learning theory are two powerful analytical frameworks for analyzing signals and developing efficient processing and classification algorithms. In this thesis, these frameworks are applied for modelling and processing biomedical signals in two different contexts: ultrasound medical imaging systems and primate neural activity analysis and modelling. In the context of ultrasound medical imaging, two main applications are explored: deconvolution of signals measured from a ultrasonic transducer and automatic image segmentation and classification of prostate ultrasound scans. In the former application a stochastic model of the radio frequency signal measured from a ultrasonic transducer is derived. This model is then employed for developing in a statistical framework a regularized deconvolution procedure, for enhancing signal resolution. In the latter application, different statistical models are used to characterize images of prostate tissues, extracting different features. These features are then uses to segment the images in region of interests by means of an automatic procedure based on a statistical model of the extracted features. Finally, machine learning techniques are used for automatic classification of the different region of interests. In the context of neural activity signals, an example of bio-inspired dynamical network was developed to help in studies of motor-related processes in the brain of primate monkeys. The presented model aims to mimic the abstract functionality of a cell population in 7a parietal region of primate monkeys, during the execution of learned behavioural tasks.
Resumo:
The ocean and its resources are increasingly seen as indispensable in addressing the multiple challenges the planet is facing in the decades to come. It has never been easy to quantify this particular sector of the economy, in any country, given the lack of a detailed, centralized data base with adequate specifics covering the necessary sectors, this article aims to compare the existing ocean economy statistical systems, especially Asia-Pacific, American and European countries, in order to overcome the deficiencies with regard to the diversity of definitions and statistical representations of ocean sectors, establish the standard statistical system and compile data for the global ocean economy.
Resumo:
Simultaneous acquisition of electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) aims to disentangle the description of brain processes by exploiting the advantages of each technique. Most studies in this field focus on exploring the relationships between fMRI signals and the power spectrum at some specific frequency bands (alpha, beta, etc.). On the other hand, brain mapping of EEG signals (e.g., interictal spikes in epileptic patients) usually assumes an haemodynamic response function for a parametric analysis applying the GLM, as a rough approximation. The integration of the information provided by the high spatial resolution of MR images and the high temporal resolution of EEG may be improved by referencing them by transfer functions, which allows the identification of neural driven areas without strong assumptions about haemodynamic response shapes or brain haemodynamic`s homogeneity. The difference on sampling rate is the first obstacle for a full integration of EEG and fMRI information. Moreover, a parametric specification of a function representing the commonalities of both signals is not established. In this study, we introduce a new data-driven method for estimating the transfer function from EEG signal to fMRI signal at EEG sampling rate. This approach avoids EEG subsampling to fMRI time resolution and naturally provides a test for EEG predictive power over BOLD signal fluctuations, in a well-established statistical framework. We illustrate this concept in resting state (eyes closed) and visual simultaneous fMRI-EEG experiments. The results point out that it is possible to predict the BOLD fluctuations in occipital cortex by using EEG measurements. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
This paper introduces a new unsupervised hyperspectral unmixing method conceived to linear but highly mixed hyperspectral data sets, in which the simplex of minimum volume, usually estimated by the purely geometrically based algorithms, is far way from the true simplex associated with the endmembers. The proposed method, an extension of our previous studies, resorts to the statistical framework. The abundance fraction prior is a mixture of Dirichlet densities, thus automatically enforcing the constraints on the abundance fractions imposed by the acquisition process, namely, nonnegativity and sum-to-one. A cyclic minimization algorithm is developed where the following are observed: 1) The number of Dirichlet modes is inferred based on the minimum description length principle; 2) a generalized expectation maximization algorithm is derived to infer the model parameters; and 3) a sequence of augmented Lagrangian-based optimizations is used to compute the signatures of the endmembers. Experiments on simulated and real data are presented to show the effectiveness of the proposed algorithm in unmixing problems beyond the reach of the geometrically based state-of-the-art competitors.
Resumo:
Neutrality tests in quantitative genetics provide a statistical framework for the detection of selection on polygenic traits in wild populations. However, the existing method based on comparisons of divergence at neutral markers and quantitative traits (Q(st)-F(st)) suffers from several limitations that hinder a clear interpretation of the results with typical empirical designs. In this article, we propose a multivariate extension of this neutrality test based on empirical estimates of the among-populations (D) and within-populations (G) covariance matrices by MANOVA. A simple pattern is expected under neutrality: D = 2F(st)/(1 - F(st))G, so that neutrality implies both proportionality of the two matrices and a specific value of the proportionality coefficient. This pattern is tested using Flury's framework for matrix comparison [common principal-component (CPC) analysis], a well-known tool in G matrix evolution studies. We show the importance of using a Bartlett adjustment of the test for the small sample sizes typically found in empirical studies. We propose a dual test: (i) that the proportionality coefficient is not different from its neutral expectation [2F(st)/(1 - F(st))] and (ii) that the MANOVA estimates of mean square matrices between and among populations are proportional. These two tests combined provide a more stringent test for neutrality than the classic Q(st)-F(st) comparison and avoid several statistical problems. Extensive simulations of realistic empirical designs suggest that these tests correctly detect the expected pattern under neutrality and have enough power to efficiently detect mild to strong selection (homogeneous, heterogeneous, or mixed) when it is occurring on a set of traits. This method also provides a rigorous and quantitative framework for disentangling the effects of different selection regimes and of drift on the evolution of the G matrix. We discuss practical requirements for the proper application of our test in empirical studies and potential extensions.
Resumo:
One of the standard tools used to understand the processes shaping trait evolution along the branches of a phylogenetic tree is the reconstruction of ancestral states (Pagel 1999). The purpose is to estimate the values of the trait of interest for every internal node of a phylogenetic tree based on the trait values of the extant species, a topology and, depending on the method used, branch lengths and a model of trait evolution (Ronquist 2004). This approach has been used in a variety of contexts such as biogeography (e.g., Nepokroeff et al. 2003, Blackburn 2008), ecological niche evolution (e.g., Smith and Beaulieu 2009, Evans et al. 2009) and metabolic pathway evolution (e.g., Gabaldón 2003, Christin et al. 2008). Investigations of the factors affecting the accuracy with which ancestral character states can be reconstructed have focused in particular on the choice of statistical framework (Ekman et al. 2008) and the selection of the best model of evolution (Cunningham et al. 1998, Mooers et al. 1999). However, other potential biases affecting these methods, such as the effect of tree shape (Mooers 2004), taxon sampling (Salisbury and Kim 2001) as well as reconstructing traits involved in species diversification (Goldberg and Igić 2008), have also received specific attention. Most of these studies conclude that ancestral character states reconstruction is still not perfect, and that further developments are necessary to improve its accuracy (e.g., Christin et al. 2010). Here, we examine how different estimations of branch lengths affect the accuracy of ancestral character state reconstruction. In particular, we tested the effect of using time-calibrated versus molecular branch lengths and provide guidelines to select the most appropriate branch lengths to reconstruct the ancestral state of a trait.
Resumo:
1. Aim - Concerns over how global change will influence species distributions, in conjunction with increased emphasis on understanding niche dynamics in evolutionary and community contexts, highlight the growing need for robust methods to quantify niche differences between or within taxa. We propose a statistical framework to describe and compare environmental niches from occurrence and spatial environmental data.¦2. Location - Europe, North America, South America¦3. Methods - The framework applies kernel smoothers to densities of species occurrence in gridded environmental space to calculate metrics of niche overlap and test hypotheses regarding niche conservatism. We use this framework and simulated species with predefined distributions and amounts of niche overlap to evaluate several ordination and species distribution modeling techniques for quantifying niche overlap. We illustrate the approach with data on two well-studied invasive species.¦4. Results - We show that niche overlap can be accurately detected with the framework when variables driving the distributions are known. The method is robust to known and previously undocumented biases related to the dependence of species occurrences on the frequency of environmental conditions that occur across geographic space. The use of a kernel smoother makes the process of moving from geographical space to multivariate environmental space independent of both sampling effort and arbitrary choice of resolution in environmental space. However, the use of ordination and species distribution model techniques for selecting, combining and weighting variables on which niche overlap is calculated provide contrasting results.¦5. Main conclusions - The framework meets the increasing need for robust methods to quantify niche differences. It is appropriate to study niche differences between species, subspecies or intraspecific lineages that differ in their geographical distributions. Alternatively, it can be used to measure the degree to which the environmental niche of a species or intraspecific lineage has changed over time.
Resumo:
It is a well-appreciated fact that in many organisms the process of ageing reacts highly plastically, so that lifespan increases or decreases when the environment changes. The perhaps best-known example of such lifespan plasticity is dietary restriction (DR), a phenomenon whereby reduced food intake without malnutrition extends lifespan (typically at the expense of reduced fecundity) and which has been documented in numerous species, from invertebrates to mammals. For the evolutionary biologist, DR and other cases of lifespan plasticity are examples of a more general phenomenon called phenotypic plasticity, the ability of a single genotype to produce different phenotypes (e.g. lifespan) in response to changes in the environment (e.g. changes in diet). To analyse phenotypic plasticity, evolutionary biologists (and epidemiologists) often use a conceptual and statistical framework based on reaction norms (genotype-specific response curves) and genotype × environment interactions (G × E; differences in the plastic response among genotypes), concepts that biologists who are working on molecular aspects of ageing are usually not familiar with. Here I briefly discuss what has been learned about lifespan plasticity or, more generally, about plasticity of somatic maintenance and survival ability. In particular, I argue that adopting the conceptual framework of reaction norms and G × E interactions, as used by evolutionary biologists, is crucially important for our understanding of the mechanisms underlying DR and other forms of lifespan or survival plasticity.