924 resultados para Automatic Analysis of Multivariate Categorical Data Sets
Resumo:
It is thought that speciation in phytophagous insects is often due to colonization of novel host plants, because radiations of plant and insect lineages are typically asynchronous. Recent phylogenetic comparisons have supported this model of diversification for both insect herbivores and specialized pollinators. An exceptional case where contemporaneous plant-insect diversification might be expected is the obligate mutualism between fig trees (Ficus species, Moraceae) and their pollinating wasps (Agaonidae, Hymenoptera). The ubiquity and ecological significance of this mutualism in tropical and subtropical ecosystems has long intrigued biologists, but the systematic challenge posed by >750 interacting species pairs has hindered progress toward understanding its evolutionary history. In particular, taxon sampling and analytical tools have been insufficient for large-scale cophylogenetic analyses. Here, we sampled nearly 200 interacting pairs of fig and wasp species from across the globe. Two supermatrices were assembled: on an average, wasps had sequences from 77% of 6 genes (5.6 kb), figs had sequences from 60% of 5 genes (5.5 kb), and overall 850 new DNA sequences were generated for this study. We also developed a new analytical tool, Jane 2, for event-based phylogenetic reconciliation analysis of very large data sets. Separate Bayesian phylogenetic analyses for figs and fig wasps under relaxed molecular clock assumptions indicate Cretaceous diversification of crown groups and contemporaneous divergence for nearly half of all fig and pollinator lineages. Event-based cophylogenetic analyses further support the codiversification hypothesis. Biogeographic analyses indicate that the present-day distribution of fig and pollinator lineages is consistent with a Eurasian origin and subsequent dispersal, rather than with Gondwanan vicariance. Overall, our findings indicate that the fig-pollinator mutualism represents an extreme case among plant-insect interactions of coordinated dispersal and long-term codiversification.
Resumo:
A computational pipeline combining texture analysis and pattern classification algorithms was developed for investigating associations between high-resolution MRI features and histological data. This methodology was tested in the study of dentate gyrus images of sclerotic hippocampi resected from refractory epilepsy patients. Images were acquired using a simple surface coil in a 3.0T MRI scanner. All specimens were subsequently submitted to histological semiquantitative evaluation. The computational pipeline was applied for classifying pixels according to: a) dentate gyrus histological parameters and b) patients' febrile or afebrile initial precipitating insult history. The pipeline results for febrile and afebrile patients achieved 70% classification accuracy, with 78% sensitivity and 80% specificity [area under the reader observer characteristics (ROC) curve: 0.89]. The analysis of the histological data alone was not sufficient to achieve significant power to separate febrile and afebrile groups. Interesting enough, the results from our approach did not show significant correlation with histological parameters (which per se were not enough to classify patient groups). These results showed the potential of adding computational texture analysis together with classification methods for detecting subtle MRI signal differences, a method sufficient to provide good clinical classification. A wide range of applications of this pipeline can also be used in other areas of medical imaging. Magn Reson Med, 2012. (c) 2012 Wiley Periodicals, Inc.
Resumo:
The cone penetration test (CPT), together with its recent variation (CPTU), has become the most widely used in-situ testing technique for soil profiling and geotechnical characterization. The knowledge gained over the last decades on the interpretation procedures in sands and clays is certainly wide, whilst very few contributions can be found as regards the analysis of CPT(u) data in intermediate soils. Indeed, it is widely accepted that at the standard rate of penetration (v = 20 mm/s), drained penetration occurs in sands while undrained penetration occurs in clays. However, a problem arise when the available interpretation approaches are applied to cone measurements in silts, sandy silts, silty or clayey sands, since such intermediate geomaterials are often characterized by permeability values within the range in which partial drainage is very likely to occur. Hence, the application of the available and well-established interpretation procedures, developed for ‘standard’ clays and sands, may result in invalid estimates of soil parameters. This study aims at providing a better understanding on the interpretation of CPTU data in natural sand and silt mixtures, by taking into account two main aspects, as specified below: 1)Investigating the effect of penetration rate on piezocone measurements, with the aim of identifying drainage conditions when cone penetration is performed at a standard rate. This part of the thesis has been carried out with reference to a specific CPTU database recently collected in a liquefaction-prone area (Emilia-Romagna Region, Italy). 2)Providing a better insight into the interpretation of piezocone tests in the widely studied silty sediments of the Venetian lagoon (Italy). Research has focused on the calibration and verification of some site-specific correlations, with special reference to the estimate of compressibility parameters for the assessment of long-term settlements of the Venetian coastal defences.
Resumo:
Obesity is a multifactorial trait, which comprises an independent risk factor for cardiovascular disease (CVD). The aim of the current work is to study the complex etiology beneath obesity and identify genetic variations and/or factors related to nutrition that contribute to its variability. To this end, a set of more than 2300 white subjects who participated in a nutrigenetics study was used. For each subject a total of 63 factors describing genetic variants related to CVD (24 in total), gender, and nutrition (38 in total), e.g. average daily intake in calories and cholesterol, were measured. Each subject was categorized according to body mass index (BMI) as normal (BMI ≤ 25) or overweight (BMI > 25). Two artificial neural network (ANN) based methods were designed and used towards the analysis of the available data. These corresponded to i) a multi-layer feed-forward ANN combined with a parameter decreasing method (PDM-ANN), and ii) a multi-layer feed-forward ANN trained by a hybrid method (GA-ANN) which combines genetic algorithms and the popular back-propagation training algorithm.
Resumo:
Congenital syndactyly with a variable number of affected feet was observed in eight black and white German Holstein calves. Analysis of the pedigree data revealed that all affected individuals could be traced back to a single founder. The pedigree was consistent with monogenic autosomal recessive inheritance and variable expressivity. Bovine syndactyly or "mulefoot" has been previously shown to map on the telomeric end of bovine chromosome 15 and we performed PCR genotyping of microsatellite markers spanning 27 cM of this chromosomal region to test the new cases for genetic linkage with the phenotype. The haplotype segregation confirmed the suggested inheritance pattern of the mulefoot mutation in this family and markers RM004, BM848 and BMS820 showed significant linkage to the phenotype. The results confirmed the chromosomal location of the mulefoot gene in this pedigree. Furthermore the study demonstrated that although marker testing has been available for nearly a decade the use of mulefoot carriers in cattle breeding remains uncontrolled. The presented family provides a resource for positional cloning of the causative mutation.
Resumo:
The work described in this thesis had two objectives. The first objective was to develop a physically based computational model that could be used to predict the electronic conductivity, Seebeck coefficient, and thermal conductivity of Pb1-xSnxTe alloys over the 400 K to 700 K temperature as a function of Sn content and doping level. The second objective was to determine how the secondary phase inclusions observed in Pb1-xSnxTe alloys made by consolidating mechanically alloyed elemental powders impact the ability of the material to harvest waste heat and generate electricity in the 400 K to 700 K temperature range. The motivation for this work was that though the promise of this alloy as an unusually efficient thermoelectric power generator material in the 400 K to 700 K range had been demonstrated in the literature, methods to reproducibly control and subsequently optimize the materials thermoelectric figure of merit remain elusive. Mechanical alloying, though not typically used to fabricate these alloys, is a potential method for cost-effectively engineering these properties. Given that there are deviations from crystalline perfection in mechanically alloyed material such as secondary phase inclusions, the question arises as to whether these defects are detrimental to thermoelectric function or alternatively, whether they enhance thermoelectric function of the alloy. The hypothesis formed at the onset of this work was that the small secondary phase SnO2 inclusions observed to be present in the mechanically alloyed Pb1-xSnxTe would increase the thermoelectric figure of merit of the material over the temperature range of interest. It was proposed that the increase in the figure of merit would arise because the inclusions in the material would not reduce the electrical conductivity to as great an extent as the thermal conductivity. If this were to be true, then the experimentally measured electronic conductivity in mechanically alloyed Pb1-xSnxTe alloys that have these inclusions would not be less than that expected in alloys without these inclusions while the portion of the thermal conductivity that is not due to charge carriers (the lattice thermal conductivity) would be less than what would be expected from alloys that do not have these inclusions. Furthermore, it would be possible to approximate the observed changes in the electrical and thermal transport properties using existing physical models for the scattering of electrons and phonons by small inclusions. The approach taken to investigate this hypothesis was to first experimentally characterize the mobile carrier concentration at room temperature along with the extent and type of secondary phase inclusions present in a series of three mechanically alloyed Pb1-xSnxTe alloys with different Sn content. Second, the physically based computational model was developed. This model was used to determine what the electronic conductivity, Seebeck coefficient, total thermal conductivity, and the portion of the thermal conductivity not due to mobile charge carriers would be in these particular Pb1-xSnxTe alloys if there were to be no secondary phase inclusions. Third, the electronic conductivity, Seebeck coefficient and total thermal conductivity was experimentally measured for these three alloys with inclusions present at elevated temperatures. The model predictions for electrical conductivity and Seebeck coefficient were directly compared to the experimental elevated temperature electrical transport measurements. The computational model was then used to extract the lattice thermal conductivity from the experimentally measured total thermal conductivity. This lattice thermal conductivity was then compared to what would be expected from the alloys in the absence of secondary phase inclusions. Secondary phase inclusions were determined by X-ray diffraction analysis to be present in all three alloys to a varying extent. The inclusions were found not to significantly degrade electrical conductivity at temperatures above ~ 400 K in these alloys, though they do dramatically impact electronic mobility at room temperature. It is shown that, at temperatures above ~ 400 K, electrons are scattered predominantly by optical and acoustical phonons rather than by an alloy scattering mechanism or the inclusions. The experimental electrical conductivity and Seebeck coefficient data at elevated temperatures were found to be within ~ 10 % of what would be expected for material without inclusions. The inclusions were not found to reduce the lattice thermal conductivity at elevated temperatures. The experimentally measured thermal conductivity data was found to be consistent with the lattice thermal conductivity that would arise due to two scattering processes: Phonon phonon scattering (Umklapp scattering) and the scattering of phonons by the disorder induced by the formation of a PbTe-SnTe solid solution (alloy scattering). As opposed to the case in electrical transport, the alloy scattering mechanism in thermal transport is shown to be a significant contributor to the total thermal resistance. An estimation of the extent to which the mean free time between phonon scattering events would be reduced due to the presence of the inclusions is consistent with the above analysis of the experimental data. The first important result of this work was the development of an experimentally validated, physically based computational model that can be used to predict the electronic conductivity, Seebeck coefficient, and thermal conductivity of Pb1-xSnxTe alloys over the 400 K to 700 K temperature as a function of Sn content and doping level. This model will be critical in future work as a tool to first determine what the highest thermoelectric figure of merit one can expect from this alloy system at a given temperature and, second, as a tool to determine the optimum Sn content and doping level to achieve this figure of merit. The second important result of this work is the determination that the secondary phase inclusions that were observed to be present in the Pb1-xSnxTe made by mechanical alloying do not keep the material from having the same electrical and thermal transport that would be expected from “perfect" single crystal material at elevated temperatures. The analytical approach described in this work will be critical in future investigations to predict how changing the size, type, and volume fraction of secondary phase inclusions can be used to impact thermal and electrical transport in this materials system.
Resumo:
This paper investigates the use of virtual reality (VR) technologies to facilitate the analysis of plant biological data in distinctive steps in the application pipeline. Reconstructed three-dimensional biological models (primary polygonal models) transferred to a virtual environment support scientists' collaborative exploration of biological datasets so that they obtain accurate analysis results and uncover information hidden in the data. Examples of the use of virtual reality in practice are provided and a complementary user study was performed.
Resumo:
El MC en baloncesto es aquel fenómeno relacionado con el juego que presenta unas características particulares determinadas por la idiosincrasia de un equipo y puede afectar a los protagonistas y por ende al devenir del juego. En la presente Tesis se ha estudiado la incidencia del MC en Liga A.C.B. de baloncesto y para su desarrollo en profundidad se ha planteado dos investigaciones una cuantitativa y otra cualitativa cuya metodología se detalla a continuación: La investigación cuantitativa se ha basado en la técnica de estudio del “Performance analysis”, para ello se han estudiado cuatro temporadas de la Liga A.C.B. (del 2007/08 al 2010/11), tal y como refleja en la bibliografía consultada se han tomado como momentos críticos del juego a los últimos cinco minutos de partidos donde la diferencia de puntos fue de seis puntos y todos los Tiempos Extras disputados, de tal manera que se han estudiado 197 momentos críticos. La contextualización del estudio se ha hecho en función de la variables situacionales “game location” (local o visitante), “team quality” (mejores o peores clasificados) y “competition” (fases de LR y Playoff). Para la interpretación de los resultados se han realizado los siguientes análisis descriptivos: 1) Análisis Discriminante, 2) Regresión Lineal Múltiple; y 3) Análisis del Modelo Lineal General Multivariante. La investigación cualitativa se ha basado en la técnica de investigación de la entrevista semiestructurada. Se entrevistaron a 12 entrenadores que militaban en la Liga A.C.B. durante la temporada 2011/12, cuyo objetivo ha sido conocer el punto de vista que tiene el entrenador sobre el concepto del MC y que de esta forma pudiera dar un enfoque más práctico basado en su conocimiento y experiencia acerca de cómo actuar ante el MC en el baloncesto. Los resultados de ambas investigaciones coinciden en señalar la importancia del MC sobre el resultado final del juego. De igual forma, el concepto en sí entraña una gran complejidad por lo que se considera fundamental la visión científica de la observación del juego y la percepción subjetiva que presenta el entrenador ante el fenómeno, para la cual los aspectos psicológicos de sus protagonistas (jugadores y entrenadores) son determinantes. ABSTRACT The Critical Moment (CM) in basketball is a related phenomenon with the game that has particular features determined by the idiosyncrasies of a team and can affect the players and therefore the future of the game. In this Thesis we have studied the impact of CM in the A.C.B. League and from a profound development two investigations have been raised, quantitative and qualitative whose methodology is as follows: The quantitative research is based on the technique of study "Performance analysis", for this we have studied four seasons in the A.C.B. League (2007/08 to 2010/11), and as reflected in the literature the Critical Moments of the games were taken from the last five minutes of games where the point spread was six points and all overtimes disputed, such that 197 critical moments have been studied. The contextualization of the study has been based on the situational variables "game location" (home or away), "team quality" (better or lower classified) and "competition" (LR and Playoff phases). For the interpretation of the results the following descriptive analyzes were performed: 1) Discriminant Analysis, 2) Multiple Linear Regression Analysis; and 3) Analysis of Multivariate General Linear Model. Qualitative research is based on the technique of investigation of a semi-structured interview. 12 coaches who belonged to the A.C.B. League were interviewed in seasons 2011/12, which aimed to determine the point of view that the coach has on the CM concept and thus could give a more practical approach based on their knowledge and experience about how to deal with the CM in basketball. The results of both studies agree on the importance of the CM on the final outcome of the game. Similarly, the concept itself is highly complex so the scientific view of the observation of the game is considered essential as well as the subjective perception the coach presents before the phenomenon, for which the psychological aspects of their characters (players and coaches) are crucial.
Resumo:
We present the first joint analysis of gamma-ray data from the MAGIC Cherenkov telescopes and the Fermi Large Area Telescope (LAT) to search for gamma-ray signals from dark matter annihilation in dwarf satellite galaxies. We combine 158 hours of Segue 1 observations with MAGIC with 6-year observations of 15 dwarf satellite galaxies by the Fermi-LAT. We obtain limits on the annihilation cross-section for dark matter particle masses between 10 GeV and 100 TeV – the widest mass range ever explored by a single gamma-ray analysis. These limits improve on previously published Fermi-LAT and MAGIC results by up to a factor of two at certain masses. Our new inclusive analysis approach is completely generic and can be used to perform a global, sensitivity-optimized dark matter search by combining data from present and future gamma-ray and neutrino detectors.
Resumo:
To be able to determine the grain size obtained from the addition of a grain refining master alloy, the relationship between grain size (d), solute content (defined by the growth restriction factor Q), and the potency and number density of nucleant particles needs to be understood. A study was undertaken on aluminium alloys where additions of TiB2 and Ti were made to eight wrought aluminum alloys covering a range of alloying elements and compositions. It was found from analysis of the data that d = a/(3)root pct TiB2 + b/Q. From consideration of the experimental data and from further analysis of previously published data, it is shown that the coefficients a and b relate to characteristics of the nucleant particles added by a grain refiner. The term a is related to the maximum density of active TiB2 nucleant particles within the melt, while b is related to their potency. By using the analysis methodology presented in this article, the performance characteristics of different master alloys were defined and the effects of Zr and Si on the poisoning of grain refinement were illustrated.
Resumo:
Objective: Five double-blind, randomized, saline-controlled trials (RCTs) were included in the United States marketing application for an intra-articular hyaluronan (IA-HA) product for the treatment of osteoarthritis (OA) of the knee. We report an integrated analysis of the primary Case Report Form (CRF) data from these trials. Method. Trials were similar in design, patient population and outcome measures - all included the Lequesne Algofunctional Index (LI), a validated composite index of pain and function, evaluating treatment over 3 months. Individual patient data were pooled; a repeated measures analysis of covariance was performed in the intent-to-treat (ITT) population. Analyses utilized both fixed and random effects models. Safety data from the five RCTs were summarized. Results: A total of 1155 patients with radiologically confirmed knee OA were enrolled: 619 received three or five IA-HA injections; 536 received. placebo saline injections. In the active and control groups, mean ages were 61.8 and 61.4 years; 62.4% and 58.8% were women; baseline total Lequesne scores 11.03 and 11.30, respectively. Integrated analysis of the pooled data set found a statistically significant reduction (P < 0.001) in total Lequesne score with hyaluronan (HA) (-2.68) vs placebo (-2.00); estimated difference -0.68 (95% CI: -0.56 to -0.79), effect size 0.20. Additional modeling approaches confirmed robustness of the analyses. Conclusions: This integrated analysis demonstrates that multiple design factors influence the results of RCTs assessing efficacy of intra-articular (IA) therapies, and that integrated analyses based on primary data differ from meta-analyses using transformed data. (C) 2006 OsteoArthritis Research Society International. Published by Elsevier Ltd. All rights reserved.
Resumo:
Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.
Resumo:
Objectives: Are behavioural interventions effective in reducing the rate of sexually transmitted infections (STIs) among genitourinary medicine (GUM) clinic patients? Design: Systematic review and meta-analysis of published articles. Data sources: Medline, CINAHL, Embase, PsychINFO, Applied Social Sciences Index and Abstracts, Cochrane Library Controlled Clinical Trials Register, National Research Register (1966 to January 2004). Review methods: Randomised controlled trials of behavioural interventions in sexual health clinic patients were included if they reported change to STI rates or self reported sexual behaviour. Trial quality was assessed using the Jadad score and results pooled using random effects meta-analyses where outcomes were consistent across studies. Results: 14 trials were included; 12 based in the United States. Experimental interventions were heterogeneous and most control interventions were more structured than typical UK care. Eight trials reported data on laboratory confirmed infections, of which four observed a greater reduction in their intervention groups (in two cases this result was statistically significant, p<0.05). Seven trials reported consistent condom use, of which six observed a greater increase among their intervention subjects. Results for other measures of sexual behaviour were inconsistent. Success in reducing STIs was related to trial quality, use of social cognition models, and formative research in the target population. However, effectiveness was not related to intervention format or length. Conclusions: While results were heterogeneous, several trials observed reductions in STI rates. The most effective interventions were developed through extensive formative research. These findings should encourage further research in the United Kingdom where new approaches to preventing STIs are urgently required.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.
Resumo:
La dynamique démographique ayant cours au sein de la région des Caraïbes est très particulière, notamment par la rapidité avec laquelle la population vieillit, un rythme des plus élevés par rapport aux autres régions du monde. Les enjeux cruciaux que sont ceux d’assurer la qualité de vie des aînés d’aujourd’hui et de demain ainsi qu’une gestion efficace de ces sociétés vieillissantes se doivent d’être abordés et pris en compte. Le présent mémoire diffuse les résultats d'une analyse ciblée des caractéristiques sociodémographiques des personnes âgées de quatre États des Caraïbes (Antigua-et-Barbuda, Sainte-Lucie, Saint-Vincent-et-les-Grenadines et Trinité-et-Tobago) à partir des données de leur plus récent recensement. Ce portrait met une emphase particulière sur les conditions de vie, la santé et la participation sur le marché du travail des personnes âgées, soit sur les grands thèmes des trois objectifs du Plan d’action international sur le vieillissement de Madrid. Par ailleurs, un regard est posé sur les effets des cinq premières années en vigueur du Plan de Madrid sur les populations caribéennes. Les informations obtenues à la suite d’interviews effectués auprès de personnes contacts de quelques pays caribéens sont synthétisées et identifient les efforts déployés principalement par les gouvernements pour inclure les objectifs du Plan de Madrid et autres enjeux du vieillissement démographique dans les mécanismes et les politiques de développement social et économique ainsi que ceux de respect des droits humains.