854 resultados para sparse Bayesian regression
Resumo:
In this demonstration we present our web services to perform Bayesian learning for classification tasks.
Resumo:
A statewide study was performed to develop regional regression equations for estimating selected annual exceedance- probability statistics for ungaged stream sites in Iowa. The study area comprises streamgages located within Iowa and 50 miles beyond the State’s borders. Annual exceedanceprobability estimates were computed for 518 streamgages by using the expected moments algorithm to fit a Pearson Type III distribution to the logarithms of annual peak discharges for each streamgage using annual peak-discharge data through 2010. The estimation of the selected statistics included a Bayesian weighted least-squares/generalized least-squares regression analysis to update regional skew coefficients for the 518 streamgages. Low-outlier and historic information were incorporated into the annual exceedance-probability analyses, and a generalized Grubbs-Beck test was used to detect multiple potentially influential low flows. Also, geographic information system software was used to measure 59 selected basin characteristics for each streamgage. Regional regression analysis, using generalized leastsquares regression, was used to develop a set of equations for each flood region in Iowa for estimating discharges for ungaged stream sites with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities, which are equivalent to annual flood-frequency recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, respectively. A total of 394 streamgages were included in the development of regional regression equations for three flood regions (regions 1, 2, and 3) that were defined for Iowa based on landform regions and soil regions. Average standard errors of prediction range from 31.8 to 45.2 percent for flood region 1, 19.4 to 46.8 percent for flood region 2, and 26.5 to 43.1 percent for flood region 3. The pseudo coefficients of determination for the generalized leastsquares equations range from 90.8 to 96.2 percent for flood region 1, 91.5 to 97.9 percent for flood region 2, and 92.4 to 96.0 percent for flood region 3. The regression equations are applicable only to stream sites in Iowa with flows not significantly affected by regulation, diversion, channelization, backwater, or urbanization and with basin characteristics within the range of those used to develop the equations. These regression equations will be implemented within the U.S. Geological Survey StreamStats Web-based geographic information system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the eight selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided by the Web-based tool. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these eight selected statistics are provided for the streamgage.
Resumo:
We describe the case of a man with a history of complex partial seizures and severe language, cognitive and behavioural regression during early childhood (3.5 years), who underwent epilepsy surgery at the age of 25 years. His early epilepsy had clinical and electroencephalogram features of the syndromes of epilepsy with continuous spike waves during sleep and acquired epileptic aphasia (Landau-Kleffner syndrome), which we considered initially to be of idiopathic origin. Seizures recurred at 19 years and presurgical investigations at 25 years showed a lateral frontal epileptic focus with spread to Broca's area and the frontal orbital regions. Histopathology revealed a focal cortical dysplasia, not visible on magnetic resonance imaging. The prolonged but reversible early regression and the residual neuropsychological disorders during adulthood were probably the result of an active left frontal epilepsy, which interfered with language and behaviour during development. Our findings raise the question of the role of focal cortical dysplasia as an aetiology in the syndromes of epilepsy with continuous spike waves during sleep and acquired epileptic aphasia.
Resumo:
Résumé Ce travail de thèse étudie des moyens de formalisation permettant d'assister l'expert forensique dans la gestion des facteurs influençant l'évaluation des indices scientifiques, tout en respectant des procédures d'inférence établies et acceptables. Selon une vue préconisée par une partie majoritaire de la littérature forensique et juridique - adoptée ici sans réserve comme point de départ - la conceptualisation d'une procédure évaluative est dite 'cohérente' lors qu'elle repose sur une implémentation systématique de la théorie des probabilités. Souvent, par contre, la mise en oeuvre du raisonnement probabiliste ne découle pas de manière automatique et peut se heurter à des problèmes de complexité, dus, par exemple, à des connaissances limitées du domaine en question ou encore au nombre important de facteurs pouvant entrer en ligne de compte. En vue de gérer ce genre de complications, le présent travail propose d'investiguer une formalisation de la théorie des probabilités au moyen d'un environment graphique, connu sous le nom de Réseaux bayesiens (Bayesian networks). L'hypothèse principale que cette recherche envisage d'examiner considère que les Réseaux bayesiens, en concert avec certains concepts accessoires (tels que des analyses qualitatives et de sensitivité), constituent une ressource clé dont dispose l'expert forensique pour approcher des problèmes d'inférence de manière cohérente, tant sur un plan conceptuel que pratique. De cette hypothèse de travail, des problèmes individuels ont été extraits, articulés et abordés dans une série de recherches distinctes, mais interconnectées, et dont les résultats - publiés dans des revues à comité de lecture - sont présentés sous forme d'annexes. D'un point de vue général, ce travail apporte trois catégories de résultats. Un premier groupe de résultats met en évidence, sur la base de nombreux exemples touchant à des domaines forensiques divers, l'adéquation en termes de compatibilité et complémentarité entre des modèles de Réseaux bayesiens et des procédures d'évaluation probabilistes existantes. Sur la base de ces indications, les deux autres catégories de résultats montrent, respectivement, que les Réseaux bayesiens permettent également d'aborder des domaines auparavant largement inexplorés d'un point de vue probabiliste et que la disponibilité de données numériques dites 'dures' n'est pas une condition indispensable pour permettre l'implémentation des approches proposées dans ce travail. Le présent ouvrage discute ces résultats par rapport à la littérature actuelle et conclut en proposant les Réseaux bayesiens comme moyen d'explorer des nouvelles voies de recherche, telles que l'étude de diverses formes de combinaison d'indices ainsi que l'analyse de la prise de décision. Pour ce dernier aspect, l'évaluation des probabilités constitue, dans la façon dont elle est préconisée dans ce travail, une étape préliminaire fondamentale de même qu'un moyen opérationnel.
Resumo:
Well developed experimental procedures currently exist for retrieving and analyzing particle evidence from hands of individuals suspected of being associated with the discharge of a firearm. Although analytical approaches (e.g. automated Scanning Electron Microscopy with Energy Dispersive X-ray (SEM-EDS) microanalysis) allow the determination of the presence of elements typically found in gunshot residue (GSR) particles, such analyses provide no information about a given particle's actual source. Possible origins for which scientists may need to account for are a primary exposure to the discharge of a firearm or a secondary transfer due to a contaminated environment. In order to approach such sources of uncertainty in the context of evidential assessment, this paper studies the construction and practical implementation of graphical probability models (i.e. Bayesian networks). These can assist forensic scientists in making the issue tractable within a probabilistic perspective. The proposed models focus on likelihood ratio calculations at various levels of detail as well as case pre-assessment.
Resumo:
Standard practice of wave-height hazard analysis often pays little attention to the uncertainty of assessed return periods and occurrence probabilities. This fact favors the opinion that, when large events happen, the hazard assessment should change accordingly. However, uncertainty of the hazard estimates is normally able to hide the effect of those large events. This is illustrated using data from the Mediterranean coast of Spain, where the last years have been extremely disastrous. Thus, it is possible to compare the hazard assessment based on data previous to those years with the analysis including them. With our approach, no significant change is detected when the statistical uncertainty is taken into account. The hazard analysis is carried out with a standard model. Time-occurrence of events is assumed Poisson distributed. The wave-height of each event is modelled as a random variable which upper tail follows a Generalized Pareto Distribution (GPD). Moreover, wave-heights are assumed independent from event to event and also independent of their occurrence in time. A threshold for excesses is assessed empirically. The other three parameters (Poisson rate, shape and scale parameters of GPD) are jointly estimated using Bayes' theorem. Prior distribution accounts for physical features of ocean waves in the Mediterranean sea and experience with these phenomena. Posterior distribution of the parameters allows to obtain posterior distributions of other derived parameters like occurrence probabilities and return periods. Predictives are also available. Computations are carried out using the program BGPE v2.0
Resumo:
Spatial data analysis mapping and visualization is of great importance in various fields: environment, pollution, natural hazards and risks, epidemiology, spatial econometrics, etc. A basic task of spatial mapping is to make predictions based on some empirical data (measurements). A number of state-of-the-art methods can be used for the task: deterministic interpolations, methods of geostatistics: the family of kriging estimators (Deutsch and Journel, 1997), machine learning algorithms such as artificial neural networks (ANN) of different architectures, hybrid ANN-geostatistics models (Kanevski and Maignan, 2004; Kanevski et al., 1996), etc. All the methods mentioned above can be used for solving the problem of spatial data mapping. Environmental empirical data are always contaminated/corrupted by noise, and often with noise of unknown nature. That's one of the reasons why deterministic models can be inconsistent, since they treat the measurements as values of some unknown function that should be interpolated. Kriging estimators treat the measurements as the realization of some spatial randomn process. To obtain the estimation with kriging one has to model the spatial structure of the data: spatial correlation function or (semi-)variogram. This task can be complicated if there is not sufficient number of measurements and variogram is sensitive to outliers and extremes. ANN is a powerful tool, but it also suffers from the number of reasons. of a special type ? multiplayer perceptrons ? are often used as a detrending tool in hybrid (ANN+geostatistics) models (Kanevski and Maignank, 2004). Therefore, development and adaptation of the method that would be nonlinear and robust to noise in measurements, would deal with the small empirical datasets and which has solid mathematical background is of great importance. The present paper deals with such model, based on Statistical Learning Theory (SLT) - Support Vector Regression. SLT is a general mathematical framework devoted to the problem of estimation of the dependencies from empirical data (Hastie et al, 2004; Vapnik, 1998). SLT models for classification - Support Vector Machines - have shown good results on different machine learning tasks. The results of SVM classification of spatial data are also promising (Kanevski et al, 2002). The properties of SVM for regression - Support Vector Regression (SVR) are less studied. First results of the application of SVR for spatial mapping of physical quantities were obtained by the authorsin for mapping of medium porosity (Kanevski et al, 1999), and for mapping of radioactively contaminated territories (Kanevski and Canu, 2000). The present paper is devoted to further understanding of the properties of SVR model for spatial data analysis and mapping. Detailed description of the SVR theory can be found in (Cristianini and Shawe-Taylor, 2000; Smola, 1996) and basic equations for the nonlinear modeling are given in section 2. Section 3 discusses the application of SVR for spatial data mapping on the real case study - soil pollution by Cs137 radionuclide. Section 4 discusses the properties of the modelapplied to noised data or data with outliers.
Resumo:
Forensic scientists face increasingly complex inference problems for evaluating likelihood ratios (LRs) for an appropriate pair of propositions. Up to now, scientists and statisticians have derived LR formulae using an algebraic approach. However, this approach reaches its limits when addressing cases with an increasing number of variables and dependence relationships between these variables. In this study, we suggest using a graphical approach, based on the construction of Bayesian networks (BNs). We first construct a BN that captures the problem, and then deduce the expression for calculating the LR from this model to compare it with existing LR formulae. We illustrate this idea by applying it to the evaluation of an activity level LR in the context of the two-trace transfer problem. Our approach allows us to relax assumptions made in previous LR developments, produce a new LR formula for the two-trace transfer problem and generalize this scenario to n traces.
Resumo:
The objective of this study was to estimate genetic parameters for survival and weight of Nile tilapia (Oreochromis niloticus), farmed in cages and ponds in Brazil, and to predict genetic gain under different scenarios. Survival was recorded as a binary response (dead or alive), during harvest time in the 2008 grow-out period. Genetic parameters were estimated using a Bayesian mixed linear-threshold animal model via Gibbs sampling. The breeding population consisted of 2,912 individual fish, which were analyzed together with the pedigree of 5,394 fish. The heritabilities estimates, with 95% posterior credible intervals, for tagging weight, harvest weight and survival were 0.17 (0.09-0.27), 0.21 (0.12-0.32) and 0.32 (0.22-0.44), respectively. Credible intervals show a 95% probability that the true genetic correlations were in a favourable direction. The selection for weight has a positive impact on survival. Estimated genetic gain was high when selecting for harvest weight (5.07%), and indirect gain for tagging weight (2.17%) and survival (2.03%) were also considerable.
Resumo:
The final year project came to us as an opportunity to get involved in a topic which has appeared to be attractive during the learning process of majoring in economics: statistics and its application to the analysis of economic data, i.e. econometrics.Moreover, the combination of econometrics and computer science is a very hot topic nowadays, given the Information Technologies boom in the last decades and the consequent exponential increase in the amount of data collected and stored day by day. Data analysts able to deal with Big Data and to find useful results from it are verydemanded in these days and, according to our understanding, the work they do, although sometimes controversial in terms of ethics, is a clear source of value added both for private corporations and the public sector. For these reasons, the essence of this project is the study of a statistical instrument valid for the analysis of large datasets which is directly related to computer science: Partial Correlation Networks.The structure of the project has been determined by our objectives through the development of it. At first, the characteristics of the studied instrument are explained, from the basic ideas up to the features of the model behind it, with the final goal of presenting SPACE model as a tool for estimating interconnections in between elements in large data sets. Afterwards, an illustrated simulation is performed in order to show the power and efficiency of the model presented. And at last, the model is put into practice by analyzing a relatively large data set of real world data, with the objective of assessing whether the proposed statistical instrument is valid and useful when applied to a real multivariate time series. In short, our main goals are to present the model and evaluate if Partial Correlation Network Analysis is an effective, useful instrument and allows finding valuable results from Big Data.As a result, the findings all along this project suggest the Partial Correlation Estimation by Joint Sparse Regression Models approach presented by Peng et al. (2009) to work well under the assumption of sparsity of data. Moreover, partial correlation networks are shown to be a very valid tool to represent cross-sectional interconnections in between elements in large data sets.The scope of this project is however limited, as there are some sections in which deeper analysis would have been appropriate. Considering intertemporal connections in between elements, the choice of the tuning parameter lambda, or a deeper analysis of the results in the real data application are examples of aspects in which this project could be completed.To sum up, the analyzed statistical tool has been proved to be a very useful instrument to find relationships that connect the elements present in a large data set. And after all, partial correlation networks allow the owner of this set to observe and analyze the existing linkages that could have been omitted otherwise.
Resumo:
The ground-penetrating radar (GPR) geophysical method has the potential to provide valuable information on the hydraulic properties of the vadose zone because of its strong sensitivity to soil water content. In particular, recent evidence has suggested that the stochastic inversion of crosshole GPR traveltime data can allow for a significant reduction in uncertainty regarding subsurface van Genuchten-Mualem (VGM) parameters. Much of the previous work on the stochastic estimation of VGM parameters from crosshole GPR data has considered the case of steady-state infiltration conditions, which represent only a small fraction of practically relevant scenarios. We explored in detail the dynamic infiltration case, specifically examining to what extent time-lapse crosshole GPR traveltimes, measured during a forced infiltration experiment at the Arreneas field site in Denmark, could help to quantify VGM parameters and their uncertainties in a layered medium, as well as the corresponding soil hydraulic properties. We used a Bayesian Markov-chain-Monte-Carlo inversion approach. We first explored the advantages and limitations of this approach with regard to a realistic synthetic example before applying it to field measurements. In our analysis, we also considered different degrees of prior information. Our findings indicate that the stochastic inversion of the time-lapse GPR data does indeed allow for a substantial refinement in the inferred posterior VGM parameter distributions compared with the corresponding priors, which in turn significantly improves knowledge of soil hydraulic properties. Overall, the results obtained clearly demonstrate the value of the information contained in time-lapse GPR data for characterizing vadose zone dynamics.
Resumo:
Purpose : To assess time trends of testicular cancer (TC) mortality in Spain for period 1985-2019 for age groups 15-74 years old through a Bayesian age-period-cohort (APC) analysis. Methods: A Bayesian age-drift model has been fitted to describe trends. Projections for 2005-2019 have been calculated by means of an autoregressive APC model. Prior precision for these parameters has been selected through evaluation of an adaptive precision parameter and 95% credible intervals (95% CRI) have been obtained for each model parameter. Results: A decrease of -2.41% (95% CRI: -3.65%; -1.13%) per year has been found for TC mortality rates in age groups 15-74 during 1985-2004, whereas mortality showed a lower annual decrease when data was restricted to age groups 15-54 (-1.18%; 95% CRI: -2.60%; -0.31%). During 2005-2019 is expected a decrease of TC mortality of 2.30% per year for men younger than 35, whereas a leveling off for TC mortality rates is expected for men older than 35. Conclusions: A Bayesian approach should be recommended to describe and project time trends for those diseases with low number of cases. Through this model it has been assessed that management of TC and advances in therapy led to decreasing trend of TC mortality during the period 1985-2004, whereas a leveling off for these trends can be considered during 2005-2019 among men older than 35.
Resumo:
Purpose : To assess time trends of testicular cancer (TC) mortality in Spain for period 1985-2019 for age groups 15-74 years old through a Bayesian age-period-cohort (APC) analysis. Methods: A Bayesian age-drift model has been fitted to describe trends. Projections for 2005-2019 have been calculated by means of an autoregressive APC model. Prior precision for these parameters has been selected through evaluation of an adaptive precision parameter and 95% credible intervals (95% CRI) have been obtained for each model parameter. Results: A decrease of -2.41% (95% CRI: -3.65%; -1.13%) per year has been found for TC mortality rates in age groups 15-74 during 1985-2004, whereas mortality showed a lower annual decrease when data was restricted to age groups 15-54 (-1.18%; 95% CRI: -2.60%; -0.31%). During 2005-2019 is expected a decrease of TC mortality of 2.30% per year for men younger than 35, whereas a leveling off for TC mortality rates is expected for men older than 35. Conclusions: A Bayesian approach should be recommended to describe and project time trends for those diseases with low number of cases. Through this model it has been assessed that management of TC and advances in therapy led to decreasing trend of TC mortality during the period 1985-2004, whereas a leveling off for these trends can be considered during 2005-2019 among men older than 35.
Resumo:
Purpose : To assess time trends of testicular cancer (TC) mortality in Spain for period 1985-2019 for age groups 15-74 years old through a Bayesian age-period-cohort (APC) analysis. Methods: A Bayesian age-drift model has been fitted to describe trends. Projections for 2005-2019 have been calculated by means of an autoregressive APC model. Prior precision for these parameters has been selected through evaluation of an adaptive precision parameter and 95% credible intervals (95% CRI) have been obtained for each model parameter. Results: A decrease of -2.41% (95% CRI: -3.65%; -1.13%) per year has been found for TC mortality rates in age groups 15-74 during 1985-2004, whereas mortality showed a lower annual decrease when data was restricted to age groups 15-54 (-1.18%; 95% CRI: -2.60%; -0.31%). During 2005-2019 is expected a decrease of TC mortality of 2.30% per year for men younger than 35, whereas a leveling off for TC mortality rates is expected for men older than 35. Conclusions: A Bayesian approach should be recommended to describe and project time trends for those diseases with low number of cases. Through this model it has been assessed that management of TC and advances in therapy led to decreasing trend of TC mortality during the period 1985-2004, whereas a leveling off for these trends can be considered during 2005-2019 among men older than 35.