979 resultados para Methods : Statistical
Resumo:
Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics from the observed data with minimal loss of information. In this article we provide a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature. The methods are split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization. In addition, we introduce two new methods of dimension reduction. The first is a best subset selection method based on Akaike and Bayesian information criteria, and the second uses ridge regression as a regularization procedure. We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets.
Resumo:
We investigate the initialization of Northern-hemisphere sea ice in the global climate model ECHAM5/MPI-OM by assimilating sea-ice concentration data. The analysis updates for concentration are given by Newtonian relaxation, and we discuss different ways of specifying the analysis updates for mean thickness. Because the conservation of mean ice thickness or actual ice thickness in the analysis updates leads to poor assimilation performance, we introduce a proportional dependence between concentration and mean thickness analysis updates. Assimilation with these proportional mean-thickness analysis updates significantly reduces assimilation error both in identical-twin experiments and when assimilating sea-ice observations, reducing the concentration error by a factor of four to six, and the thickness error by a factor of two. To understand the physical aspects of assimilation errors, we construct a simple prognostic model of the sea-ice thermodynamics, and analyse its response to the assimilation. We find that the strong dependence of thermodynamic ice growth on ice concentration necessitates an adjustment of mean ice thickness in the analysis update. To understand the statistical aspects of assimilation errors, we study the model background error covariance between ice concentration and ice thickness. We find that the spatial structure of covariances is best represented by the proportional mean-thickness analysis updates. Both physical and statistical evidence supports the experimental finding that proportional mean-thickness updates are superior to the other two methods considered and enable us to assimilate sea ice in a global climate model using simple Newtonian relaxation.
Resumo:
We investigate the initialisation of Northern Hemisphere sea ice in the global climate model ECHAM5/MPI-OM by assimilating sea-ice concentration data. The analysis updates for concentration are given by Newtonian relaxation, and we discuss different ways of specifying the analysis updates for mean thickness. Because the conservation of mean ice thickness or actual ice thickness in the analysis updates leads to poor assimilation performance, we introduce a proportional dependence between concentration and mean thickness analysis updates. Assimilation with these proportional mean-thickness analysis updates leads to good assimilation performance for sea-ice concentration and thickness, both in identical-twin experiments and when assimilating sea-ice observations. The simulation of other Arctic surface fields in the coupled model is, however, not significantly improved by the assimilation. To understand the physical aspects of assimilation errors, we construct a simple prognostic model of the sea-ice thermodynamics, and analyse its response to the assimilation. We find that an adjustment of mean ice thickness in the analysis update is essential to arrive at plausible state estimates. To understand the statistical aspects of assimilation errors, we study the model background error covariance between ice concentration and ice thickness. We find that the spatial structure of covariances is best represented by the proportional mean-thickness analysis updates. Both physical and statistical evidence supports the experimental finding that assimilation with proportional mean-thickness updates outperforms the other two methods considered. The method described here is very simple to implement, and gives results that are sufficiently good to be used for initialising sea ice in a global climate model for seasonal to decadal predictions.
Resumo:
A statistical–dynamical downscaling (SDD) approach for the regionalization of wind energy output (Eout) over Europe with special focus on Germany is proposed. SDD uses an extended circulation weather type (CWT) analysis on global daily mean sea level pressure fields with the central point being located over Germany. Seventy-seven weather classes based on the associated CWT and the intensity of the geostrophic flow are identified. Representatives of these classes are dynamically downscaled with the regional climate model COSMO-CLM. By using weather class frequencies of different data sets, the simulated representatives are recombined to probability density functions (PDFs) of near-surface wind speed and finally to Eout of a sample wind turbine for present and future climate. This is performed for reanalysis, decadal hindcasts and long-term future projections. For evaluation purposes, results of SDD are compared to wind observations and to simulated Eout of purely dynamical downscaling (DD) methods. For the present climate, SDD is able to simulate realistic PDFs of 10-m wind speed for most stations in Germany. The resulting spatial Eout patterns are similar to DD-simulated Eout. In terms of decadal hindcasts, results of SDD are similar to DD-simulated Eout over Germany, Poland, Czech Republic, and Benelux, for which high correlations between annual Eout time series of SDD and DD are detected for selected hindcasts. Lower correlation is found for other European countries. It is demonstrated that SDD can be used to downscale the full ensemble of the Earth System Model of the Max Planck Institute (MPI-ESM) decadal prediction system. Long-term climate change projections in Special Report on Emission Scenarios of ECHAM5/MPI-OM as obtained by SDD agree well to the results of other studies using DD methods, with increasing Eout over northern Europe and a negative trend over southern Europe. Despite some biases, it is concluded that SDD is an adequate tool to assess regional wind energy changes in large model ensembles.
Resumo:
BACKGROUND: Social networks are common in digital health. A new stream of research is beginning to investigate the mechanisms of digital health social networks (DHSNs), how they are structured, how they function, and how their growth can be nurtured and managed. DHSNs increase in value when additional content is added, and the structure of networks may resemble the characteristics of power laws. Power laws are contrary to traditional Gaussian averages in that they demonstrate correlated phenomena. OBJECTIVES: The objective of this study is to investigate whether the distribution frequency in four DHSNs can be characterized as following a power law. A second objective is to describe the method used to determine the comparison. METHODS: Data from four DHSNs—Alcohol Help Center (AHC), Depression Center (DC), Panic Center (PC), and Stop Smoking Center (SSC)—were compared to power law distributions. To assist future researchers and managers, the 5-step methodology used to analyze and compare datasets is described. RESULTS: All four DHSNs were found to have right-skewed distributions, indicating the data were not normally distributed. When power trend lines were added to each frequency distribution, R(2) values indicated that, to a very high degree, the variance in post frequencies can be explained by actor rank (AHC .962, DC .975, PC .969, SSC .95). Spearman correlations provided further indication of the strength and statistical significance of the relationship (AHC .987. DC .967, PC .983, SSC .993, P<.001). CONCLUSIONS: This is the first study to investigate power distributions across multiple DHSNs, each addressing a unique condition. Results indicate that despite vast differences in theme, content, and length of existence, DHSNs follow properties of power laws. The structure of DHSNs is important as it gives insight to researchers and managers into the nature and mechanisms of network functionality. The 5-step process undertaken to compare actor contribution patterns can be replicated in networks that are managed by other organizations, and we conjecture that patterns observed in this study could be found in other DHSNs. Future research should analyze network growth over time and examine the characteristics and survival rates of superusers.
Resumo:
To know how much misalignment is tolerable for a particle accelerator is an important input for the design of these machines. In particle accelerators the beam must be guided and focused using bending magnets and magnetic lenses, respectively. The alignment of the lenses along a transport line aims to ensure that the beam passes through their optical axes and represents a critical point in the assembly of the machine. There are more and more accelerators in the world, many of which are very small machines. Because the existing literature and programs are mostly targeted for large machines. in this work we describe a method suitable for small machines. This method consists in determining statistically the alignment tolerance in a set of lenses. Differently from the methods used in standard simulation codes for particle accelerators, the statistical method we propose makes it possible to evaluate particle losses as a function of the alignment accuracy of the optical elements in a transport line. Results for 100 key electrons, on the 3.5-m long conforming beam stage of the IFUSP Microtron are presented as an example of use. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
We report a statistical analysis of Doppler broadening coincidence data of electron-positron annihilation radiation in silicon using a (22)Na source. The Doppler broadening coincidence spectrum was fit using a model function that included positron annihilation at rest with 1s, 2s, 2p, and valence band electrons. In-flight positron annihilation was also fit. The response functions of the detectors accounted for backscattering, combinations of Compton effects, pileup, ballistic deficit, and pulse-shaping problems. The procedure allows the quantitative determination of positron annihilation with core and valence electron intensities as well as their standard deviations directly from the experimental spectrum. The results obtained for the core and valence band electron annihilation intensities were 2.56(9)% and 97.44(9)%, respectively. These intensities are consistent with published experimental data treated by conventional analysis methods. This new procedure has the advantage of allowing one to distinguish additional effects from those associated with the detection system response function. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
The topology of real-world complex networks, such as in transportation and communication, is always changing with time. Such changes can arise not only as a natural consequence of their growth, but also due to major modi. cations in their intrinsic organization. For instance, the network of transportation routes between cities and towns ( hence locations) of a given country undergo a major change with the progressive implementation of commercial air transportation. While the locations could be originally interconnected through highways ( paths, giving rise to geographical networks), transportation between those sites progressively shifted or was complemented by air transportation, with scale free characteristics. In the present work we introduce the path-star transformation ( in its uniform and preferential versions) as a means to model such network transformations where paths give rise to stars of connectivity. It is also shown, through optimal multivariate statistical methods (i.e. canonical projections and maximum likelihood classification) that while the US highways network adheres closely to a geographical network model, its path-star transformation yields a network whose topological properties closely resembles those of the respective airport transportation network.
Resumo:
We propose a likelihood ratio test ( LRT) with Bartlett correction in order to identify Granger causality between sets of time series gene expression data. The performance of the proposed test is compared to a previously published bootstrapbased approach. LRT is shown to be significantly faster and statistically powerful even within non- Normal distributions. An R package named gGranger containing an implementation for both Granger causality identification tests is also provided.
Resumo:
Quadratic assignment problems (QAPs) are commonly solved by heuristic methods, where the optimum is sought iteratively. Heuristics are known to provide good solutions but the quality of the solutions, i.e., the confidence interval of the solution is unknown. This paper uses statistical optimum estimation techniques (SOETs) to assess the quality of Genetic algorithm solutions for QAPs. We examine the functioning of different SOETs regarding biasness, coverage rate and length of interval, and then we compare the SOET lower bound with deterministic ones. The commonly used deterministic bounds are confined to only a few algorithms. We show that, the Jackknife estimators have better performance than Weibull estimators, and when the number of heuristic solutions is as large as 100, higher order JK-estimators perform better than lower order ones. Compared with the deterministic bounds, the SOET lower bound performs significantly better than most deterministic lower bounds and is comparable with the best deterministic ones.
Predictive models for chronic renal disease using decision trees, naïve bayes and case-based methods
Resumo:
Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the database is initially transformed to Weka (3.6) and Chi-Square method is used for features section. After normalizing data, three classifiers were applied and efficiency of output is evaluated. Mainly, three classifiers are analyzed: Decision Tree, Naïve Bayes, K-Nearest Neighbour algorithm. Results show that each technique has its unique strength in realizing the objectives of the defined mining goals. Efficiency of Decision Tree and KNN was almost same but Naïve Bayes proved a comparative edge over others. Further sensitivity and specificity tests are used as statistical measures to examine the performance of a binary classification. Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified while Specificity measures the proportion of negatives which are correctly identified. CRISP-DM methodology is applied to build the mining models. It consists of six major phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.
Resumo:
Generalized linear mixed models are flexible tools for modeling non-normal data and are useful for accommodating overdispersion in Poisson regression models with random effects. Their main difficulty resides in the parameter estimation because there is no analytic solution for the maximization of the marginal likelihood. Many methods have been proposed for this purpose and many of them are implemented in software packages. The purpose of this study is to compare the performance of three different statistical principles - marginal likelihood, extended likelihood, Bayesian analysis-via simulation studies. Real data on contact wrestling are used for illustration.
Resumo:
Researchers analyzing spatiotemporal or panel data, which varies both in location and over time, often find that their data has holes or gaps. This thesis explores alternative methods for filling those gaps and also suggests a set of techniques for evaluating those gap-filling methods to determine which works best.
Resumo:
Data available on continuos-time diffusions are always sampled discretely in time. In most cases, the likelihood function of the observations is not directly computable. This survey covers a sample of the statistical methods that have been developed to solve this problem. We concentrate on some recent contributions to the literature based on three di§erent approaches to the problem: an improvement of the Euler-Maruyama discretization scheme, the use of Martingale Estimating Functions and the application of Generalized Method of Moments (GMM).
Resumo:
Data available on continuous-time diffusions are always sampled discretely in time. In most cases, the likelihood function of the observations is not directly computable. This survey covers a sample of the statistical methods that have been developed to solve this problem. We concentrate on some recent contributions to the literature based on three di§erent approaches to the problem: an improvement of the Euler-Maruyama discretization scheme, the employment of Martingale Estimating Functions, and the application of Generalized Method of Moments (GMM).