893 resultados para variable sample size
Resumo:
This paper gives a review of recent progress in the design of numerical methods for computing the trajectories (sample paths) of solutions to stochastic differential equations. We give a brief survey of the area focusing on a number of application areas where approximations to strong solutions are important, with a particular focus on computational biology applications, and give the necessary analytical tools for understanding some of the important concepts associated with stochastic processes. We present the stochastic Taylor series expansion as the fundamental mechanism for constructing effective numerical methods, give general results that relate local and global order of convergence and mention the Magnus expansion as a mechanism for designing methods that preserve the underlying structure of the problem. We also present various classes of explicit and implicit methods for strong solutions, based on the underlying structure of the problem. Finally, we discuss implementation issues relating to maintaining the Brownian path, efficient simulation of stochastic integrals and variable-step-size implementations based on various types of control.
Resumo:
Background Reliable information on causes of death is a fundamental component of health development strategies, yet globally only about one-third of countries have access to such information. For countries currently without adequate mortality reporting systems there are useful models other than resource-intensive population-wide medical certification. Sample-based mortality surveillance is one such approach. This paper provides methods for addressing appropriate sample size considerations in relation to mortality surveillance, with particular reference to situations in which prior information on mortality is lacking. Methods The feasibility of model-based approaches for predicting the expected mortality structure and cause composition is demonstrated for populations in which only limited empirical data is available. An algorithm approach is then provided to derive the minimum person-years of observation needed to generate robust estimates for the rarest cause of interest in three hypothetical populations, each representing different levels of health development. Results Modelled life expectancies at birth and cause of death structures were within expected ranges based on published estimates for countries at comparable levels of health development. Total person-years of observation required in each population could be more than halved by limiting the set of age, sex, and cause groups regarded as 'of interest'. Discussion The methods proposed are consistent with the philosophy of establishing priorities across broad clusters of causes for which the public health response implications are similar. The examples provided illustrate the options available when considering the design of mortality surveillance for population health monitoring purposes.
Resumo:
Registration of births, recording deaths by age, sex and cause, and calculating mortality levels and differentials are fundamental to evidence-based health policy, monitoring and evaluation. Yet few of the countries with the greatest need for these data have functioning systems to produce them despite legislation providing for the establishment and maintenance of vital registration. Sample vital registration (SVR), when applied in conjunction with validated verbal autopsy, procedures and implemented in a nationally representative sample of population clusters represents an affordable, cost-effective, and sustainable short- and medium-term solution to this problem. SVR complements other information sources by producing age-, sex-, and cause-specific mortality data that are more complete and continuous than those currently available. The tools and methods employed in an SVR system, however, are imperfect and require rigorous validation and continuous quality assurance; sampling strategies for SVR are also still evolving. Nonetheless, interest in establishing SVR is rapidly growing in Africa and Asia. Better systems for reporting and recording data on vital events will be sustainable only if developed hand-in-hand with existing health information strategies at the national and district levels; governance structures; and agendas for social research and development monitoring. If the global community wishes to have mortality measurements 5 or 10 years hence, the foundation stones of SVR must be laid today.
Bias, precision and heritability of self-reported and clinically measured height in Australian twins
Resumo:
Many studies of quantitative and disease traits in human genetics rely upon self-reported measures. Such measures are based on questionnaires or interviews and are often cheaper and more readily available than alternatives. However, the precision and potential bias cannot usually be assessed. Here we report a detailed quantitative genetic analysis of stature. We characterise the degree of measurement error by utilising a large sample of Australian twin pairs (857 MZ, 815 DZ) with both clinical and self-reported measures of height. Self-report height measurements are shown to be more variable than clinical measures. This has led to lowered estimates of heritability in many previous studies of stature. In our twin sample the heritability estimate for clinical height exceeded 90%. Repeated measures analysis shows that 2-3 times as many self-report measures are required to recover heritability estimates similar to those obtained from clinical measures. Bivariate genetic repeated measures analysis of self-report and clinical height measures showed an additive genetic correlation > 0.98. We show that the accuracy of self-report height is upwardly biased in older individuals and in individuals of short stature. By comparing clinical and self-report measures we also showed that there was a genetic component to females systematically reporting their height incorrectly; this phenomenon appeared to not be present in males. The results from the measurement error analysis were subsequently used to assess the effects of error on the power to detect linkage in a genome scan. Moderate reduction in error (through the use of accurate clinical or multiple self-report measures) increased the effective sample size by 22%; elimination of measurement error led to increases in effective sample size of 41%.
Resumo:
Among the Solar System’s bodies, Moon, Mercury and Mars are at present, or have been in the recent years, object of space missions aimed, among other topics, also at improving our knowledge about surface composition. Between the techniques to detect planet’s mineralogical composition, both from remote and close range platforms, visible and near-infrared reflectance (VNIR) spectroscopy is a powerful tool, because crystal field absorption bands are related to particular transitional metals in well-defined crystal structures, e.g., Fe2+ in M1 and M2 sites of olivine or pyroxene (Burns, 1993). Thanks to the improvements in the spectrometers onboard the recent missions, a more detailed interpretation of the planetary surfaces can now be delineated. However, quantitative interpretation of planetary surface mineralogy could not always be a simple task. In fact, several factors such as the mineral chemistry, the presence of different minerals that absorb in a narrow spectral range, the regolith with a variable particle size range, the space weathering, the atmosphere composition etc., act in unpredictable ways on the reflectance spectra on a planetary surface (Serventi et al., 2014). One method for the interpretation of reflectance spectra of unknown materials involves the study of a number of spectra acquired in the laboratory under different conditions, such as different mineral abundances or different particle sizes, in order to derive empirical trends. This is the methodology that has been followed in this PhD thesis: the single factors previously listed have been analyzed, creating, in the laboratory, a set of terrestrial analogues with well-defined composition and size. The aim of this work is to provide new tools and criteria to improve the knowledge of the composition of planetary surfaces. In particular, mixtures composed with different content and chemistry of plagioclase and mafic minerals have been spectroscopically analyzed at different particle sizes and with different mineral relative percentages. The reflectance spectra of each mixture have been analyzed both qualitatively (using the software ORIGIN®) and quantitatively applying the Modified Gaussian Model (MGM, Sunshine et al., 1990) algorithm. In particular, the spectral parameter variations of each absorption band have been evaluated versus the volumetric FeO% content in the PL phase and versus the PL modal abundance. This delineated calibration curves of composition vs. spectral parameters and allow implementation of spectral libraries. Furthermore, the trends derived from terrestrial analogues here analyzed and from analogues in the literature have been applied for the interpretation of hyperspectral images of both plagioclase-rich (Moon) and plagioclase-poor (Mars) bodies.
Resumo:
The performance of seven minimization algorithms are compared on five neural network problems. These include a variable-step-size algorithm, conjugate gradient, and several methods with explicit analytic or numerical approximations to the Hessian.
Resumo:
The Vapnik-Chervonenkis (VC) dimension is a combinatorial measure of a certain class of machine learning problems, which may be used to obtain upper and lower bounds on the number of training examples needed to learn to prescribed levels of accuracy. Most of the known bounds apply to the Probably Approximately Correct (PAC) framework, which is the framework within which we work in this paper. For a learning problem with some known VC dimension, much is known about the order of growth of the sample-size requirement of the problem, as a function of the PAC parameters. The exact value of sample-size requirement is however less well-known, and depends heavily on the particular learning algorithm being used. This is a major obstacle to the practical application of the VC dimension. Hence it is important to know exactly how the sample-size requirement depends on VC dimension, and with that in mind, we describe a general algorithm for learning problems having VC dimension 1. Its sample-size requirement is minimal (as a function of the PAC parameters), and turns out to be the same for all non-trivial learning problems having VC dimension 1. While the method used cannot be naively generalised to higher VC dimension, it suggests that optimal algorithm-dependent bounds may improve substantially on current upper bounds.
Resumo:
Researchers often use 3-way interactions in moderated multiple regression analysis to test the joint effect of 3 independent variables on a dependent variable. However, further probing of significant interaction terms varies considerably and is sometimes error prone. The authors developed a significance test for slope differences in 3-way interactions and illustrate its importance for testing psychological hypotheses. Monte Carlo simulations revealed that sample size, magnitude of the slope difference, and data reliability affected test power. Application of the test to published data yielded detection of some slope differences that were undetected by alternative probing techniques and led to changes of results and conclusions. The authors conclude by discussing the test's applicability for psychological research. Copyright 2006 by the American Psychological Association.
Resumo:
Multiple regression analysis is a complex statistical method with many potential uses. It has also become one of the most abused of all statistical procedures since anyone with a data base and suitable software can carry it out. An investigator should always have a clear hypothesis in mind before carrying out such a procedure and knowledge of the limitations of each aspect of the analysis. In addition, multiple regression is probably best used in an exploratory context, identifying variables that might profitably be examined by more detailed studies. Where there are many variables potentially influencing Y, they are likely to be intercorrelated and to account for relatively small amounts of the variance. Any analysis in which R squared is less than 50% should be suspect as probably not indicating the presence of significant variables. A further problem relates to sample size. It is often stated that the number of subjects or patients must be at least 5-10 times the number of variables included in the study.5 This advice should be taken only as a rough guide but it does indicate that the variables included should be selected with great care as inclusion of an obviously unimportant variable may have a significant impact on the sample size required.
Resumo:
The principal theme of this thesis is the identification of additional factors affecting, and consequently to better allow, the prediction of soft contact lens fit. Various models have been put forward in an attempt to predict the parameters that influence soft contact lens fit dynamics; however, the factors that influence variation in soft lens fit are still not fully understood. The investigations in this body of work involved the use of a variety of different imaging techniques to both quantify the anterior ocular topography and assess lens fit. The use of Anterior-Segment Optical Coherence Tomography (AS-OCT) allowed for a more complete characterisation of the cornea and corneoscleral profile (CSP) than either conventional keratometry or videokeratoscopy alone, and for the collection of normative data relating to the CSP for a substantial sample size. The scleral face was identified as being rotationally asymmetric, the mean corneoscleral junction (CSJ) angle being sharpest nasally and becoming progressively flatter at the temporal, inferior and superior limbal junctions. Additionally, 77% of all CSJ angles were within ±50 of 1800, demonstrating an almost tangential extension of the cornea to form the paralimbal sclera. Use of AS-OCT allowed for a more robust determination of corneal diameter than that of white-to-white (WTW) measurement, which is highly variable and dependent on changes in peripheral corneal transparency. Significant differences in ocular topography were found between different ethnicities and sexes, most notably for corneal diameter and corneal sagittal height variables. Lens tightness was found to be significantly correlated with the difference between horizontal CSJ angles (r =+0.40, P =0.0086). Modelling of the CSP data gained allowed for prediction of up to 24% of the variance in contact lens fit; however, it was likely that stronger associations and an increase in the modelled prediction of variance in fit may have occurred had an objective method of lens fit assessment have been made. A subsequent investigation to determine the validity and repeatability of objective contact lens fit assessment using digital video capture showed no significant benefit over subjective evaluation. The technique, however, was employed in the ensuing investigation to show significant changes in lens fit between 8 hours (the longest duration of wear previously examined) and 16 hours, demonstrating that wearing time is an additional factor driving lens fit dynamics. The modelling of data from enhanced videokeratoscopy composite maps alone allowed for up to 77% of the variance in soft contact lens fit, and up to almost 90% to be predicted when used in conjunction with OCT. The investigations provided further insight into the ocular topography and factors affecting soft contact lens fit.
Resumo:
* This work was financially supported by RFBR-04-01-00858.
Resumo:
* This work was financially supported by RFBR-04-01-00858.
Resumo:
The Highway Safety Manual (HSM) estimates roadway safety performance based on predictive models that were calibrated using national data. Calibration factors are then used to adjust these predictive models to local conditions for local applications. The HSM recommends that local calibration factors be estimated using 30 to 50 randomly selected sites that experienced at least a total of 100 crashes per year. It also recommends that the factors be updated every two to three years, preferably on an annual basis. However, these recommendations are primarily based on expert opinions rather than data-driven research findings. Furthermore, most agencies do not have data for many of the input variables recommended in the HSM. This dissertation is aimed at determining the best way to meet three major data needs affecting the estimation of calibration factors: (1) the required minimum sample sizes for different roadway facilities, (2) the required frequency for calibration factor updates, and (3) the influential variables affecting calibration factors. In this dissertation, statewide segment and intersection data were first collected for most of the HSM recommended calibration variables using a Google Maps application. In addition, eight years (2005-2012) of traffic and crash data were retrieved from existing databases from the Florida Department of Transportation. With these data, the effect of sample size criterion on calibration factor estimates was first studied using a sensitivity analysis. The results showed that the minimum sample sizes not only vary across different roadway facilities, but they are also significantly higher than those recommended in the HSM. In addition, results from paired sample t-tests showed that calibration factors in Florida need to be updated annually. To identify influential variables affecting the calibration factors for roadway segments, the variables were prioritized by combining the results from three different methods: negative binomial regression, random forests, and boosted regression trees. Only a few variables were found to explain most of the variation in the crash data. Traffic volume was consistently found to be the most influential. In addition, roadside object density, major and minor commercial driveway densities, and minor residential driveway density were also identified as influential variables.
Resumo:
We present new Holocene century to millennial-scale proxies for the well-dated piston core MD99-2269 from Húnaflóadjúp on the North Iceland Shelf. The core is located in 365 mwd and lies close to the fluctuating boundary between Atlantic and Arctic/Polar waters. The proxies are: alkenone-based SST°C, and Mg/Ca SST°C estimates and stable d13C and d18O values on planktonic and benthic foraminifera. The data were converted to 60 yr equi-spaced time-series. Significant trends in the data were extracted using Singular Spectrum Analysis and these accounted for between 50% and 70% of the variance. A comparison between these data with previously published climate proxies from MD99-2269 was carried out on a data set which consisted of 14-variable data set covering the interval 400-9200 cal yr BP at 100 yr time steps. This analysis indicated that the 1st two PC axes accounted for 57% of the variability with high loadings clustering primarily into "nutrient" and "temperature" proxies. Clustering on the 100 yr time-series indicated major changes in environment at ~6350 and ~3450 cal yr BP, which define early, mid- and late Holocene climatic intervals. We argue that a pervasive freshwater cap during the early Holocene resulted in warm SST°s, a stratified water column, and a depleted nutrient supply. The loss of the freshwater layer in the mid-Holocene resulted in high carbonate production, and the late Holocene/neoglacial interval was marked by significantly more variable sea surface conditions.
Resumo:
Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.
While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.
For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.