970 resultados para SAMPLING METHODS
Resumo:
Contexte. Les études cas-témoins sont très fréquemment utilisées par les épidémiologistes pour évaluer l’impact de certaines expositions sur une maladie particulière. Ces expositions peuvent être représentées par plusieurs variables dépendant du temps, et de nouvelles méthodes sont nécessaires pour estimer de manière précise leurs effets. En effet, la régression logistique qui est la méthode conventionnelle pour analyser les données cas-témoins ne tient pas directement compte des changements de valeurs des covariables au cours du temps. Par opposition, les méthodes d’analyse des données de survie telles que le modèle de Cox à risques instantanés proportionnels peuvent directement incorporer des covariables dépendant du temps représentant les histoires individuelles d’exposition. Cependant, cela nécessite de manipuler les ensembles de sujets à risque avec précaution à cause du sur-échantillonnage des cas, en comparaison avec les témoins, dans les études cas-témoins. Comme montré dans une étude de simulation précédente, la définition optimale des ensembles de sujets à risque pour l’analyse des données cas-témoins reste encore à être élucidée, et à être étudiée dans le cas des variables dépendant du temps. Objectif: L’objectif général est de proposer et d’étudier de nouvelles versions du modèle de Cox pour estimer l’impact d’expositions variant dans le temps dans les études cas-témoins, et de les appliquer à des données réelles cas-témoins sur le cancer du poumon et le tabac. Méthodes. J’ai identifié de nouvelles définitions d’ensemble de sujets à risque, potentiellement optimales (le Weighted Cox model and le Simple weighted Cox model), dans lesquelles différentes pondérations ont été affectées aux cas et aux témoins, afin de refléter les proportions de cas et de non cas dans la population source. Les propriétés des estimateurs des effets d’exposition ont été étudiées par simulation. Différents aspects d’exposition ont été générés (intensité, durée, valeur cumulée d’exposition). Les données cas-témoins générées ont été ensuite analysées avec différentes versions du modèle de Cox, incluant les définitions anciennes et nouvelles des ensembles de sujets à risque, ainsi qu’avec la régression logistique conventionnelle, à des fins de comparaison. Les différents modèles de régression ont ensuite été appliqués sur des données réelles cas-témoins sur le cancer du poumon. Les estimations des effets de différentes variables de tabac, obtenues avec les différentes méthodes, ont été comparées entre elles, et comparées aux résultats des simulations. Résultats. Les résultats des simulations montrent que les estimations des nouveaux modèles de Cox pondérés proposés, surtout celles du Weighted Cox model, sont bien moins biaisées que les estimations des modèles de Cox existants qui incluent ou excluent simplement les futurs cas de chaque ensemble de sujets à risque. De plus, les estimations du Weighted Cox model étaient légèrement, mais systématiquement, moins biaisées que celles de la régression logistique. L’application aux données réelles montre de plus grandes différences entre les estimations de la régression logistique et des modèles de Cox pondérés, pour quelques variables de tabac dépendant du temps. Conclusions. Les résultats suggèrent que le nouveau modèle de Cox pondéré propose pourrait être une alternative intéressante au modèle de régression logistique, pour estimer les effets d’expositions dépendant du temps dans les études cas-témoins
Resumo:
Background: An important challenge in conducting social research of specific relevance to harm reduction programs is locating hidden populations of consumers of substances like cannabis who typically report few adverse or unwanted consequences of their use. Much of the deviant, pathologized perception of drug users is historically derived from, and empirically supported, by a research emphasis on gaining ready access to users in drug treatment or in prison populations with higher incidence of problems of dependence and misuse. Because they are less visible, responsible recreational users of illicit drugs have been more difficult to study. Methods: This article investigates Respondent Driven Sampling (RDS) as a method of recruiting experienced marijuana users representative of users in the general population. Based on sampling conducted in a multi-city study (Halifax, Montreal, Toronto, and Vancouver), and compared to samples gathered using other research methods, we assess the strengths and weaknesses of RDS recruitment as a means of gaining access to illicit substance users who experience few harmful consequences of their use. Demographic characteristics of the sample in Toronto are compared with those of users in a recent household survey and a pilot study of Toronto where the latter utilized nonrandom self-selection of respondents. Results: A modified approach to RDS was necessary to attain the target sample size in all four cities (i.e., 40 'users' from each site). The final sample in Toronto was largely similar, however, to marijuana users in a random household survey that was carried out in the same city. Whereas well-educated, married, whites and females in the survey were all somewhat overrepresented, the two samples, overall, were more alike than different with respect to economic status and employment. Furthermore, comparison with a self-selected sample suggests that (even modified) RDS recruitment is a cost-effective way of gathering respondents who are more representative of users in the general population than nonrandom methods of recruitment ordinarily produce. Conclusions: Research on marijuana use, and other forms of drug use hidden in the general population of adults, is important for informing and extending harm reduction beyond its current emphasis on 'at-risk' populations. Expanding harm reduction in a normalizing context, through innovative research on users often overlooked, further challenges assumptions about reducing harm through prohibition of drug use and urges consideration of alternative policies such as decriminalization and legal regulation.
Resumo:
Study on variable stars is an important topic of modern astrophysics. After the invention of powerful telescopes and high resolving powered CCD’s, the variable star data is accumulating in the order of peta-bytes. The huge amount of data need lot of automated methods as well as human experts. This thesis is devoted to the data analysis on variable star’s astronomical time series data and hence belong to the inter-disciplinary topic, Astrostatistics. For an observer on earth, stars that have a change in apparent brightness over time are called variable stars. The variation in brightness may be regular (periodic), quasi periodic (semi-periodic) or irregular manner (aperiodic) and are caused by various reasons. In some cases, the variation is due to some internal thermo-nuclear processes, which are generally known as intrinsic vari- ables and in some other cases, it is due to some external processes, like eclipse or rotation, which are known as extrinsic variables. Intrinsic variables can be further grouped into pulsating variables, eruptive variables and flare stars. Extrinsic variables are grouped into eclipsing binary stars and chromospheri- cal stars. Pulsating variables can again classified into Cepheid, RR Lyrae, RV Tauri, Delta Scuti, Mira etc. The eruptive or cataclysmic variables are novae, supernovae, etc., which rarely occurs and are not periodic phenomena. Most of the other variations are periodic in nature. Variable stars can be observed through many ways such as photometry, spectrophotometry and spectroscopy. The sequence of photometric observa- xiv tions on variable stars produces time series data, which contains time, magni- tude and error. The plot between variable star’s apparent magnitude and time are known as light curve. If the time series data is folded on a period, the plot between apparent magnitude and phase is known as phased light curve. The unique shape of phased light curve is a characteristic of each type of variable star. One way to identify the type of variable star and to classify them is by visually looking at the phased light curve by an expert. For last several years, automated algorithms are used to classify a group of variable stars, with the help of computers. Research on variable stars can be divided into different stages like observa- tion, data reduction, data analysis, modeling and classification. The modeling on variable stars helps to determine the short-term and long-term behaviour and to construct theoretical models (for eg:- Wilson-Devinney model for eclips- ing binaries) and to derive stellar properties like mass, radius, luminosity, tem- perature, internal and external structure, chemical composition and evolution. The classification requires the determination of the basic parameters like pe- riod, amplitude and phase and also some other derived parameters. Out of these, period is the most important parameter since the wrong periods can lead to sparse light curves and misleading information. Time series analysis is a method of applying mathematical and statistical tests to data, to quantify the variation, understand the nature of time-varying phenomena, to gain physical understanding of the system and to predict future behavior of the system. Astronomical time series usually suffer from unevenly spaced time instants, varying error conditions and possibility of big gaps. This is due to daily varying daylight and the weather conditions for ground based observations and observations from space may suffer from the impact of cosmic ray particles. Many large scale astronomical surveys such as MACHO, OGLE, EROS, xv ROTSE, PLANET, Hipparcos, MISAO, NSVS, ASAS, Pan-STARRS, Ke- pler,ESA, Gaia, LSST, CRTS provide variable star’s time series data, even though their primary intention is not variable star observation. Center for Astrostatistics, Pennsylvania State University is established to help the astro- nomical community with the aid of statistical tools for harvesting and analysing archival data. Most of these surveys releases the data to the public for further analysis. There exist many period search algorithms through astronomical time se- ries analysis, which can be classified into parametric (assume some underlying distribution for data) and non-parametric (do not assume any statistical model like Gaussian etc.,) methods. Many of the parametric methods are based on variations of discrete Fourier transforms like Generalised Lomb-Scargle peri- odogram (GLSP) by Zechmeister(2009), Significant Spectrum (SigSpec) by Reegen(2007) etc. Non-parametric methods include Phase Dispersion Minimi- sation (PDM) by Stellingwerf(1978) and Cubic spline method by Akerlof(1994) etc. Even though most of the methods can be brought under automation, any of the method stated above could not fully recover the true periods. The wrong detection of period can be due to several reasons such as power leakage to other frequencies which is due to finite total interval, finite sampling interval and finite amount of data. Another problem is aliasing, which is due to the influence of regular sampling. Also spurious periods appear due to long gaps and power flow to harmonic frequencies is an inherent problem of Fourier methods. Hence obtaining the exact period of variable star from it’s time series data is still a difficult problem, in case of huge databases, when subjected to automation. As Matthew Templeton, AAVSO, states “Variable star data analysis is not always straightforward; large-scale, automated analysis design is non-trivial”. Derekas et al. 2007, Deb et.al. 2010 states “The processing of xvi huge amount of data in these databases is quite challenging, even when looking at seemingly small issues such as period determination and classification”. It will be beneficial for the variable star astronomical community, if basic parameters, such as period, amplitude and phase are obtained more accurately, when huge time series databases are subjected to automation. In the present thesis work, the theories of four popular period search methods are studied, the strength and weakness of these methods are evaluated by applying it on two survey databases and finally a modified form of cubic spline method is intro- duced to confirm the exact period of variable star. For the classification of new variable stars discovered and entering them in the “General Catalogue of Vari- able Stars” or other databases like “Variable Star Index“, the characteristics of the variability has to be quantified in term of variable star parameters.
Resumo:
The prediction of climate variability and change requires the use of a range of simulation models. Multiple climate model simulations are needed to sample the inherent uncertainties in seasonal to centennial prediction. Because climate models are computationally expensive, there is a tradeoff between complexity, spatial resolution, simulation length, and ensemble size. The methods used to assess climate impacts are examined in the context of this trade-off. An emphasis on complexity allows simulation of coupled mechanisms, such as the carbon cycle and feedbacks between agricultural land management and climate. In addition to improving skill, greater spatial resolution increases relevance to regional planning. Greater ensemble size improves the sampling of probabilities. Research from major international projects is used to show the importance of synergistic research efforts. The primary climate impact examined is crop yield, although many of the issues discussed are relevant to hydrology and health modeling. Methods used to bridge the scale gap between climate and crop models are reviewed. Recent advances include large-area crop modeling, quantification of uncertainty in crop yield, and fully integrated crop–climate modeling. The implications of trends in computer power, including supercomputers, are also discussed.
Resumo:
The Representative Soil Sampling Scheme of England and Wales has recorded information on the soil of agricultural land in England and Wales since 1969. It is a valuable source of information about the soil in the context of monitoring for sustainable agricultural development. Changes in soil nutrient status and pH were examined over the period 1971-2001. Several methods of statistical analysis were applied to data from the surveys during this period. The main focus here is on the data for 1971, 1981, 1991 and 2001. The results of examining change over time in general show that levels of potassium in the soil have increased, those of magnesium have remained fairly constant, those of phosphorus have declined and pH has changed little. Future sampling needs have been assessed in the context of monitoring, to determine the mean at a given level of confidence and tolerable error and to detect change in the mean over time at these same levels over periods of 5 and 10 years. The results of a non-hierarchical multivariate classification suggest that England and Wales could be stratified to optimize future sampling and analysis. To monitor soil quality and health more generally than for agriculture, more of the country should be sampled and a wider range of properties recorded.
Resumo:
The systematic sampling (SYS) design (Madow and Madow, 1944) is widely used by statistical offices due to its simplicity and efficiency (e.g., Iachan, 1982). But it suffers from a serious defect, namely, that it is impossible to unbiasedly estimate the sampling variance (Iachan, 1982) and usual variance estimators (Yates and Grundy, 1953) are inadequate and can overestimate the variance significantly (Särndal et al., 1992). We propose a novel variance estimator which is less biased and that can be implemented with any given population order. We will justify this estimator theoretically and with a Monte Carlo simulation study.
Resumo:
This paper presents our experience with combining statistical principles and participatory methods to generate national statistics. The methodology was developed in Malawi during 1999–2002. We demonstrate that if PRA is combined with statistical principles (including probability-based sampling and standardization), it can produce total population statistics and estimates of the proportion of households with certain characteristics (e.g., poverty). It can also provide quantitative data on complex issues of national importance such as poverty targeting. This approach is distinct from previous PRA-based approaches, which generate numbers at community level but only provide qualitative information at national level.
Resumo:
Imputation is commonly used to compensate for item non-response in sample surveys. If we treat the imputed values as if they are true values, and then compute the variance estimates by using standard methods, such as the jackknife, we can seriously underestimate the true variances. We propose a modified jackknife variance estimator which is defined for any without-replacement unequal probability sampling design in the presence of imputation and non-negligible sampling fraction. Mean, ratio and random-imputation methods will be considered. The practical advantage of the method proposed is its breadth of applicability.
Resumo:
Phylogenetic methods hold great promise for the reconstruction of the transition from precursor to modern flora and the identification of underlying factors which drive the process. The phylogenetic methods presently used to address the question of the origin of the Cape flora of South Africa are considered here. The sampling requirements of each of these methods, which include dating of diversifications using calibrated molecular trees, sister pair comparisons, lineage through time plots and biogeographical optimizations are reviewed. Sampling of genes, genomes and species are considered. Although increased higher-level studies and increased sampling are required for robust interpretation, it is clear that much progress is already made. It is argued that despite the remarkable richness of the flora, the Cape flora is a valuable model system to demonstrate the utility of phylogenetic methods in determining the history of a modern flora.
Resumo:
The sampling of certain solid angle is a fundamental operation in realistic image synthesis, where the rendering equation describing the light propagation in closed domains is solved. Monte Carlo methods for solving the rendering equation use sampling of the solid angle subtended by unit hemisphere or unit sphere in order to perform the numerical integration of the rendering equation. In this work we consider the problem for generation of uniformly distributed random samples over hemisphere and sphere. Our aim is to construct and study the parallel sampling scheme for hemisphere and sphere. First we apply the symmetry property for partitioning of hemisphere and sphere. The domain of solid angle subtended by a hemisphere is divided into a number of equal sub-domains. Each sub-domain represents solid angle subtended by orthogonal spherical triangle with fixed vertices and computable parameters. Then we introduce two new algorithms for sampling of orthogonal spherical triangles. Both algorithms are based on a transformation of the unit square. Similarly to the Arvo's algorithm for sampling of arbitrary spherical triangle the suggested algorithms accommodate the stratified sampling. We derive the necessary transformations for the algorithms. The first sampling algorithm generates a sample by mapping of the unit square onto orthogonal spherical triangle. The second algorithm directly compute the unit radius vector of a sampling point inside to the orthogonal spherical triangle. The sampling of total hemisphere and sphere is performed in parallel for all sub-domains simultaneously by using the symmetry property of partitioning. The applicability of the corresponding parallel sampling scheme for Monte Carlo and Quasi-D/lonte Carlo solving of rendering equation is discussed.
Resumo:
Consideration is given to a standard CDMA system and determination of the density function of the interference with and without Gaussian noise using sampling theory concepts. The formula derived provides fast and accurate results and is a simple, useful alternative to other methods
Resumo:
Weeds tend to aggregate in patches within fields and there is evidence that this is partly owing to variation in soil properties. Because the processes driving soil heterogeneity operate at different scales, the strength of the relationships between soil properties and weed density would also be expected to be scale-dependent. Quantifying these effects of scale on weed patch dynamics is essential to guide the design of discrete sampling protocols for mapping weed distribution. We have developed a general method that uses novel within-field nested sampling and residual maximum likelihood (REML) estimation to explore scale-dependent relationships between weeds and soil properties. We have validated the method using a case study of Alopecurus myosuroides in winter wheat. Using REML, we partitioned the variance and covariance into scale-specific components and estimated the correlations between the weed counts and soil properties at each scale. We used variograms to quantify the spatial structure in the data and to map variables by kriging. Our methodology successfully captured the effect of scale on a number of edaphic drivers of weed patchiness. The overall Pearson correlations between A. myosuroides and soil organic matter and clay content were weak and masked the stronger correlations at >50 m. Knowing how the variance was partitioned across the spatial scales we optimized the sampling design to focus sampling effort at those scales that contributed most to the total variance. The methods have the potential to guide patch spraying of weeds by identifying areas of the field that are vulnerable to weed establishment.
Resumo:
This paper presents a critique of current methods of sampling and analyzing soils for metals in archaeological prospection. Commonly used methodologies in soil science are shown to be suitable for archaeological investigations, with a concomitant improvement in their resolution. Understanding the soil-fraction location, concentration range, and spatial distribution of autochthonous (native) soil metals is shown to be a vital precursor to archaeological-site investigations, as this is the background upon which anthropogenic deposition takes place. Nested sampling is suggested as the most cost-effective method of investigating the spatial variability in the autochthonous metal concentrations. The use of the appropriate soil horizon (or sampling depth) and point sampling are critical in the preparation of a sampling regime. Simultaneous extraction is proposed as the most efficient method of identifying the location and eventual fate of autochthonous and anthropogenic metals, respectively.
Resumo:
Sea surface temperature (SST) data are often provided as gridded products, typically at resolutions of order 0.05 degrees from satellite observations to reduce data volume at the request of data users and facilitate comparison against other products or models. Sampling uncertainty is introduced in gridded products where the full surface area of the ocean within a grid cell cannot be fully observed because of cloud cover. In this paper we parameterise uncertainties in SST as a function of the percentage of clear-sky pixels available and the SST variability in that subsample. This parameterisation is developed from Advanced Along Track Scanning Radiometer (AATSR) data, but is applicable to all gridded L3U SST products at resolutions of 0.05-0.1 degrees, irrespective of instrument and retrieval algorithm, provided that instrument noise propagated into the SST is accounted for. We also calculate the sampling uncertainty of ~0.04 K in Global Area Coverage (GAC) Advanced Very High Resolution Radiometer (AVHRR) products, using related methods.