947 resultados para stratified random sampling


Relevância:

100.00% 100.00%

Publicador:

Resumo:

With Tweet volumes reaching 500 million a day, sampling is inevitable for any application using Twitter data. Realizing this, data providers such as Twitter, Gnip and Boardreader license sampled data streams priced in accordance with the sample size. Big Data applications working with sampled data would be interested in working with a large enough sample that is representative of the universal dataset. Previous work focusing on the representativeness issue has considered ensuring the global occurrence rates of key terms, be reliably estimated from the sample. Present technology allows sample size estimation in accordance with probabilistic bounds on occurrence rates for the case of uniform random sampling. In this paper, we consider the problem of further improving sample size estimates by leveraging stratification in Twitter data. We analyze our estimates through an extensive study using simulations and real-world data, establishing the superiority of our method over uniform random sampling. Our work provides the technical know-how for data providers to expand their portfolio to include stratified sampled datasets, whereas applications are benefited by being able to monitor more topics/events at the same data and computing cost.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

OBJETIVO: Estimar a prevalência e fatores associados à doença pulmonar obstrutiva crônica. MÉTODOS: Estudo transversal, de base populacional com 1.441 indivíduos de ambos os sexos e com 40 anos de idade ou mais no município de São Paulo, SP, entre 2008 e 2009. As informações foram coletadas por meio de entrevistas domiciliares e os participantes foram selecionados a partir de amostragem probabilística, estratificada por sexo e idade, e por conglomerados em dois estágios (setores censitários e domicílios). Foi realizada regressão múltipla de Poisson na análise ajustada. RESULTADOS: Dos entrevistados, 4,2% (IC95% 3,1;5,4) referiram doença pulmonar obstrutiva crônica. Após análise ajustada, identificaram-se os seguintes fatores independentemente associados ao agravo: número de cigarros fumados na vida (> 1.500/nenhum) RP = 3,85 (IC95%: 1,87;7,94), cansar-se com facilidade (sim/não) RP = 2,61 (IC95% 1,39;4,90), idade (60 a 69 anos/50 a 59 anos) RP = 3,27 (IC95% 1,01;11,24), idade (70 anos e mais/50 a 59 anos) RP = 4,29 (IC95% 1,30;14,29), problemas de saúde nos últimos 15 dias (sim/não) RP = 1,31 (IC95% 1,02;1,77), e atividade física no tempo livre (sim/não) RP = 0,57 (IC95% 0,26;0,97). CONCLUSÕES: A prevalência da doença pulmonar obstrutiva crônica é elevada e está associada ao uso do tabaco e idade acima de 60 anos. Os problemas de saúde freqüentes e redução da atividade física no tempo livre podem ser considerados conseqüências dessa doença.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Epidemiological studies of drug misusers have until recently relied on two main forms of sampling: probability and convenience. The former has been used when the aim was simply to estimate the prevalence of the condition and the latter when in depth studies of the characteristics, profiles and behaviour of drug users were required, but each method has its limitations. Probability samples become impracticable when the prevalence of the condition is very low, less than 0.5% for example, or when the condition being studied is a clandestine activity such as illicit drug use. When stratified random samples are used, it may be difficult to obtain a truly representative sample, depending on the quality of the information used to develop the stratification strategy. The main limitation of studies using convenience samples is that the results cannot be generalised to the whole population of drug users due to selection bias and a lack of information concerning the sampling frame. New methods have been developed which aim to overcome some of these difficulties, for example, social network analysis, snowball sampling, capture-recapture techniques, privileged access interviewer method and contact tracing. All these methods have been applied to the study of drug misuse. The various methods are described and examples of their use given, drawn from both the Brazilian and international drug misuse literature.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Introdução - A prevalência da doença pulmonar obstrutiva crónica (DPOC) apresenta valores muito heterogéneos em todo o mundo. A iniciativa Burden of Obstructive Lung Disease (BOLD) foi desenvolvida para que a prevalência da DPOC possa ser avaliada com metodologia uniformizada. O objetivo deste estudo foi estimar a prevalência da DPOC em adultos com 40 ou mais anos numa população alvo de 2 700 000 habitantes na região de Lisboa, de acordo com o protocolo BOLD. Métodos - A amostra foi estratificada de forma aleatória multifaseada selecionando-se 12 freguesias. O inquérito compreendia um questionário com informação sobre fatores de risco para a DPOC e doença respiratória autoreportada; adicionalmente, foi efetuada espirometria com prova de broncodilatação. Resultados - Foram incluídos 710 participantes com questionário e espirometria aceitáveis. A prevalência estimada da DPOC na população no estadio GOLD I+ foi de 14,2% (IC 95%: 11,1; 18,1) e de 7,3% no estadio ii+ (IC 95%: 4,7; 11,3). A prevalência não ajustada foi de 20,2% (IC 95%: 17,4; 23,3) no estadio i+ e de 9,5% (IC 95%: 7,6; 11,9) no estadio ii+. A prevalência da DPOC no estadio GOLD II+ aumentou com a idade, sendo mais elevada no sexo masculino. A prevalência estimada da DPOC no estadio GOLD I+ foi de 9,2% (IC 95%: 5,9; 14,0) nos não fumadores versus 27,4% (IC 95%: 18,5; 38,5) nos fumadores com carga tabágica de ≥ 20 Unidades Maço Ano. Detetou-se uma fraca concordância entre a referência a diagnóstico médico prévio e o diagnóstico espirométrico, com 86,8% de subdiagnósticos. Conclusões - O achado de uma prevalência estimada da DPOC de 14,2% sugere que esta é uma doença comum na região de Lisboa, contudo com uma elevada proporção de subdiagnósticos. Estes dados apontam para a necessidade de aumentar o grau de conhecimento dos profissionais de saúde sobre a DPOC, bem como a necessidade de maior utilização da espirometria nos cuidados de saúde primários.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objectives: To identify factors that correlate with insulin values and to examine its independent associations among adolescents. Methods: A cross-sectional population-based study was conducted among adolescents aged 12-16,9 years old. A multi-stage stratified cluster random sampling method was employed. Anthropometric measurements and nutritional survey were performed, and fasting blood samples for insulin were obtained. Statistics: Multiple lineal regression. Results: 379 adolescents were included. Mean age was 14.08 ± 1.30 years. Factors associated with higher fasting insulin levels were puberty [ 4.55 (95% IC 0.42-8.69)], abdominal obesity [ 6.11 (95% IC 3.93-8.29)] and to be born small for gestational age (SGA) [ 7.45 (95% IC 2.47-12.44)]. It was observed a negative association between the regular intake of olive oil at home and insulin values [ -4.14 (95% IC -7.31- -0.98)]. Conclusions: Abdominal obesity and SGA were factors associated with higher fasting insulin values. In contrast, the regular intake of olive oil at home was an independent protective factor.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

BACKGROUND Mental and body weight disorders are among the major global health challenges, and their comorbidity may play an important role in treatment and prevention of both pathologies. A growing number of studies have examined the relationship between psychiatric status and body weight, but our knowledge is still limited. OBJECTIVE The present study aims to investigate the cross-sectional relationships of psychiatric status and body mass index (BMI) in Málaga, a Mediterranean city in the South of Spain. MATERIALS AND METHODS A total of 563 participants were recruited from those who came to his primary care physician, using a systematic random sampling, non-proportional stratified by BMI categories. Structured clinical interviews were used to assess current Axes-I and II mental disorders according to the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (DSM-IV-TR). BMI was calculated as weight (Kg) divided by square of height in meters (m2). Logistic regression was used to investigate the association between BMI and the presence of any mental disorder. BMI was introduced in the models using restricted cubic splines. RESULTS We found that high BMI values were directly associated with mood and adjustment disorders, and low BMI values were directly associated with avoidant and dependent personality disorders (PDs). We observed an inverse relationship between low BMI values and cluster A PDs. There were not significant relationships between anxiety or substance-related disorders and BMI. CONCLUSION Psychiatric status and BMI are related in a Mediterranean Spanish population. A multidisciplinary approach to both pathologies becomes increasingly more necessary.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Aim Specialized mutualistic clades may revert and thus increase their autonomy and generalist characteristics. However, our understanding of the drivers that trigger reductions in mutualistic traits and of the consequences for the tolerance of these species to various environmental conditions remains limited. This study investigates the relationship between the environmental niche and the degree of myrmecophily (i.e. the ability to interact with ants) among members of the Lycaenidae. Location The western Swiss Alps. Methods We measured the tolerance of Lycaenidae species to low temperatures by comparing observations from a random stratified field sampling with climatic maps. We then compared the species-specific degree of myrmecophily with the species range limits at colder temperatures while controlling for phylogenetic dependence. We further evaluated whether the community-averaged degree of myrmecophily increases with temperature, as would be expected in the case of environmental filters acting on myrmecophilous species. Results Twenty-nine Lycaenidae species were found during sampling. Ancestral state reconstruction indicated that the 24 species of Polyommatinae displayed both strong myrmecophily and secondary loss of mutualism; these species were used in the subsequent statistical analyses. Species with a higher degree of ant interaction were, on average, more likely to inhabit warmer sites. Species inhabiting the coldest environments displayed little or no interaction with ants. Main conclusions Colder climates at high elevations filter out species with a high degree of myrmecophily and may have been the direct evolutionary force that promoted the loss of mutualism. A larger taxon sampling across the Holarctic may help to distinguish between the ecological and evolutionary effects of climate.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

OBJECTIVE: Accuracy studies of Patient Safety Indicators (PSIs) are critical but limited by the large samples required due to low occurrence of most events. We tested a sampling design based on test results (verification-biased sampling [VBS]) that minimizes the number of subjects to be verified. METHODS: We considered 3 real PSIs, whose rates were calculated using 3 years of discharge data from a university hospital and a hypothetical screen of very rare events. Sample size estimates, based on the expected sensitivity and precision, were compared across 4 study designs: random and VBS, with and without constraints on the size of the population to be screened. RESULTS: Over sensitivities ranging from 0.3 to 0.7 and PSI prevalence levels ranging from 0.02 to 0.2, the optimal VBS strategy makes it possible to reduce sample size by up to 60% in comparison with simple random sampling. For PSI prevalence levels below 1%, the minimal sample size required was still over 5000. CONCLUSIONS: Verification-biased sampling permits substantial savings in the required sample size for PSI validation studies. However, sample sizes still need to be very large for many of the rarer PSIs.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Identifying the geographic distribution of populations is a basic, yet crucial step in many fundamental and applied ecological projects, as it provides key information on which many subsequent analyses depend. However, this task is often costly and time consuming, especially where rare species are concerned and where most sampling designs generally prove inefficient. At the same time, rare species are those for which distribution data are most needed for their conservation to be effective. To enhance fieldwork sampling, model-based sampling (MBS) uses predictions from species distribution models: when looking for the species in areas of high habitat suitability, chances should be higher to find them. We thoroughly tested the efficiency of MBS by conducting an important survey in the Swiss Alps, assessing the detection rate of three rare and five common plant species. For each species, habitat suitability maps were produced following an ensemble modeling framework combining two spatial resolutions and two modeling techniques. We tested the efficiency of MBS and the accuracy of our models by sampling 240 sites in the field (30 sitesx8 species). Across all species, the MBS approach proved to be effective. In particular, the MBS design strictly led to the discovery of six sites of presence of one rare plant, increasing chances to find this species from 0 to 50%. For common species, MBS doubled the new population discovery rates as compared to random sampling. Habitat suitability maps coming from the combination of four individual modeling methods predicted well the species' distribution and more accurately than the individual models. As a conclusion, using MBS for fieldwork could efficiently help in increasing our knowledge of rare species distribution. More generally, we recommend using habitat suitability models to support conservation plans.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Because data on rare species usually are sparse, it is important to have efficient ways to sample additional data. Traditional sampling approaches are of limited value for rare species because a very large proportion of randomly chosen sampling sites are unlikely to shelter the species. For these species, spatial predictions from niche-based distribution models can be used to stratify the sampling and increase sampling efficiency. New data sampled are then used to improve the initial model. Applying this approach repeatedly is an adaptive process that may allow increasing the number of new occurrences found. We illustrate the approach with a case study of a rare and endangered plant species in Switzerland and a simulation experiment. Our field survey confirmed that the method helps in the discovery of new populations of the target species in remote areas where the predicted habitat suitability is high. In our simulations the model-based approach provided a significant improvement (by a factor of 1.8 to 4 times, depending on the measure) over simple random sampling. In terms of cost this approach may save up to 70% of the time spent in the field.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Tutkimuksen avulla selvitettiin opintomenestykseen liittyviä tekijöitä Lappeenrannan teknillisessä korkeakoulussa (LTKK). Tutkimus liittyi opetuksen ja oppimisen kehitystyöhön tuotantotalouden osastolla. Tutkimuksen viitekehyksen muodosti oppimistuloksia selittävä malli, joka on laadittu Tynjälän (1999) kokoaman mallin perusteella. Tutkimuksen perusjoukko muodostui LTKK:n läsnä olevista perusopiskelijoista lukuun ottamatta jatko- ja vaihto-opiskelijoita. Opiskelijat jaettiin ositetulla otannalla ryhmiin, joissa suoritettiin yksinkertainen satunnaisotanta. Otoskoko oli 645 opiskelijaa. Tiedonkeruumenetelmänä oli Internet-kysely. Aineisto analysoitiin useiden kvantitatiivisten ja kvalitatiivisten menetelmien avulla. Tutkimuksen tuloksia voidaan pitää luotettavina ja tutkimuksen avulla saatiin tärkeää ja hyödyllistä tietoa opintomenestyksestä ja oppimisprosesseista. Tulosten perusteella merkittävimmät oppimistuloksiin positiivisesti liittyvät tekijät ovat syväsuuntautunut opiskelustrategia ja luottaminen omiin kykyihin, ja negatiiviset tekijät ovat oppimisen itsesäätelyn puute, omien kykyjen epäily ja pintasuuntautunut opiskelustrategia. Merkitysorientoituneet, itsesäätelykykyiset opiskelijat menestyivät LTKK:ssa parhaiten.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

L'imputation est souvent utilisée dans les enquêtes pour traiter la non-réponse partielle. Il est bien connu que traiter les valeurs imputées comme des valeurs observées entraîne une sous-estimation importante de la variance des estimateurs ponctuels. Pour remédier à ce problème, plusieurs méthodes d'estimation de la variance ont été proposées dans la littérature, dont des méthodes adaptées de rééchantillonnage telles que le Bootstrap et le Jackknife. Nous définissons le concept de double-robustesse pour l'estimation ponctuelle et de variance sous l'approche par modèle de non-réponse et l'approche par modèle d'imputation. Nous mettons l'emphase sur l'estimation de la variance à l'aide du Jackknife qui est souvent utilisé dans la pratique. Nous étudions les propriétés de différents estimateurs de la variance à l'aide du Jackknife pour l'imputation par la régression déterministe ainsi qu'aléatoire. Nous nous penchons d'abord sur le cas de l'échantillon aléatoire simple. Les cas de l'échantillonnage stratifié et à probabilités inégales seront aussi étudiés. Une étude de simulation compare plusieurs méthodes d'estimation de variance à l'aide du Jackknife en terme de biais et de stabilité relative quand la fraction de sondage n'est pas négligeable. Finalement, nous établissons la normalité asymptotique des estimateurs imputés pour l'imputation par régression déterministe et aléatoire.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The EU Water Framework Directive (WFD) requires that the ecological and chemical status of water bodies in Europe should be assessed, and action taken where possible to ensure that at least "good" quality is attained in each case by 2015. This paper is concerned with the accuracy and precision with which chemical status in rivers can be measured given certain sampling strategies, and how this can be improved. High-frequency (hourly) chemical data from four rivers in southern England were subsampled to simulate different sampling strategies for four parameters used for WFD classification: dissolved phosphorus, dissolved oxygen, pH and water temperature. These data sub-sets were then used to calculate the WFD classification for each site. Monthly sampling was less precise than weekly sampling, but the effect on WFD classification depended on the closeness of the range of concentrations to the class boundaries. In some cases, monthly sampling for a year could result in the same water body being assigned to three or four of the WFD classes with 95% confidence, due to random sampling effects, whereas with weekly sampling this was one or two classes for the same cases. In the most extreme case, the same water body could have been assigned to any of the five WFD quality classes. Weekly sampling considerably reduces the uncertainties compared to monthly sampling. The width of the weekly sampled confidence intervals was about 33% that of the monthly for P species and pH, about 50% for dissolved oxygen, and about 67% for water temperature. For water temperature, which is assessed as the 98th percentile in the UK, monthly sampling biases the mean downwards by about 1 °C compared to the true value, due to problems of assessing high percentiles with limited data. Low-frequency measurements will generally be unsuitable for assessing standards expressed as high percentiles. Confining sampling to the working week compared to all 7 days made little difference, but a modest improvement in precision could be obtained by sampling at the same time of day within a 3 h time window, and this is recommended. For parameters with a strong diel variation, such as dissolved oxygen, the value obtained, and thus possibly the WFD classification, can depend markedly on when in the cycle the sample was taken. Specifying this in the sampling regime would be a straightforward way to improve precision, but there needs to be agreement about how best to characterise risk in different types of river. These results suggest that in some cases it will be difficult to assign accurate WFD chemical classes or to detect likely trends using current sampling regimes, even for these largely groundwater-fed rivers. A more critical approach to sampling is needed to ensure that management actions are appropriate and supported by data.