951 resultados para statistical distribution
Resumo:
Distributed systems are widely used for solving large-scale and data-intensive computing problems, including all-to-all comparison (ATAC) problems. However, when used for ATAC problems, existing computational frameworks such as Hadoop focus on load balancing for allocating comparison tasks, without careful consideration of data distribution and storage usage. While Hadoop-based solutions provide users with simplicity of implementation, their inherent MapReduce computing pattern does not match the ATAC pattern. This leads to load imbalances and poor data locality when Hadoop's data distribution strategy is used for ATAC problems. Here we present a data distribution strategy which considers data locality, load balancing and storage savings for ATAC computing problems in homogeneous distributed systems. A simulated annealing algorithm is developed for data distribution and task scheduling. Experimental results show a significant performance improvement for our approach over Hadoop-based solutions.
Resumo:
This chapter addresses opportunities for problem posing in developing young children’s statistical literacy, with a focus on student-directed investigations. Although the notion of problem posing has broadened in recent years, there nevertheless remains limited research on how problem posing can be integrated within the regular mathematics curriculum, especially in the areas of statistics and probability. The chapter first reviews briefly aspects of problem posing that have featured in the literature over the years. Consideration is next given to the importance of developing children’s statistical literacy in which problem posing is an inherent feature. Some findings from a school playground investigation conducted in four, fourth-grade classes illustrate the different ways in which children posed investigative questions, how they made predictions about their outcomes and compared these with their findings, and the ways in which they chose to represent their findings.
Resumo:
As statistical education becomes more firmly embedded in the school curriculum and its value across the curriculum is recognised, attention moves from knowing procedures, such as calculating a mean or drawing a graph, to understanding the purpose of a statistical investigation in decision making in many disciplines. As students learn to complete the stages of an investigation, the question of meaningful assessment of the process arises. This paper considers models for carrying out a statistical inquiry and, based on a four-phase model, creates a developmental squence that can be used for the assessment of outcomes from each of the four phases as well as for the complete inquiry. The developmental sequence is based on the SOLO model, focussing on the "observed" outcomes during the inquiry process.
Resumo:
Modularity has been suggested to be connected to evolvability because a higher degree of independence among parts allows them to evolve as separate units. Recently, the Escoufier RV coefficient has been proposed as a measure of the degree of integration between modules in multivariate morphometric datasets. However, it has been shown, using randomly simulated datasets, that the value of the RV coefficient depends on sample size. Also, so far there is no statistical test for the difference in the RV coefficient between a priori defined groups of observations. Here, we (1), using a rarefaction analysis, show that the value of the RV coefficient depends on sample size also in real geometric morphometric datasets; (2) propose a permutation procedure to test for the difference in the RV coefficient between a priori defined groups of observations; (3) show, through simulations, that such a permutation procedure has an appropriate Type I error; (4) suggest that a rarefaction procedure could be used to obtain sample-size-corrected values of the RV coefficient; and (5) propose a nearest-neighbor procedure that could be used when studying the variation of modularity in geographic space. The approaches outlined here, readily extendable to non-morphometric datasets, allow study of the variation in the degree of integration between a priori defined modules. A Java application – that will allow performance of the proposed test using a software with graphical user interface – has also been developed and is available at the Morphometrics at Stony Brook Web page (http://life.bio.sunysb.edu/morph/).
Resumo:
Genome-wide association studies (GWAS) have identified around 60 common variants associated with multiple sclerosis (MS), but these loci only explain a fraction of the heritability of MS. Some missing heritability may be caused by rare variants that have been suggested to play an important role in the aetiology of complex diseases such as MS. However current genetic and statistical methods for detecting rare variants are expensive and time consuming. 'Population-based linkage analysis' (PBLA) or so called identity-by-descent (IBD) mapping is a novel way to detect rare variants in extant GWAS datasets. We employed BEAGLE fastIBD to search for rare MS variants utilising IBD mapping in a large GWAS dataset of 3,543 cases and 5,898 controls. We identified a genome-wide significant linkage signal on chromosome 19 (LOD = 4.65; p = 1.9×10-6). Network analysis of cases and controls sharing haplotypes on chromosome 19 further strengthened the association as there are more large networks of cases sharing haplotypes than controls. This linkage region includes a cluster of zinc finger genes of unknown function. Analysis of genome wide transcriptome data suggests that genes in this zinc finger cluster may be involved in very early developmental regulation of the CNS. Our study also indicates that BEAGLE fastIBD allowed identification of rare variants in large unrelated population with moderate computational intensity. Even with the development of whole-genome sequencing, IBD mapping still may be a promising way to narrow down the region of interest for sequencing priority. © 2013 Lin et al.
Resumo:
We thank Ploski and colleagues for their interest in our study. The explanation for the difference in our findings is a typographic error in Table 2 of our article, whereby the alleles for marker TNF ⫺1031 were labeled incorrectly...
Resumo:
Provision of network infrastructure to meet rising network peak demand is increasing the cost of electricity. Addressing this demand is a major imperative for Australian electricity agencies. The network peak demand model reported in this paper provides a quantified decision support tool and a means of understanding the key influences and impacts on network peak demand. An investigation of the system factors impacting residential consumers’ peak demand for electricity was undertaken in Queensland, Australia. Technical factors, such as the customers’ location, housing construction and appliances, were combined with social factors, such as household demographics, culture, trust and knowledge, and Change Management Options (CMOs) such as tariffs, price,managed supply, etc., in a conceptual ‘map’ of the system. A Bayesian network was used to quantify the model and provide insights into the major influential factors and their interactions. The model was also used to examine the reduction in network peak demand with different market-based and government interventions in various customer locations of interest and investigate the relative importance of instituting programs that build trust and knowledge through well designed customer-industry engagement activities. The Bayesian network was implemented via a spreadsheet with a tick box interface. The model combined available data from industry-specific and public sources with relevant expert opinion. The results revealed that the most effective intervention strategies involve combining particular CMOs with associated education and engagement activities. The model demonstrated the importance of designing interventions that take into account the interactions of the various elements of the socio-technical system. The options that provided the greatest impact on peak demand were Off-Peak Tariffs and Managed Supply and increases in the price of electricity. The impact in peak demand reduction differed for each of the locations and highlighted that household numbers, demographics as well as the different climates were significant factors. It presented possible network peak demand reductions which would delay any upgrade of networks, resulting in savings for Queensland utilities and ultimately for households. The use of this systems approach using Bayesian networks to assist the management of peak demand in different modelled locations in Queensland provided insights about the most important elements in the system and the intervention strategies that could be tailored to the targeted customer segments.
Resumo:
Bone mass acquired during childhood is the primary determinant of adult bone mineral density (BMD) and osteoporosis risk. Bone accrual is subject to genetic influences. Activating and inactivating LRP5 gene mutations elicit extreme bone phenotypes, while more common LRP5 polymorphisms are associated with normal variation of BMD. Our aim was to test the hypothesis that LRP5 gene polymorphisms influence bone mass acquisition during childhood. The association between LRP5 gene polymorphisms and bone size and mineralization was examined in 819 unrelated British Caucasian children (n = 429 boys) aged 9 years. Height, weight, pubertal status (where available), total-body and spinal bone area, bone mineral content (BMC), BMD, and area-adjusted BMC (aBMC) were assessed. Dual-energy X-ray absorptiometry (DXA)-gene associations were assessed by linear regression, with adjustment for age, gender, pubertal status, and body size parameters. There were 140, 79, 12, and 2 girls who achieved Tanner stages I-IV, respectively, and 179 and 32 boys who achieved Tanner stages I and II, respectively. The rs2306862 (N740N) coding polymorphism in exon 10 of the LRP5 gene was associated with spinal BMD and aBMC (each P = 0.01) and total-body BMD and aBMC (P = 0.04 and 0.03, respectively). Adjusting for pubertal stage strengthened associations between this polymorphism and spinal BMD and aBMC (P = 0.01 and 0.002, respectively). Individuals homozygous for the T allele had greater spinal BMD and aBMC scores than those homozygous for the C allele. A dose effect was apparent as the mean spinal BMD and aBMC of heterozygous TC individuals were intermediate between those of their TT and CC counterparts. The N740N polymorphism in exon 10 of LRP5 was associated with spinal BMD and aBMC in pre- and early pubertal children. These results indicate that LRP5 influences volumetric bone density in childhood, possibly through effects on trabecular bone formation.
Resumo:
Some statistical procedures already available in literature are employed in developing the water quality index, WQI. The nature of complexity and interdependency that occur in physical and chemical processes of water could be easier explained if statistical approaches were applied to water quality indexing. The most popular statistical method used in developing WQI is the principal component analysis (PCA). In literature, the WQI development based on the classical PCA mostly used water quality data that have been transformed and normalized. Outliers may be considered in or eliminated from the analysis. However, the classical mean and sample covariance matrix used in classical PCA methodology is not reliable if the outliers exist in the data. Since the presence of outliers may affect the computation of the principal component, robust principal component analysis, RPCA should be used. Focusing in Langat River, the RPCA-WQI was introduced for the first time in this study to re-calculate the DOE-WQI. Results show that the RPCA-WQI is capable to capture similar distribution in the existing DOE-WQI.
Resumo:
Adverse health effects caused by worker exposure to ultrafine particles have been detected in recent years. The scientific community focuses on the assessment of ultrafine aerosols in different microenvironments in order to determine the related worker exposure/dose levels. To this end, particle size distribution measurements have to be taken along with total particle number concentrations. The latter are obtainable through hand-held monitors. A portable particle size distribution analyzer (Nanoscan SMPS 3910, TSI Inc.) was recently commercialized, but so far no metrological assessment has been performed to characterize its performance with respect to well-established laboratory- based instruments such as the scanning mobility particle sizer (SMPS) spectrometer. The present paper compares the aerosol monitoring capability of the Nanoscan SMPS to the laboratory SMPS in order to evaluate whether the Nanoscan SMPS is suitable for field experiments designed to characterize particle exposure in different microenvironments. Tests were performed both in a Marple calm air chamber, where fresh diesel particulate matter and atomized dioctyl phthalate particles were monitored, and in microenvironments, where outdoor, urban, indoor aged, and indoor fresh aerosols were measured. Results show that the Nanoscan SMPS is able to properly measure the particle size distribution for each type of aerosol investigated, but it overestimates the total particle number concentration in the case of fresh aerosols. In particular, the test performed in the Marple chamber showed total concentrations up to twice those measured by the laboratory SMPS—likely because of the inability of the Nanoscan SMPS unipolar charger to properly charge aerosols made up of aggregated particles. Based on these findings, when field test exposure studies are conducted, the Nanoscan SMPS should be used in tandem
Resumo:
Japanese encephalitis (JE) is the most common cause of viral encephalitis and an important public health concern in the Asia-Pacific region, particularly in China where 50% of global cases are notified. To explore the association between environmental factors and human JE cases and identify the high risk areas for JE transmission in China, we used annual notified data on JE cases at the center of administrative township and environmental variables with a pixel resolution of 1 km×1 km from 2005 to 2011 to construct models using ecological niche modeling (ENM) approaches based on maximum entropy. These models were then validated by overlaying reported human JE case localities from 2006 to 2012 onto each prediction map. ENMs had good discriminatory ability with the area under the curve (AUC) of the receiver operating curve (ROC) of 0.82-0.91, and low extrinsic omission rate of 5.44-7.42%. Resulting maps showed JE being presented extensively throughout southwestern and central China, with local spatial variations in probability influenced by minimum temperatures, human population density, mean temperatures, and elevation, with contribution of 17.94%-38.37%, 15.47%-21.82%, 3.86%-21.22%, and 12.05%-16.02%, respectively. Approximately 60% of JE cases occurred in predicted high risk areas, which covered less than 6% of areas in mainland China. Our findings will help inform optimal geographical allocation of the limited resources available for JE prevention and control in China, find hidden high-risk areas, and increase the effectiveness of public health interventions against JE transmission.
Resumo:
Electronic cigarette-generated mainstream aerosols were characterized in terms of particle number concentrations and size distributions through a Condensation Particle Counter and a Fast Mobility Particle Sizer spectrometer, respectively. A thermodilution system was also used to properly sample and dilute the mainstream aerosol. Different types of electronic cigarettes, liquid flavors, liquid nicotine contents, as well as different puffing times were tested. Conventional tobacco cigarettes were also investigated. The total particle number concentration peak (for 2-s puff), averaged across the different electronic cigarette types and liquids, was measured equal to 4.39 ± 0.42 × 109 part. cm−3, then comparable to the conventional cigarette one (3.14 ± 0.61 × 109 part. cm−3). Puffing times and nicotine contents were found to influence the particle concentration, whereas no significant differences were recognized in terms of flavors and types of cigarettes used. Particle number distribution modes of the electronic cigarette-generated aerosol were in the 120–165 nm range, then similar to the conventional cigarette one.
Resumo:
Background Spanish is one of the five most spoken languages in the world. There is currently no published Spanish version of the Örebro Musculoskeletal Pain Questionnaire (OMPQ). The aim of the present study is to describe the process of translating the OMPQ into Spanish and to perform an analysis of reliability, internal structure, internal consistency and concurrent criterion-related validity. Methods Design: Translation and psychometric testing. Procedure: Two independent translators translated the OMPQ into Spanish. From both translations a consensus version was achieved. A backward translation was made to verify and resolve any semantic or conceptual problems. A total of 104 patients (67 men/37 women) with a mean age of 53.48 (±11.63), suffering from chronic musculoskeletal disorders, twice completed a Spanish version of the OMPQ. Statistical analysis was performed to evaluate the reliability, the internal structure, internal consistency and concurrent criterion-related validity with reference to the gold standard questionnaire SF-12v2. Results All variables except “Coping” showed a rate above 0.85 on reliability. The internal structure calculation through exploratory factor analysis indicated that 75.2% of the variance can be explained with six components with an eigenvalue higher than 1 and 52.1% with only three components higher than 10% of variance explained. In the concurrent criterion-related validity, several significant correlations were seen close to 0.6, exceeding that value in the correlation between general health and total value of the OMPQ. Conclusions The Spanish version of the screening questionnaire OMPQ can be used to identify Spanish patients with musculoskeletal pain at risk of developing a chronic disability.
Resumo:
Background An increasing body of evidence associates a high level of sitting time with poor health outcomes. The benefits of moderate to vigorous-intensity physical activities to various aspects of health are now well documented; however, individuals may engage in moderate-intensity physical activity for at least 30 minutes on five or more days of the week and still exhibit a high level of sitting time. This purpose of this study was to examine differences in total wellness among adults relative to high/low levels of sitting time combined with insufficient/sufficient physical activity (PA). The construct of total wellness incorporates a holistic approach to the body, mind and spirit components of life, an approach which may be more encompassing than some definitions of health. Methods Data were obtained from 226 adult respondents (27 ± 6 years), including 116 (51%) males and 110 (49%) females. Total PA and total sitting time were assessed with the International Physical Activity Questionnaire (IPAQ) (short-version). The Wellness Evaluation of Lifestyle Inventory was used to assess total wellness. An analysis of covariance (ANCOVA) was utilised to assess the effects of the sitting time/physical activity group on total wellness. A covariate was included to partial out the effects of age, sex and work status (student or employed). Cross-tabulations were used to show associations between the IPAQ derived high/low levels of sitting time with insufficient/sufficient PA and the three total wellness groups (i.e. high level of wellness, moderate wellness and wellness development needed). Results The majority of the participants were located in the high total sitting time and sufficient PA group. There were statistical differences among the IPAQ groups for total wellness [F (2,220) = 32.5 (p <0.001)]. A Chi-square test revealed a significant difference in the distribution of the IPAQ categories within the classification of wellness [χ2 (N = 226) = 54.5, p < .001]. One-hundred percent (100%) of participants who self-rated as high total sitting time/insufficient PA were found in the wellness development needed group. In contrast, 72% of participants who were located in the low total sitting time/sufficient PA group were situated in the moderate wellness group. Conclusion Many participants who meet the physical activity guidelines, in this sample, sit for longer periods of time than the median Australian sitting time. An understanding of the effects of the enhanced PA and reduced sitting time on total wellness can add to the development of public health initiatives. Keywords: IPAQ; The Wellness Evaluation of Lifestyle (WEL); Sedentary lifestyle
Resumo:
Based on protein molecular dynamics, we investigate the fractal properties of energy, pressure and volume time series using the multifractal detrended fluctuation analysis (MF-DFA) and the topological and fractal properties of their converted horizontal visibility graphs (HVGs). The energy parameters of protein dynamics we considered are bonded potential, angle potential, dihedral potential, improper potential, kinetic energy, Van der Waals potential, electrostatic potential, total energy and potential energy. The shape of the h(q)h(q) curves from MF-DFA indicates that these time series are multifractal. The numerical values of the exponent h(2)h(2) of MF-DFA show that the series of total energy and potential energy are non-stationary and anti-persistent; the other time series are stationary and persistent apart from series of pressure (with H≈0.5H≈0.5 indicating the absence of long-range correlation). The degree distributions of their converted HVGs show that these networks are exponential. The results of fractal analysis show that fractality exists in these converted HVGs. For each energy, pressure or volume parameter, it is found that the values of h(2)h(2) of MF-DFA on the time series, exponent λλ of the exponential degree distribution and fractal dimension dBdB of their converted HVGs do not change much for different proteins (indicating some universality). We also found that after taking average over all proteins, there is a linear relationship between 〈h(2)〉〈h(2)〉 (from MF-DFA on time series) and 〈dB〉〈dB〉 of the converted HVGs for different energy, pressure and volume.