989 resultados para Sample data
Resumo:
This paper develops a methodology to estimate the entire population distributions from bin-aggregated sample data. We do this through the estimation of the parameters of mixtures of distributions that allow for maximal parametric flexibility. The statistical approach we develop enables comparisons of the full distributions of height data from potential army conscripts across France's 88 departments for most of the nineteenth century. These comparisons are made by testing for differences-of-means stochastic dominance. Corrections for possible measurement errors are also devised by taking advantage of the richness of the data sets. Our methodology is of interest to researchers working on historical as well as contemporary bin-aggregated or histogram-type data, something that is still widely done since much of the information that is publicly available is in that form, often due to restrictions due to political sensitivity and/or confidentiality concerns.
Resumo:
Includes bibliography
Resumo:
Includes bibliography
Resumo:
This paper examines how the geospatial accuracy of samples and sample size influence conclusions from geospatial analyses. It does so using the example of a study investigating the global phenomenon of large-scale land acquisitions and the socio-ecological characteristics of the areas they target. First, we analysed land deal datasets of varying geospatial accuracy and varying sizes and compared the results in terms of land cover, population density, and two indicators for agricultural potential: yield gap and availability of uncultivated land that is suitable for rainfed agriculture. We found that an increase in geospatial accuracy led to a substantial and greater change in conclusions about the land cover types targeted than an increase in sample size, suggesting that using a sample of higher geospatial accuracy does more to improve results than using a larger sample. The same finding emerged for population density, yield gap, and the availability of uncultivated land suitable for rainfed agriculture. Furthermore, the statistical median proved to be more consistent than the mean when comparing the descriptive statistics for datasets of different geospatial accuracy. Second, we analysed effects of geospatial accuracy on estimations regarding the potential for advancing agricultural development in target contexts. Our results show that the target contexts of the majority of land deals in our sample whose geolocation is known with a high level of accuracy contain smaller amounts of suitable, but uncultivated land than regional- and national-scale averages suggest. Consequently, the more target contexts vary within a country, the more detailed the spatial scale of analysis has to be in order to draw meaningful conclusions about the phenomena under investigation. We therefore advise against using national-scale statistics to approximate or characterize phenomena that have a local-scale impact, particularly if key indicators vary widely within a country.
Resumo:
Traditional methods of submerged aquatic vegetation (SAV) survey last long and then, they are high cost. Optical remote sensing is an alternative, but it has some limitations in the aquatic environment. The use of echosounder techniques is efficient to detect submerged targets. Therefore, the aim of this study is to evaluate different kinds of interpolation approach applied on SAV sample data collected by echosounder. This study case was performed in a region of Uberaba River - Brazil. The interpolation methods evaluated in this work follow: Nearest Neighbor, Weighted Average, Triangular Irregular Network (TIN) and ordinary kriging. Better results were carried out with kriging interpolation. Thus, it is recommend the use of geostatistics for spatial inference of SAV from sample data surveyed with echosounder techniques. © 2012 IEEE.
Mineral chemistry, whole-rock compositions, and petrogenesis of leg 176 gabbros: Data and discussion
Resumo:
We report mineral chemistry, whole-rock major element compositions, and trace element analyses on Hole 735B samples drilled and selected during Leg 176. We discuss these data, together with Leg 176 shipboard data and Leg 118 sample data from the literature, in terms of primary igneous petrogenesis. Despite mineral compositional variation in a given sample, major constituent minerals in Hole 735B gabbroic rocks display good chemical equilibrium as shown by significant correlations among Mg# (= Mg/[Mg+Fe2+]) of olivine, clinopyroxene, and orthopyroxene and An (=Ca/[Ca+Na]) of plagioclase. This indicates that the mineral assemblages olivine + plagioclase in troctolite, plagioclase + clinopyroxene in gabbro, plagioclases + clinopyroxene + olivine in olivine gabbro, and plagioclase + clinopyroxene + olivine + orthopyroxene in gabbronorite, and so on, have all coprecipitated from their respective parental melts. Fe-Ti oxides (ilmenite and titanomagnetite), which are ubiquitous in most of these rocks, are not in chemical equilibrium with olivine, clinopyroxene, and plagioclase, but precipitated later at lower temperatures. Disseminated oxides in some samples may have precipitated from trapped Fe-Ti–rich melts. Oxides that concentrate along shear bands/zones may mark zones of melt coalescence/transport expelled from the cumulate sequence as a result of compaction or filter pressing. Bulk Hole 735B is of cumulate composition. The most primitive olivine, with Fo = 0.842, in Hole 735B suggests that the most primitive melt parental to Hole 735B lithologies must have Mg# ≤ 0.637, which is significantly less than Mg# = 0.714 of bulk Hole 735B.
Resumo:
Consider a model with parameter phi, and an auxiliary model with parameter theta. Let phi be a randomly sampled from a given density over the known parameter space. Monte Carlo methods can be used to draw simulated data and compute the corresponding estimate of theta, say theta_tilde. A large set of tuples (phi, theta_tilde) can be generated in this manner. Nonparametric methods may be use to fit the function E(phi|theta_tilde=a), using these tuples. It is proposed to estimate phi using the fitted E(phi|theta_tilde=theta_hat), where theta_hat is the auxiliary estimate, using the real sample data. This is a consistent and asymptotically normally distributed estimator, under certain assumptions. Monte Carlo results for dynamic panel data and vector autoregressions show that this estimator can have very attractive small sample properties. Confidence intervals can be constructed using the quantiles of the phi for which theta_tilde is close to theta_hat. Such confidence intervals are found to have very accurate coverage.
Resumo:
SUMMARY: ExpressionView is an R package that provides an interactive graphical environment to explore transcription modules identified in gene expression data. A sophisticated ordering algorithm is used to present the modules with the expression in a visually appealing layout that provides an intuitive summary of the results. From this overview, the user can select individual modules and access biologically relevant metadata associated with them. AVAILABILITY: http://www.unil.ch/cbg/ExpressionView. Screenshots, tutorials and sample data sets can be found on the ExpressionView web site.
Resumo:
This paper presents a process of mining research & development abstract databases to profile current status and to project potential developments for target technologies, The process is called "technology opportunities analysis." This article steps through the process using a sample data set of abstracts from the INSPEC database on the topic o "knowledge discovery and data mining." The paper offers a set of specific indicators suitable for mining such databases to understand innovation prospects. In illustrating the uses of such indicators, it offers some insights into the status of knowledge discovery research*.
Resumo:
Land cover plays a key role in global to regional monitoring and modeling because it affects and is being affected by climate change and thus became one of the essential variables for climate change studies. National and international organizations require timely and accurate land cover information for reporting and management actions. The North American Land Change Monitoring System (NALCMS) is an international cooperation of organizations and entities of Canada, the United States, and Mexico to map land cover change of North America's changing environment. This paper presents the methodology to derive the land cover map of Mexico for the year 2005 which was integrated in the NALCMS continental map. Based on a time series of 250 m Moderate Resolution Imaging Spectroradiometer (MODIS) data and an extensive sample data base the complexity of the Mexican landscape required a specific approach to reflect land cover heterogeneity. To estimate the proportion of each land cover class for every pixel several decision tree classifications were combined to obtain class membership maps which were finally converted to a discrete map accompanied by a confidence estimate. The map yielded an overall accuracy of 82.5% (Kappa of 0.79) for pixels with at least 50% map confidence (71.3% of the data). An additional assessment with 780 randomly stratified samples and primary and alternative calls in the reference data to account for ambiguity indicated 83.4% overall accuracy (Kappa of 0.80). A high agreement of 83.6% for all pixels and 92.6% for pixels with a map confidence of more than 50% was found for the comparison between the land cover maps of 2005 and 2006. Further wall-to-wall comparisons to related land cover maps resulted in 56.6% agreement with the MODIS land cover product and a congruence of 49.5 with Globcover.
Resumo:
We present the results of the combination of searches for the standard model Higgs boson produced in association with a W or Z boson and decaying into bb̄ using the data sample collected with the D0 detector in pp̄ collisions at √s=1.96TeV at the Fermilab Tevatron Collider. We derive 95% C.L. upper limits on the Higgs boson cross section relative to the standard model prediction in the mass range 100GeV≤M H≤150GeV, and we exclude Higgs bosons with masses smaller than 102 GeV at the 95% C.L. In the mass range 120GeV≤M H≤145GeV, the data exhibit an excess above the background prediction with a global significance of 1.5 standard deviations, consistent with the expectation in the presence of a standard model Higgs boson. © 2012 American Physical Society.
Resumo:
BACKGROUND: Physiologic data display is essential to decision making in critical care. Current displays echo first-generation hemodynamic monitors dating to the 1970s and have not kept pace with new insights into physiology or the needs of clinicians who must make progressively more complex decisions about their patients. The effectiveness of any redesign must be tested before deployment. Tools that compare current displays with novel presentations of processed physiologic data are required. Regenerating conventional physiologic displays from archived physiologic data is an essential first step. OBJECTIVES: The purposes of the study were to (1) describe the SSSI (single sensor single indicator) paradigm that is currently used for physiologic signal displays, (2) identify and discuss possible extensions and enhancements of the SSSI paradigm, and (3) develop a general approach and a software prototype to construct such "extended SSSI displays" from raw data. RESULTS: We present Multi Wave Animator (MWA) framework-a set of open source MATLAB (MathWorks, Inc., Natick, MA, USA) scripts aimed to create dynamic visualizations (eg, video files in AVI format) of patient vital signs recorded from bedside (intensive care unit or operating room) monitors. Multi Wave Animator creates animations in which vital signs are displayed to mimic their appearance on current bedside monitors. The source code of MWA is freely available online together with a detailed tutorial and sample data sets.
Resumo:
Dynamic changes in ERP topographies can be conveniently analyzed by means of microstates, the so-called "atoms of thoughts", that represent brief periods of quasi-stable synchronized network activation. Comparing temporal microstate features such as on- and offset or duration between groups and conditions therefore allows a precise assessment of the timing of cognitive processes. So far, this has been achieved by assigning the individual time-varying ERP maps to spatially defined microstate templates obtained from clustering the grand mean data into predetermined numbers of topographies (microstate prototypes). Features obtained from these individual assignments were then statistically compared. This has the problem that the individual noise dilutes the match between individual topographies and templates leading to lower statistical power. We therefore propose a randomization-based procedure that works without assigning grand-mean microstate prototypes to individual data. In addition, we propose a new criterion to select the optimal number of microstate prototypes based on cross-validation across subjects. After a formal introduction, the method is applied to a sample data set of an N400 experiment and to simulated data with varying signal-to-noise ratios, and the results are compared to existing methods. In a first comparison with previously employed statistical procedures, the new method showed an increased robustness to noise, and a higher sensitivity for more subtle effects of microstate timing. We conclude that the proposed method is well-suited for the assessment of timing differences in cognitive processes. The increased statistical power allows identifying more subtle effects, which is particularly important in small and scarce patient populations.
Resumo:
Lognormal distribution has abundant applications in various fields. In literature, most inferences on the two parameters of the lognormal distribution are based on Type-I censored sample data. However, exact measurements are not always attainable especially when the observation is below or above the detection limits, and only the numbers of measurements falling into predetermined intervals can be recorded instead. This is the so-called grouped data. In this paper, we will show the existence and uniqueness of the maximum likelihood estimators of the two parameters of the underlying lognormal distribution with Type-I censored data and grouped data. The proof was first established under the case of normal distribution and extended to the lognormal distribution through invariance property. The results are applied to estimate the median and mean of the lognormal population.