59 resultados para Data Accuracy
Resumo:
In order to examine metacognitive accuracy (i.e., the relationship between metacognitive judgment and memory performance), researchers often rely on by-participant analysis, where metacognitive accuracy (e.g., resolution, as measured by the gamma coefficient or signal detection measures) is computed for each participant and the computed values are entered into group-level statistical tests such as the t-test. In the current work, we argue that the by-participant analysis, regardless of the accuracy measurements used, would produce a substantial inflation of Type-1 error rates, when a random item effect is present. A mixed-effects model is proposed as a way to effectively address the issue, and our simulation studies examining Type-1 error rates indeed showed superior performance of mixed-effects model analysis as compared to the conventional by-participant analysis. We also present real data applications to illustrate further strengths of mixed-effects model analysis. Our findings imply that caution is needed when using the by-participant analysis, and recommend the mixed-effects model analysis.
Resumo:
Recent studies showed that features extracted from brain MRIs can well discriminate Alzheimer’s disease from Mild Cognitive Impairment. This study provides an algorithm that sequentially applies advanced feature selection methods for findings the best subset of features in terms of binary classification accuracy. The classifiers that provided the highest accuracies, have been then used for solving a multi-class problem by the one-versus-one strategy. Although several approaches based on Regions of Interest (ROIs) extraction exist, the prediction power of features has not yet investigated by comparing filter and wrapper techniques. The findings of this work suggest that (i) the IntraCranial Volume (ICV) normalization can lead to overfitting and worst the accuracy prediction of test set and (ii) the combined use of a Random Forest-based filter with a Support Vector Machines-based wrapper, improves accuracy of binary classification.
Resumo:
We present an analysis of the accuracy of the method introduced by Lockwood et al. (1994) for the determination of the magnetopause reconnection rate from the dispersion of precipitating ions in the ionospheric cusp region. Tests are made by applying the method to synthesised data. The simulated cusp ion precipitation data are produced by an analytic model of the evolution of newly-opened field lines, along which magnetosheath ions are firstly injected across the magnetopause and then dispersed as they propagate into the ionosphere. The rate at which these newly opened field lines are generated by reconnection can be varied. The derived reconnection rate estimates are then compared with the input variation to the model and the accuracy of the method assessed. Results are presented for steady-state reconnection, for continuous reconnection showing a sine-wave variation in rate and for reconnection which only occurs in square wave pulses. It is found that the method always yields the total flux reconnected (per unit length of the open-closed field-line boundary) to within an accuracy of better than 5%, but that pulses tend to be smoothed so that the peak reconnection rate within the pulse is underestimated and the pulse length is overestimated. This smoothing is reduced if the separation between energy channels of the instrument is reduced; however this also acts to increase the experimental uncertainty in the estimates, an effect which can be countered by improving the time resolution of the observations. The limited time resolution of the data is shown to set a minimum reconnection rate below which the method gives spurious short-period oscillations about the true value. Various examples of reconnection rate variations derived from cusp observations are discussed in the light of this analysis.
Resumo:
Advances in hardware and software technologies allow to capture streaming data. The area of Data Stream Mining (DSM) is concerned with the analysis of these vast amounts of data as it is generated in real-time. Data stream classification is one of the most important DSM techniques allowing to classify previously unseen data instances. Different to traditional classifiers for static data, data stream classifiers need to adapt to concept changes (concept drift) in the stream in real-time in order to reflect the most recent concept in the data as accurately as possible. A recent addition to the data stream classifier toolbox is eRules which induces and updates a set of expressive rules that can easily be interpreted by humans. However, like most rule-based data stream classifiers, eRules exhibits a poor computational performance when confronted with continuous attributes. In this work, we propose an approach to deal with continuous data effectively and accurately in rule-based classifiers by using the Gaussian distribution as heuristic for building rule terms on continuous attributes. We show on the example of eRules that incorporating our method for continuous attributes indeed speeds up the real-time rule induction process while maintaining a similar level of accuracy compared with the original eRules classifier. We termed this new version of eRules with our approach G-eRules.
Resumo:
Land cover plays a key role in global to regional monitoring and modeling because it affects and is being affected by climate change and thus became one of the essential variables for climate change studies. National and international organizations require timely and accurate land cover information for reporting and management actions. The North American Land Change Monitoring System (NALCMS) is an international cooperation of organizations and entities of Canada, the United States, and Mexico to map land cover change of North America's changing environment. This paper presents the methodology to derive the land cover map of Mexico for the year 2005 which was integrated in the NALCMS continental map. Based on a time series of 250 m Moderate Resolution Imaging Spectroradiometer (MODIS) data and an extensive sample data base the complexity of the Mexican landscape required a specific approach to reflect land cover heterogeneity. To estimate the proportion of each land cover class for every pixel several decision tree classifications were combined to obtain class membership maps which were finally converted to a discrete map accompanied by a confidence estimate. The map yielded an overall accuracy of 82.5% (Kappa of 0.79) for pixels with at least 50% map confidence (71.3% of the data). An additional assessment with 780 randomly stratified samples and primary and alternative calls in the reference data to account for ambiguity indicated 83.4% overall accuracy (Kappa of 0.80). A high agreement of 83.6% for all pixels and 92.6% for pixels with a map confidence of more than 50% was found for the comparison between the land cover maps of 2005 and 2006. Further wall-to-wall comparisons to related land cover maps resulted in 56.6% agreement with the MODIS land cover product and a congruence of 49.5 with Globcover.
Resumo:
Highly heterogeneous mountain snow distributions strongly affect soil moisture patterns; local ecology; and, ultimately, the timing, magnitude, and chemistry of stream runoff. Capturing these vital heterogeneities in a physically based distributed snow model requires appropriately scaled model structures. This work looks at how model scale—particularly the resolutions at which the forcing processes are represented—affects simulated snow distributions and melt. The research area is in the Reynolds Creek Experimental Watershed in southwestern Idaho. In this region, where there is a negative correlation between snow accumulation and melt rates, overall scale degradation pushed simulated melt to earlier in the season. The processes mainly responsible for snow distribution heterogeneity in this region—wind speed, wind-affected snow accumulations, thermal radiation, and solar radiation—were also independently rescaled to test process-specific spatiotemporal sensitivities. It was found that in order to accurately simulate snowmelt in this catchment, the snow cover needed to be resolved to 100 m. Wind and wind-affected precipitation—the primary influence on snow distribution—required similar resolution. Thermal radiation scaled with the vegetation structure (~100 m), while solar radiation was adequately modeled with 100–250-m resolution. Spatiotemporal sensitivities to model scale were found that allowed for further reductions in computational costs through the winter months with limited losses in accuracy. It was also shown that these modeling-based scale breaks could be associated with physiographic and vegetation structures to aid a priori modeling decisions.
Resumo:
Respiration chambers are one of the primary sources of data on methane emissions from livestock. This paper describes the results from a coordinated set of chamber validation experiments which establishes the absolute accuracy of the methane emission rates measured by the chambers, and for the first time provides metrological traceability to international standards, assesses the impact of both analyser and chamber response times on measurement uncertainty and establishes direct comparability between measurements made across different facilities with a wide range of chamber designs. As a result of the validation exercise the estimated combined uncertainty associated with the overall capability across all facilities reduced from 25.7% (k = 2, 95% confidence) before the validation to 2.1% (k = 2, 95% confidence) when the validation results are applied to the facilities’ data.
Resumo:
The EU Water Framework Directive (WFD) requires that the ecological and chemical status of water bodies in Europe should be assessed, and action taken where possible to ensure that at least "good" quality is attained in each case by 2015. This paper is concerned with the accuracy and precision with which chemical status in rivers can be measured given certain sampling strategies, and how this can be improved. High-frequency (hourly) chemical data from four rivers in southern England were subsampled to simulate different sampling strategies for four parameters used for WFD classification: dissolved phosphorus, dissolved oxygen, pH and water temperature. These data sub-sets were then used to calculate the WFD classification for each site. Monthly sampling was less precise than weekly sampling, but the effect on WFD classification depended on the closeness of the range of concentrations to the class boundaries. In some cases, monthly sampling for a year could result in the same water body being assigned to three or four of the WFD classes with 95% confidence, due to random sampling effects, whereas with weekly sampling this was one or two classes for the same cases. In the most extreme case, the same water body could have been assigned to any of the five WFD quality classes. Weekly sampling considerably reduces the uncertainties compared to monthly sampling. The width of the weekly sampled confidence intervals was about 33% that of the monthly for P species and pH, about 50% for dissolved oxygen, and about 67% for water temperature. For water temperature, which is assessed as the 98th percentile in the UK, monthly sampling biases the mean downwards by about 1 °C compared to the true value, due to problems of assessing high percentiles with limited data. Low-frequency measurements will generally be unsuitable for assessing standards expressed as high percentiles. Confining sampling to the working week compared to all 7 days made little difference, but a modest improvement in precision could be obtained by sampling at the same time of day within a 3 h time window, and this is recommended. For parameters with a strong diel variation, such as dissolved oxygen, the value obtained, and thus possibly the WFD classification, can depend markedly on when in the cycle the sample was taken. Specifying this in the sampling regime would be a straightforward way to improve precision, but there needs to be agreement about how best to characterise risk in different types of river. These results suggest that in some cases it will be difficult to assign accurate WFD chemical classes or to detect likely trends using current sampling regimes, even for these largely groundwater-fed rivers. A more critical approach to sampling is needed to ensure that management actions are appropriate and supported by data.
Resumo:
This paper presents an approximate closed form sample size formula for determining non-inferiority in active-control trials with binary data. We use the odds-ratio as the measure of the relative treatment effect, derive the sample size formula based on the score test and compare it with a second, well-known formula based on the Wald test. Both closed form formulae are compared with simulations based on the likelihood ratio test. Within the range of parameter values investigated, the score test closed form formula is reasonably accurate when non-inferiority margins are based on odds-ratios of about 0.5 or above and when the magnitude of the odds ratio under the alternative hypothesis lies between about 1 and 2.5. The accuracy generally decreases as the odds ratio under the alternative hypothesis moves upwards from 1. As the non-inferiority margin odds ratio decreases from 0.5, the score test closed form formula increasingly overestimates the sample size irrespective of the magnitude of the odds ratio under the alternative hypothesis. The Wald test closed form formula is also reasonably accurate in the cases where the score test closed form formula works well. Outside these scenarios, the Wald test closed form formula can either underestimate or overestimate the sample size, depending on the magnitude of the non-inferiority margin odds ratio and the odds ratio under the alternative hypothesis. Although neither approximation is accurate for all cases, both approaches lead to satisfactory sample size calculation for non-inferiority trials with binary data where the odds ratio is the parameter of interest.
Resumo:
Pasture-based ruminant production systems are common in certain areas of the world, but energy evaluation in grazing cattle is performed with equations developed, in their majority, with sheep or cattle fed total mixed rations. The aim of the current study was to develop predictions of metabolisable energy (ME) concentrations in fresh-cut grass offered to non-pregnant non-lactating cows at maintenance energy level, which may be more suitable for grazing cattle. Data were collected from three digestibility trials performed over consecutive grazing seasons. In order to cover a range of commercial conditions and data availability in pasture-based systems, thirty-eight equations for the prediction of energy concentrations and ratios were developed. An internal validation was performed for all equations and also for existing predictions of grass ME. Prediction error for ME using nutrient digestibility was lowest when gross energy (GE) or organic matter digestibilities were used as sole predictors, while the addition of grass nutrient contents reduced the difference between predicted and actual values, and explained more variation. Addition of N, GE and diethyl ether extract (EE) contents improved accuracy when digestible organic matter in DM was the primary predictor. When digestible energy was the primary explanatory variable, prediction error was relatively low, but addition of water-soluble carbohydrates, EE and acid-detergent fibre contents of grass decreased prediction error. Equations developed in the current study showed lower prediction errors when compared with those of existing equations, and may thus allow for an improved prediction of ME in practice, which is critical for the sustainability of pasture-based systems.
Resumo:
An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.
Resumo:
Geospatial information of many kinds, from topographic maps to scientific data, is increasingly being made available through web mapping services. These allow georeferenced map images to be served from data stores and displayed in websites and geographic information systems, where they can be integrated with other geographic information. The Open Geospatial Consortium’s Web Map Service (WMS) standard has been widely adopted in diverse communities for sharing data in this way. However, current services typically provide little or no information about the quality or accuracy of the data they serve. In this paper we will describe the design and implementation of a new “quality-enabled” profile of WMS, which we call “WMS-Q”. This describes how information about data quality can be transmitted to the user through WMS. Such information can exist at many levels, from entire datasets to individual measurements, and includes the many different ways in which data uncertainty can be expressed. We also describe proposed extensions to the Symbology Encoding specification, which include provision for visualizing uncertainty in raster data in a number of different ways, including contours, shading and bivariate colour maps. We shall also describe new open-source implementations of the new specifications, which include both clients and servers.
Resumo:
4-Dimensional Variational Data Assimilation (4DVAR) assimilates observations through the minimisation of a least-squares objective function, which is constrained by the model flow. We refer to 4DVAR as strong-constraint 4DVAR (sc4DVAR) in this thesis as it assumes the model is perfect. Relaxing this assumption gives rise to weak-constraint 4DVAR (wc4DVAR), leading to a different minimisation problem with more degrees of freedom. We consider two wc4DVAR formulations in this thesis, the model error formulation and state estimation formulation. The 4DVAR objective function is traditionally solved using gradient-based iterative methods. The principle method used in Numerical Weather Prediction today is the Gauss-Newton approach. This method introduces a linearised `inner-loop' objective function, which upon convergence, updates the solution of the non-linear `outer-loop' objective function. This requires many evaluations of the objective function and its gradient, which emphasises the importance of the Hessian. The eigenvalues and eigenvectors of the Hessian provide insight into the degree of convexity of the objective function, while also indicating the difficulty one may encounter while iterative solving 4DVAR. The condition number of the Hessian is an appropriate measure for the sensitivity of the problem to input data. The condition number can also indicate the rate of convergence and solution accuracy of the minimisation algorithm. This thesis investigates the sensitivity of the solution process minimising both wc4DVAR objective functions to the internal assimilation parameters composing the problem. We gain insight into these sensitivities by bounding the condition number of the Hessians of both objective functions. We also precondition the model error objective function and show improved convergence. We show that both formulations' sensitivities are related to error variance balance, assimilation window length and correlation length-scales using the bounds. We further demonstrate this through numerical experiments on the condition number and data assimilation experiments using linear and non-linear chaotic toy models.
Resumo:
Ocean prediction systems are now able to analyse and predict temperature, salinity and velocity structures within the ocean by assimilating measurements of the ocean’s temperature and salinity into physically based ocean models. Data assimilation combines current estimates of state variables, such as temperature and salinity, from a computational model with measurements of the ocean and atmosphere in order to improve forecasts and reduce uncertainty in the forecast accuracy. Data assimilation generally works well with ocean models away from the equator but has been found to induce vigorous and unrealistic overturning circulations near the equator. A pressure correction method was developed at the University of Reading and the Met Office to control these circulations using ideas from control theory and an understanding of equatorial dynamics. The method has been used for the last 10 years in seasonal forecasting and ocean prediction systems at the Met Office and European Center for Medium-range Weather Forecasting (ECMWF). It has been an important element in recent re-analyses of the ocean heat uptake that mitigates climate change.