909 resultados para Data accuracy
Resumo:
We consider the problem of stable determination of a harmonic function from knowledge of the solution and its normal derivative on a part of the boundary of the (bounded) solution domain. The alternating method is a procedure to generate an approximation to the harmonic function from such Cauchy data and we investigate a numerical implementation of this procedure based on Fredholm integral equations and Nyström discretization schemes, which makes it possible to perform a large number of iterations (millions) with minor computational cost (seconds) and high accuracy. Moreover, the original problem is rewritten as a fixed point equation on the boundary, and various other direct regularization techniques are discussed to solve that equation. We also discuss how knowledge of the smoothness of the data can be used to further improve the accuracy. Numerical examples are presented showing that accurate approximations of both the solution and its normal derivative can be obtained with much less computational time than in previous works.
Resumo:
Productivity at the macro level is a complex concept but also arguably the most appropriate measure of economic welfare. Currently, there is limited research available on the various approaches that can be used to measure it and especially on the relative accuracy of said approaches. This thesis has two main objectives: firstly, to detail some of the most common productivity measurement approaches and assess their accuracy under a number of conditions and secondly, to present an up-to-date application of productivity measurement and provide some guidance on selecting between sometimes conflicting productivity estimates. With regards to the first objective, the thesis provides a discussion on the issues specific to macro-level productivity measurement and on the strengths and weaknesses of the three main types of approaches available, namely index-number approaches (represented by Growth Accounting), non-parametric distance functions (DEA-based Malmquist indices) and parametric production functions (COLS- and SFA-based Malmquist indices). The accuracy of these approaches is assessed through simulation analysis, which provided some interesting findings. Probably the most important were that deterministic approaches are quite accurate even when the data is moderately noisy, that no approaches were accurate when noise was more extensive, that functional form misspecification has a severe negative effect in the accuracy of the parametric approaches and finally that increased volatility in inputs and prices from one period to the next adversely affects all approaches examined. The application was based on the EU KLEMS (2008) dataset and revealed that the different approaches do in fact result in different productivity change estimates, at least for some of the countries assessed. To assist researchers in selecting between conflicting estimates, a new, three step selection framework is proposed, based on findings of simulation analyses and established diagnostics/indicators. An application of this framework is also provided, based on the EU KLEMS dataset.
Resumo:
In the face of global population growth and the uneven distribution of water supply, a better knowledge of the spatial and temporal distribution of surface water resources is critical. Remote sensing provides a synoptic view of ongoing processes, which addresses the intricate nature of water surfaces and allows an assessment of the pressures placed on aquatic ecosystems. However, the main challenge in identifying water surfaces from remotely sensed data is the high variability of spectral signatures, both in space and time. In the last 10 years only a few operational methods have been proposed to map or monitor surface water at continental or global scale, and each of them show limitations. The objective of this study is to develop and demonstrate the adequacy of a generic multi-temporal and multi-spectral image analysis method to detect water surfaces automatically, and to monitor them in near-real-time. The proposed approach, based on a transformation of the RGB color space into HSV, provides dynamic information at the continental scale. The validation of the algorithm showed very few omission errors and no commission errors. It demonstrates the ability of the proposed algorithm to perform as effectively as human interpretation of the images. The validation of the permanent water surface product with an independent dataset derived from high resolution imagery, showed an accuracy of 91.5% and few commission errors. Potential applications of the proposed method have been identified and discussed. The methodology that has been developed 27 is generic: it can be applied to sensors with similar bands with good reliability, and minimal effort. Moreover, this experiment at continental scale showed that the methodology is efficient for a large range of environmental conditions. Additional preliminary tests over other continents indicate that the proposed methodology could also be applied at the global scale without too many difficulties
Resumo:
The research presented in this paper is part of an ongoing investigation into how best to incorporate speech-based input within mobile data collection applications. In our previous work [1], we evaluated the ability of a single speech recognition engine to support accurate, mobile, speech-based data input. Here, we build on our previous research to compare the achievable speaker-independent accuracy rates of a variety of speech recognition engines; we also consider the relative effectiveness of different speech recognition engine and microphone pairings in terms of their ability to support accurate text entry under realistic mobile conditions of use. Our intent is to provide some initial empirical data derived from mobile, user-based evaluations to support technological decisions faced by developers of mobile applications that would benefit from, or require, speech-based data entry facilities.
Resumo:
Optical data communication systems are prone to a variety of processes that modify the transmitted signal, and contribute errors in the determination of 1s from 0s. This is a difficult, and commercially important, problem to solve. Errors must be detected and corrected at high speed, and the classifier must be very accurate; ideally it should also be tunable to the characteristics of individual communication links. We show that simple single layer neural networks may be used to address these problems, and examine how different input representations affect the accuracy of bit error correction. Our results lead us to conclude that a system based on these principles can perform at least as well as an existing non-trainable error correction system, whilst being tunable to suit the individual characteristics of different communication links.
Resumo:
One of the main challenges of classifying clinical data is determining how to handle missing features. Most research favours imputing of missing values or neglecting records that include missing data, both of which can degrade accuracy when missing values exceed a certain level. In this research we propose a methodology to handle data sets with a large percentage of missing values and with high variability in which particular data are missing. Feature selection is effected by picking variables sequentially in order of maximum correlation with the dependent variable and minimum correlation with variables already selected. Classification models are generated individually for each test case based on its particular feature set and the matching data values available in the training population. The method was applied to real patients' anonymous mental-health data where the task was to predict the suicide risk judgement clinicians would give for each patient's data, with eleven possible outcome classes: zero to ten, representing no risk to maximum risk. The results compare favourably with alternative methods and have the advantage of ensuring explanations of risk are based only on the data given, not imputed data. This is important for clinical decision support systems using human expertise for modelling and explaining predictions.
Resumo:
This article presents a new method for data collection in regional dialectology based on site-restricted web searches. The method measures the usage and determines the distribution of lexical variants across a region of interest using common web search engines, such as Google or Bing. The method involves estimating the proportions of the variants of a lexical alternation variable over a series of cities by counting the number of webpages that contain the variants on newspaper websites originating from these cities through site-restricted web searches. The method is evaluated by mapping the 26 variants of 10 lexical variables with known distributions in American English. In almost all cases, the maps based on site-restricted web searches align closely with traditional dialect maps based on data gathered through questionnaires, demonstrating the accuracy of this method for the observation of regional linguistic variation. However, unlike collecting dialect data using traditional methods, which is a relatively slow process, the use of site-restricted web searches allows for dialect data to be collected from across a region as large as the United States in a matter of days.
Resumo:
Aim: To evaluate OneTouch® Verio™ test strip performance at hypoglycaemic blood glucose (BG) levels (<3.9mmol/L [<70mg/dL]) at seven clinical studies. Methods: Trained clinical staff performed duplicate capillary BG monitoring system tests on 700 individuals with type 1 and type 2 diabetes using blood from a single fingerstick lancing. BG reference values were obtained using a YSI 2300 STAT™ Glucose Analyzer. The number and percentage of BG values within ±0.83. mmol/L (±15. mg/dL) and ±0.56. mmol/L (±10. mg/dL) were calculated at BG concentrations of <3.9. mmol/L (<70. mg/dL), <3.3. mmol/L (<60. mg/dL), and <2.8. mmol/L (<50. mg/dL). Results: At BG concentrations <3.9. mmol/L (<70. mg/dL), 674/674 (100%) of meter results were within ±0.83. mmol/L (±15. mg/dL) and 666/674 (98.8%) were within ±0.56. mmol/L (±10. mg/dL) of reference values. At BG concentrations <3.3. mmol/L (<60. mg/dL), and <2.8. mmol/L (<50. mg/dL), 358/358 (100%) and 270/270 (100%) were within ±0.56. mmol/L (±10. mg/dL) of reference values, respectively. Conclusion: In this analysis of data from seven independent studies, OneTouch Verio test strips provide highly accurate results at hypoglycaemic BG levels. © 2012 Elsevier Ireland Ltd.
Resumo:
A spatial object consists of data assigned to points in a space. Spatial objects, such as memory states and three dimensional graphical scenes, are diverse and ubiquitous in computing. We develop a general theory of spatial objects by modelling abstract data types of spatial objects as topological algebras of functions. One useful algebra is that of continuous functions, with operations derived from operations on space and data, and equipped with the compact-open topology. Terms are used as abstract syntax for defining spatial objects and conditional equational specifications are used for reasoning. We pose a completeness problem: Given a selection of operations on spatial objects, do the terms approximate all the spatial objects to arbitrary accuracy? We give some general methods for solving the problem and consider their application to spatial objects with real number attributes. © 2011 British Computer Society.
Resumo:
Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of “the curse of dimensionality”. Three different eigenvector-based feature extraction approaches are discussed and three different kinds of applications with respect to classification tasks are considered. The summary of obtained results concerning the accuracy of classification schemes is presented with the conclusion about the search for the most appropriate feature extraction method. The problem how to discover knowledge needed to integrate the feature extraction and classification processes is stated. A decision support system to aid in the integration of the feature extraction and classification processes is proposed. The goals and requirements set for the decision support system and its basic structure are defined. The means of knowledge acquisition needed to build up the proposed system are considered.
Resumo:
The sheer volume of citizen weather data collected and uploaded to online data hubs is immense. However as with any citizen data it is difficult to assess the accuracy of the measurements. Within this project we quantify just how much data is available, where it comes from, the frequency at which it is collected, and the types of automatic weather stations being used. We also list the numerous possible sources of error and uncertainty within citizen weather observations before showing evidence of such effects in real data. A thorough intercomparison field study was conducted, testing popular models of citizen weather stations. From this study we were able to parameterise key sources of bias. Most significantly the project develops a complete quality control system through which citizen air temperature observations can be passed. The structure of this system was heavily informed by the results of the field study. Using a Bayesian framework the system learns and updates its estimates of the calibration and radiation-induced biases inherent to each station. We then show the benefit of correcting for these learnt biases over using the original uncorrected data. The system also attaches an uncertainty estimate to each observation, which would provide real world applications that choose to incorporate such observations with a measure on which they may base their confidence in the data. The system relies on interpolated temperature and radiation observations from neighbouring professional weather stations for which a Bayesian regression model is used. We recognise some of the assumptions and flaws of the developed system and suggest further work that needs to be done to bring it to an operational setting. Such a system will hopefully allow applications to leverage the additional value citizen weather data brings to longstanding professional observing networks.
Resumo:
This paper presents the results of our data mining study of Pb-Zn (lead-zinc) ore assay records from a mine enterprise in Bulgaria. We examined the dataset, cleaned outliers, visualized the data, and created dataset statistics. A Pb-Zn cluster data mining model was created for segmentation and prediction of Pb-Zn ore assay data. The Pb-Zn cluster data model consists of five clusters and DMX queries. We analyzed the Pb-Zn cluster content, size, structure, and characteristics. The set of the DMX queries allows for browsing and managing the clusters, as well as predicting ore assay records. A testing and validation of the Pb-Zn cluster data mining model was developed in order to show its reasonable accuracy before beingused in a production environment. The Pb-Zn cluster data mining model can be used for changes of the mine grinding and floatation processing parameters in almost real-time, which is important for the efficiency of the Pb-Zn ore beneficiation process. ACM Computing Classification System (1998): H.2.8, H.3.3.
Resumo:
Diabetes patients might suffer from an unhealthy life, long-term treatment and chronic complicated diseases. The decreasing hospitalization rate is a crucial problem for health care centers. This study combines the bagging method with base classifier decision tree and costs-sensitive analysis for diabetes patients' classification purpose. Real patients' data collected from a regional hospital in Thailand were analyzed. The relevance factors were selected and used to construct base classifier decision tree models to classify diabetes and non-diabetes patients. The bagging method was then applied to improve accuracy. Finally, asymmetric classification cost matrices were used to give more alternative models for diabetes data analysis.
Resumo:
Objective: To test the practicality and effectiveness of cheap, ubiquitous, consumer-grade smartphones to discriminate Parkinson’s disease (PD) subjects from healthy controls, using self-administered tests of gait and postural sway. Background: Existing tests for the diagnosis of PD are based on subjective neurological examinations, performed in-clinic. Objective movement symptom severity data, collected using widely-accessible technologies such as smartphones, would enable the remote characterization of PD symptoms based on self-administered, behavioral tests. Smartphones, when backed up by interviews using web-based videoconferencing, could make it feasible for expert neurologists to perform diagnostic testing on large numbers of individuals at low cost. However, to date, the compliance rate of testing using smart-phones has not been assessed. Methods: We conducted a one-month controlled study with twenty participants, comprising 10 PD subjects and 10 controls. All participants were provided identical LG Optimus S smartphones, capable of recording tri-axial acceleration. Using these smartphones, patients conducted self-administered, short (less than 5 minute) controlled gait and postural sway tests. We analyzed a wide range of summary measures of gait and postural sway from the accelerometry data. Using statistical machine learning techniques, we identified discriminating patterns in the summary measures in order to distinguish PD subjects from controls. Results: Compliance was high all 20 participants performed an average of 3.1 tests per day for the duration of the study. Using this test data, we demonstrated cross-validated sensitivity of 98% and specificity of 98% in discriminating PD subjects from healthy controls. Conclusions: Using consumer-grade smartphone accelerometers, it is possible to distinguish PD from healthy controls with high accuracy. Since these smartphones are inexpensive (around $30 each) and easily available, and the tests are highly non-invasive and objective, we envisage that this kind of smartphone-based testing could radically increase the reach and effectiveness of experts in diagnosing PD.
Resumo:
The seminal multiple-view stereo benchmark evaluations from Middlebury and by Strecha et al. have played a major role in propelling the development of multi-view stereopsis (MVS) methodology. The somewhat small size and variability of these data sets, however, limit their scope and the conclusions that can be derived from them. To facilitate further development within MVS, we here present a new and varied data set consisting of 80 scenes, seen from 49 or 64 accurate camera positions. This is accompanied by accurate structured light scans for reference and evaluation. In addition all images are taken under seven different lighting conditions. As a benchmark and to validate the use of our data set for obtaining reasonable and statistically significant findings about MVS, we have applied the three state-of-the-art MVS algorithms by Campbell et al., Furukawa et al., and Tola et al. to the data set. To do this we have extended the evaluation protocol from the Middlebury evaluation, necessitated by the more complex geometry of some of our scenes. The data set and accompanying evaluation framework are made freely available online. Based on this evaluation, we are able to observe several characteristics of state-of-the-art MVS, e.g. that there is a tradeoff between the quality of the reconstructed 3D points (accuracy) and how much of an object’s surface is captured (completeness). Also, several issues that we hypothesized would challenge MVS, such as specularities and changing lighting conditions did not pose serious problems. Our study finds that the two most pressing issues for MVS are lack of texture and meshing (forming 3D points into closed triangulated surfaces).