909 resultados para Data accuracy
Resumo:
Data mining is one of the hottest research areas nowadays as it has got wide variety of applications in common man’s life to make the world a better place to live. It is all about finding interesting hidden patterns in a huge history data base. As an example, from a sales data base, one can find an interesting pattern like “people who buy magazines tend to buy news papers also” using data mining. Now in the sales point of view the advantage is that one can place these things together in the shop to increase sales. In this research work, data mining is effectively applied to a domain called placement chance prediction, since taking wise career decision is so crucial for anybody for sure. In India technical manpower analysis is carried out by an organization named National Technical Manpower Information System (NTMIS), established in 1983-84 by India's Ministry of Education & Culture. The NTMIS comprises of a lead centre in the IAMR, New Delhi, and 21 nodal centres located at different parts of the country. The Kerala State Nodal Centre is located at Cochin University of Science and Technology. In Nodal Centre, they collect placement information by sending postal questionnaire to passed out students on a regular basis. From this raw data available in the nodal centre, a history data base was prepared. Each record in this data base includes entrance rank ranges, reservation, Sector, Sex, and a particular engineering. From each such combination of attributes from the history data base of student records, corresponding placement chances is computed and stored in the history data base. From this data, various popular data mining models are built and tested. These models can be used to predict the most suitable branch for a particular new student with one of the above combination of criteria. Also a detailed performance comparison of the various data mining models is done.This research work proposes to use a combination of data mining models namely a hybrid stacking ensemble for better predictions. A strategy to predict the overall absorption rate for various branches as well as the time it takes for all the students of a particular branch to get placed etc are also proposed. Finally, this research work puts forward a new data mining algorithm namely C 4.5 * stat for numeric data sets which has been proved to have competent accuracy over standard benchmarking data sets called UCI data sets. It also proposes an optimization strategy called parameter tuning to improve the standard C 4.5 algorithm. As a summary this research work passes through all four dimensions for a typical data mining research work, namely application to a domain, development of classifier models, optimization and ensemble methods.
Resumo:
Seafloor imagery is a rich source of data for the study of biological and geological processes. Among several applications, still images of the ocean floor can be used to build image composites referred to as photo-mosaics. Photo-mosaics provide a wide-area visual representation of the benthos, and enable applications as diverse as geological surveys, mapping and detection of temporal changes in the morphology of biodiversity. We present an approach for creating globally aligned photo-mosaics using 3D position estimates provided by navigation sensors available in deep water surveys. Without image registration, such navigation data does not provide enough accuracy to produce useful composite images. Results from a challenging data set of the Lucky Strike vent field at the Mid Atlantic Ridge are reported
Resumo:
Flood modelling of urban areas is still at an early stage, partly because until recently topographic data of sufficiently high resolution and accuracy have been lacking in urban areas. However, Digital Surface Models (DSMs) generated from airborne scanning laser altimetry (LiDAR) having sub-metre spatial resolution have now become available, and these are able to represent the complexities of urban topography. The paper describes the development of a LiDAR post-processor for urban flood modelling based on the fusion of LiDAR and digital map data. The map data are used in conjunction with LiDAR data to identify different object types in urban areas, though pattern recognition techniques are also employed. Post-processing produces a Digital Terrain Model (DTM) for use as model bathymetry, and also a friction parameter map for use in estimating spatially-distributed friction coefficients. In vegetated areas, friction is estimated from LiDAR-derived vegetation height, and (unlike most vegetation removal software) the method copes with short vegetation less than ~1m high, which may occupy a substantial fraction of even an urban floodplain. The DTM and friction parameter map may also be used to help to generate an unstructured mesh of a vegetated urban floodplain for use by a 2D finite element model. The mesh is decomposed to reflect floodplain features having different frictional properties to their surroundings, including urban features such as buildings and roads as well as taller vegetation features such as trees and hedges. This allows a more accurate estimation of local friction. The method produces a substantial node density due to the small dimensions of many urban features.
Resumo:
Flood extent maps derived from SAR images are a useful source of data for validating hydraulic models of river flood flow. The accuracy of such maps is reduced by a number of factors, including changes in returns from the water surface caused by different meteorological conditions and the presence of emergent vegetation. The paper describes how improved accuracy can be achieved by modifying an existing flood extent delineation algorithm to use airborne laser altimetry (LiDAR) as well as SAR data. The LiDAR data provide an additional constraint that waterline (land-water boundary) heights should vary smoothly along the flooded reach. The method was tested on a SAR image of a flood for which contemporaneous aerial photography existed, together with LiDAR data of the un-flooded reach. Waterline heights of the SAR flood extent conditioned on both SAR and LiDAR data matched the corresponding heights from the aerial photo waterline significantly more closely than those from the SAR flood extent conditioned only on SAR data.
Resumo:
The long-term stability, high accuracy, all-weather capability, high vertical resolution, and global coverage of Global Navigation Satellite System (GNSS) radio occultation (RO) suggests it as a promising tool for global monitoring of atmospheric temperature change. With the aim to investigate and quantify how well a GNSS RO observing system is able to detect climate trends, we are currently performing an (climate) observing system simulation experiment over the 25-year period 2001 to 2025, which involves quasi-realistic modeling of the neutral atmosphere and the ionosphere. We carried out two climate simulations with the general circulation model MAECHAM5 (Middle Atmosphere European Centre/Hamburg Model Version 5) of the MPI-M Hamburg, covering the period 2001–2025: One control run with natural variability only and one run also including anthropogenic forcings due to greenhouse gases, sulfate aerosols, and tropospheric ozone. On the basis of this, we perform quasi-realistic simulations of RO observables for a small GNSS receiver constellation (six satellites), state-of-the-art data processing for atmospheric profiles retrieval, and a statistical analysis of temperature trends in both the “observed” climatology and the “true” climatology. Here we describe the setup of the experiment and results from a test bed study conducted to obtain a basic set of realistic estimates of observational errors (instrument- and retrieval processing-related errors) and sampling errors (due to spatial-temporal undersampling). The test bed results, obtained for a typical summer season and compared to the climatic 2001–2025 trends from the MAECHAM5 simulation including anthropogenic forcing, were found encouraging for performing the full 25-year experiment. They indicated that observational and sampling errors (both contributing about 0.2 K) are consistent with recent estimates of these errors from real RO data and that they should be sufficiently small for monitoring expected temperature trends in the global atmosphere over the next 10 to 20 years in most regions of the upper troposphere and lower stratosphere (UTLS). Inspection of the MAECHAM5 trends in different RO-accessible atmospheric parameters (microwave refractivity and pressure/geopotential height in addition to temperature) indicates complementary climate change sensitivity in different regions of the UTLS so that optimized climate monitoring shall combine information from all climatic key variables retrievable from GNSS RO data.
Resumo:
Using a flexible chemical box model with full heterogeneous chemistry, intercepts of chemically modified Langley plots have been computed for the 5 years of zenith-sky NO2 data from Faraday in Antarctica (65°S). By using these intercepts as the effective amount in the reference spectrum, drifts in zero of total vertical NO2 were much reduced. The error in zero of total NO2 is ±0.03×1015 moleccm−2 from one year to another. This error is small enough to determine trends in midsummer and any variability in denoxification between midwinters. The technique also suggests a more sensitive method for determining N2O5 from zenith-sky NO2 data.
Resumo:
Insect returns from the UK's Doppler weather radars were collected in the summers of 2007 and 2008, to ascertain their usefulness in providing information about boundary layer winds. Such observations could be assimilated into numerical weather prediction models to improve forecasts of convective showers before precipitation begins. Significant numbers of insect returns were observed during daylight hours on a number of days through this period, when they were detected at up to 30 km range from the radars, and up to 2 km above sea level. The range of detectable insect returns was found to vary with time of year and temperature. There was also a very weak correlation with wind speed and direction. Use of a dual-polarized radar revealed that the insects did not orient themselves at random, but showed distinct evidence of common orientation on several days, sometimes at an angle to their direction of travel. Observation minus model background residuals of wind profiles showed greater bias and standard deviation than that of other wind measurement types, which may be due to the insects' headings/airspeeds and to imperfect data extraction. The method used here, similar to the Met Office's procedure for extracting precipitation returns, requires further development as clutter contamination remained one of the largest error contributors. Wind observations derived from the insect returns would then be useful for data assimilation applications.
Resumo:
We have designed and implemented a low-cost digital system using closed-circuit television cameras coupled to a digital acquisition system for the recording of in vivo behavioral data in rodents and for allowing observation and recording of more than 10 animals simultaneously at a reduced cost, as compared with commercially available solutions. This system has been validated using two experimental rodent models: one involving chemically induced seizures and one assessing appetite and feeding. We present observational results showing comparable or improved levels of accuracy and observer consistency between this new system and traditional methods in these experimental models, discuss advantages of the presented system over conventional analog systems and commercially available digital systems, and propose possible extensions to the system and applications to nonrodent studies.
Resumo:
Matheron's usual variogram estimator can result in unreliable variograms when data are strongly asymmetric or skewed. Asymmetry in a distribution can arise from a long tail of values in the underlying process or from outliers that belong to another population that contaminate the primary process. This paper examines the effects of underlying asymmetry on the variogram and on the accuracy of prediction, and the second one examines the effects arising from outliers. Standard geostatistical texts suggest ways of dealing with underlying asymmetry; however, this is based on informed intuition rather than detailed investigation. To determine whether the methods generally used to deal with underlying asymmetry are appropriate, the effects of different coefficients of skewness on the shape of the experimental variogram and on the model parameters were investigated. Simulated annealing was used to create normally distributed random fields of different size from variograms with different nugget:sill ratios. These data were then modified to give different degrees of asymmetry and the experimental variogram was computed in each case. The effects of standard data transformations on the form of the variogram were also investigated. Cross-validation was used to assess quantitatively the performance of the different variogram models for kriging. The results showed that the shape of the variogram was affected by the degree of asymmetry, and that the effect increased as the size of data set decreased. Transformations of the data were more effective in reducing the skewness coefficient in the larger sets of data. Cross-validation confirmed that variogram models from transformed data were more suitable for kriging than were those from the raw asymmetric data. The results of this study have implications for the 'standard best practice' in dealing with asymmetry in data for geostatistical analyses. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
With the rapid development in technology over recent years, construction, in common with many areas of industry, has become increasingly complex. It would, therefore, seem to be important to develop and extend the understanding of complexity so that industry in general and in this case the construction industry can work with greater accuracy and efficiency to provide clients with a better service. This paper aims to generate a definition of complexity and a method for its measurement in order to assess its influence upon the accuracy of the quantity surveying profession in UK new build office construction. Quantitative data came from an analysis of twenty projects of varying size and value and qualitative data came from interviews with professional quantity surveyors. The findings highlight the difficulty in defining and measuring project complexity. The correlation between accuracy and complexity was not straightforward, being subjected to many extraneous variables, particularly the impact of project size. Further research is required to develop a better measure of complexity. This is in order to improve the response of quantity surveyors, so that an appropriate level of effort can be applied to individual projects, permitting greater accuracy and enabling better resource planning within the profession.
Resumo:
The proportional odds model provides a powerful tool for analysing ordered categorical data and setting sample size, although for many clinical trials its validity is questionable. The purpose of this paper is to present a new class of constrained odds models which includes the proportional odds model. The efficient score and Fisher's information are derived from the profile likelihood for the constrained odds model. These results are new even for the special case of proportional odds where the resulting statistics define the Mann-Whitney test. A strategy is described involving selecting one of these models in advance, requiring assumptions as strong as those underlying proportional odds, but allowing a choice of such models. The accuracy of the new procedure and its power are evaluated.
Resumo:
This study investigated the relationships between phonological awareness and reading in Oriya and English. Oriya is the official language of Orissa, an eastern state of India. The writing system is an alphasyllabary. Ninety-nine fifth grade children (mean age 9 years 7 months) were assessed on measures of phonological awareness, word reading and pseudo-word reading in both languages. Forty-eight of the children attended Oriya-medium schools where they received literacy instruction in Oriya from grade 1 and learned English from grade 2. Fifty-one children attended English-medium schools where they received literacy instruction in English from grade 1 and in Oriya from grade 2. The results showed that phonological awareness in Oriya contributed significantly to reading Oriya and English words and pseudo-words for the children in the Oriya-medium schools. However, it only contributed to Oriya pseudo-word reading and English word reading for children in the English-medium schools. Phonological awareness in English contributed to English word and pseudo-word reading for both groups. Further analyses investigated the contribution of awareness of large phonological units (syllable, onsets and rimes) and small phonological units (phonemes) to reading in each language. The data suggest that cross-language transfer and facilitation of phonological awareness to word reading is not symmetrical across languages and may depend both on the characteristics of the different orthographies of the languages being learned and whether the first literacy language is also the first spoken language.
Resumo:
Once unit-cell dimensions have been determined from a powder diffraction data set and therefore the crystal system is known (e.g. orthorhombic), the method presented by Markvardsen, David, Johnson & Shankland [Acta Cryst. (2001), A57, 47-54] can be used to generate a table ranking the extinction symbols of the given crystal system according to probability. Markvardsen et al. tested a computer program (ExtSym) implementing the method against Pawley refinement outputs generated using the TF12LS program [David, Ibberson & Matthewman (1992). Report RAL-92-032. Rutherford Appleton Laboratory, Chilton, Didcot, Oxon, UK]. Here, it is shown that ExtSym can be used successfully with many well known powder diffraction analysis packages, namely DASH [David, Shankland, van de Streek, Pidcock, Motherwell & Cole (2006). J. Appl. Cryst. 39, 910-915], FullProf [Rodriguez-Carvajal (1993). Physica B, 192, 55-69], GSAS [Larson & Von Dreele (1994). Report LAUR 86-748. Los Alamos National Laboratory, New Mexico, USA], PRODD [Wright (2004). Z. Kristallogr. 219, 1-11] and TOPAS [Coelho (2003). Bruker AXS GmbH, Karlsruhe, Germany]. In addition, a precise description of the optimal input for ExtSym is given to enable other software packages to interface with ExtSym and to allow the improvement/modification of existing interfacing scripts. ExtSym takes as input the powder data in the form of integrated intensities and error estimates for these intensities. The output returned by ExtSym is demonstrated to be strongly dependent on the accuracy of these error estimates and the reason for this is explained. ExtSym is tested against a wide range of data sets, confirming the algorithm to be very successful at ranking the published extinction symbol as the most likely. (C) 2008 International Union of Crystallography Printed in Singapore - all rights reserved.
Resumo:
An increasing number of neuroscience experiments are using virtual reality to provide a more immersive and less artificial experimental environment. This is particularly useful to navigation and three-dimensional scene perception experiments. Such experiments require accurate real-time tracking of the observer's head in order to render the virtual scene. Here, we present data on the accuracy of a commonly used six degrees of freedom tracker (Intersense IS900) when it is moved in ways typical of virtual reality applications. We compared the reported location of the tracker with its location computed by an optical tracking method. When the tracker was stationary, the root mean square error in spatial accuracy was 0.64 mm. However, we found that errors increased over ten-fold (up to 17 mm) when the tracker moved at speeds common in virtual reality applications. We demonstrate that the errors we report here are predominantly due to inaccuracies of the IS900 system rather than the optical tracking against which it was compared. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
We have designed and implemented a low-cost digital system using closed-circuit television cameras coupled to a digital acquisition system for the recording of in vivo behavioral data in rodents and for allowing observation and recording of more than 10 animals simultaneously at a reduced cost, as compared with commercially available solutions. This system has been validated using two experimental rodent models: one involving chemically induced seizures and one assessing appetite and feeding. We present observational results showing comparable or improved levels of accuracy and observer consistency between this new system and traditional methods in these experimental models, discuss advantages of the presented system over conventional analog systems and commercially available digital systems, and propose possible extensions to the system and applications to non-rodent studies.