909 resultados para Data accuracy
Resumo:
Few valid and reliable placement procedures are available to assess the English language proficiency of adults who enroll in English for Speakers of Other Languages (ESOL) programs. Whereas placement material exists for children and university ESOL students, the needs of students in adult community education programs have not been adequately addressed. Furthermore, the research suggests that a number of variables, such as, native language, age, prior schooling, length of residence, and employment are related to second language acquisition. Numerous studies contribute to our understanding of the relationship of these factors to second language acquisition of Spanish-speaking students. Again, there is a void in the research investigating the factors affecting second language acquisition and consequently, appropriate placement of Haitian Creole-speaking students. This study compared a standardized instrument, the NYS Place Test, used alone and in combination with a writing sample in English, to subjective judgement of a department coordinator for initial placement of Haitian adult ESOL students in a community education program. The study also investigated whether or not consideration of student profile data improved the accuracy of the test. Finally, the study sought to determine if a relationship existed between student profile data and those who withdrew from the program or did not enter a class after registering. Analysis of the data by crosstabulation and chi-square revealed that the standardized NYS Place Test was at least as accurate as subjective department coordinator placement and that one procedure could be substituted for li other. Although the writing sample in English improved accuracy of placement by the NYS test, the results were not significant. Of the profile variables, only length of residence was found to be significantly related to accuracy of placement using the NYS Place Test. The number of incorrect placements was higher for those students who lived in the host country from twenty-five to one hundred ten months. A post hoc analysis of NYS test scores according to level showed that those learners who placed in level three also had a significantly higher incidence of incorrect placements. No significant relationship was observed between the profile variables and those who withdrew from the program or registered but did not enter a class.
Resumo:
Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a two-fold "custom wrapper" approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases.
Resumo:
Until recently the use of biometrics was restricted to high-security environments and criminal identification applications, for economic and technological reasons. However, in recent years, biometric authentication has become part of daily lives of people. The large scale use of biometrics has shown that users within the system may have different degrees of accuracy. Some people may have trouble authenticating, while others may be particularly vulnerable to imitation. Recent studies have investigated and identified these types of users, giving them the names of animals: Sheep, Goats, Lambs, Wolves, Doves, Chameleons, Worms and Phantoms. The aim of this study is to evaluate the existence of these users types in a database of fingerprints and propose a new way of investigating them, based on the performance of verification between subjects samples. Once introduced some basic concepts in biometrics and fingerprint, we present the biometric menagerie and how to evaluate them.
Resumo:
Until recently the use of biometrics was restricted to high-security environments and criminal identification applications, for economic and technological reasons. However, in recent years, biometric authentication has become part of daily lives of people. The large scale use of biometrics has shown that users within the system may have different degrees of accuracy. Some people may have trouble authenticating, while others may be particularly vulnerable to imitation. Recent studies have investigated and identified these types of users, giving them the names of animals: Sheep, Goats, Lambs, Wolves, Doves, Chameleons, Worms and Phantoms. The aim of this study is to evaluate the existence of these users types in a database of fingerprints and propose a new way of investigating them, based on the performance of verification between subjects samples. Once introduced some basic concepts in biometrics and fingerprint, we present the biometric menagerie and how to evaluate them.
Resumo:
A mosaic of two WorldView-2 high resolution multispectral images (Acquisition dates: October 2010 and April 2012), in conjunction with field survey data, was used to create a habitat map of the Danajon Bank, Philippines (10°15'0'' N, 124°08'0'' E) using an object-based approach. To create the habitat map, we conducted benthic cover (seafloor) field surveys using two methods. Firstly, we undertook georeferenced point intercept transects (English et al., 1997). For ten sites we recorded habitat cover types at 1 m intervals on 10 m long transects (n= 2,070 points). Second, we conducted geo-referenced spot check surveys, by placing a viewing bucket in the water to estimate the percent cover benthic cover types (n = 2,357 points). Survey locations were chosen to cover a diverse and representative subset of habitats found in the Danajon Bank. The combination of methods was a compromise between the higher accuracy of point intercept transects and the larger sample area achievable through spot check surveys (Roelfsema and Phinn, 2008, doi:10.1117/12.804806). Object-based image analysis, using the field data as calibration data, was used to classify the image mosaic at each of the reef, geomorphic and benthic community levels. The benthic community level segregated the image into a total of 17 pure and mixed benthic classes.
Resumo:
The oxygen isotopic composition (d18O) of calcium carbonate of planktonic calcifying organisms is a key tool for reconstructing both past seawater temperature and salinity. The calibration of paloeceanographic proxies relies in general on empirical relationships derived from field experiments on extant species. Laboratory experiments have more often than not revealed that variables other than the target parameter influence the proxy signal, which makes proxy calibration a challenging task. Understanding these secondary or "vital" effects is crucial for increasing proxy accuracy. We present data from laboratory experiments showing that oxygen isotope fractionation during calcification in the coccolithophore Calcidiscus leptoporus and the calcareous dinoflagellate Thoracosphaera heimii is dependent on carbonate chemistry of seawater in addition to its dependence on temperature. A similar result has previously been reported for planktonic foraminifera, supporting the idea that the [CO3]2- effect on d18O is universal for unicellular calcifying planktonic organisms. The slopes of the d18O/[CO3]2- relationships range between -0.0243 per mil/(µmol/kg) (calcareous dinoflagellate T. heimii) and the previously published -0.0022 per mil/(µmol/kg) (non-symbiotic planktonic foramifera Orbulina universa), while C. leptoporus has a slope of -0.0048 per mil/(µmol/kg). We present a simple conceptual model, based on the contribution of d18O-enriched [HCO3]- to the [CO3]2- pool in the calcifying vesicle, which can explain the [CO3]2- effect on d18O for the different unicellular calcifiers. This approach provides a new insight into biological fractionation in calcifying organisms. The large range in d18O/[CO3]2- slopes should possibly be explored as a means for paleoreconstruction of surface [CO3]2-, particularly through comparison of the response in ecologically similar planktonic organisms.
Resumo:
The auditory evoked N1m-P2m response complex presents a challenging case for MEG source-modelling, because symmetrical, phase-locked activity occurs in the hemispheres both contralateral and ipsilateral to stimulation. Beamformer methods, in particular, can be susceptible to localisation bias and spurious sources under these conditions. This study explored the accuracy and efficiency of event-related beamformer source models for auditory MEG data under typical experimental conditions: monaural and diotic stimulation; and whole-head beamformer analysis compared to a half-head analysis using only sensors from the hemisphere contralateral to stimulation. Event-related beamformer localisations were also compared with more traditional single-dipole models. At the group level, the event-related beamformer performed equally well as the single-dipole models in terms of accuracy for both the N1m and the P2m, and in terms of efficiency (number of successful source models) for the N1m. The results yielded by the half-head analysis did not differ significantly from those produced by the traditional whole-head analysis. Any localisation bias caused by the presence of correlated sources is minimal in the context of the inter-individual variability in source localisations. In conclusion, event-related beamformers provide a useful alternative to equivalent-current dipole models in localisation of auditory evoked responses.
Resumo:
Energy efficiency and user comfort have recently become priorities in the Facility Management (FM) sector. This has resulted in the use of innovative building components, such as thermal solar panels, heat pumps, etc., as they have potential to provide better performance, energy savings and increased user comfort. However, as the complexity of components increases, the requirement for maintenance management also increases. The standard routine for building maintenance is inspection which results in repairs or replacement when a fault is found. This routine leads to unnecessary inspections which have a cost with respect to downtime of a component and work hours. This research proposes an alternative routine: performing building maintenance at the point in time when the component is degrading and requires maintenance, thus reducing the frequency of unnecessary inspections. This thesis demonstrates that statistical techniques can be used as part of a maintenance management methodology to invoke maintenance before failure occurs. The proposed FM process is presented through a scenario utilising current Building Information Modelling (BIM) technology and innovative contractual and organisational models. This FM scenario supports a Degradation based Maintenance (DbM) scheduling methodology, implemented using two statistical techniques, Particle Filters (PFs) and Gaussian Processes (GPs). DbM consists of extracting and tracking a degradation metric for a component. Limits for the degradation metric are identified based on one of a number of proposed processes. These processes determine the limits based on the maturity of the historical information available. DbM is implemented for three case study components: a heat exchanger; a heat pump; and a set of bearings. The identified degradation points for each case study, from a PF, a GP and a hybrid (PF and GP combined) DbM implementation are assessed against known degradation points. The GP implementations are successful for all components. For the PF implementations, the results presented in this thesis find that the extracted metrics and limits identify degradation occurrences accurately for components which are in continuous operation. For components which have seasonal operational periods, the PF may wrongly identify degradation. The GP performs more robustly than the PF, but the PF, on average, results in fewer false positives. The hybrid implementations, which are a combination of GP and PF results, are successful for 2 of 3 case studies and are not affected by seasonal data. Overall, DbM is effectively applied for the three case study components. The accuracy of the implementations is dependant on the relationships modelled by the PF and GP, and on the type and quantity of data available. This novel maintenance process can improve equipment performance and reduce energy wastage from BSCs operation.
Resumo:
We propose a novel method to harmonize diffusion MRI data acquired from multiple sites and scanners, which is imperative for joint analysis of the data to significantly increase sample size and statistical power of neuroimaging studies. Our method incorporates the following main novelties: i) we take into account the scanner-dependent spatial variability of the diffusion signal in different parts of the brain; ii) our method is independent of compartmental modeling of diffusion (e.g., tensor, and intra/extra cellular compartments) and the acquired signal itself is corrected for scanner related differences; and iii) inter-subject variability as measured by the coefficient of variation is maintained at each site. We represent the signal in a basis of spherical harmonics and compute several rotation invariant spherical harmonic features to estimate a region and tissue specific linear mapping between the signal from different sites (and scanners). We validate our method on diffusion data acquired from seven different sites (including two GE, three Philips, and two Siemens scanners) on a group of age-matched healthy subjects. Since the extracted rotation invariant spherical harmonic features depend on the accuracy of the brain parcellation provided by Freesurfer, we propose a feature based refinement of the original parcellation such that it better characterizes the anatomy and provides robust linear mappings to harmonize the dMRI data. We demonstrate the efficacy of our method by statistically comparing diffusion measures such as fractional anisotropy, mean diffusivity and generalized fractional anisotropy across multiple sites before and after data harmonization. We also show results using tract-based spatial statistics before and after harmonization for independent validation of the proposed methodology. Our experimental results demonstrate that, for nearly identical acquisition protocol across sites, scanner-specific differences can be accurately removed using the proposed method.
Resumo:
Abstract
Continuous variable is one of the major data types collected by the survey organizations. It can be incomplete such that the data collectors need to fill in the missingness. Or, it can contain sensitive information which needs protection from re-identification. One of the approaches to protect continuous microdata is to sum them up according to different cells of features. In this thesis, I represents novel methods of multiple imputation (MI) that can be applied to impute missing values and synthesize confidential values for continuous and magnitude data.
The first method is for limiting the disclosure risk of the continuous microdata whose marginal sums are fixed. The motivation for developing such a method comes from the magnitude tables of non-negative integer values in economic surveys. I present approaches based on a mixture of Poisson distributions to describe the multivariate distribution so that the marginals of the synthetic data are guaranteed to sum to the original totals. At the same time, I present methods for assessing disclosure risks in releasing such synthetic magnitude microdata. The illustration on a survey of manufacturing establishments shows that the disclosure risks are low while the information loss is acceptable.
The second method is for releasing synthetic continuous micro data by a nonstandard MI method. Traditionally, MI fits a model on the confidential values and then generates multiple synthetic datasets from this model. Its disclosure risk tends to be high, especially when the original data contain extreme values. I present a nonstandard MI approach conditioned on the protective intervals. Its basic idea is to estimate the model parameters from these intervals rather than the confidential values. The encouraging results of simple simulation studies suggest the potential of this new approach in limiting the posterior disclosure risk.
The third method is for imputing missing values in continuous and categorical variables. It is extended from a hierarchically coupled mixture model with local dependence. However, the new method separates the variables into non-focused (e.g., almost-fully-observed) and focused (e.g., missing-a-lot) ones. The sub-model structure of focused variables is more complex than that of non-focused ones. At the same time, their cluster indicators are linked together by tensor factorization and the focused continuous variables depend locally on non-focused values. The model properties suggest that moving the strongly associated non-focused variables to the side of focused ones can help to improve estimation accuracy, which is examined by several simulation studies. And this method is applied to data from the American Community Survey.
Resumo:
The amount and quality of available biomass is a key factor for the sustainable livestock industry and agricultural management related decision making. Globally 31.5% of land cover is grassland while 80% of Ireland’s agricultural land is grassland. In Ireland, grasslands are intensively managed and provide the cheapest feed source for animals. This dissertation presents a detailed state of the art review of satellite remote sensing of grasslands, and the potential application of optical (Moderate–resolution Imaging Spectroradiometer (MODIS)) and radar (TerraSAR-X) time series imagery to estimate the grassland biomass at two study sites (Moorepark and Grange) in the Republic of Ireland using both statistical and state of the art machine learning algorithms. High quality weather data available from the on-site weather station was also used to calculate the Growing Degree Days (GDD) for Grange to determine the impact of ancillary data on biomass estimation. In situ and satellite data covering 12 years for the Moorepark and 6 years for the Grange study sites were used to predict grassland biomass using multiple linear regression, Neuro Fuzzy Inference Systems (ANFIS) models. The results demonstrate that a dense (8-day composite) MODIS image time series, along with high quality in situ data, can be used to retrieve grassland biomass with high performance (R2 = 0:86; p < 0:05, RMSE = 11.07 for Moorepark). The model for Grange was modified to evaluate the synergistic use of vegetation indices derived from remote sensing time series and accumulated GDD information. As GDD is strongly linked to the plant development, or phonological stage, an improvement in biomass estimation would be expected. It was observed that using the ANFIS model the biomass estimation accuracy increased from R2 = 0:76 (p < 0:05) to R2 = 0:81 (p < 0:05) and the root mean square error was reduced by 2.72%. The work on the application of optical remote sensing was further developed using a TerraSAR-X Staring Spotlight mode time series over the Moorepark study site to explore the extent to which very high resolution Synthetic Aperture Radar (SAR) data of interferometrically coherent paddocks can be exploited to retrieve grassland biophysical parameters. After filtering out the non-coherent plots it is demonstrated that interferometric coherence can be used to retrieve grassland biophysical parameters (i. e., height, biomass), and that it is possible to detect changes due to the grass growth, and grazing and mowing events, when the temporal baseline is short (11 days). However, it not possible to automatically uniquely identify the cause of these changes based only on the SAR backscatter and coherence, due to the ambiguity caused by tall grass laid down due to the wind. Overall, the work presented in this dissertation has demonstrated the potential of dense remote sensing and weather data time series to predict grassland biomass using machine-learning algorithms, where high quality ground data were used for training. At present a major limitation for national scale biomass retrieval is the lack of spatial and temporal ground samples, which can be partially resolved by minor modifications in the existing PastureBaseIreland database by adding the location and extent ofeach grassland paddock in the database. As far as remote sensing data requirements are concerned, MODIS is useful for large scale evaluation but due to its coarse resolution it is not possible to detect the variations within the fields and between the fields at the farm scale. However, this issue will be resolved in terms of spatial resolution by the Sentinel-2 mission, and when both satellites (Sentinel-2A and Sentinel-2B) are operational the revisit time will reduce to 5 days, which together with Landsat-8, should enable sufficient cloud-free data for operational biomass estimation at a national scale. The Synthetic Aperture Radar Interferometry (InSAR) approach is feasible if there are enough coherent interferometric pairs available, however this is difficult to achieve due to the temporal decorrelation of the signal. For repeat-pass InSAR over a vegetated area even an 11 days temporal baseline is too large. In order to achieve better coherence a very high resolution is required at the cost of spatial coverage, which limits its scope for use in an operational context at a national scale. Future InSAR missions with pair acquisition in Tandem mode will minimize the temporal decorrelation over vegetation areas for more focused studies. The proposed approach complements the current paradigm of Big Data in Earth Observation, and illustrates the feasibility of integrating data from multiple sources. In future, this framework can be used to build an operational decision support system for retrieval of grassland biophysical parameters based on data from long term planned optical missions (e. g., Landsat, Sentinel) that will ensure the continuity of data acquisition. Similarly, Spanish X-band PAZ and TerraSAR-X2 missions will ensure the continuity of TerraSAR-X and COSMO-SkyMed.
Resumo:
Clinical optical motion capture allows us to obtain kinematic and kinetic outcome measures that aid clinicians in diagnosing and treating different pathologies affecting healthy gait. The long term aim for gait centres is for subject-specific analyses that can predict, prevent, or reverse the effects of pathologies through gait retraining. To track the body, anatomical segment coordinate systems are commonly created by applying markers to the surface of the skin over specific, bony anatomy that is manually palpated. The location and placement of these markers is subjective and precision errors of up to 25mm have been reported [1]. Additionally, the selection of which anatomical landmarks to use in segment models can result in large angular differences; for example angular differences in the trunk can range up to 53o for the same motion depending on marker placement [2]. These errors can result in erroneous kinematic outcomes that either diminish or increase the apparent effects of a treatment or pathology compared to healthy data. Our goal was to improve the accuracy and precision of optical motion capture outcome measures. This thesis describes two separate studies. In the first study we aimed to establish an approach that would allow us to independently quantify the error among trunk models. Using this approach we determined if there was a best model to accurately track trunk motion. In the second study we designed a device to improve precision for test, re-test protocols that would also reduce the set-up time for motion capture experiments. Our method to compare a kinematically derived centre of mass velocity to one that was derived kinetically was successful in quantifying error among trunk models. Our findings indicate that models that use lateral shoulder markers as well as limit the translational degrees of freedom of the trunk through shared pelvic markers result in the least amount of error for the tasks we studied. We also successfully reduced intra- and inter-operator anatomical marker placement errors using a marker alignment device. The improved accuracy and precision resulting from the methods established in this thesis may lead to increased sensitivity to changes in kinematics, and ultimately result in more consistent treatment outcomes.
Resumo:
The need for continuous recording rain gauges makes it difficult to determine the rainfall erosivity factor (R-factor) of the (R)USLE model in areas without good temporal data coverage. In mainland Spain, the Nature Conservation Institute (ICONA) determined the R-factor at few selected pluviographs, so simple estimates of the R-factor are definitely of great interest. The objectives of this study were: (1) to identify a readily available estimate of the R-factor for mainland Spain; (2) to discuss the applicability of a single (global) estimate based on analysis of regional results; (3) to evaluate the effect of record length on estimate precision and accuracy; and (4) to validate an available regression model developed by ICONA. Four estimators based on monthly precipitation were computed at 74 rainfall stations throughout mainland Spain. The regression analysis conducted at a global level clearly showed that modified Fournier index (MFI) ranked first among all assessed indexes. Applicability of this preliminary global model across mainland Spain was evaluated by analyzing regression results obtained at a regional level. It was found that three contiguous regions of eastern Spain (Catalonia, Valencian Community and Murcia) could have a different rainfall erosivity pattern, so a new regression analysis was conducted by dividing mainland Spain into two areas: Eastern Spain and plateau-lowland area. A comparative analysis concluded that the bi-areal regression model based on MFI for a 10-year record length provided a simple, precise and accurate estimate of the R-factor in mainland Spain. Finally, validation of the regression model proposed by ICONA showed that R-ICONA index overpredicted the R-factor by approximately 19%.
Resumo:
A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km2. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question is to what extent the soil geochemistry reflects the underlying geology or superficial deposits. To address this, the geochemical data were transformed using centered log ratios (clr) to observe the requirements of compositional data analysis and avoid closure issues. Following this, compositional multivariate techniques including compositional Principal Component Analysis (PCA) and minimum/maximum autocorrelation factor (MAF) analysis method were used to determine the influence of underlying geology on the soil geochemistry signature. PCA showed that 72% of the variation was determined by the first four principal components (PC’s) implying “significant” structure in the data. Analysis of variance showed that only 10 PC’s were necessary to classify the soil geochemical data. To consider an improvement over PCA that uses the spatial relationships of the data, a classification based on MAF analysis was undertaken using the first 6 dominant factors. Understanding the relationship between soil geochemistry and superficial deposits is important for environmental monitoring of fragile ecosystems such as peat. To explore whether peat cover could be predicted from the classification, the lithology designation was adapted to include the presence of peat, based on GSNI superficial deposit polygons and linear discriminant analysis (LDA) undertaken. Prediction accuracy for LDA classification improved from 60.98% based on PCA using 10 principal components to 64.73% using MAF based on the 6 most dominant factors. The misclassification of peat may reflect degradation of peat covered areas since the creation of superficial deposit classification. Further work will examine the influence of underlying lithologies on elemental concentrations in peat composition and the effect of this in classification analysis.
Resumo:
The viscosity of ionic liquids (ILs) has been modeled as a function of temperature and at atmospheric pressure using a new method based on the UNIFAC–VISCO method. This model extends the calculations previously reported by our group (see Zhao et al. J. Chem. Eng. Data 2016, 61, 2160–2169) which used 154 experimental viscosity data points of 25 ionic liquids for regression of a set of binary interaction parameters and ion Vogel–Fulcher–Tammann (VFT) parameters. Discrepancies in the experimental data of the same IL affect the quality of the correlation and thus the development of the predictive method. In this work, mathematical gnostics was used to analyze the experimental data from different sources and recommend one set of reliable data for each IL. These recommended data (totally 819 data points) for 70 ILs were correlated using this model to obtain an extended set of binary interaction parameters and ion VFT parameters, with a regression accuracy of 1.4%. In addition, 966 experimental viscosity data points for 11 binary mixtures of ILs were collected from literature to establish this model. All the binary data consist of 128 training data points used for the optimization of binary interaction parameters and 838 test data points used for the comparison of the pure evaluated values. The relative average absolute deviation (RAAD) for training and test is 2.9% and 3.9%, respectively.