863 resultados para data redundancy


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Citizen Science projects are initiatives in which members of the general public participate in scientific research projects and perform or manage research-related tasks such as data collection and/or data annotation. Citizen Science is technologically possible and scientifically significant. However, as the gathered information is from the crowd, the data quality is always hard to manage. There are many ways to manage data quality, and reputation management is one of the common approaches. In recent year, many research teams have deployed many audio or image sensors in natural environment in order to monitor the status of animals or plants. The collected data will be analysed by ecologists. However, as the amount of collected data is exceedingly huge and the number of ecologists is very limited, it is impossible for scientists to manually analyse all these data. The functions of existing automated tools to process the data are still very limited and the results are still not very accurate. Therefore, researchers have turned to recruiting general citizens who are interested in helping scientific research to do the pre-processing tasks such as species tagging. Although research teams can save time and money by recruiting general citizens to volunteer their time and skills to help data analysis, the reliability of contributed data varies a lot. Therefore, this research aims to investigate techniques to enhance the reliability of data contributed by general citizens in scientific research projects especially for acoustic sensing projects. In particular, we aim to investigate how to use reputation management to enhance data reliability. Reputation systems have been used to solve the uncertainty and improve data quality in many marketing and E-Commerce domains. The commercial organizations which have chosen to embrace the reputation management and implement the technology have gained many benefits. Data quality issues are significant to the domain of Citizen Science due to the quantity and diversity of people and devices involved. However, research on reputation management in this area is relatively new. We therefore start our investigation by examining existing reputation systems in different domains. Then we design novel reputation management approaches for Citizen Science projects to categorise participants and data. We have investigated some critical elements which may influence data reliability in Citizen Science projects. These elements include personal information such as location and education and performance information such as the ability to recognise certain bird calls. The designed reputation framework is evaluated by a series of experiments involving many participants for collecting and interpreting data, in particular, environmental acoustic data. Our research in exploring the advantages of reputation management in Citizen Science (or crowdsourcing in general) will help increase awareness among organizations that are unacquainted with its potential benefits.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Health complaint statistics are important for identifying problems and bringing about improvements to health care provided by health service providers and to the wider health care system. This paper overviews complaints handling by the eight Australian state and territory health complaint entities, based on an analysis of data from their annual reports. The analysis shows considerable variation between jurisdictions in the ways complaint data are defined, collected and recorded. Complaints from the public are an important accountability mechanism and open a window on service quality. The lack of a national approach leads to fragmentation of complaint data and a lost opportunity to use national data to assist policy development and identify the main areas causing consumers to complain. We need a national approach to complaints data collection in order to better respond to patients’ concerns.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: National physical activity data suggest that there is a considerable difference in physical activity levels of US and Australian adults. Although different surveys (Active Australia and BRFSS) are used, the questions are similar. Different protocols, however, are used to estimate “activity” from the data collected. The primary aim of this study was to assess whether the 2 approaches to the management of PA data could explain some of the difference in prevalence estimates derived from the two national surveys. Methods: Secondary data analysis of the most recent AA survey (N = 2987). Results: 15% of the sample was defined as “active” using Australian criteria but as “inactive” using the BRFSS protocol, even though weekly energy expenditure was commensurate with meeting current guidelines. Younger respondents (age < 45 y) were more likely to be “misclassified” using the BRFSS criteria. Conclusions: The prevalence of activity in Australia and the US appears to be more similar than we had previously thought.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Client owners usually need an estimate or forecast of their likely building costs in advance of detailed design in order to confirm the financial feasibility of their projects. Because of their timing in the project life cycle, these early stage forecasts are characterized by the minimal amount of information available concerning the new (target) project to the point that often only its size and type are known. One approach is to use the mean contract sum of a sample, or base group, of previous projects of a similar type and size to the project for which the estimate is needed. Bernoulli’s law of large numbers implies that this base group should be as large as possible. However, increasing the size of the base group inevitably involves including projects that are less and less similar to the target project. Deciding on the optimal number of base group projects is known as the homogeneity or pooling problem. A method of solving the homogeneity problem is described involving the use of closed form equations to compare three different sampling arrangements of previous projects for their simulated forecasting ability by a cross-validation method, where a series of targets are extracted, with replacement, from the groups and compared with the mean value of the projects in the base groups. The procedure is then demonstrated with 450 Hong Kong projects (with different project types: Residential, Commercial centre, Car parking, Social community centre, School, Office, Hotel, Industrial, University and Hospital) clustered into base groups according to their type and size.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The selection of appropriate analogue materials is a central consideration in the design of realistic physical models. We investigate the rheology of highly-filled silicone polymers in order to find materials with a power-law strain-rate softening rheology suitable for modelling rock deformation by dislocation creep and report the rheological properties of the materials as functions of the filler content. The mixtures exhibit strain-rate softening behaviour but with increasing amounts of filler become strain-dependent. For the strain-independent viscous materials, flow laws are presented while for strain-dependent materials the relative importance of strain and strain rate softening/hardening is reported. If the stress or strain rate is above a threshold value some highly-filled silicone polymers may be considered linear visco-elastic (strain independent) and power-law strain-rate softening. The power-law exponent can be raised from 1 to ~3 by using mixtures of high-viscosity silicone and plasticine. However, the need for high shear strain rates to obtain the power-law rheology imposes some restrictions on the usage of such materials for geodynamic modelling. Two simple shear experiments are presented that use Newtonian and power-law strain-rate softening materials. The results demonstrate how materials with power-law rheology result in better strain localization in analogue experiments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A significant issue encountered when fusing data received from multiple sensors is the accuracy of the timestamp associated with each piece of data. This is particularly important in applications such as Simultaneous Localisation and Mapping (SLAM) where vehicle velocity forms an important part of the mapping algorithms; on fastmoving vehicles, even millisecond inconsistencies in data timestamping can produce errors which need to be compensated for. The timestamping problem is compounded in a robot swarm environment due to the use of non-deterministic readily-available hardware (such as 802.11-based wireless) and inaccurate clock synchronisation protocols (such as Network Time Protocol (NTP)). As a result, the synchronisation of the clocks between robots can be out by tens-to-hundreds of milliseconds making correlation of data difficult and preventing the possibility of the units performing synchronised actions such as triggering cameras or intricate swarm manoeuvres. In this thesis, a complete data fusion unit is designed, implemented and tested. The unit, named BabelFuse, is able to accept sensor data from a number of low-speed communication buses (such as RS232, RS485 and CAN Bus) and also timestamp events that occur on General Purpose Input/Output (GPIO) pins referencing a submillisecondaccurate wirelessly-distributed "global" clock signal. In addition to its timestamping capabilities, it can also be used to trigger an attached camera at a predefined start time and frame rate. This functionality enables the creation of a wirelessly-synchronised distributed image acquisition system over a large geographic area; a real world application for this functionality is the creation of a platform to facilitate wirelessly-distributed 3D stereoscopic vision. A ‘best-practice’ design methodology is adopted within the project to ensure the final system operates according to its requirements. Initially, requirements are generated from which a high-level architecture is distilled. This architecture is then converted into a hardware specification and low-level design, which is then manufactured. The manufactured hardware is then verified to ensure it operates as designed and firmware and Linux Operating System (OS) drivers are written to provide the features and connectivity required of the system. Finally, integration testing is performed to ensure the unit functions as per its requirements. The BabelFuse System comprises of a single Grand Master unit which is responsible for maintaining the absolute value of the "global" clock. Slave nodes then determine their local clock o.set from that of the Grand Master via synchronisation events which occur multiple times per-second. The mechanism used for synchronising the clocks between the boards wirelessly makes use of specific hardware and a firmware protocol based on elements of the IEEE-1588 Precision Time Protocol (PTP). With the key requirement of the system being submillisecond-accurate clock synchronisation (as a basis for timestamping and camera triggering), automated testing is carried out to monitor the o.sets between each Slave and the Grand Master over time. A common strobe pulse is also sent to each unit for timestamping; the correlation between the timestamps of the di.erent units is used to validate the clock o.set results. Analysis of the automated test results show that the BabelFuse units are almost threemagnitudes more accurate than their requirement; clocks of the Slave and Grand Master units do not di.er by more than three microseconds over a running time of six hours and the mean clock o.set of Slaves to the Grand Master is less-than one microsecond. The common strobe pulse used to verify the clock o.set data yields a positive result with a maximum variation between units of less-than two microseconds and a mean value of less-than one microsecond. The camera triggering functionality is verified by connecting the trigger pulse output of each board to a four-channel digital oscilloscope and setting each unit to output a 100Hz periodic pulse with a common start time. The resulting waveform shows a maximum variation between the rising-edges of the pulses of approximately 39¥ìs, well below its target of 1ms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mandatory data breach notification laws are a novel statutory solution in relation to organizational protections of personal information. They require organizations which have suffered a breach of security involving personal information to notif'y those persons whose information may have been affected. These laws originated in the state based legislatures of the United States during the last decade and have subsequently garnered worldwide legislative interest. Despite their perceived utility, mandatory data breach notification laws have several conceptual and practical concems that limit the scope of their applicability, particularly in relation to existing information privacy law regimes. We outline these concerns, and in doing so, we contend that while mandatory data breach notification laws have many useful facets, their utility as an 'add-on' to enhance the failings of current information privacy law frameworks should not necessarily be taken for granted.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

longitudinal study of data modelling across grades 1-3. The activity engaged children in designing, implementing, and analysing a survey about their new playground. Data modelling involves investigations of meaningful phenomena, deciding what is worthy of attention (identifying complex attributes), and then progressing to organising, structuring, visualising, and representing data. The core components of data modelling addressed here are children’s structuring and representing of data, with a focus on their display of metarepresentational competence (diSessa, 2004). Such competence includes students’ abilities to invent or design a variety of new representations, explain their creations, understand the role they play, and critique and compare the adequacy of representations. Reported here are the ways in which the children structured and represented their data, the metarepresentational competence displayed, and links between their metarepresentational competence and conceptual competence.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset. Results We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours. Conclusions We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is also proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Experiments based on several real-world data collections demonstrate that WebPut outperforms existing approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an input-orientated data envelopment analysis (DEA) framework which allows the measurement and decomposition of economic, environmental and ecological efficiency levels in agricultural production across different countries. Economic, environmental and ecological optimisations search for optimal input combinations that minimise total costs, total amount of nutrients, and total amount of cumulative exergy contained in inputs respectively. The application of the framework to an agricultural dataset of 30 OECD countries revealed that (i) there was significant scope to make their agricultural production systemsmore environmentally and ecologically sustainable; (ii) the improvement in the environmental and ecological sustainability could be achieved by being more technically efficient and, even more significantly, by changing the input combinations; (iii) the rankings of sustainability varied significantly across OECD countries within frontier-based environmental and ecological efficiency measures and between frontier-based measures and indicators.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The ability to forecast machinery health is vital to reducing maintenance costs, operation downtime and safety hazards. Recent advances in condition monitoring technologies have given rise to a number of prognostic models which attempt to forecast machinery health based on condition data such as vibration measurements. This paper demonstrates how the population characteristics and condition monitoring data (both complete and suspended) of historical items can be integrated for training an intelligent agent to predict asset health multiple steps ahead. The model consists of a feed-forward neural network whose training targets are asset survival probabilities estimated using a variation of the Kaplan–Meier estimator and a degradation-based failure probability density function estimator. The trained network is capable of estimating the future survival probabilities when a series of asset condition readings are inputted. The output survival probabilities collectively form an estimated survival curve. Pump data from a pulp and paper mill were used for model validation and comparison. The results indicate that the proposed model can predict more accurately as well as further ahead than similar models which neglect population characteristics and suspended data. This work presents a compelling concept for longer-range fault prognosis utilising available information more fully and accurately.