918 resultados para data quality issues
Resumo:
Smartphones and other powerful sensor-equipped consumer devices make it possible to sense the physical world at an unprecedented scale. Nearly 2 million Android and iOS devices are activated every day, each carrying numerous sensors and a high-speed internet connection. Whereas traditional sensor networks have typically deployed a fixed number of devices to sense a particular phenomena, community networks can grow as additional participants choose to install apps and join the network. In principle, this allows networks of thousands or millions of sensors to be created quickly and at low cost. However, making reliable inferences about the world using so many community sensors involves several challenges, including scalability, data quality, mobility, and user privacy.
This thesis focuses on how learning at both the sensor- and network-level can provide scalable techniques for data collection and event detection. First, this thesis considers the abstract problem of distributed algorithms for data collection, and proposes a distributed, online approach to selecting which set of sensors should be queried. In addition to providing theoretical guarantees for submodular objective functions, the approach is also compatible with local rules or heuristics for detecting and transmitting potentially valuable observations. Next, the thesis presents a decentralized algorithm for spatial event detection, and describes its use detecting strong earthquakes within the Caltech Community Seismic Network. Despite the fact that strong earthquakes are rare and complex events, and that community sensors can be very noisy, our decentralized anomaly detection approach obtains theoretical guarantees for event detection performance while simultaneously limiting the rate of false alarms.
Resumo:
The purpose of the project is to improve our understanding about best management practices that can be utilized on diked managed wetlands in Suisun Marsh for reducing the occurrence of low dissolved oxygen (DO) and high methylmercury (MeHg) events associated primarily with fall flood-up practices. Low DO events are of concern because they can lead to undue stress and even mortality of sensitive aquatic organisms. Elevated MeHg levels are of concern because MeHg is a neurotoxin that bio-magnifies up the food chain and can cause deleterious effects to higher trophic level consumers such as piscivorous fish, birds, and mammals (including humans). This study involved two years (2007-2008) of intensive field data collection at two managed wetland sites in northwest Suisun Marsh and their surrounding tidal sloughs, an area with prior documented low DO events. In addition, the study collected limited soils and water quality field data and mapped vegetation for three managed wetland sites in the central interior of Suisun Marsh, for the purpose of examining whether wetlands at other locations exhibit characteristics that could indicate potential for similar concerns. In Year 1 of the study, the objective was to identify the baseline conditions in the managed wetlands and determine which physical management conditions could be modified for Year 2 to reduce low DO and MeHg production issues most effectively. The objective of Year 2 was to evaluate the effectiveness of these modified management actions at reducing production of low DO and elevated MeHg conditions within the managed wetlands and to continue improving understanding of the underlying biogeochemical processes at play. This Final Evaluation Memorandum examined a total of 19 BMPs, 14 involving modified water management operations and the remaining five involving modified soil and vegetation management practices. Some of these BMPs were previously employed and others have not yet been tested. For each BMP this report assesses its efficacy in improving water quality conditions and potential conflicts with wetland management. It makes recommendations for further study (either feasibility assessments or field testing) and whether to consider for future use. Certain previously used BMPs were found to be important contributors to poor water quality conditions and their continued use is not recommended. Some BMPs that could improve water quality conditions appear difficult to implement in regards to compatibility with wetland management; these BMPs require further elaboration and feasibility assessment to determine whether they should be field tested. In practice for any given wetland, there is likely a combination of BMPs that would together have the greatest potential to address the low DO and high MeHg water quality concerns. Consequently, this report makes no sweeping recommendations applicable to large groups of wetlands but instead promotes a careful consideration of factors at each wetland or small groups of wetlands and from that assessment to apply the most effective suite of BMPs. This report also identifies a number of recommended future actions and studies. These recommendations are geared toward improving the process understanding of factors that promote low DO and high MeHg conditions, the extent of these problems in Suisun Marsh, the regulatory basis for the DO standards for a large estuarine marsh, the economics of BMPs, and alternative approaches to BMPs on diked managed wetlands that may address the water quality issues. The most important of these recommendations is that future BMP implementation should be carried out within the context of rigorous scientific evaluation so as to gain the maximum improvement in how to manage these water quality issues in the diked managed wetlands of Suisun Marsh.
Resumo:
Technology-supported citizen science has created huge volumes of data with increasing potential to facilitate scientific progress, however, verifying data quality is still a substantial hurdle due to the limitations of existing data quality mechanisms. In this study, we adopted a mixed methods approach to investigate community-based data validation practices and the characteristics of records of wildlife species observations that affected the outcomes of collaborative data quality management in an online community where people record what they see in the nature. The findings describe the processes that both relied upon and added to information provenance through information stewardship behaviors, which led to improved reliability and informativity. The likelihood of community-based validation interactions were predicted by several factors, including the types of organisms observed and whether the data were submitted from a mobile device. We conclude with implications for technology design, citizen science practices, and research.
Resumo:
The GEOTRACES Intermediate Data Product 2014 (IDP2014) is the first publicly available data product of the international GEOTRACES programme, and contains data measured and quality controlled before the end of 2013. It consists of two parts: (1) a compilation of digital data for more than 200 trace elements and isotopes (TEls) as well as classical hydrographic parameters, and (2) the eGEOTRACES Electronic Atlas providing a strongly inter-linked on-line atlas including more than 300 section plots and 90 animated 3D scenes. The IDP2014 covers the Atlantic, Arctic, and Indian oceans, exhibiting highest data density in the Atlantic. The TEI data in the IDP2014 are quality controlled by careful assessment of intercalibration results and multi-laboratory data comparisons at cross-over stations. The digital data are provided in several formats, including ASCII spreadsheet, Excel spreadsheet, netCDF, and Ocean Data View collection. In addition to the actual data values the IDP2014 also contains data quality flags and 1-sigma data error values where available. Quality flags and error values are useful for data filtering. Metadata about data originators, analytical methods and original publications related to the data are linked to the data in an easily accessible way. The eGEOTRACES Electronic Atlas is the visual representation of the IDP2014 data providing section plots and a new kind of animated 3D scenes. The basin-wide 3D scenes allow for viewing of data from many cruises at the same time, thereby providing quick overviews of large-scale tracer distributions. In addition, the 3D scenes provide geographical and bathymetric context that is crucial for the interpretation and assessment of observed tracer plumes, as well as for making inferences about controlling processes.
Resumo:
Although data quality and weighting decisions impact the outputs of reserve selection algorithms, these factors have not been closely studied. We examine these methodological issues in the use of reserve selection algorithms by comparing: (1) quality of input data and (2) use of different weighting methods for prioritizing among species. In 2003, the government of Madagascar, a global biodiversity hotspot, committed to tripling the size of its protected area network to protect 10% of the country’s total land area. We apply the Zonation reserve selection algorithm to distribution data for 52 lemur species to identify priority areas for the expansion of Madagascar’s reserve network. We assess the similarity of the areas selected, as well as the proportions of lemur ranges protected in the resulting areas when different forms of input data were used: extent of occurrence versus refined extent of occurrence. Low overlap between the areas selected suggests that refined extent of occurrence data are highly desirable, and to best protect lemur species, we recommend refining extent of occurrence ranges using habitat and altitude limitations. Reserve areas were also selected for protection based on three different species weighting schemes, resulting in marked variation in proportional representation of species among the IUCN Red List of Threatened Species extinction risk categories. This result demonstrates that assignment of species weights influences whether a reserve network prioritizes maximizing overall species protection or maximizing protection of the most threatened species.
Resumo:
In many environmental valuation applications standard sample sizes for choice modelling surveys are impractical to achieve. One can improve data quality using more in-depth surveys administered to fewer respondents. We report on a study using high quality rank-ordered data elicited with the best-worst approach. The resulting "exploded logit" choice model, estimated on 64 responses per person, was used to study the willingness to pay for external benefits by visitors for policies which maintain the cultural heritage of alpine grazing commons. We find evidence supporting this approach and reasonable estimates of mean WTP, which appear theoretically valid and policy informative. © The Author (2011).
Resumo:
Introduction
The use of video capture of lectures in Higher Education is not a recent occurrence with web based learning technologies including digital recording of live lectures becoming increasing commonly offered by universities throughout the world (Holliman and Scanlon, 2004). However in the past decade the increase in technical infrastructural provision including the availability of high speed broadband has increased the potential and use of videoed lecture capture. This had led to a variety of lecture capture formats including pod casting, live streaming or delayed broadcasting of whole or part of lectures.
Additionally in the past five years there has been a significant increase in the popularity of online learning, specifically via Massive Open Online Courses (MOOCs) (Vardi, 2014). One of the key aspects of MOOCs is the simulated recording of lecture like activities. There has been and continues to be much debate on the consequences of the popularity of MOOCs, especially in relation to its potential uses within established University programmes.
There have been a number of studies dedicated to the effects of videoing lectures.
The clustered areas of research in video lecture capture have the following main themes:
• Staff perceptions including attendance, performance of students and staff workload
• Reinforcement versus replacement of lectures
• Improved flexibility of learning
• Facilitating engaging and effective learning experiences
• Student usage, perception and satisfaction
• Facilitating students learning at their own pace
Most of the body of the research has concentrated on student and faculty perceptions, including academic achievement, student attendance and engagement (Johnston et al, 2012).
Generally the research has been positive in review of the benefits of lecture capture for both students and faculty. This perception coupled with technical infrastructure improvements and student demand may well mean that the use of video lecture capture will continue to increase in frequency in the next number of years in tertiary education. However there is a relatively limited amount of research in the effects of lecture capture specifically in the area of computer programming with Watkins 2007 being one of few studies . Video delivery of programming solutions is particularly useful for enabling a lecturer to illustrate the complex decision making processes and iterative nature of the actual code development process (Watkins et al 2007). As such research in this area would appear to be particularly appropriate to help inform debate and future decisions made by policy makers.
Research questions and objectives
The purpose of the research was to investigate how a series of lecture captures (in which the audio of lectures and video of on-screen projected content were recorded) impacted on the delivery and learning of a programme of study in an MSc Software Development course in Queen’s University, Belfast, Northern Ireland. The MSc is conversion programme, intended to take graduates from non-computing primary degrees and upskill them in this area. The research specifically targeted the Java programming module within the course. It also analyses and reports on the empirical data from attendances and various video viewing statistics. In addition, qualitative data was collected from staff and student feedback to help contextualise the quantitative results.
Methodology, Methods and Research Instruments Used
The study was conducted with a cohort of 85 post graduate students taking a compulsory module in Java programming in the first semester of a one year MSc in Software Development. A pre-course survey of students found that 58% preferred to have available videos of “key moments” of lectures rather than whole lectures. A large scale study carried out by Guo concluded that “shorter videos are much more engaging” (Guo 2013). Of concern was the potential for low audience retention for videos of whole lectures.
The lecturers recorded snippets of the lecture directly before or after the actual physical delivery of the lecture, in a quiet environment and then upload the video directly to a closed YouTube channel. These snippets generally concentrated on significant parts of the theory followed by theory related coding demonstration activities and were faithful in replication of the face to face lecture. Generally each lecture was supported by two to three videos of durations ranging from 20 – 30 minutes.
Attendance
The MSc programme has several attendance based modules of which Java Programming was one element. In order to assess the consequence on attendance for the Programming module a control was established. The control used was a Database module which is taken by the same students and runs in the same semester.
Access engagement
The videos were hosted on a closed YouTube channel made available only to the students in the class. The channel had enabled analytics which reported on the following areas for all and for each individual video; views (hits), audience retention, viewing devices / operating systems used and minutes watched.
Student attitudes
Three surveys were taken in regard to investigating student attitudes towards the videoing of lectures. The first was before the start of the programming module, then at the mid-point and subsequently after the programme was complete.
The questions in the first survey were targeted at eliciting student attitudes towards lecture capture before they had experienced it in the programme. The midpoint survey gathered data in relation to how the students were individually using the system up to that point. This included feedback on how many videos an individual had watched, viewing duration, primary reasons for watching and the result on attendance, in addition to probing for comments or suggestions. The final survey on course completion contained questions similar to the midpoint survey but in summative view of the whole video programme.
Conclusions and Outcomes
The study confirmed findings of other such investigations illustrating that there is little or no effect on attendance at lectures. The use of the videos appears to help promote continual learning but they are particularly accessed by students at assessment periods. Students respond positively to the ability to access lectures digitally, as a means of reinforcing learning experiences rather than replacing them. Feedback from students was overwhelmingly positive indicating that the videos benefited their learning. Also there are significant benefits to part recording of lectures rather than recording whole lectures. The behaviour viewing trends analytics suggest that despite the increase in the popularity of online learning via MOOCs and the promotion of video learning on mobile devices in fact in this study the vast majority of students accessed the online videos at home on laptops or desktops However, in part, this is likely due to the nature of the taught subject, that being programming.
The research involved prerecording the lecture in smaller timed units and then uploading for distribution to counteract existing quality issues with recording entire live lectures. However the advancement and consequential improvement in quality of in situ lecture capture equipment may well help negate the need to record elsewhere. The research has also highlighted an area of potentially very significant use for performance analysis and improvement that could have major implications for the quality of teaching. A study of the analytics of the viewings of the videos could well provide a quick response formative feedback mechanism for the lecturer. If a videoed lecture either recorded live or later is a true reflection of the face to face lecture an analysis of the viewing patterns for the video may well reveal trends that correspond with the live delivery.
Resumo:
O objeto principal desta tese é o estudo de algoritmos de processamento e representação automáticos de dados, em particular de informação obtida por sensores montados a bordo de veículos (2D e 3D), com aplicação em contexto de sistemas de apoio à condução. O trabalho foca alguns dos problemas que, quer os sistemas de condução automática (AD), quer os sistemas avançados de apoio à condução (ADAS), enfrentam hoje em dia. O documento é composto por duas partes. A primeira descreve o projeto, construção e desenvolvimento de três protótipos robóticos, incluindo pormenores associados aos sensores montados a bordo dos robôs, algoritmos e arquitecturas de software. Estes robôs foram utilizados como plataformas de ensaios para testar e validar as técnicas propostas. Para além disso, participaram em várias competições de condução autónoma tendo obtido muito bons resultados. A segunda parte deste documento apresenta vários algoritmos empregues na geração de representações intermédias de dados sensoriais. Estes podem ser utilizados para melhorar técnicas já existentes de reconhecimento de padrões, deteção ou navegação, e por este meio contribuir para futuras aplicações no âmbito dos AD ou ADAS. Dado que os veículos autónomos contêm uma grande quantidade de sensores de diferentes naturezas, representações intermédias são particularmente adequadas, pois podem lidar com problemas relacionados com as diversas naturezas dos dados (2D, 3D, fotométrica, etc.), com o carácter assíncrono dos dados (multiplos sensores a enviar dados a diferentes frequências), ou com o alinhamento dos dados (problemas de calibração, diferentes sensores a disponibilizar diferentes medições para um mesmo objeto). Neste âmbito, são propostas novas técnicas para a computação de uma representação multi-câmara multi-modal de transformação de perspectiva inversa, para a execução de correcção de côr entre imagens de forma a obter mosaicos de qualidade, ou para a geração de uma representação de cena baseada em primitivas poligonais, capaz de lidar com grandes quantidades de dados 3D e 2D, tendo inclusivamente a capacidade de refinar a representação à medida que novos dados sensoriais são recebidos.
Resumo:
This paper presents the SmartClean tool. The purpose of this tool is to detect and correct the data quality problems (DQPs). Compared with existing tools, SmartClean has the following main advantage: the user does not need to specify the execution sequence of the data cleaning operations. For that, an execution sequence was developed. The problems are manipulated (i.e., detected and corrected) following that sequence. The sequence also supports the incremental execution of the operations. In this paper, the underlying architecture of the tool is presented and its components are described in detail. The tool's validity and, consequently, of the architecture is demonstrated through the presentation of a case study. Although SmartClean has cleaning capabilities in all other levels, in this paper are only described those related with the attribute value level.
Resumo:
The emergence of new business models, namely, the establishment of partnerships between organizations, the chance that companies have of adding existing data on the web, especially in the semantic web, to their information, led to the emphasis on some problems existing in databases, particularly related to data quality. Poor data can result in loss of competitiveness of the organizations holding these data, and may even lead to their disappearance, since many of their decision-making processes are based on these data. For this reason, data cleaning is essential. Current approaches to solve these problems are closely linked to database schemas and specific domains. In order that data cleaning can be used in different repositories, it is necessary for computer systems to understand these data, i.e., an associated semantic is needed. The solution presented in this paper includes the use of ontologies: (i) for the specification of data cleaning operations and, (ii) as a way of solving the semantic heterogeneity problems of data stored in different sources. With data cleaning operations defined at a conceptual level and existing mappings between domain ontologies and an ontology that results from a database, they may be instantiated and proposed to the expert/specialist to be executed over that database, thus enabling their interoperability.
Resumo:
Dissertação para a obtenção do grau de Mestre em Engenharia Electrotécnica Ramo de Automação e Electrónica Industrial
Resumo:
The enhanced functional sensitivity offered by ultra-high field imaging may significantly benefit simultaneous EEG-fMRI studies, but the concurrent increases in artifact contamination can strongly compromise EEG data quality. In the present study, we focus on EEG artifacts created by head motion in the static B0 field. A novel approach for motion artifact detection is proposed, based on a simple modification of a commercial EEG cap, in which four electrodes are non-permanently adapted to record only magnetic induction effects. Simultaneous EEG-fMRI data were acquired with this setup, at 7T, from healthy volunteers undergoing a reversing-checkerboard visual stimulation paradigm. Data analysis assisted by the motion sensors revealed that, after gradient artifact correction, EEG signal variance was largely dominated by pulse artifacts (81-93%), but contributions from spontaneous motion (4-13%) were still comparable to or even larger than those of actual neuronal activity (3-9%). Multiple approaches were tested to determine the most effective procedure for denoising EEG data incorporating motion sensor information. Optimal results were obtained by applying an initial pulse artifact correction step (AAS-based), followed by motion artifact correction (based on the motion sensors) and ICA denoising. On average, motion artifact correction (after AAS) yielded a 61% reduction in signal power and a 62% increase in VEP trial-by-trial consistency. Combined with ICA, these improvements rose to a 74% power reduction and an 86% increase in trial consistency. Overall, the improvements achieved were well appreciable at single-subject and single-trial levels, and set an encouraging quality mark for simultaneous EEG-fMRI at ultra-high field.
Resumo:
The main objective of this master’s thesis is to examine if Weibull analysis is suitable method for warranty forecasting in the Case Company. The Case Company has used Reliasoft’s Weibull++ software, which is basing on the Weibull method, but the Company has noticed that the analysis has not given right results. This study was conducted making Weibull simulations in different profit centers of the Case Company and then comparing actual cost and forecasted cost. Simula-tions were made using different time frames and two methods for determining future deliveries. The first sub objective is to examine, which parameters of simulations will give the best result to each profit center. The second sub objective of this study is to create a simple control model for following forecasted costs and actual realized costs. The third sub objective is to document all Qlikview-parameters of profit centers. This study is a constructive research, and solutions for company’s problems are figured out in this master’s thesis. In the theory parts were introduced quality issues, for example; what is quality, quality costing and cost of poor quality. Quality is one of the major aspects in the Case Company, so understand-ing the link between quality and warranty forecasting is important. Warranty management was also introduced and other different tools for warranty forecasting. The Weibull method and its mathematical properties and reliability engineering were introduced. The main results of this master’s thesis are that the Weibull analysis forecasted too high costs, when calculating provision. Although, some forecasted values of profit centers were lower than actual values, the method works better for planning purposes. One of the reasons is that quality improving or alternatively quality decreasing is not showing in the results of the analysis in the short run. The other reason for too high values is that the products of the Case Company are com-plex and analyses were made in the profit center-level. The Weibull method was developed for standard products, but products of the Case Company consists of many complex components. According to the theory, this method was developed for homogeneous-data. So the most im-portant notification is that the analysis should be made in the product level, not the profit center level, when the data is more homogeneous.
Resumo:
The amateur birding community has a long and proud tradition of contributing to bird surveys and bird atlases. Coordinated activities such as Breeding Bird Atlases and the Christmas Bird Count are examples of "citizen science" projects. With the advent of technology, Web 2.0 sites such as eBird have been developed to facilitate online sharing of data and thus increase the potential for real-time monitoring. However, as recently articulated in an editorial in this journal and elsewhere, monitoring is best served when based on a priori hypotheses. Harnessing citizen scientists to collect data following a hypothetico-deductive approach carries challenges. Moreover, the use of citizen science in scientific and monitoring studies has raised issues of data accuracy and quality. These issues are compounded when data collection moves into the Web 2.0 world. An examination of the literature from social geography on the concept of "citizen sensors" and volunteered geographic information (VGI) yields thoughtful reflections on the challenges of data quality/data accuracy when applying information from citizen sensors to research and management questions. VGI has been harnessed in a number of contexts, including for environmental and ecological monitoring activities. Here, I argue that conceptualizing a monitoring project as an experiment following the scientific method can further contribute to the use of VGI. I show how principles of experimental design can be applied to monitoring projects to better control for data quality of VGI. This includes suggestions for how citizen sensors can be harnessed to address issues of experimental controls and how to design monitoring projects to increase randomization and replication of sampled data, hence increasing scientific reliability and statistical power.