895 resultados para Data Linkage
Resumo:
This paper proposes an experimental study of quality metrics that can be applied to visual and infrared images acquired from cameras onboard an unmanned ground vehicle (UGV). The relevance of existing metrics in this context is discussed and a novel metric is introduced. Selected metrics are evaluated on data collected by a UGV in clear and challenging environmental conditions, represented in this paper by the presence of airborne dust or smoke.
Resumo:
This document describes large, accurately calibrated and time-synchronised datasets, gathered in controlled environmental conditions, using an unmanned ground vehicle equipped with a wide variety of sensors. These sensors include: multiple laser scanners, a millimetre wave radar scanner, a colour camera and an infra-red camera. Full details of the sensors are given, as well as the calibration parameters needed to locate them with respect to each other and to the platform. This report also specifies the format and content of the data, and the conditions in which the data have been gathered. The data collection was made in two different situations of the vehicle: static and dynamic. The static tests consisted of sensing a fixed ’reference’ terrain, containing simple known objects, from a motionless vehicle. For the dynamic tests, data were acquired from a moving vehicle in various environments, mainly rural, including an open area, a semi-urban zone and a natural area with different types of vegetation. For both categories, data have been gathered in controlled environmental conditions, which included the presence of dust, smoke and rain. Most of the environments involved were static, except for a few specific datasets which involve the presence of a walking pedestrian. Finally, this document presents illustrations of the effects of adverse environmental conditions on sensor data, as a first step towards reliability and integrity in autonomous perceptual systems.
Resumo:
In this paper we present large, accurately calibrated and time-synchronized data sets, gathered outdoors in controlled and variable environmental conditions, using an unmanned ground vehicle (UGV), equipped with a wide variety of sensors. These include four 2D laser scanners, a radar scanner, a color camera and an infrared camera. It provides a full description of the system used for data collection and the types of environments and conditions in which these data sets have been gathered, which include the presence of airborne dust, smoke and rain.
Resumo:
This work aims to promote integrity in autonomous perceptual systems, with a focus on outdoor unmanned ground vehicles equipped with a camera and a 2D laser range finder. A method to check for inconsistencies between the data provided by these two heterogeneous sensors is proposed and discussed. First, uncertainties in the estimated transformation between the laser and camera frames are evaluated and propagated up to the projection of the laser points onto the image. Then, for each pair of laser scan-camera image acquired, the information at corners of the laser scan is compared with the content of the image, resulting in a likelihood of correspondence. The result of this process is then used to validate segments of the laser scan that are found to be consistent with the image, while inconsistent segments are rejected. Experimental results illustrate how this technique can improve the reliability of perception in challenging environmental conditions, such as in the presence of airborne dust.
Resumo:
Server consolidation using virtualization technology has become an important technology to improve the energy efficiency of data centers. Virtual machine placement is the key in the server consolidation technology. In the past few years, many approaches to the virtual machine placement have been proposed. However, existing virtual machine placement approaches consider the energy consumption by physical machines only, but do not consider the energy consumption in communication network, in a data center. However, the energy consumption in the communication network in a data center is not trivial, and therefore should be considered in the virtual machine placement. In our preliminary research, we have proposed a genetic algorithm for a new virtual machine placement problem that considers the energy consumption in both physical machines and the communication network in a data center. Aiming at improving the performance and efficiency of the genetic algorithm, this paper presents a hybrid genetic algorithm for the energy-efficient virtual machine placement problem. Experimental results show that the hybrid genetic algorithm significantly outperforms the original genetic algorithm, and that the hybrid genetic algorithm is scalable.
Resumo:
OBJECTIVES: Four randomized phase II/III trials investigated the addition of cetuximab to platinum-based, first-line chemotherapy in patients with advanced non-small cell lung cancer (NSCLC). A meta-analysis was performed to examine the benefit/risk ratio for the addition of cetuximab to chemotherapy. MATERIALS AND METHODS: The meta-analysis included individual patient efficacy data from 2018 patients and individual patient safety data from 1970 patients comprising respectively the combined intention-to-treat and safety populations of the four trials. The effect of adding cetuximab to chemotherapy was measured by hazard ratios (HRs) obtained using a Cox proportional hazards model and odds ratios calculated by logistic regression. Survival rates at 1 year were calculated. All applied models were stratified by trial. Tests on heterogeneity of treatment effects across the trials and sensitivity analyses were performed for all endpoints. RESULTS: The meta-analysis demonstrated that the addition of cetuximab to chemotherapy significantly improved overall survival (HR 0.88, p=0.009, median 10.3 vs 9.4 months), progression-free survival (HR 0.90, p=0.045, median 4.7 vs 4.5 months) and response (odds ratio 1.46, p<0.001, overall response rate 32.2% vs 24.4%) compared with chemotherapy alone. The safety profile of chemotherapy plus cetuximab in the meta-analysis population was confirmed as manageable. Neither trials nor patient subgroups defined by key baseline characteristics showed significant heterogeneity for any endpoint. CONCLUSION: The addition of cetuximab to platinum-based, first-line chemotherapy for advanced NSCLC significantly improved outcome for all efficacy endpoints with an acceptable safety profile, indicating a favorable benefit/risk ratio.
Resumo:
Modern health information systems can generate several exabytes of patient data, the so called "Health Big Data", per year. Many health managers and experts believe that with the data, it is possible to easily discover useful knowledge to improve health policies, increase patient safety and eliminate redundancies and unnecessary costs. The objective of this paper is to discuss the characteristics of Health Big Data as well as the challenges and solutions for health Big Data Analytics (BDA) – the process of extracting knowledge from sets of Health Big Data – and to design and evaluate a pipelined framework for use as a guideline/reference in health BDA.
Resumo:
This paper uses innovative content analysis techniques to map how the death of Oscar Pistorius' girlfriend, Reeva Steenkamp, was framed on Twitter conversations. Around 1.5 million posts from a two-week timeframe are analyzed with a combination of syntactic and semantic methods. This analysis is grounded in the frame analysis perspective and is different than sentiment analysis. Instead of looking for explicit evaluations, such as “he is guilty” or “he is innocent”, we showcase through the results how opinions can be identified by complex articulations of more implicit symbolic devices such as examples and metaphors repeatedly mentioned. Different frames are adopted by users as more information about the case is revealed: from a more episodic one, highly used in the very beginning, to more systemic approaches, highlighting the association of the event with urban violence, gun control issues, and violence against women. A detailed timeline of the discussions is provided.
Resumo:
After nearly fifteen years of the open access (OA) movement and its hard-fought struggle for a more open scholarly communication system, publishers are realizing that business models can be both open and profitable. Making journal articles available on an OA license is becoming an accepted strategy for maximizing the value of content to both research communities and the businesses that serve them. The first blog in this two-part series celebrating Data Innovation Day looks at the role that data-innovation is playing in the shift to open access for journal articles.
Resumo:
Recent studies have linked the ability of novice (CS1) programmers to read and explain code with their ability to write code. This study extends earlier work by asking CS2 students to explain object-oriented data structures problems that involve recursion. Results show a strong correlation between ability to explain code at an abstract level and performance on code writing and code reading test problems for these object-oriented data structures problems. The authors postulate that there is a common set of skills concerned with reasoning about programs that explains the correlation between writing code and explaining code. The authors suggest that an overly exclusive emphasis on code writing may be detrimental to learning to program. Non-code writing learning activities (e.g., reading and explaining code) are likely to improve student ability to reason about code and, by extension, improve student ability to write code. A judicious mix of code-writing and code-reading activities is recommended.
Resumo:
BACKGROUND Experimental and epidemiologic evidence have suggested that chronic inflammation may play a critical role in endometrial carcinogenesis. METHODS To investigate this hypothesis, a two-stage study was carried out to evaluate single-nucleotide polymorphisms (SNP) in inflammatory pathway genes in association with endometrial cancer risk. In stage I, 64 candidate pathway genes were identified and 4,542 directly genotyped or imputed SNPs were analyzed among 832 endometrial cancer cases and 2,049 controls, using data from the Shanghai Endometrial Cancer Genetics Study. Linkage disequilibrium of stage I SNPs significantly associated with endometrial cancer (P < 0.05) indicated that the majority of associations could be linked to one of 24 distinct loci. One SNP from each of the 24 loci was then selected for follow-up genotyping. Of these, 21 SNPs were successfully designed and genotyped in stage II, which consisted of 10 additional studies including 6,604 endometrial cancer cases and 8,511 controls. RESULTS Five of the 21 SNPs had significant allelic odds ratios (ORs) and 95% confidence intervals (CI) as follows: FABP1, 0.92 (0.85-0.99); CXCL3, 1.16 (1.05-1.29); IL6, 1.08 (1.00-1.17); MSR1, 0.90 (0.82-0.98); and MMP9, 0.91 (0.87-0.97). Two of these polymorphisms were independently significant in the replication sample (rs352038 in CXCL3 and rs3918249 in MMP9). The association for the MMP9 polymorphism remained significant after Bonferroni correction and showed a significant association with endometrial cancer in both Asian- and European-ancestry samples. CONCLUSIONS These findings lend support to the hypothesis that genetic polymorphisms in genes involved in the inflammatory pathway may contribute to genetic susceptibility to endometrial cancer. Impact statement: This study adds to the growing evidence that inflammation plays an important role in endometrial carcinogenesis.
Resumo:
Road networks are a national critical infrastructure. The road assets need to be monitored and maintained efficiently as their conditions deteriorate over time. The condition of one of such assets, road pavement, plays a major role in the road network maintenance programmes. Pavement conditions depend upon many factors such as pavement types, traffic and environmental conditions. This paper presents a data analytics case study for assessing the factors affecting the pavement deflection values measured by the traffic speed deflectometer (TSD) device. The analytics process includes acquisition and integration of data from multiple sources, data pre-processing, mining useful information from them and utilising data mining outputs for knowledge deployment. Data mining techniques are able to show how TSD outputs vary in different roads, traffic and environmental conditions. The generated data mining models map the TSD outputs to some classes and define correction factors for each class.
Resumo:
The quality of data collection methods selected and the integrity of the data collected are integral tot eh success of a study. This chapter focuses on data collection and study validity. After reading the chapter, readers should be able to define types of data collection methods in quantitative research; list advantages and disadvantages of each method; discuss factors related to internal and external validity; critically evaluate data collection methods and discuss the need to operationalise variables of interest for data collection.
Resumo:
A spatial process observed over a lattice or a set of irregular regions is usually modeled using a conditionally autoregressive (CAR) model. The neighborhoods within a CAR model are generally formed deterministically using the inter-distances or boundaries between the regions. An extension of CAR model is proposed in this article where the selection of the neighborhood depends on unknown parameter(s). This extension is called a Stochastic Neighborhood CAR (SNCAR) model. The resulting model shows flexibility in accurately estimating covariance structures for data generated from a variety of spatial covariance models. Specific examples are illustrated using data generated from some common spatial covariance functions as well as real data concerning radioactive contamination of the soil in Switzerland after the Chernobyl accident.
Resumo:
Environmental monitoring is becoming critical as human activity and climate change place greater pressures on biodiversity, leading to an increasing need for data to make informed decisions. Acoustic sensors can help collect data across large areas for extended periods making them attractive in environmental monitoring. However, managing and analysing large volumes of environmental acoustic data is a great challenge and is consequently hindering the effective utilization of the big dataset collected. This paper presents an overview of our current techniques for collecting, storing and analysing large volumes of acoustic data efficiently, accurately, and cost-effectively.