970 resultados para Datasets


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This document describes large, accurately calibrated and time-synchronised datasets, gathered in controlled environmental conditions, using an unmanned ground vehicle equipped with a wide variety of sensors. These sensors include: multiple laser scanners, a millimetre wave radar scanner, a colour camera and an infra-red camera. Full details of the sensors are given, as well as the calibration parameters needed to locate them with respect to each other and to the platform. This report also specifies the format and content of the data, and the conditions in which the data have been gathered. The data collection was made in two different situations of the vehicle: static and dynamic. The static tests consisted of sensing a fixed ’reference’ terrain, containing simple known objects, from a motionless vehicle. For the dynamic tests, data were acquired from a moving vehicle in various environments, mainly rural, including an open area, a semi-urban zone and a natural area with different types of vegetation. For both categories, data have been gathered in controlled environmental conditions, which included the presence of dust, smoke and rain. Most of the environments involved were static, except for a few specific datasets which involve the presence of a walking pedestrian. Finally, this document presents illustrations of the effects of adverse environmental conditions on sensor data, as a first step towards reliability and integrity in autonomous perceptual systems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Malignant Pleural Mesothelioma (MPM) is an aggressive cancer that is often diagnosed at an advanced stage and is characterized by a long latency period (20-40 years between initial exposure and diagnosis) and prior exposure to asbestos. Currently accurate diagnosis of MPM is difficult due to the lack of sensitive biomarkers and despite minor improvements in treatment, median survival rates do not exceed 12 months. Accumulating evidence suggests that aberrant expression of long non-coding RNAs (lncRNAs) play an important functional role in cancer biology. LncRNAs are a class of recently discovered non-protein coding RNAs >200 nucleotides in length with a role in regulating transcription. Here we used NCode long noncoding microarrays to identify differentially expressed lncRNAs potentially involved in MPM pathogenesis. High priority candidate lncRNAs were selected on the basis of statistical (P<0.05) and biological significance (>3-fold difference). Expression levels of 9 candidate lncRNAs were technically validated using RT-qPCR, and biologically validated in three independent test sets: (1) 57 archived MPM tissues obtained from extrapleural pneumonectomy patients, (2) 15 cryopreserved MPM and 3 benign pleura, and (3) an extended panel of 10 MPM cell lines. RT-qPCR analysis demonstrated consistent up-regulation of these lncRNAs in independent datasets. ROC curve analysis showed that two candidates were able to separate benign pleura and MPM with high sensitivity and specificity, and were associated with nodal metastases and survival following induction chemotherapy. These results suggest that lncRNAs have potential to serve as biomarkers in MPM.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The global financial crisis (GFC) in 2008 rocked local, regional, and state economies throughout the world. Several intermediate outcomes of the GFC have been well documented in the literature including loss of jobs and reduced income. Relatively little research has, however, examined the impacts of the GFC on individual level travel behaviour change. To address this shortcoming, HABITAT panel data were employed to estimate a multinomial logit model to examine mode switching behaviour between 2007 (pre-GFC) and 2009 (post-GFC) of a baby boomers cohort in Brisbane, Australia—a city within a developed country that has been on many metrics the least affected by the GFC. In addition, a Poisson regression model was estimated to model the number of trips made by individuals in 2007, 2008, and 2009. The South East Queensland Travel Survey datasets were used to develop this model. Four linear regression models were estimated to assess the effects of the GFC on time allocated to travel during a day: one for each of the three travel modes including public transport, active transport, less environmentally friendly transport; and an overall travel time model irrespective of mode. The results reveal that individuals were more likely to switch to public transport who lost their job or whose income reduced between 2007 and 2009. Individuals also made significantly fewer trips in 2008 and 2009 compared to 2007. Individuals spent significantly less time using less environmentally friendly transport but more time using public transport in 2009. Baby boomers switched to more environmentally friendly travel modes during the GFC.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In order to increase the accuracy of patient positioning for complex radiotherapy treatments various 3D imaging techniques have been developed. MegaVoltage Cone Beam CT (MVCBCT) can utilise existing hardware to implement a 3D imaging modality to aid patient positioning. MVCBCT has been investigated using an unmodified Elekta Precise linac and 15 iView amorphous silicon electronic portal imaging device (EPID). Two methods of delivery and acquisition have been investigated for imaging an anthropomorphic head phantom and quality assurance phantom. Phantom projections were successfully acquired and CT datasets reconstructed using both acquisition methods. Bone, tissue and air were 20 clearly resolvable in both phantoms even with low dose (22 MU) scans. The feasibility of MegaVoltage Cone beam CT was investigated using a standard linac, amorphous silicon EPID and a combination of a free open source reconstruction toolkit as well as custom in-house software written in Matlab. The resultant image quality has 25 been assessed and presented. Although bone, tissue and air were resolvable 2 in all scans, artifacts are present and scan doses are increased when compared with standard portal imaging. The feasibility of MVCBCT with unmodified Elekta Precise linac and EPID has been considered as well as the identification of possible areas for future development in artifact correction techniques to 30 further improve image quality.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The K-means algorithm is one of the most popular techniques in clustering. Nevertheless, the performance of the K-means algorithm depends highly on initial cluster centers and converges to local minima. This paper proposes a hybrid evolutionary programming based clustering algorithm, called PSO-SA, by combining particle swarm optimization (PSO) and simulated annealing (SA). The basic idea is to search around the global solution by SA and to increase the information exchange among particles using a mutation operator to escape local optima. Three datasets, Iris, Wisconsin Breast Cancer, and Ripley’s Glass, have been considered to show the effectiveness of the proposed clustering algorithm in providing optimal clusters. The simulation results show that the PSO-SA clustering algorithm not only has a better response but also converges more quickly than the K-means, PSO, and SA algorithms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An important aspect of decision support systems involves applying sophisticated and flexible statistical models to real datasets and communicating these results to decision makers in interpretable ways. An important class of problem is the modelling of incidence such as fire, disease etc. Models of incidence known as point processes or Cox processes are particularly challenging as they are ‘doubly stochastic’ i.e. obtaining the probability mass function of incidents requires two integrals to be evaluated. Existing approaches to the problem either use simple models that obtain predictions using plug-in point estimates and do not distinguish between Cox processes and density estimation but do use sophisticated 3D visualization for interpretation. Alternatively other work employs sophisticated non-parametric Bayesian Cox process models, but do not use visualization to render interpretable complex spatial temporal forecasts. The contribution here is to fill this gap by inferring predictive distributions of Gaussian-log Cox processes and rendering them using state of the art 3D visualization techniques. This requires performing inference on an approximation of the model on a discretized grid of large scale and adapting an existing spatial-diurnal kernel to the log Gaussian Cox process context.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Many emerging economies are dangling the patent system to stimulate bio-technological innovations with the ultimate premise that these will improve their economic and social growth. The patent system mandates full disclosure of the patented invention in exchange of a temporary exclusive patent right. Recently, however, patent offices have fallen short of complying with such a mandate, especially for genetic inventions. Most patent offices provide only static information about disclosed patent sequences and even some do not keep track of the sequence listing data in their own database. The successful partnership of QUT Library and Cambia exemplifies advocacy in Open Access, Open Innovation and User Participation. The library extends its services to various departments within the university, builds and encourages research networks to complement skills needed to make a contribution in the real world.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Enterprises, both public and private, have rapidly commenced using the benefits of enterprise resource planning (ERP) combined with business analytics and “open data sets” which are often outside the control of the enterprise to gain further efficiencies, build new service operations and increase business activity. In many cases, these business activities are based around relevant software systems hosted in a “cloud computing” environment. “Garbage in, garbage out”, or “GIGO”, is a term long used to describe problems in unqualified dependency on information systems, dating from the 1960s. However, a more pertinent variation arose sometime later, namely “garbage in, gospel out” signifying that with large scale information systems, such as ERP and usage of open datasets in a cloud environment, the ability to verify the authenticity of those data sets used may be almost impossible, resulting in dependence upon questionable results. Illicit data set “impersonation” becomes a reality. At the same time the ability to audit such results may be an important requirement, particularly in the public sector. This paper discusses the need for enhancement of identity, reliability, authenticity and audit services, including naming and addressing services, in this emerging environment and analyses some current technologies that are offered and which may be appropriate. However, severe limitations to addressing these requirements have been identified and the paper proposes further research work in the area.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Enterprise resource planning (ERP) systems are rapidly being combined with “big data” analytics processes and publicly available “open data sets”, which are usually outside the arena of the enterprise, to expand activity through better service to current clients as well as identifying new opportunities. Moreover, these activities are now largely based around relevant software systems hosted in a “cloud computing” environment. However, the over 50- year old phrase related to mistrust in computer systems, namely “garbage in, garbage out” or “GIGO”, is used to describe problems of unqualified and unquestioning dependency on information systems. However, a more relevant GIGO interpretation arose sometime later, namely “garbage in, gospel out” signifying that with large scale information systems based around ERP and open datasets as well as “big data” analytics, particularly in a cloud environment, the ability to verify the authenticity and integrity of the data sets used may be almost impossible. In turn, this may easily result in decision making based upon questionable results which are unverifiable. Illicit “impersonation” of and modifications to legitimate data sets may become a reality while at the same time the ability to audit any derived results of analysis may be an important requirement, particularly in the public sector. The pressing need for enhancement of identity, reliability, authenticity and audit services, including naming and addressing services, in this emerging environment is discussed in this paper. Some current and appropriate technologies currently being offered are also examined. However, severe limitations in addressing the problems identified are found and the paper proposes further necessary research work for the area. (Note: This paper is based on an earlier unpublished paper/presentation “Identity, Addressing, Authenticity and Audit Requirements for Trust in ERP, Analytics and Big/Open Data in a ‘Cloud’ Computing Environment: A Review and Proposal” presented to the Department of Accounting and IT, College of Management, National Chung Chen University, 20 November 2013.)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this study, a machine learning technique called anomaly detection is employed for wind turbine bearing fault detection. Basically, the anomaly detection algorithm is used to recognize the presence of unusual and potentially faulty data in a dataset, which contains two phases: a training phase and a testing phase. Two bearing datasets were used to validate the proposed technique, fault-seeded bearing from a test rig located at Case Western Reserve University to validate the accuracy of the anomaly detection method, and a test to failure data of bearings from the NSF I/UCR Center for Intelligent Maintenance Systems (IMS). The latter data set was used to compare anomaly detection with SVM, a previously well-known applied method, in rapidly finding the incipient faults.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present PAC-Bayes-Empirical-Bernstein inequality. The inequality is based on combination of PAC-Bayesian bounding technique with Empirical Bernstein bound. It allows to take advantage of small empirical variance and is especially useful in regression. We show that when the empirical variance is significantly smaller than the empirical loss PAC-Bayes-Empirical-Bernstein inequality is significantly tighter than PAC-Bayes-kl inequality of Seeger (2002) and otherwise it is comparable. PAC-Bayes-Empirical-Bernstein inequality is an interesting example of application of PAC-Bayesian bounding technique to self-bounding functions. We provide empirical comparison of PAC-Bayes-Empirical-Bernstein inequality with PAC-Bayes-kl inequality on a synthetic example and several UCI datasets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

2011 ‘Arab Spring’ are likely to overstate the impact of Facebook and Twitter on these uprisings, it is nonetheless true that protests and unrest in countries from Tunisia to Syria generated a substantial amount of social media activity. On Twitter alone, several millions of tweets containing the hashtags #libya or #egypt were generated during 2011, both by directly affected citizens of these countries, and by onlookers from further afield. What remains unclear, though, is the extent to which there was any direct interaction between these two groups (especially considering potential language barriers between them). Building on hashtag datasets gathered between January and November 2011, this paper compares patterns of Twitter usage during the popular revolution in Egypt and the civil war in Libya. Using custom-made tools for processing ‘big data’, we examine the volume of tweets sent by English-, Arabic-, and mixed-language Twitter users over time, and examine the networks of interaction (variously through @replying, retweeting, or both) between these groups as they developed and shifted over the course of these uprisings. Examining @reply and retweet traffic, we identify general patterns of information flow between the English- and Arabic-speaking sides of the Twittersphere, and highlight the roles played by users bridging both language spheres.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Traditional nearest points methods use all the samples in an image set to construct a single convex or affine hull model for classification. However, strong artificial features and noisy data may be generated from combinations of training samples when significant intra-class variations and/or noise occur in the image set. Existing multi-model approaches extract local models by clustering each image set individually only once, with fixed clusters used for matching with various image sets. This may not be optimal for discrimination, as undesirable environmental conditions (eg. illumination and pose variations) may result in the two closest clusters representing different characteristics of an object (eg. frontal face being compared to non-frontal face). To address the above problem, we propose a novel approach to enhance nearest points based methods by integrating affine/convex hull classification with an adapted multi-model approach. We first extract multiple local convex hulls from a query image set via maximum margin clustering to diminish the artificial variations and constrain the noise in local convex hulls. We then propose adaptive reference clustering (ARC) to constrain the clustering of each gallery image set by forcing the clusters to have resemblance to the clusters in the query image set. By applying ARC, noisy clusters in the query set can be discarded. Experiments on Honda, MoBo and ETH-80 datasets show that the proposed method outperforms single model approaches and other recent techniques, such as Sparse Approximated Nearest Points, Mutual Subspace Method and Manifold Discriminant Analysis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes a novel system for automatic classification of images obtained from Anti-Nuclear Antibody (ANA) pathology tests on Human Epithelial type 2 (HEp-2) cells using the Indirect Immunofluorescence (IIF) protocol. The IIF protocol on HEp-2 cells has been the hallmark method to identify the presence of ANAs, due to its high sensitivity and the large range of antigens that can be detected. However, it suffers from numerous shortcomings, such as being subjective as well as time and labour intensive. Computer Aided Diagnostic (CAD) systems have been developed to address these problems, which automatically classify a HEp-2 cell image into one of its known patterns (eg. speckled, homogeneous). Most of the existing CAD systems use handpicked features to represent a HEp-2 cell image, which may only work in limited scenarios. We propose a novel automatic cell image classification method termed Cell Pyramid Matching (CPM), which is comprised of regional histograms of visual words coupled with the Multiple Kernel Learning framework. We present a study of several variations of generating histograms and show the efficacy of the system on two publicly available datasets: the ICPR HEp-2 cell classification contest dataset and the SNPHEp-2 dataset.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Person re-identification is particularly challenging due to significant appearance changes across separate camera views. In order to re-identify people, a representative human signature should effectively handle differences in illumination, pose and camera parameters. While general appearance-based methods are modelled in Euclidean spaces, it has been argued that some applications in image and video analysis are better modelled via non-Euclidean manifold geometry. To this end, recent approaches represent images as covariance matrices, and interpret such matrices as points on Riemannian manifolds. As direct classification on such manifolds can be difficult, in this paper we propose to represent each manifold point as a vector of similarities to class representers, via a recently introduced form of Bregman matrix divergence known as the Stein divergence. This is followed by using a discriminative mapping of similarity vectors for final classification. The use of similarity vectors is in contrast to the traditional approach of embedding manifolds into tangent spaces, which can suffer from representing the manifold structure inaccurately. Comparative evaluations on benchmark ETHZ and iLIDS datasets for the person re-identification task show that the proposed approach obtains better performance than recent techniques such as Histogram Plus Epitome, Partial Least Squares, and Symmetry-Driven Accumulation of Local Features.