97 resultados para Noisy corpora.


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The application of the Bluetooth (BT) technology to transportation has been enabling researchers to make accurate travel time observations, in freeway and arterial roads. The Bluetooth traffic data are generally incomplete, for they only relate to those vehicles that are equipped with Bluetooth devices, and that are detected by the Bluetooth sensors of the road network. The fraction of detected vehicles versus the total number of transiting vehicles is often referred to as Bluetooth Penetration Rate (BTPR). The aim of this study is to precisely define the spatio-temporal relationship between the quantities that become available through the partial, noisy BT observations; and the hidden variables that describe the actual dynamics of vehicular traffic. To do so, we propose to incorporate a multi- class traffic model into a Sequential Montecarlo Estimation algorithm. Our framework has been applied for the empirical travel time investigations into the Brisbane Metropolitan region.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Big Data presents many challenges related to volume, whether one is interested in studying past datasets or, even more problematically, attempting to work with live streams of data. The most obvious challenge, in a ‘noisy’ environment such as contemporary social media, is to collect the pertinent information; be that information for a specific study, tweets which can inform emergency services or other responders to an ongoing crisis, or give an advantage to those involved in prediction markets. Often, such a process is iterative, with keywords and hashtags changing with the passage of time, and both collection and analytic methodologies need to be continually adapted to respond to this changing information. While many of the data sets collected and analyzed are preformed, that is they are built around a particular keyword, hashtag, or set of authors, they still contain a large volume of information, much of which is unnecessary for the current purpose and/or potentially useful for future projects. Accordingly, this panel considers methods for separating and combining data to optimize big data research and report findings to stakeholders. The first paper considers possible coding mechanisms for incoming tweets during a crisis, taking a large stream of incoming tweets and selecting which of those need to be immediately placed in front of responders, for manual filtering and possible action. The paper suggests two solutions for this, content analysis and user profiling. In the former case, aspects of the tweet are assigned a score to assess its likely relationship to the topic at hand, and the urgency of the information, whilst the latter attempts to identify those users who are either serving as amplifiers of information or are known as an authoritative source. Through these techniques, the information contained in a large dataset could be filtered down to match the expected capacity of emergency responders, and knowledge as to the core keywords or hashtags relating to the current event is constantly refined for future data collection. The second paper is also concerned with identifying significant tweets, but in this case tweets relevant to particular prediction market; tennis betting. As increasing numbers of professional sports men and women create Twitter accounts to communicate with their fans, information is being shared regarding injuries, form and emotions which have the potential to impact on future results. As has already been demonstrated with leading US sports, such information is extremely valuable. Tennis, as with American Football (NFL) and Baseball (MLB) has paid subscription services which manually filter incoming news sources, including tweets, for information valuable to gamblers, gambling operators, and fantasy sports players. However, whilst such services are still niche operations, much of the value of information is lost by the time it reaches one of these services. The paper thus considers how information could be filtered from twitter user lists and hash tag or keyword monitoring, assessing the value of the source, information, and the prediction markets to which it may relate. The third paper examines methods for collecting Twitter data and following changes in an ongoing, dynamic social movement, such as the Occupy Wall Street movement. It involves the development of technical infrastructure to collect and make the tweets available for exploration and analysis. A strategy to respond to changes in the social movement is also required or the resulting tweets will only reflect the discussions and strategies the movement used at the time the keyword list is created — in a way, keyword creation is part strategy and part art. In this paper we describe strategies for the creation of a social media archive, specifically tweets related to the Occupy Wall Street movement, and methods for continuing to adapt data collection strategies as the movement’s presence in Twitter changes over time. We also discuss the opportunities and methods to extract data smaller slices of data from an archive of social media data to support a multitude of research projects in multiple fields of study. The common theme amongst these papers is that of constructing a data set, filtering it for a specific purpose, and then using the resulting information to aid in future data collection. The intention is that through the papers presented, and subsequent discussion, the panel will inform the wider research community not only on the objectives and limitations of data collection, live analytics, and filtering, but also on current and in-development methodologies that could be adopted by those working with such datasets, and how such approaches could be customized depending on the project stakeholders.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A security system based on the recognition of the iris of human eyes using the wavelet transform is presented. The zero-crossings of the wavelet transform are used to extract the unique features obtained from the grey-level profiles of the iris. The recognition process is performed in two stages. The first stage consists of building a one-dimensional representation of the grey-level profiles of the iris, followed by obtaining the wavelet transform zerocrossings of the resulting representation. The second stage is the matching procedure for iris recognition. The proposed approach uses only a few selected intermediate resolution levels for matching, thus making it computationally efficient as well as less sensitive to noise and quantisation errors. A normalisation process is implemented to compensate for size variations due to the possible changes in the camera-to-face distance. The technique has been tested on real images in both noise-free and noisy conditions. The technique is being investigated for real-time implementation, as a stand-alone system, for access control to high-security areas.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Guaranteeing the quality of extracted features that describe relevant knowledge to users or topics is a challenge because of the large number of extracted features. Most popular existing term-based feature selection methods suffer from noisy feature extraction, which is irrelevant to the user needs (noisy). One popular method is to extract phrases or n-grams to describe the relevant knowledge. However, extracted n-grams and phrases usually contain a lot of noise. This paper proposes a method for reducing the noise in n-grams. The method first extracts more specific features (terms) to remove noisy features. The method then uses an extended random set to accurately weight n-grams based on their distribution in the documents and their terms distribution in n-grams. The proposed approach not only reduces the number of extracted n-grams but also improves the performance. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms the state-of-art methods underpinned by Okapi BM25, tf*idf and Rocchio.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The detection and correction of defects remains among the most time consuming and expensive aspects of software development. Extensive automated testing and code inspections may mitigate their effect, but some code fragments are necessarily more likely to be faulty than others, and automated identification of fault prone modules helps to focus testing and inspections, thus limiting wasted effort and potentially improving detection rates. However, software metrics data is often extremely noisy, with enormous imbalances in the size of the positive and negative classes. In this work, we present a new approach to predictive modelling of fault proneness in software modules, introducing a new feature representation to overcome some of these issues. This rank sum representation offers improved or at worst comparable performance to earlier approaches for standard data sets, and readily allows the user to choose an appropriate trade-off between precision and recall to optimise inspection effort to suit different testing environments. The method is evaluated using the NASA Metrics Data Program (MDP) data sets, and performance is compared with existing studies based on the Support Vector Machine (SVM) and Naïve Bayes (NB) Classifiers, and with our own comprehensive evaluation of these methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis presents a sequential pattern based model (PMM) to detect news topics from a popular microblogging platform, Twitter. PMM captures key topics and measures their importance using pattern properties and Twitter characteristics. This study shows that PMM outperforms traditional term-based models, and can potentially be implemented as a decision support system. The research contributes to news detection and addresses the challenging issue of extracting information from short and noisy text.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Term-based approaches can extract many features in text documents, but most include noise. Many popular text-mining strategies have been adapted to reduce noisy information from extracted features; however, text-mining techniques suffer from low frequency. The key issue is how to discover relevance features in text documents to fulfil user information needs. To address this issue, we propose a new method to extract specific features from user relevance feedback. The proposed approach includes two stages. The first stage extracts topics (or patterns) from text documents to focus on interesting topics. In the second stage, topics are deployed to lower level terms to address the low-frequency problem and find specific terms. The specific terms are determined based on their appearances in relevance feedback and their distribution in topics or high-level patterns. We test our proposed method with extensive experiments in the Reuters Corpus Volume 1 dataset and TREC topics. Results show that our proposed approach significantly outperforms the state-of-the-art models.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A long query provides more useful hints for searching relevant documents, but it is likely to introduce noise which affects retrieval performance. In order to smooth such adverse effect, it is important to reduce noisy terms, introduce and boost additional relevant terms. This paper presents a comprehensive framework, called Aspect Hidden Markov Model (AHMM), which integrates query reduction and expansion, for retrieval with long queries. It optimizes the probability distribution of query terms by utilizing intra-query term dependencies as well as the relationships between query terms and words observed in relevance feedback documents. Empirical evaluation on three large-scale TREC collections demonstrates that our approach, which is automatic, achieves salient improvements over various strong baselines, and also reaches a comparable performance to a state of the art method based on user’s interactive query term reduction and expansion.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes the design and implementation of a wireless neural telemetry system that enables new experimental paradigms, such as neural recordings during rodent navigation in large outdoor environments. RoSco, short for Rodent Scope, is a small lightweight user-configurable module suitable for digital wireless recording from freely behaving small animals. Due to the digital transmission technology, RoSco has advantages over most other wireless modules of noise immunity and online user-configurable settings. RoSco digitally transmits entire neural waveforms for 14 of 16 channels at 20 kHz with 8-bit encoding which are streamed to the PC as standard USB audio packets. Up to 31 RoSco wireless modules can coexist in the same environment on non-overlapping independent channels. The design has spatial diversity reception via two antennas, which makes wireless communication resilient to fading and obstacles. In comparison with most existing wireless systems, this system has online user-selectable independent gain control of each channel in 8 factors from 500 to 32,000 times, two selectable ground references from a subset of channels, selectable channel grounding to disable noisy electrodes, and selectable bandwidth suitable for action potentials (300 Hz–3 kHz) and low frequency field potentials (4 Hz–3 kHz). Indoor and outdoor recordings taken from freely behaving rodents are shown to be comparable to a commercial wired system in sorting for neural populations. The module has low input referred noise, battery life of 1.5 hours and transmission losses of 0.1% up to a range of 10 m.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Time- and position-resolved synchrotron small angle X-ray scattering data were acquired from samples of two Australian coal seams: Bulli seam (Bulli 4, Ro=1.42%, Sydney Basin), which naturally contains CO2 and Baralaba seam (Ro=0.67%, Bowen Basin), a potential candidate for sequestering CO2. This experimental approach has provided unique, pore-size-specific insights into the kinetics of CO2 sorption in the micro- and small mesopores (diameter 5 to 175 Å) and the density of the sorbed CO2 at reservoir-like conditions of temperature and hydrostatic pressure. For both samples, at pressures above 5 bar, the density of CO2 confined in pores was found to be uniform, with no densification in near-wall regions. In the Bulli 4 sample, CO2 first flooded the slit pores between polyaromatic sheets. In the pore-size range analysed, the confined CO2 density was close to that of the free CO2. The kinetics data are too noisy for reliable quantitative analysis, but qualitatively indicate faster kinetics in mineral-matter-rich regions. In the Baralaba sample, CO2 preferentially invaded the smallest micropores and the confined CO2 density was up to five times that of the free CO2. Faster CO2 sorption kinetics was found to be correlated with higher mineral matter content but, the mineral-matter-rich regions had lower-density CO2 confined in their pores. Remarkably, the kinetics was pore-size dependent, being faster for smaller pores. These results suggest that injection into the permeable section of an interbedded coal-clastic sequence could provide a viable combination of reasonable injectivity and high sorption capacity.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Traditional nearest points methods use all the samples in an image set to construct a single convex or affine hull model for classification. However, strong artificial features and noisy data may be generated from combinations of training samples when significant intra-class variations and/or noise occur in the image set. Existing multi-model approaches extract local models by clustering each image set individually only once, with fixed clusters used for matching with various image sets. This may not be optimal for discrimination, as undesirable environmental conditions (eg. illumination and pose variations) may result in the two closest clusters representing different characteristics of an object (eg. frontal face being compared to non-frontal face). To address the above problem, we propose a novel approach to enhance nearest points based methods by integrating affine/convex hull classification with an adapted multi-model approach. We first extract multiple local convex hulls from a query image set via maximum margin clustering to diminish the artificial variations and constrain the noise in local convex hulls. We then propose adaptive reference clustering (ARC) to constrain the clustering of each gallery image set by forcing the clusters to have resemblance to the clusters in the query image set. By applying ARC, noisy clusters in the query set can be discarded. Experiments on Honda, MoBo and ETH-80 datasets show that the proposed method outperforms single model approaches and other recent techniques, such as Sparse Approximated Nearest Points, Mutual Subspace Method and Manifold Discriminant Analysis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes and analyzes research on the dynamics of long-term care and the policy relevance of identifying the sources of persistence in caregiving arrangements (including the effect of dynamics on parameter estimates, implications for family welfare, parent welfare, child welfare, and cost of government programs). We discuss sources and causes of observed persistence in caregiving arrangements including inertia/state dependence (confounded by unobserved heterogeneity) and costs of changing caregivers. We comment on causes of dynamics including learning/human capital accumulation; burnout; and game-playing. We suggest how to deal with endogenous geography; dynamics in discrete and continuous choices; and equilibrium issues (multiple equilibria, dynamic equilibria). We also present an overview of commonly used longitudinal data sets and evaluate their relative advantages/disadvantages. We also discuss other data issues related to noisy measures of wealth and family structure. Finally, we suggest some methods to handle econometric problems such as endogeneous geography. © 2014 Springer Science+Business Media New York.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Monitoring of the integrity of rolling element bearings in the traction system of high speed trains is a fundamental operation in order to avoid catastrophic failures and to implement effective condition-based maintenance strategies. Diagnostics of rolling element bearings is usually based on vibration signal analysis by means of suitable signal processing techniques. The experimental validation of such techniques has been traditionally performed by means of laboratory tests on artificially damaged bearings, while their actual effectiveness in industrial applications, particularly in the field of rail transport, remains scarcely investigated. This paper will address the diagnostics of bearings taken from the service after a long term operation on a high speed train. These worn bearings have been installed on a test-rig, consisting of a complete full-scale traction system of a high speed train, able to reproduce the effects of wheel-track interaction and bogie-wheelset dynamics. The results of the experimental campaign show that suitable signal processing techniques are able to diagnose bearing failures even in this harsh and noisy application. Moreover, the most suitable location of the sensors on the traction system is also proposed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Rolling element bearings are the most critical components in the traction system of high speed trains. Monitoring their integrity is a fundamental operation in order to avoid catastrophic failures and to implement effective condition based maintenance strategies. Generally, diagnostics of rolling element bearings is usually performed by analyzing vibration signals measured by accelerometers placed in the proximity of the bearing under investigation. Several papers have been published on this subject in the last two decades, mainly devoted to the development and assessment of signal processing techniques for diagnostics. The experimental validation of such techniques has been traditionally performed by means of laboratory tests on artificially damaged bearings, while their actual effectiveness in specific industrial applications, particularly in rail industry, remains scarcely investigated. This paper is aimed at filling this knowledge gap, by addressing the diagnostics of bearings taken from the service after a long term operation on the traction system of a high speed train. Moreover, in order to test the effectiveness of the diagnostic procedures in the environmental conditions peculiar to the rail application, a specific test-rig has been built, consisting of a complete full-scale train traction system, able to reproduce the effects of wheeltrack interaction and bogie-wheelset dynamics. The results of the experimental campaign show that suitable signal processing techniques are able to diagnose bearing failures even in this harsh and noisy application. Moreover, the most suitable location of the sensors on the traction system is proposed, in order to limit their number.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a novel method to rank map hypotheses by the quality of localization they afford. The highest ranked hypothesis at any moment becomes the active representation that is used to guide the robot to its goal location. A single static representation is insufficient for navigation in dynamic environments where paths can be blocked periodically, a common scenario which poses significant challenges for typical planners. In our approach we simultaneously rank multiple map hypotheses by the influence that localization in each of them has on locally accurate odometry. This is done online for the current locally accurate window by formulating a factor graph of odometry relaxed by localization constraints. Comparison of the resulting perturbed odometry of each hypothesis with the original odometry yields a score that can be used to rank map hypotheses by their utility. We deploy the proposed approach on a real robot navigating a structurally noisy office environment. The configuration of the environment is physically altered outside the robots sensory horizon during navigation tasks to demonstrate the proposed approach of hypothesis selection.