844 resultados para Bag-of-Features
Resumo:
Capacity limits in visual attention have traditionally been studied using static arrays of elements from which an observer must detect a target defined by a certain visual feature or combination of features. In the current study we use this visual search paradigm, with accuracy as the dependent variable, to examine attentional capacity limits for different visual features undergoing change over time. In Experiment 1, detectability of a single changing target was measured under conditions where the type of change (size, speed, colour), the magnitude of change, the set size and homogeneity of the unchanging distractors were all systematically varied. Psychometric function slopes were calculated for different experimental conditions and ‘change thresholds’extracted from these slopes were used in Experiment 2, in which multiple supra-threshold changes were made, simultaneously, either to a single or to two or three different stimulus elements. These experiments give an objective psychometric paradigm for measuring changes in visual features over time. Results favour object-based accounts of visual attention, and show consistent differences in the allocation of attentional capacity to different perceptual dimensions.
Resumo:
In recent years, learning word vector representations has attracted much interest in Natural Language Processing. Word representations or embeddings learned using unsupervised methods help addressing the problem of traditional bag-of-word approaches which fail to capture contextual semantics. In this paper we go beyond the vector representations at the word level and propose a novel framework that learns higher-level feature representations of n-grams, phrases and sentences using a deep neural network built from stacked Convolutional Restricted Boltzmann Machines (CRBMs). These representations have been shown to map syntactically and semantically related n-grams to closeby locations in the hidden feature space. We have experimented to additionally incorporate these higher-level features into supervised classifier training for two sentiment analysis tasks: subjectivity classification and sentiment classification. Our results have demonstrated the success of our proposed framework with 4% improvement in accuracy observed for subjectivity classification and improved the results achieved for sentiment classification over models trained without our higher level features.
Resumo:
SQL Injection Attack (SQLIA) remains a technique used by a computer network intruder to pilfer an organisation’s confidential data. This is done by an intruder re-crafting web form’s input and query strings used in web requests with malicious intent to compromise the security of an organisation’s confidential data stored at the back-end database. The database is the most valuable data source, and thus, intruders are unrelenting in constantly evolving new techniques to bypass the signature’s solutions currently provided in Web Application Firewalls (WAF) to mitigate SQLIA. There is therefore a need for an automated scalable methodology in the pre-processing of SQLIA features fit for a supervised learning model. However, obtaining a ready-made scalable dataset that is feature engineered with numerical attributes dataset items to train Artificial Neural Network (ANN) and Machine Leaning (ML) models is a known issue in applying artificial intelligence to effectively address ever evolving novel SQLIA signatures. This proposed approach applies numerical attributes encoding ontology to encode features (both legitimate web requests and SQLIA) to numerical data items as to extract scalable dataset for input to a supervised learning model in moving towards a ML SQLIA detection and prevention model. In numerical attributes encoding of features, the proposed model explores a hybrid of static and dynamic pattern matching by implementing a Non-Deterministic Finite Automaton (NFA). This combined with proxy and SQL parser Application Programming Interface (API) to intercept and parse web requests in transition to the back-end database. In developing a solution to address SQLIA, this model allows processed web requests at the proxy deemed to contain injected query string to be excluded from reaching the target back-end database. This paper is intended for evaluating the performance metrics of a dataset obtained by numerical encoding of features ontology in Microsoft Azure Machine Learning (MAML) studio using Two-Class Support Vector Machines (TCSVM) binary classifier. This methodology then forms the subject of the empirical evaluation.
Resumo:
This paper presents a study made in a field poorly explored in the Portuguese language – modality and its automatic tagging. Our main goal was to find a set of attributes for the creation of automatic tag- gers with improved performance over the bag-of-words (bow) approach. The performance was measured using precision, recall and F1. Because it is a relatively unexplored field, the study covers the creation of the corpus (composed by eleven verbs), the use of a parser to extract syntac- tic and semantic information from the sentences and a machine learning approach to identify modality values. Based on three different sets of attributes – from trigger itself and the trigger’s path (from the parse tree) and context – the system creates a tagger for each verb achiev- ing (in almost every verb) an improvement in F1 when compared to the traditional bow approach.
Resumo:
This paper describes various experiments done to investigate author profiling of tweets in 4 different languages – English, Dutch, Italian, and Spanish. Profiling consists of age and gender classification, as well as regression on 5 different person- ality dimensions – extroversion, stability, agreeableness, open- ness, and conscientiousness. Different sets of features were tested – bag-of-words, word ngrams, POS ngrams, and average of word embeddings. SVM was used as the classifier. Tfidf worked best for most English tasks while for most of the tasks from the other languages, the combination of the best features worked better.
Resumo:
Heart rate variability (HRV) refers to the regulation of the sinoatrial node, the natural pacemaker of the heart, by the sympathetic and parasympathetic branches of the autonomic nervous system. Heart rate variability analysis is an important tool to observe the heart's ability to respond to normal regulatory impulses that affect its rhythm. A computer-based intelligent system for analysis of cardiac states is very useful in diagnostics and disease management. Like many bio-signals, HRV signals are nonlinear in nature. Higher order spectral analysis (HOS) is known to be a good tool for the analysis of nonlinear systems and provides good noise immunity. In this work, we studied the HOS of the HRV signals of normal heartbeat and seven classes of arrhythmia. We present some general characteristics for each of these classes of HRV signals in the bispectrum and bicoherence plots. We also extracted features from the HOS and performed an analysis of variance (ANOVA) test. The results are very promising for cardiac arrhythmia classification with a number of features yielding a p-value < 0.02 in the ANOVA test.
Resumo:
High-rate flooding attacks (aka Distributed Denial of Service or DDoS attacks) continue to constitute a pernicious threat within the Internet domain. In this work we demonstrate how using packet source IP addresses coupled with a change-point analysis of the rate of arrival of new IP addresses may be sufficient to detect the onset of a high-rate flooding attack. Importantly, minimizing the number of features to be examined, directly addresses the issue of scalability of the detection process to higher network speeds. Using a proof of concept implementation we have shown how pre-onset IP addresses can be efficiently represented using a bit vector and used to modify a “white list” filter in a firewall as part of the mitigation strategy.
Resumo:
The theory of nonlinear dyamic systems provides some new methods to handle complex systems. Chaos theory offers new concepts, algorithms and methods for processing, enhancing and analyzing the measured signals. In recent years, researchers are applying the concepts from this theory to bio-signal analysis. In this work, the complex dynamics of the bio-signals such as electrocardiogram (ECG) and electroencephalogram (EEG) are analyzed using the tools of nonlinear systems theory. In the modern industrialized countries every year several hundred thousands of people die due to sudden cardiac death. The Electrocardiogram (ECG) is an important biosignal representing the sum total of millions of cardiac cell depolarization potentials. It contains important insight into the state of health and nature of the disease afflicting the heart. Heart rate variability (HRV) refers to the regulation of the sinoatrial node, the natural pacemaker of the heart by the sympathetic and parasympathetic branches of the autonomic nervous system. Heart rate variability analysis is an important tool to observe the heart's ability to respond to normal regulatory impulses that affect its rhythm. A computerbased intelligent system for analysis of cardiac states is very useful in diagnostics and disease management. Like many bio-signals, HRV signals are non-linear in nature. Higher order spectral analysis (HOS) is known to be a good tool for the analysis of non-linear systems and provides good noise immunity. In this work, we studied the HOS of the HRV signals of normal heartbeat and four classes of arrhythmia. This thesis presents some general characteristics for each of these classes of HRV signals in the bispectrum and bicoherence plots. Several features were extracted from the HOS and subjected an Analysis of Variance (ANOVA) test. The results are very promising for cardiac arrhythmia classification with a number of features yielding a p-value < 0.02 in the ANOVA test. An automated intelligent system for the identification of cardiac health is very useful in healthcare technology. In this work, seven features were extracted from the heart rate signals using HOS and fed to a support vector machine (SVM) for classification. The performance evaluation protocol in this thesis uses 330 subjects consisting of five different kinds of cardiac disease conditions. The classifier achieved a sensitivity of 90% and a specificity of 89%. This system is ready to run on larger data sets. In EEG analysis, the search for hidden information for identification of seizures has a long history. Epilepsy is a pathological condition characterized by spontaneous and unforeseeable occurrence of seizures, during which the perception or behavior of patients is disturbed. An automatic early detection of the seizure onsets would help the patients and observers to take appropriate precautions. Various methods have been proposed to predict the onset of seizures based on EEG recordings. The use of nonlinear features motivated by the higher order spectra (HOS) has been reported to be a promising approach to differentiate between normal, background (pre-ictal) and epileptic EEG signals. In this work, these features are used to train both a Gaussian mixture model (GMM) classifier and a Support Vector Machine (SVM) classifier. Results show that the classifiers were able to achieve 93.11% and 92.67% classification accuracy, respectively, with selected HOS based features. About 2 hours of EEG recordings from 10 patients were used in this study. This thesis introduces unique bispectrum and bicoherence plots for various cardiac conditions and for normal, background and epileptic EEG signals. These plots reveal distinct patterns. The patterns are useful for visual interpretation by those without a deep understanding of spectral analysis such as medical practitioners. It includes original contributions in extracting features from HRV and EEG signals using HOS and entropy, in analyzing the statistical properties of such features on real data and in automated classification using these features with GMM and SVM classifiers.
Resumo:
Paired speaking tests are now commonly used in both high-stakes testing and classroom assessment contexts. The co-construction of discourse by candidates is regarded as a strength of paired speaking tests, as candidates have the opportunity to display a wider range of interactional competencies, including turn taking, initiating topics and engaging in extended discourse with a partner, rather than an examiner. However, the impact of the interlocutor in such jointly negotiated discourse and the implications for assessing interactional competence are areas of concern. This article reports on the features of interactional competence that were salient to four trained raters of 12 paired speaking tests through the analysis of rater notes, stimulated verbal recalls and rater discussions. Findings enabled the identification of features of the performance noted by raters when awarding scores for interactional competence, and the particular features associated with higher and lower scores. A number of these features were seen by the raters as mutual achievements, which raises the issue of the extent to which it is possible to assess individual contributions to the co-constructed performance. The findings have implications for defining the construct of interactional competence in paired speaking tests and operationalising this in rating scales.
Resumo:
In this paper, we seek to expand the use of direct methods in real-time applications by proposing a vision-based strategy for pose estimation of aerial vehicles. The vast majority of approaches make use of features to estimate motion. Conversely, the strategy we propose is based on a MR (Multi- Resolution) implementation of an image registration technique (Inverse Compositional Image Alignment ICIA) using direct methods. An on-board camera in a downwards-looking configuration, and the assumption of planar scenes, are the bases of the algorithm. The motion between frames (rotation and translation) is recovered by decomposing the frame-to-frame homography obtained by the ICIA algorithm applied to a patch that covers around the 80% of the image. When the visual estimation is required (e.g. GPS drop-out), this motion is integrated with the previous known estimation of the vehicles’ state, obtained from the on-board sensors (GPS/IMU), and the subsequent estimations are based only on the vision-based motion estimations. The proposed strategy is tested with real flight data in representative stages of a flight: cruise, landing, and take-off, being two of those stages considered critical: take-off and landing. The performance of the pose estimation strategy is analyzed by comparing it with the GPS/IMU estimations. Results show correlation between the visual estimation obtained with the MR-ICIA and the GPS/IMU data, that demonstrate that the visual estimation can be used to provide a good approximation of the vehicle’s state when it is required (e.g. GPS drop-outs). In terms of performance, the proposed strategy is able to maintain an estimation of the vehicle’s state for more than one minute, at real-time frame rates based, only on visual information.
Resumo:
This paper reports results from a study exploring the multimedia search functionality of Chinese language search engines. Web searching in Chinese (Mandarin) is a growing research area and a technical challenge for popular commercial Web search engines. Few studies have been conducted on Chinese language search engines. We investigate two research questions: which Chinese language search engines provide multimedia searching, and what multimedia search functionalities are available in Chinese language Web search engines. Specifically, we examine each Web search engine's (1) features permitting Chinese language multimedia searches, (2) extent of search personalization and user control of multimedia search variables, and (3) the relationships between Web search engines and their features in the Chinese context. Key findings show that Chinese language Web search engines offer limited multimedia search functionality, and general search engines provide a wider range of features than specialized multimedia search engines. Study results have implications for Chinese Web users, Website designers and Web search engine developers. © 2009 Elsevier Ltd. All rights reserved.
Resumo:
Modelling activities in crowded scenes is very challenging as object tracking is not robust in complicated scenes and optical flow does not capture long range motion. We propose a novel approach to analyse activities in crowded scenes using a “bag of particle trajectories”. Particle trajectories are extracted from foreground regions within short video clips using particle video, which estimates long range motion in contrast to optical flow which is only concerned with inter-frame motion. Our applications include temporal video segmentation and anomaly detection, and we perform our evaluation on several real-world datasets containing complicated scenes. We show that our approaches achieve state-of-the-art performance for both tasks.
Resumo:
An historical analysis of the management of the arts in Australia in the last fifty years demonstrates clearly the problems faced by arts organisations which have poorly selected and trained Boards of Directors. Traditionally Board members were selected because they represented the various facets and skills involved in business (marketing, law, accountancy, management, entrepreneurship) or they were arts practitioners or patrons, or they had some particular social standing. Arts organisations recruited Board members like a "mixed bag of lollies - one of these and one of those". No consideration was given to the vital qualities of enthusiasm, reliability, empathy, capacity for hard work, strong arts interest, effective communication skills and respect for organisational processes.
Resumo:
Background. We have characterised a new highly divergent geminivirus species, Eragrostis curvula streak virus (ECSV), found infecting a hardy perennial South African wild grass. ECSV represents a new genus-level geminivirus lineage, and has a mixture of features normally associated with other specific geminivirus genera. Results. Whereas the ECSV genome is predicted to express a replication associated protein (Rep) from an unspliced complementary strand transcript that is most similar to those of begomoviruses, curtoviruses and topocuviruses, its Rep also contains what is apparently a canonical retinoblastoma related protein interaction motif such as that found in mastreviruses. Similarly, while ECSV has the same unusual TAAGATTCC virion strand replication origin nonanucleotide found in another recently described divergent geminivirus, Beet curly top Iran virus (BCTIV), the rest of the transcription and replication origin is structurally more similar to those found in begomoviruses and curtoviruses than it is to those found in BCTIV and mastreviruses. ECSV also has what might be a homologue of the begomovirus transcription activator protein gene found in begomoviruses, a mastrevirus-like coat protein gene and two intergenic regions. Conclusion. Although it superficially resembles a chimaera of geminiviruses from different genera, the ECSV genome is not obviously recombinant, implying that the features it shares with other geminiviruses are those that were probably present within the last common ancestor of these viruses. In addition to inferring how the ancestral geminivirus genome may have looked, we use the discovery of ECSV to refine various hypotheses regarding the recombinant origins of the major geminivirus lineages. © 2009 Varsani et al; licensee BioMed Central Ltd.