329 resultados para Pattern recognition systems.
Resumo:
Frogs have received increasing attention due to their effectiveness for indicating the environment change. Therefore, it is important to monitor and assess frogs. With the development of sensor techniques, large volumes of audio data (including frog calls) have been collected and need to be analysed. After transforming the audio data into its spectrogram representation using short-time Fourier transform, the visual inspection of this representation motivates us to use image processing techniques for analysing audio data. Applying acoustic event detection (AED) method to spectrograms, acoustic events are firstly detected from which ridges are extracted. Three feature sets, Mel-frequency cepstral coefficients (MFCCs), AED feature set and ridge feature set, are then used for frog call classification with a support vector machine classifier. Fifteen frog species widely spread in Queensland, Australia, are selected to evaluate the proposed method. The experimental results show that ridge feature set can achieve an average classification accuracy of 74.73% which outperforms the MFCCs (38.99%) and AED feature set (67.78%).
Resumo:
Over past few decades, frog species have been experiencing dramatic decline around the world. The reason for this decline includes habitat loss, invasive species, climate change and so on. To better know the status of frog species, classifying frogs has become increasingly important. In this study, acoustic features are investigated for multi-level classification of Australian frogs: family, genus and species, including three families, eleven genera and eighty five species which are collected from Queensland, Australia. For each frog species, six instances are selected from which ten acoustic features are calculated. Then, the multicollinearity between ten features are studied for selecting non-correlated features for subsequent analysis. A decision tree (DT) classifier is used to visually and explicitly determine which acoustic features are relatively important for classifying family, which for genus, and which for species. Finally, a weighted support vector machines (SVMs) classifier is used for the multi- level classification with three most important acoustic features respectively. Our experiment results indicate that using different acoustic feature sets can successfully classify frogs at different levels and the average classification accuracy can be up to 85.6%, 86.1% and 56.2% for family, genus and species respectively.
Resumo:
Document clustering is one of the prominent methods for mining important information from the vast amount of data available on the web. However, document clustering generally suffers from the curse of dimensionality. Providentially in high dimensional space, data points tend to be more concentrated in some areas of clusters. We take advantage of this phenomenon by introducing a novel concept of dynamic cluster representation named as loci. Clusters’ loci are efficiently calculated using documents’ ranking scores generated from a search engine. We propose a fast loci-based semi-supervised document clustering algorithm that uses clusters’ loci instead of conventional centroids for assigning documents to clusters. Empirical analysis on real-world datasets shows that the proposed method produces cluster solutions with promising quality and is substantially faster than several benchmarked centroid-based semi-supervised document clustering methods.
Resumo:
Bioacoustic data can be used for monitoring animal species diversity. The deployment of acoustic sensors enables acoustic monitoring at large temporal and spatial scales. We describe a content-based birdcall retrieval algorithm for the exploration of large data bases of acoustic recordings. In the algorithm, an event-based searching scheme and compact features are developed. In detail, ridge events are detected from audio files using event detection on spectral ridges. Then event alignment is used to search through audio files to locate candidate instances. A similarity measure is then applied to dimension-reduced spectral ridge feature vectors. The event-based searching method processes a smaller list of instances for faster retrieval. The experimental results demonstrate that our features achieve better success rate than existing methods and the feature dimension is greatly reduced.
Resumo:
In this paper we tackle the problem of efficient video event detection. We argue that linear detection functions should be preferred in this regard due to their scalability and efficiency during estimation and evaluation. A popular approach in this regard is to represent a sequence using a bag of words (BOW) representation due to its: (i) fixed dimensionality irrespective of the sequence length, and (ii) its ability to compactly model the statistics in the sequence. A drawback to the BOW representation, however, is the intrinsic destruction of the temporal ordering information. In this paper we propose a new representation that leverages the uncertainty in relative temporal alignments between pairs of sequences while not destroying temporal ordering. Our representation, like BOW, is of a fixed dimensionality making it easily integrated with a linear detection function. Extensive experiments on CK+, 6DMG, and UvA-NEMO databases show significant performance improvements across both isolated and continuous event detection tasks.
Resumo:
In this paper, we tackle the problem of unsupervised domain adaptation for classification. In the unsupervised scenario where no labeled samples from the target domain are provided, a popular approach consists in transforming the data such that the source and target distributions be- come similar. To compare the two distributions, existing approaches make use of the Maximum Mean Discrepancy (MMD). However, this does not exploit the fact that prob- ability distributions lie on a Riemannian manifold. Here, we propose to make better use of the structure of this man- ifold and rely on the distance on the manifold to compare the source and target distributions. In this framework, we introduce a sample selection method and a subspace-based method for unsupervised domain adaptation, and show that both these manifold-based techniques outperform the cor- responding approaches based on the MMD. Furthermore, we show that our subspace-based approach yields state-of- the-art results on a standard object recognition benchmark.
Resumo:
Background CD14, a coreceptor for several pattern recognition receptors and a widely used monocyte/macrophage marker, plays a key role in host responses to gram-negative bacteria. Despite the central role of CD14 in the inflammatory response to lipopolysaccharide and other microbial products and in the dissemination of bacteria in some infections, the signaling networks controlled by CD14 during urinary tract infection (UTI) are unknown. Methods We used uropathogenic Escherichia coli (UPEC) infection of wild-type (WT) C57BL/6 and Cd14−/− mice and RNA sequencing to define the CD14-dependent transcriptional signature and the role of CD14 in host defense against UTI in the bladder. Results UPEC induced the upregulation of Cd14 and the monocyte/macrophage-related genes Emr1/F4/80 and Csf1r/c-fms, which was associated with lower UPEC burdens in WT mice, compared with Cd14−/− mice. Exacerbation of infection in Cd14−/− mice was associated with the absence of a 491-gene transcriptional signature in the bladder that encompassed multiple host networks not previously associated with this receptor. CD14-dependent pathways included immune cell trafficking, differential cytokine production in macrophages, and interleukin 17 signaling. Depletion of monocytes/macrophages in the bladder by administration of liposomal clodronate led to higher UPEC burdens. Conclusions This study identifies new host protective and signaling roles for CD14 in the bladder during UPEC UTI.
Resumo:
Age estimation from facial images is increasingly receiving attention to solve age-based access control, age-adaptive targeted marketing, amongst other applications. Since even humans can be induced in error due to the complex biological processes involved, finding a robust method remains a research challenge today. In this paper, we propose a new framework for the integration of Active Appearance Models (AAM), Local Binary Patterns (LBP), Gabor wavelets (GW) and Local Phase Quantization (LPQ) in order to obtain a highly discriminative feature representation which is able to model shape, appearance, wrinkles and skin spots. In addition, this paper proposes a novel flexible hierarchical age estimation approach consisting of a multi-class Support Vector Machine (SVM) to classify a subject into an age group followed by a Support Vector Regression (SVR) to estimate a specific age. The errors that may happen in the classification step, caused by the hard boundaries between age classes, are compensated in the specific age estimation by a flexible overlapping of the age ranges. The performance of the proposed approach was evaluated on FG-NET Aging and MORPH Album 2 datasets and a mean absolute error (MAE) of 4.50 and 5.86 years was achieved respectively. The robustness of the proposed approach was also evaluated on a merge of both datasets and a MAE of 5.20 years was achieved. Furthermore, we have also compared the age estimation made by humans with the proposed approach and it has shown that the machine outperforms humans. The proposed approach is competitive with current state-of-the-art and it provides an additional robustness to blur, lighting and expression variance brought about by the local phase features.
Resumo:
In this paper we investigate the effectiveness of class specific sparse codes in the context of discriminative action classification. The bag-of-words representation is widely used in activity recognition to encode features, and although it yields state-of-the art performance with several feature descriptors it still suffers from large quantization errors and reduces the overall performance. Recently proposed sparse representation methods have been shown to effectively represent features as a linear combination of an over complete dictionary by minimizing the reconstruction error. In contrast to most of the sparse representation methods which focus on Sparse-Reconstruction based Classification (SRC), this paper focuses on a discriminative classification using a SVM by constructing class-specific sparse codes for motion and appearance separately. Experimental results demonstrates that separate motion and appearance specific sparse coefficients provide the most effective and discriminative representation for each class compared to a single class-specific sparse coefficients.
Resumo:
This paper presents an effective feature representation method in the context of activity recognition. Efficient and effective feature representation plays a crucial role not only in activity recognition, but also in a wide range of applications such as motion analysis, tracking, 3D scene understanding etc. In the context of activity recognition, local features are increasingly popular for representing videos because of their simplicity and efficiency. While they achieve state-of-the-art performance with low computational requirements, their performance is still limited for real world applications due to a lack of contextual information and models not being tailored to specific activities. We propose a new activity representation framework to address the shortcomings of the popular, but simple bag-of-words approach. In our framework, first multiple instance SVM (mi-SVM) is used to identify positive features for each action category and the k-means algorithm is used to generate a codebook. Then locality-constrained linear coding is used to encode the features into the generated codebook, followed by spatio-temporal pyramid pooling to convey the spatio-temporal statistics. Finally, an SVM is used to classify the videos. Experiments carried out on two popular datasets with varying complexity demonstrate significant performance improvement over the base-line bag-of-feature method.
Resumo:
This paper presents 'vSpeak', the first initiative taken in Pakistan for ICT enabled conversion of dynamic Sign Urdu gestures into natural language sentences. To realize this, vSpeak has adopted a novel approach for feature extraction using edge detection and image compression which gives input to the Artificial Neural Network that recognizes the gesture. This technique caters for the blurred images as well. The training and testing is currently being performed on a dataset of 200 patterns of 20 words from Sign Urdu with target accuracy of 90% and above.
Resumo:
This paper presents a statistical aircraft trajectory clustering approach aimed at discriminating between typical manned and expected unmanned traffic patterns. First, a resampled version of each trajectory is modelled using a mixture of Von Mises distributions (circular statistics). Second, the remodelled trajectories are globally aligned using tools from bioinformatics. Third, the alignment scores are used to cluster the trajectories using an iterative k-medoids approach and an appropriate distance function. The approach is then evaluated using synthetically generated unmanned aircraft flights combined with real air traffic position reports taken over a sector of Northern Queensland, Australia. Results suggest that the technique is useful in distinguishing between expected unmanned and manned aircraft traffic behaviour, as well as identifying some common conventional air traffic patterns.
Resumo:
The aim of this study was to identify and describe the types of errors in clinical reasoning that contribute to poor diagnostic performance at different levels of medical training and experience. Three cohorts of subjects, second- and fourth- (final) year medical students and a group of general practitioners, completed a set of clinical reasoning problems. The responses of those whose scores fell below the 25th centile were analysed to establish the stage of the clinical reasoning process - identification of relevant information, interpretation or hypothesis generation - at which most errors occurred and whether this was dependent on problem difficulty and level of medical experience. Results indicate that hypothesis errors decrease as expertise increases but that identification and interpretation errors increase. This may be due to inappropriate use of pattern recognition or to failure of the knowledge base. Furthermore, although hypothesis errors increased in line with problem difficulty, identification and interpretation errors decreased. A possible explanation is that as problem difficulty increases, subjects at all levels of expertise are less able to differentiate between relevant and irrelevant clinical features and so give equal consideration to all information contained within a case. It is concluded that the development of clinical reasoning in medical students throughout the course of their pre-clinical and clinical education may be enhanced by both an analysis of the clinical reasoning process and a specific focus on each of the stages at which errors commonly occur.
Resumo:
This paper addresses the challenges of flood mapping using multispectral images. Quantitative flood mapping is critical for flood damage assessment and management. Remote sensing images obtained from various satellite or airborne sensors provide valuable data for this application, from which the information on the extent of flood can be extracted. However the great challenge involved in the data interpretation is to achieve more reliable flood extent mapping including both the fully inundated areas and the 'wet' areas where trees and houses are partly covered by water. This is a typical combined pure pixel and mixed pixel problem. In this paper, an extended Support Vector Machines method for spectral unmixing developed recently has been applied to generate an integrated map showing both pure pixels (fully inundated areas) and mixed pixels (trees and houses partly covered by water). The outputs were compared with the conventional mean based linear spectral mixture model, and better performance was demonstrated with a subset of Landsat ETM+ data recorded at the Daly River Basin, NT, Australia, on 3rd March, 2008, after a flood event.