960 resultados para Optical pattern recognition.
Resumo:
The concept of a “mutualistic teacher” is introduced for unsupervised learning of the mean vectors of the components of a mixture of multivariate normal densities, when the number of classes is also unknown. The unsupervised learning problem is formulated here as a multi-stage quasi-supervised problem incorporating a cluster approach. The mutualistic teacher creates a quasi-supervised environment at each stage by picking out “mutual pairs” of samples and assigning identical (but unknown) labels to the individuals of each mutual pair. The number of classes, if not specified, can be determined at an intermediate stage. The risk in assigning identical labels to the individuals of mutual pairs is estimated. Results of some simulation studies are presented.
Resumo:
Multi-document summarization addressing the problem of information overload has been widely utilized in the various real-world applications. Most of existing approaches adopt term-based representation for documents which limit the performance of multi-document summarization systems. In this paper, we proposed a novel pattern-based topic model (PBTMSum) for the task of the multi-document summarization. PBTMSum combining pattern mining techniques with LDA topic modelling could generate discriminative and semantic rich representations for topics and documents so that the most representative and non-redundant sentences can be selected to form a succinct and informative summary. Extensive experiments are conducted on the data of document understanding conference (DUC) 2007. The results prove the effectiveness and efficiency of our proposed approach.
Resumo:
Learning automata are adaptive decision making devices that are found useful in a variety of machine learning and pattern recognition applications. Although most learning automata methods deal with the case of finitely many actions for the automaton, there are also models of continuous-action-set learning automata (CALA). A team of such CALA can be useful in stochastic optimization problems where one has access only to noise-corrupted values of the objective function. In this paper, we present a novel formulation for noise-tolerant learning of linear classifiers using a CALA team. We consider the general case of nonuniform noise, where the probability that the class label of an example is wrong may be a function of the feature vector of the example. The objective is to learn the underlying separating hyperplane given only such noisy examples. We present an algorithm employing a team of CALA and prove, under some conditions on the class conditional densities, that the algorithm achieves noise-tolerant learning as long as the probability of wrong label for any example is less than 0.5. We also present some empirical results to illustrate the effectiveness of the algorithm.
Resumo:
This paper presents a novel crop detection system applied to the challenging task of field sweet pepper (capsicum) detection. The field-grown sweet pepper crop presents several challenges for robotic systems such as the high degree of occlusion and the fact that the crop can have a similar colour to the background (green on green). To overcome these issues, we propose a two-stage system that performs per-pixel segmentation followed by region detection. The output of the segmentation is used to search for highly probable regions and declares these to be sweet pepper. We propose the novel use of the local binary pattern (LBP) to perform crop segmentation. This feature improves the accuracy of crop segmentation from an AUC of 0.10, for previously proposed features, to 0.56. Using the LBP feature as the basis for our two-stage algorithm, we are able to detect 69.2% of field grown sweet peppers in three sites. This is an impressive result given that the average detection accuracy of people viewing the same colour imagery is 66.8%.
Resumo:
This paper presents a statistical aircraft trajectory clustering approach aimed at discriminating between typical manned and expected unmanned traffic patterns. First, a resampled version of each trajectory is modelled using a mixture of Von Mises distributions (circular statistics). Second, the remodelled trajectories are globally aligned using tools from bioinformatics. Third, the alignment scores are used to cluster the trajectories using an iterative k-medoids approach and an appropriate distance function. The approach is then evaluated using synthetically generated unmanned aircraft flights combined with real air traffic position reports taken over a sector of Northern Queensland, Australia. Results suggest that the technique is useful in distinguishing between expected unmanned and manned aircraft traffic behaviour, as well as identifying some common conventional air traffic patterns.
Resumo:
Partitional clustering algorithms, which partition the dataset into a pre-defined number of clusters, can be broadly classified into two types: algorithms which explicitly take the number of clusters as input and algorithms that take the expected size of a cluster as input. In this paper, we propose a variant of the k-means algorithm and prove that it is more efficient than standard k-means algorithms. An important contribution of this paper is the establishment of a relation between the number of clusters and the size of the clusters in a dataset through the analysis of our algorithm. We also demonstrate that the integration of this algorithm as a pre-processing step in classification algorithms reduces their running-time complexity.
Resumo:
With the development of wearable and mobile computing technology, more and more people start using sleep-tracking tools to collect personal sleep data on a daily basis aiming at understanding and improving their sleep. While sleep quality is influenced by many factors in a person’s lifestyle context, such as exercise, diet and steps walked, existing tools simply visualize sleep data per se on a dashboard rather than analyse those data in combination with contextual factors. Hence many people find it difficult to make sense of their sleep data. In this paper, we present a cloud-based intelligent computing system named SleepExplorer that incorporates sleep domain knowledge and association rule mining for automated analysis on personal sleep data in light of contextual factors. Experiments show that the same contextual factors can play a distinct role in sleep of different people, and SleepExplorer could help users discover factors that are most relevant to their personal sleep.
Resumo:
Sequence motifs occurring in a particular order in proteins or DNA have been proved to be of biological interest. In this paper, a new method to locate the occurrences of up to five user-defined motifs in a specified order in large proteins and in nucleotide sequence databases is proposed. It has been designed using the concept of quantifiers in regular expressions and linked lists for data storage. The application of this method includes the extraction of relevant consensus regions from biological sequences. This might be useful in clustering of protein families as well as to study the correlation between positions of motifs and their functional sites in DNA sequences.
Resumo:
The aim of this study was to identify and describe the types of errors in clinical reasoning that contribute to poor diagnostic performance at different levels of medical training and experience. Three cohorts of subjects, second- and fourth- (final) year medical students and a group of general practitioners, completed a set of clinical reasoning problems. The responses of those whose scores fell below the 25th centile were analysed to establish the stage of the clinical reasoning process - identification of relevant information, interpretation or hypothesis generation - at which most errors occurred and whether this was dependent on problem difficulty and level of medical experience. Results indicate that hypothesis errors decrease as expertise increases but that identification and interpretation errors increase. This may be due to inappropriate use of pattern recognition or to failure of the knowledge base. Furthermore, although hypothesis errors increased in line with problem difficulty, identification and interpretation errors decreased. A possible explanation is that as problem difficulty increases, subjects at all levels of expertise are less able to differentiate between relevant and irrelevant clinical features and so give equal consideration to all information contained within a case. It is concluded that the development of clinical reasoning in medical students throughout the course of their pre-clinical and clinical education may be enhanced by both an analysis of the clinical reasoning process and a specific focus on each of the stages at which errors commonly occur.
Resumo:
This paper addresses the challenges of flood mapping using multispectral images. Quantitative flood mapping is critical for flood damage assessment and management. Remote sensing images obtained from various satellite or airborne sensors provide valuable data for this application, from which the information on the extent of flood can be extracted. However the great challenge involved in the data interpretation is to achieve more reliable flood extent mapping including both the fully inundated areas and the 'wet' areas where trees and houses are partly covered by water. This is a typical combined pure pixel and mixed pixel problem. In this paper, an extended Support Vector Machines method for spectral unmixing developed recently has been applied to generate an integrated map showing both pure pixels (fully inundated areas) and mixed pixels (trees and houses partly covered by water). The outputs were compared with the conventional mean based linear spectral mixture model, and better performance was demonstrated with a subset of Landsat ETM+ data recorded at the Daly River Basin, NT, Australia, on 3rd March, 2008, after a flood event.
Resumo:
In this paper we present a novel algorithm for learning oblique decision trees. Most of the current decision tree algorithms rely on impurity measures to assess goodness of hyperplanes at each node. These impurity measures do not properly capture the geometric structures in the data. Motivated by this, our algorithm uses a strategy, based on some recent variants of SVM, to assess the hyperplanes in such a way that the geometric structure in the data is taken into account. We show through empirical studies that our method is effective.
Resumo:
We view association of concepts as a complex network and present a heuristic for clustering concepts by taking into account the underlying network structure of their associations. Clusters generated from our approach are qualitatively better than clusters generated from the conventional spectral clustering mechanism used for graph partitioning.
Resumo:
This paper addresses the problem of resolving ambiguities in frequently confused online Tamil character pairs by employing script specific algorithms as a post classification step. Robust structural cues and temporal information of the preprocessed character are extensively utilized in the design of these algorithms. The methods are quite robust in automatically extracting the discriminative sub-strokes of confused characters for further analysis. Experimental validation on the IWFHR Database indicates error rates of less than 3 % for the confused characters. Thus, these post processing steps have a good potential to improve the performance of online Tamil handwritten character recognition.
Resumo:
Knowledge-based clusters are studied from the structural point of view. Generalized descriptions for such clusters are stated and illustrated. Peculiarities of certain knowledge-based cluster configurations are highlighted. The adequacy of the connectives logical and (“and”) logical or (“exclusive-or”) in describing such clusters is justified. The definition of “concept” is elaborated from the clustering point of view and used to establish the equivalence between, descriptions of clusters and concepts. The order-independence of semantic-directed clustering approach is established formally based on axiomatic considerations.
Resumo:
This paper discusses a method for scaling SVM with Gaussian kernel function to handle large data sets by using a selective sampling strategy for the training set. It employs a scalable hierarchical clustering algorithm to construct cluster indexing structures of the training data in the kernel induced feature space. These are then used for selective sampling of the training data for SVM to impart scalability to the training process. Empirical studies made on real world data sets show that the proposed strategy performs well on large data sets.