897 resultados para Optical pattern recognition -- Mathematical models
Resumo:
Random Indexing K-tree is the combination of two algorithms suited for large scale document clustering.
Resumo:
The wide range of contributing factors and circumstances surrounding crashes on road curves suggest that no single intervention can prevent these crashes. This paper presents a novel methodology, based on data mining techniques, to identify contributing factors and the relationship between them. It identifies contributing factors that influence the risk of a crash. Incident records, described using free text, from a large insurance company were analysed with rough set theory. Rough set theory was used to discover dependencies among data, and reasons using the vague, uncertain and imprecise information that characterised the insurance dataset. The results show that male drivers, who are between 50 and 59 years old, driving during evening peak hours are involved with a collision, had a lowest crash risk. Drivers between 25 and 29 years old, driving from around midnight to 6 am and in a new car has the highest risk. The analysis of the most significant contributing factors on curves suggests that drivers with driving experience of 25 to 42 years, who are driving a new vehicle have the highest crash cost risk, characterised by the vehicle running off the road and hitting a tree. This research complements existing statistically based tools approach to analyse road crashes. Our data mining approach is supported with proven theory and will allow road safety practitioners to effectively understand the dependencies between contributing factors and the crash type with the view to designing tailored countermeasures.
Resumo:
This article explores two matrix methods to induce the ``shades of meaning" (SoM) of a word. A matrix representation of a word is computed from a corpus of traces based on the given word. Non-negative Matrix Factorisation (NMF) and Singular Value Decomposition (SVD) compute a set of vectors corresponding to a potential shade of meaning. The two methods were evaluated based on loss of conditional entropy with respect to two sets of manually tagged data. One set reflects concepts generally appearing in text, and the second set comprises words used for investigations into word sense disambiguation. Results show that for NMF consistently outperforms SVD for inducing both SoM of general concepts as well as word senses. The problem of inducing the shades of meaning of a word is more subtle than that of word sense induction and hence relevant to thematic analysis of opinion where nuances of opinion can arise.
Resumo:
This paper proposes a novel Hybrid Clustering approach for XML documents (HCX) that first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. The empirical analysis reveals that the proposed method is scalable and accurate.
Resumo:
A method of improving the security of biometric templates which satisfies desirable properties such as (a) irreversibility of the template, (b) revocability and assignment of a new template to the same biometric input, (c) matching in the secure transformed domain is presented. It makes use of an iterative procedure based on the bispectrum that serves as an irreversible transformation for biometric features because signal phase is discarded each iteration. Unlike the usual hash function, this transformation preserves closeness in the transformed domain for similar biometric inputs. A number of such templates can be generated from the same input. These properties are illustrated using synthetic data and applied to images from the FRGC 3D database with Gabor features. Verification can be successfully performed using these secure templates with an EER of 5.85%
Resumo:
Artificial neural networks (ANN) have demonstrated good predictive performance in a wide range of applications. They are, however, not considered sufficient for knowledge representation because of their inability to represent the reasoning process succinctly. This paper proposes a novel methodology Gyan that represents the knowledge of a trained network in the form of restricted first-order predicate rules. The empirical results demonstrate that an equivalent symbolic interpretation in the form of rules with predicates, terms and variables can be derived describing the overall behaviour of the trained ANN with improved comprehensibility while maintaining the accuracy and fidelity of the propositional rules.
Resumo:
In this paper, we discuss our participation to the INEX 2008 Link-the-Wiki track. We utilized a sliding window based algorithm to extract the frequent terms and phrases. Using the extracted phrases and term as descriptive vectors, the anchors and relevant links (both incoming and outgoing) are recognized efficiently.
Resumo:
Traffic safety is a major concern world-wide. It is in both the sociological and economic interests of society that attempts should be made to identify the major and multiple contributory factors to those road crashes. This paper presents a text mining based method to better understand the contextual relationships inherent in road crashes. By examining and analyzing the crash report data in Queensland from year 2004 and year 2005, this paper identifies and reports the major and multiple contributory factors to those crashes. The outcome of this study will support road asset management in reducing road crashes.
Resumo:
In this paper, we classify, review, and experimentally compare major methods that are exploited in the definition, adoption, and utilization of element similarity measures in the context of XML schema matching. We aim at presenting a unified view which is useful when developing a new element similarity measure, when implementing an XML schema matching component, when using an XML schema matching system, and when comparing XML schema matching systems.