356 resultados para data Mining


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Iris based identity verification is highly reliable but it can also be subject to attacks. Pupil dilation or constriction stimulated by the application of drugs are examples of sample presentation security attacks which can lead to higher false rejection rates. Suspects on a watch list can potentially circumvent the iris based system using such methods. This paper investigates a new approach using multiple parts of the iris (instances) and multiple iris samples in a sequential decision fusion framework that can yield robust performance. Results are presented and compared with the standard full iris based approach for a number of iris degradations. An advantage of the proposed fusion scheme is that the trade-off between detection errors can be controlled by setting parameters such as the number of instances and the number of samples used in the system. The system can then be operated to match security threat levels. It is shown that for optimal values of these parameters, the fused system also has a lower total error rate.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The work described in this technical report is part of an ongoing project to build practical tools for the manipulation, analysis and visualisation of recordings of the natural environment. This report describes the methods we use to remove background noise from spectrograms. It updates techniques previously described in Towsey and Planitz (2011), Technical report: acoustic analysis of the natural environment, downloadable from: http://eprints.qut.edu.au/41131/. It also describes noise removal from wave-forms, a technique not described in the above 2011 technical report.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In a classification problem typically we face two challenging issues, the diverse characteristic of negative documents and sometimes a lot of negative documents that are closed to positive documents. Therefore, it is hard for a single classifier to clearly classify incoming documents into classes. This paper proposes a novel gradual problem solving to create a two-stage classifier. The first stage identifies reliable negatives (negative documents with weak positive characteristics). It concentrates on minimizing the number of false negative documents (recall-oriented). We use Rocchio, an existing recall based classifier, for this stage. The second stage is a precision-oriented “fine tuning”, concentrates on minimizing the number of false positive documents by applying pattern (a statistical phrase) mining techniques. In this stage a pattern-based scoring is followed by threshold setting (thresholding). Experiment shows that our statistical phrase based two-stage classifier is promising.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Web is a steadily evolving resource comprising much more than mere HTML pages. With its ever-growing data sources in a variety of formats, it provides great potential for knowledge discovery. In this article, we shed light on some interesting phenomena of the Web: the deep Web, which surfaces database records as Web pages; the Semantic Web, which de�nes meaningful data exchange formats; XML, which has established itself as a lingua franca for Web data exchange; and domain-speci�c markup languages, which are designed based on XML syntax with the goal of preserving semantics in targeted domains. We detail these four developments in Web technology, and explain how they can be used for data mining. Our goal is to show that all these areas can be as useful for knowledge discovery as the HTML-based part of the Web.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tags or personal metadata for annotating web resources have been widely adopted in Web 2.0 sites. However, as tags are freely chosen by users, the vocabularies are diverse, ambiguous and sometimes only meaningful to individuals. Tag recommenders may assist users during tagging process. Its objective is to suggest relevant tags to use as well as to help consolidating vocabulary in the systems. In this paper we discuss our approach for providing personalized tag recommendation by making use of existing domain ontology generated from folksonomy. Specifically we evaluated the approach in sparse situation. The evaluation shows that the proposed ontology-based method has improved the accuracy of tag recommendation in this situation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tag recommendation is a specific recommendation task for recommending metadata (tag) for a web resource (item) during user annotation process. In this context, sparsity problem refers to situation where tags need to be produced for items with few annotations or for user who tags few items. Most of the state of the art approaches in tag recommendation are rarely evaluated or perform poorly under this situation. This paper presents a combined method for mitigating sparsity problem in tag recommendation by mainly expanding and ranking candidate tags based on similar items’ tags and existing tag ontology. We evaluated the approach on two public social bookmarking datasets. The experiment results show better accuracy for recommendation in sparsity situation over several state of the art methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a cluster ensemble method to map the corpus documents into the semantic space embedded in Wikipedia and group them using multiple types of feature space. A heterogeneous cluster ensemble is constructed with multiple types of relations i.e. document-term, document-concept and document-category. A final clustering solution is obtained by exploiting associations between document pairs and hubness of the documents. Empirical analysis with various real data sets reveals that the proposed meth-od outperforms state-of-the-art text clustering approaches.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Entity-oriented retrieval aims to return a list of relevant entities rather than documents to provide exact answers for user queries. The nature of entity-oriented retrieval requires identifying the semantic intent of user queries, i.e., understanding the semantic role of query terms and determining the semantic categories which indicate the class of target entities. Existing methods are not able to exploit the semantic intent by capturing the semantic relationship between terms in a query and in a document that contains entity related information. To improve the understanding of the semantic intent of user queries, we propose concept-based retrieval method that not only automatically identifies the semantic intent of user queries, i.e., Intent Type and Intent Modifier but introduces concepts represented by Wikipedia articles to user queries. We evaluate our proposed method on entity profile documents annotated by concepts from Wikipedia category and list structure. Empirical analysis reveals that the proposed method outperforms several state-of-the-art approaches.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Crashes on motorway contribute to a significant proportion (40-50%) of non-recurrent motorway congestions. Hence reduce crashes will help address congestion issues (Meyer, 2008). Crash likelihood estimation studies commonly focus on traffic conditions in a Short time window around the time of crash while longer-term pre-crash traffic flow trends are neglected. In this paper we will show, through data mining techniques, that a relationship between pre-crash traffic flow patterns and crash occurrence on motorways exists, and that this knowledge has the potential to improve the accuracy of existing models and opens the path for new development approaches. The data for the analysis was extracted from records collected between 2007 and 2009 on the Shibuya and Shinjuku lines of the Tokyo Metropolitan Expressway in Japan. The dataset includes a total of 824 rear-end and sideswipe crashes that have been matched with traffic flow data of one hour prior to the crash using an incident detection algorithm. Traffic flow trends (traffic speed/occupancy time series) revealed that crashes could be clustered with regards of the dominant traffic flow pattern prior to the crash. Using the k-means clustering method allowed the crashes to be clustered based on their flow trends rather than their distance. Four major trends have been found in the clustering results. Based on these findings, crash likelihood estimation algorithms can be fine-tuned based on the monitored traffic flow conditions with a sliding window of 60 minutes to increase accuracy of the results and minimize false alarms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Crashes that occur on motorways contribute to a significant proportion (40-50%) of non-recurrent motorway congestions. Hence, reducing the frequency of crashes assists in addressing congestion issues (Meyer, 2008). Crash likelihood estimation studies commonly focus on traffic conditions in a short time window around the time of a crash while longer-term pre-crash traffic flow trends are neglected. In this paper we will show, through data mining techniques that a relationship between pre-crash traffic flow patterns and crash occurrence on motorways exists. We will compare them with normal traffic trends and show this knowledge has the potential to improve the accuracy of existing models and opens the path for new development approaches. The data for the analysis was extracted from records collected between 2007 and 2009 on the Shibuya and Shinjuku lines of the Tokyo Metropolitan Expressway in Japan. The dataset includes a total of 824 rear-end and sideswipe crashes that have been matched with crashes corresponding to traffic flow data using an incident detection algorithm. Traffic trends (traffic speed time series) revealed that crashes can be clustered with regards to the dominant traffic patterns prior to the crash. Using the K-Means clustering method with Euclidean distance function allowed the crashes to be clustered. Then, normal situation data was extracted based on the time distribution of crashes and were clustered to compare with the “high risk” clusters. Five major trends have been found in the clustering results for both high risk and normal conditions. The study discovered traffic regimes had differences in the speed trends. Based on these findings, crash likelihood estimation models can be fine-tuned based on the monitored traffic conditions with a sliding window of 30 minutes to increase accuracy of the results and minimize false alarms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we describe a method to represent and discover adversarial group behavior in a continuous domain. In comparison to other types of behavior, adversarial behavior is heavily structured as the location of a player (or agent) is dependent both on their teammates and adversaries, in addition to the tactics or strategies of the team. We present a method which can exploit this relationship through the use of a spatiotemporal basis model. As players constantly change roles during a match, we show that employing a "role-based" representation instead of one based on player "identity" can best exploit the playing structure. As vision-based systems currently do not provide perfect detection/tracking (e.g. missed or false detections), we show that our compact representation can effectively "denoise" erroneous detections as well as enabe temporal analysis, which was previously prohibitive due to the dimensionality of the signal. To evaluate our approach, we used a fully instrumented field-hockey pitch with 8 fixed high-definition (HD) cameras and evaluated our approach on approximately 200,000 frames of data from a state-of-the-art real-time player detector and compare it to manually labelled data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper elaborates the approach used by the Applied Data Mining Research Group (ADMRG) for the Social Event Detection (SED) Tasks of the 2013 MediaEval Benchmark. We extended the constrained clustering algorithm to apply to the first semi-supervised clustering task, and we compared several classifiers with Latent Dirichlet Allocation as feature selector in the second event classification task. The proposed approach focuses on scalability and efficient memory allocation when applied to a high dimensional data with large clusters. Results of the first task show the effectiveness of the proposed method. Results from task 2 indicate that attention on the imbalance categories distributions is needed.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Online dating websites enable a specific form of social networking and their efficiency can be increased by supporting proactive recommendations based on participants' preferences with the use of data mining. This research develops two-way recommendation methods for people-to-people recommendation for large online social networks such as online dating networks. This research discovers the characteristics of the online dating networks and utilises these characteristics in developing efficient people-to-people recommendation methods. Methods developed support improved recommendation accuracy, can handle data sparsity that often comes with large data sets and are scalable for handling online networks with a large number of users.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Crashes that occur on motorways contribute to a significant proportion (40-50%) of non-recurrent motorway congestion. Hence, reducing the frequency of crashes assist in addressing congestion issues (Meyer, 2008). Analysing traffic conditions and discovering risky traffic trends and patterns are essential basics in crash likelihood estimations studies and still require more attention and investigation. In this paper we will show, through data mining techniques, that there is a relationship between pre-crash traffic flow patterns and crash occurrence on motorways, compare them with normal traffic trends, and that this knowledge has the potentiality to improve the accuracy of existing crash likelihood estimation models, and opens the path for new development approaches. The data for the analysis was extracted from records collected between 2007 and 2009 on the Shibuya and Shinjuku lines of the Tokyo Metropolitan Expressway in Japan. The dataset includes a total of 824 rear-end and sideswipe crashes that have been matched with crashes corresponding traffic flow data using an incident detection algorithm. Traffic trends (traffic speed time series) revealed that crashes can be clustered with regards to the dominant traffic patterns prior to the crash occurrence. K-Means clustering algorithm applied to determine dominant pre-crash traffic patterns. In the first phase of this research, traffic regimes identified by analysing crashes and normal traffic situations using half an hour speed in upstream locations of crashes. Then, the second phase investigated the different combination of speed risk indicators to distinguish crashes from normal traffic situations more precisely. Five major trends have been found in the first phase of this paper for both high risk and normal conditions. The study discovered traffic regimes had differences in the speed trends. Moreover, the second phase explains that spatiotemporal difference of speed is a better risk indicator among different combinations of speed related risk indicators. Based on these findings, crash likelihood estimation models can be fine-tuned to increase accuracy of estimations and minimize false alarms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Safety concerns in the operation of autonomous aerial systems require safe-landing protocols be followed during situations where the a mission should be aborted due to mechanical or other failure. On-board cameras provide information that can be used in the determination of potential landing sites, which are continually updated and ranked to prevent injury and minimize damage. Pulse Coupled Neural Networks have been used for the detection of features in images that assist in the classification of vegetation and can be used to minimize damage to the aerial vehicle. However, a significant drawback in the use of PCNNs is that they are computationally expensive and have been more suited to off-line applications on conventional computing architectures. As heterogeneous computing architectures are becoming more common, an OpenCL implementation of a PCNN feature generator is presented and its performance is compared across OpenCL kernels designed for CPU, GPU and FPGA platforms. This comparison examines the compute times required for network convergence under a variety of images obtained during unmanned aerial vehicle trials to determine the plausibility for real-time feature detection.