358 resultados para educational data mining


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Online dating websites enable a specific form of social networking and their efficiency can be increased by supporting proactive recommendations based on participants' preferences with the use of data mining. This research develops two-way recommendation methods for people-to-people recommendation for large online social networks such as online dating networks. This research discovers the characteristics of the online dating networks and utilises these characteristics in developing efficient people-to-people recommendation methods. Methods developed support improved recommendation accuracy, can handle data sparsity that often comes with large data sets and are scalable for handling online networks with a large number of users.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Crashes that occur on motorways contribute to a significant proportion (40-50%) of non-recurrent motorway congestion. Hence, reducing the frequency of crashes assist in addressing congestion issues (Meyer, 2008). Analysing traffic conditions and discovering risky traffic trends and patterns are essential basics in crash likelihood estimations studies and still require more attention and investigation. In this paper we will show, through data mining techniques, that there is a relationship between pre-crash traffic flow patterns and crash occurrence on motorways, compare them with normal traffic trends, and that this knowledge has the potentiality to improve the accuracy of existing crash likelihood estimation models, and opens the path for new development approaches. The data for the analysis was extracted from records collected between 2007 and 2009 on the Shibuya and Shinjuku lines of the Tokyo Metropolitan Expressway in Japan. The dataset includes a total of 824 rear-end and sideswipe crashes that have been matched with crashes corresponding traffic flow data using an incident detection algorithm. Traffic trends (traffic speed time series) revealed that crashes can be clustered with regards to the dominant traffic patterns prior to the crash occurrence. K-Means clustering algorithm applied to determine dominant pre-crash traffic patterns. In the first phase of this research, traffic regimes identified by analysing crashes and normal traffic situations using half an hour speed in upstream locations of crashes. Then, the second phase investigated the different combination of speed risk indicators to distinguish crashes from normal traffic situations more precisely. Five major trends have been found in the first phase of this paper for both high risk and normal conditions. The study discovered traffic regimes had differences in the speed trends. Moreover, the second phase explains that spatiotemporal difference of speed is a better risk indicator among different combinations of speed related risk indicators. Based on these findings, crash likelihood estimation models can be fine-tuned to increase accuracy of estimations and minimize false alarms.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Safety concerns in the operation of autonomous aerial systems require safe-landing protocols be followed during situations where the a mission should be aborted due to mechanical or other failure. On-board cameras provide information that can be used in the determination of potential landing sites, which are continually updated and ranked to prevent injury and minimize damage. Pulse Coupled Neural Networks have been used for the detection of features in images that assist in the classification of vegetation and can be used to minimize damage to the aerial vehicle. However, a significant drawback in the use of PCNNs is that they are computationally expensive and have been more suited to off-line applications on conventional computing architectures. As heterogeneous computing architectures are becoming more common, an OpenCL implementation of a PCNN feature generator is presented and its performance is compared across OpenCL kernels designed for CPU, GPU and FPGA platforms. This comparison examines the compute times required for network convergence under a variety of images obtained during unmanned aerial vehicle trials to determine the plausibility for real-time feature detection.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Association rule mining is one technique that is widely used when querying databases, especially those that are transactional, in order to obtain useful associations or correlations among sets of items. Much work has been done focusing on efficiency, effectiveness and redundancy. There has also been a focusing on the quality of rules from single level datasets with many interestingness measures proposed. However, with multi-level datasets now being common there is a lack of interestingness measures developed for multi-level and cross-level rules. Single level measures do not take into account the hierarchy found in a multi-level dataset. This leaves the Support-Confidence approach, which does not consider the hierarchy anyway and has other drawbacks, as one of the few measures available. In this chapter we propose two approaches which measure multi-level association rules to help evaluate their interestingness by considering the database’s underlying taxonomy. These measures of diversity and peculiarity can be used to help identify those rules from multi-level datasets that are potentially useful.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Textual document set has become an important and rapidly growing information source in the web. Text classification is one of the crucial technologies for information organisation and management. Text classification has become more and more important and attracted wide attention of researchers from different research fields. In this paper, many feature selection methods, the implement algorithms and applications of text classification are introduced firstly. However, because there are much noise in the knowledge extracted by current data-mining techniques for text classification, it leads to much uncertainty in the process of text classification which is produced from both the knowledge extraction and knowledge usage, therefore, more innovative techniques and methods are needed to improve the performance of text classification. It has been a critical step with great challenge to further improve the process of knowledge extraction and effectively utilization of the extracted knowledge. Rough Set decision making approach is proposed to use Rough Set decision techniques to more precisely classify the textual documents which are difficult to separate by the classic text classification methods. The purpose of this paper is to give an overview of existing text classification technologies, to demonstrate the Rough Set concepts and the decision making approach based on Rough Set theory for building more reliable and effective text classification framework with higher precision, to set up an innovative evaluation metric named CEI which is very effective for the performance assessment of the similar research, and to propose a promising research direction for addressing the challenging problems in text classification, text mining and other relative fields.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A new approach for recognizing the iris of the human eye is presented. Zero-crossings of the wavelet transform at various resolution levels are calculated over concentric circles on the iris, and the resulting one-dimensional (1-D) signals are compared with model features using different dissimilarity functions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In recent years, the Web 2.0 has provided considerable facilities for people to create, share and exchange information and ideas. Upon this, the user generated content, such as reviews, has exploded. Such data provide a rich source to exploit in order to identify the information associated with specific reviewed items. Opinion mining has been widely used to identify the significant features of items (e.g., cameras) based upon user reviews. Feature extraction is the most critical step to identify useful information from texts. Most existing approaches only find individual features about a product without revealing the structural relationships between the features which usually exist. In this paper, we propose an approach to extract features and feature relationships, represented as a tree structure called feature taxonomy, based on frequent patterns and associations between patterns derived from user reviews. The generated feature taxonomy profiles the product at multiple levels and provides more detailed information about the product. Our experiment results based on some popularly used review datasets show that our proposed approach is able to capture the product features and relations effectively.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As of today, opinion mining has been widely used to iden- tify the strength and weakness of products (e.g., cameras) or services (e.g., services in medical clinics or hospitals) based upon people's feed- back such as user reviews. Feature extraction is a crucial step for opinion mining which has been used to collect useful information from user reviews. Most existing approaches only find individual features of a product without the structural relationships between the features which usually exists. In this paper, we propose an approach to extract features and feature relationship, represented as tree structure called a feature hi- erarchy, based on frequent patterns and associations between patterns derived from user reviews. The generated feature hierarchy profiles the product at multiple levels and provides more detailed information about the product. Our experiment results based on some popularly used review datasets show that the proposed feature extraction approach can identify more correct features than the baseline model. Even though the datasets used in the experiment are about cameras, our work can be ap- plied to generate features about a service such as the services in hospitals or clinics.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The support for typically out-of-vocabulary query terms such as names, acronyms, and foreign words is an important requirement of many speech indexing applications. However, to date many unrestricted vocabulary indexing systems have struggled to provide a balance between good detection rate and fast query speeds. This paper presents a fast and accurate unrestricted vocabulary speech indexing technique named Dynamic Match Lattice Spotting (DMLS). The proposed method augments the conventional lattice spotting technique with dynamic sequence matching, together with a number of other novel algorithmic enhancements, to obtain a system that is capable of searching hours of speech in seconds while maintaining excellent detection performance

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Several websites utilise a rule-base recommendation system, which generates choices based on a series of questionnaires, for recommending products to users. This approach has a high risk of customer attrition and the bottleneck is the questionnaire set. If the questioning process is too long, complex or tedious; users are most likely to quit the questionnaire before a product is recommended to them. If the questioning process is short; the user intensions cannot be gathered. The commonly used feature selection methods do not provide a satisfactory solution. We propose a novel process combining clustering, decisions tree and association rule mining for a group-oriented question reduction process. The question set is reduced according to common properties that are shared by a specific group of users. When applied on a real-world website, the proposed combined method outperforms the methods where the reduction of question is done only by using association rule mining or only by observing distribution within the group.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present an approach to automatically de-identify health records. In our approach, personal health information is identified using a Conditional Random Fields machine learning classifier, a large set of linguistic and lexical features, and pattern matching techniques. Identified personal information is then removed from the reports. The de-identification of personal health information is fundamental for the sharing and secondary use of electronic health records, for example for data mining and disease monitoring. The effectiveness of our approach is first evaluated on the 2007 i2b2 Shared Task dataset, a widely adopted dataset for evaluating de-identification techniques. Subsequently, we investigate the robustness of the approach to limited training data; we study its effectiveness on different type and quality of data by evaluating the approach on scanned pathology reports from an Australian institution. This data contains optical character recognition errors, as well as linguistic conventions that differ from those contained in the i2b2 dataset, for example different date formats. The findings suggest that our approach compares to the best approach from the 2007 i2b2 Shared Task; in addition, the approach is found to be robust to variations of training size, data type and quality in presence of sufficient training data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Although recommender systems and reputation systems have quite different theoretical and technical bases, both types of systems have the purpose of providing advice for decision making in e-commerce and online service environments. The similarity in purpose makes it natural to integrate both types of systems in order to produce better online advice, but their difference in theory and implementation makes the integration challenging. In this paper, we propose to use mappings to subjective opinions from values produced by recommender systems as well as from scores produced by reputation systems, and to combine the resulting opinions within the framework of subjective logic.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Newsletter ACM SIGIR Forum: The Seventeenth Australian Document Computing Symposium was held in Dunedin, New Zealand on the 5th and 6th of December 2012. In total twenty four papers were submitted. From those eleven were accepted for full presentation and 8 for short presentation. A poster session was held jointly with the Australasian Language Technology Workshop.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the growing size and variety of social media files on the web, it’s becoming critical to efficiently organize them into clusters for further processing. This paper presents a novel scalable constrained document clustering method that harnesses the power of search engines capable of dealing with large text data. Instead of calculating distance between the documents and all of the clusters’ centroids, a neighborhood of best cluster candidates is chosen using a document ranking scheme. To make the method faster and less memory dependable, the in-memory and in-database processing are combined in a semi-incremental manner. This method has been extensively tested in the social event detection application. Empirical analysis shows that the proposed method is efficient both in computation and memory usage while producing notable accuracy.