954 resultados para educational data mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

An Automatic Vehicle Location (AVL) system is a computer-based vehicle tracking system that is capable of determining a vehicle's location in real time. As a major technology of the Advanced Public Transportation System (APTS), AVL systems have been widely deployed by transit agencies for purposes such as real-time operation monitoring, computer-aided dispatching, and arrival time prediction. AVL systems make a large amount of transit performance data available that are valuable for transit performance management and planning purposes. However, the difficulties of extracting useful information from the huge spatial-temporal database have hindered off-line applications of the AVL data. ^ In this study, a data mining process, including data integration, cluster analysis, and multiple regression, is proposed. The AVL-generated data are first integrated into a Geographic Information System (GIS) platform. The model-based cluster method is employed to investigate the spatial and temporal patterns of transit travel speeds, which may be easily translated into travel time. The transit speed variations along the route segments are identified. Transit service periods such as morning peak, mid-day, afternoon peak, and evening periods are determined based on analyses of transit travel speed variations for different times of day. The seasonal patterns of transit performance are investigated by using the analysis of variance (ANOVA). Travel speed models based on the clustered time-of-day intervals are developed using important factors identified as having significant effects on speed for different time-of-day periods. ^ It has been found that transit performance varied from different seasons and different time-of-day periods. The geographic location of a transit route segment also plays a role in the variation of the transit performance. The results of this research indicate that advanced data mining techniques have good potential in providing automated techniques of assisting transit agencies in service planning, scheduling, and operations control. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Whilst atom probe tomography (APT) is a powerful technique with the capacity to gather information containing hundreds of millions of atoms from a single specimen, the ability to effectively use this information creates significant challenges. The main technological bottleneck lies in handling the extremely large amounts of data on spatial-chemical correlations, as well as developing new quantitative computational foundations for image reconstruction that target critical and transformative problems in materials science. The power to explore materials at the atomic scale with the extraordinary level of sensitivity of detection offered by atom probe tomography has not been not fully harnessed due to the challenges of dealing with missing, sparse and often noisy data. Hence there is a profound need to couple the analytical tools to deal with the data challenges with the experimental issues associated with this instrument. In this paper we provide a summary of some key issues associated with the challenges, and solutions to extract or "mine" fundamental materials science information from that data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the advanced technology of medical devices and sensors, an abundance of medical data streams are available. However, data analysis techniques are very limited, especially for processing massive multiple physiological streams that may only be understood by medical experts. The state-of-the-art techniques only allow multiple medical devices to independently monitor different physiological parameters for the patient's status, thus they signal too many false alarms, creating unnecessary noise, especially in the Intensive Care Unit (ICU). An effective solution which has been recently studied is to integrate information from multiple physiologic parameters to reduce alarms. But it is a challenge to detect abnormalities from high frequently changed physiological streams data, since abnormalities occur gradually due to the complex situation of patients. An analysis of ICU physiological data streams shows that many vital physiological parameters are changed periodically (such as heart rate, arterial pressure, and respiratory impedance) and thus abnormalities are generally abnormal period patterns. In this paper, we develop a Mining Abnormal Period Patterns from Multiple Physiological Streams (MAPPMPS) method to detect and rank abnormalities in medical sensor streams. The efficiency and effectiveness of the MAPPMPS method is demonstrated by a real-world massive database of multiple physiological streams sampled in ICU, comprising 250 patients' streams (each stream involving over 1.3 million data points) with a total size of 28 GB data.

Relevância:

90.00% 90.00%

Publicador:

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Light Detection and Ranging (LIDAR) has great potential to assist vegetation management in power line corridors by providing more accurate geometric information of the power line assets and vegetation along the corridors. However, the development of algorithms for the automatic processing of LIDAR point cloud data, in particular for feature extraction and classification of raw point cloud data, is in still in its infancy. In this paper, we take advantage of LIDAR intensity and try to classify ground and non-ground points by statistically analyzing the skewness and kurtosis of the intensity data. Moreover, the Hough transform is employed to detected power lines from the filtered object points. The experimental results show the effectiveness of our methods and indicate that better results were obtained by using LIDAR intensity data than elevation data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

It is a big challenge to clearly identify the boundary between positive and negative streams. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on RCV1, and substantial experiments show that the proposed approach achieves encouraging performance.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dealing with the ever-growing information overload in the Internet, Recommender Systems are widely used online to suggest potential customers item they may like or find useful. Collaborative Filtering is the most popular techniques for Recommender Systems which collects opinions from customers in the form of ratings on items, services or service providers. In addition to the customer rating about a service provider, there is also a good number of online customer feedback information available over the Internet as customer reviews, comments, newsgroups post, discussion forums or blogs which is collectively called user generated contents. This information can be used to generate the public reputation of the service providers’. To do this, data mining techniques, specially recently emerged opinion mining could be a useful tool. In this paper we present a state of the art review of Opinion Mining from online customer feedback. We critically evaluate the existing work and expose cutting edge area of interest in opinion mining. We also classify the approaches taken by different researchers into several categories and sub-categories. Each of those steps is analyzed with their strength and limitations in this paper.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Traffic safety is a major concern world-wide. It is in both the sociological and economic interests of society that attempts should be made to identify the major and multiple contributory factors to those road crashes. This paper presents a text mining based method to better understand the contextual relationships inherent in road crashes. By examining and analyzing the crash report data in Queensland from year 2004 and year 2005, this paper identifies and reports the major and multiple contributory factors to those crashes. The outcome of this study will support road asset management in reducing road crashes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Abstract With the phenomenal growth of electronic data and information, there are many demands for the development of efficient and effective systems (tools) to perform the issue of data mining tasks on multidimensional databases. Association rules describe associations between items in the same transactions (intra) or in different transactions (inter). Association mining attempts to find interesting or useful association rules in databases: this is the crucial issue for the application of data mining in the real world. Association mining can be used in many application areas, such as the discovery of associations between customers’ locations and shopping behaviours in market basket analysis. Association mining includes two phases. The first phase, called pattern mining, is the discovery of frequent patterns. The second phase, called rule generation, is the discovery of interesting and useful association rules in the discovered patterns. The first phase, however, often takes a long time to find all frequent patterns; these also include much noise. The second phase is also a time consuming activity that can generate many redundant rules. To improve the quality of association mining in databases, this thesis provides an alternative technique, granule-based association mining, for knowledge discovery in databases, where a granule refers to a predicate that describes common features of a group of transactions. The new technique first transfers transaction databases into basic decision tables, then uses multi-tier structures to integrate pattern mining and rule generation in one phase for both intra and inter transaction association rule mining. To evaluate the proposed new technique, this research defines the concept of meaningless rules by considering the co-relations between data-dimensions for intratransaction-association rule mining. It also uses precision to evaluate the effectiveness of intertransaction association rules. The experimental results show that the proposed technique is promising.