852 resultados para 080109 Pattern Recognition and Data Mining
Resumo:
* The research is supported partly by INTAS: 04-77-7173 project, http://www.intas.be
Resumo:
In this work the new pattern recognition method based on the unification of algebraic and statistical approaches is described. The main point of the method is the voting procedure upon the statistically weighted regularities, which are linear separators in two-dimensional projections of feature space. The report contains brief description of the theoretical foundations of the method, description of its software realization and the results of series of experiments proving its usefulness in practical tasks.
Resumo:
Dimensionality reduction is a very important step in the data mining process. In this paper, we consider feature extraction for classification tasks as a technique to overcome problems occurring because of “the curse of dimensionality”. Three different eigenvector-based feature extraction approaches are discussed and three different kinds of applications with respect to classification tasks are considered. The summary of obtained results concerning the accuracy of classification schemes is presented with the conclusion about the search for the most appropriate feature extraction method. The problem how to discover knowledge needed to integrate the feature extraction and classification processes is stated. A decision support system to aid in the integration of the feature extraction and classification processes is proposed. The goals and requirements set for the decision support system and its basic structure are defined. The means of knowledge acquisition needed to build up the proposed system are considered.
Resumo:
Categorising visitors based on their interaction with a website is a key problem in Web content usage. The clickstreams generated by various users often follow distinct patterns, the knowledge of which may help in providing customised content. This paper proposes an approach to clustering weblog data, based on ART2 neural networks. Due to the characteristics of the ART2 neural network model, the proposed approach can be used for unsupervised and self-learning data mining, which makes it adaptable to dynamically changing websites.
Resumo:
AMS Subj. Classification: 62P10, 62H30, 68T01
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2014
Resumo:
Computer software plays an important role in business, government, society and sciences. To solve real-world problems, it is very important to measure the quality and reliability in the software development life cycle (SDLC). Software Engineering (SE) is the computing field concerned with designing, developing, implementing, maintaining and modifying software. The present paper gives an overview of the Data Mining (DM) techniques that can be applied to various types of SE data in order to solve the challenges posed by SE tasks such as programming, bug detection, debugging and maintenance. A specific DM software is discussed, namely one of the analytical tools for analyzing data and summarizing the relationships that have been identified. The paper concludes that the proposed techniques of DM within the domain of SE could be well applied in fields such as Customer Relationship Management (CRM), eCommerce and eGovernment. ACM Computing Classification System (1998): H.2.8.
Resumo:
The nation's freeway systems are becoming increasingly congested. A major contribution to traffic congestion on freeways is due to traffic incidents. Traffic incidents are non-recurring events such as accidents or stranded vehicles that cause a temporary roadway capacity reduction, and they can account for as much as 60 percent of all traffic congestion on freeways. One major freeway incident management strategy involves diverting traffic to avoid incident locations by relaying timely information through Intelligent Transportation Systems (ITS) devices such as dynamic message signs or real-time traveler information systems. The decision to divert traffic depends foremost on the expected duration of an incident, which is difficult to predict. In addition, the duration of an incident is affected by many contributing factors. Determining and understanding these factors can help the process of identifying and developing better strategies to reduce incident durations and alleviate traffic congestion. A number of research studies have attempted to develop models to predict incident durations, yet with limited success. ^ This dissertation research attempts to improve on this previous effort by applying data mining techniques to a comprehensive incident database maintained by the District 4 ITS Office of the Florida Department of Transportation (FDOT). Two categories of incident duration prediction models were developed: "offline" models designed for use in the performance evaluation of incident management programs, and "online" models for real-time prediction of incident duration to aid in the decision making of traffic diversion in the event of an ongoing incident. Multiple data mining analysis techniques were applied and evaluated in the research. The multiple linear regression analysis and decision tree based method were applied to develop the offline models, and the rule-based method and a tree algorithm called M5P were used to develop the online models. ^ The results show that the models in general can achieve high prediction accuracy within acceptable time intervals of the actual durations. The research also identifies some new contributing factors that have not been examined in past studies. As part of the research effort, software code was developed to implement the models in the existing software system of District 4 FDOT for actual applications. ^
Resumo:
This research is to establish new optimization methods for pattern recognition and classification of different white blood cells in actual patient data to enhance the process of diagnosis. Beckman-Coulter Corporation supplied flow cytometry data of numerous patients that are used as training sets to exploit the different physiological characteristics of the different samples provided. The methods of Support Vector Machines (SVM) and Artificial Neural Networks (ANN) were used as promising pattern classification techniques to identify different white blood cell samples and provide information to medical doctors in the form of diagnostic references for the specific disease states, leukemia. The obtained results prove that when a neural network classifier is well configured and trained with cross-validation, it can perform better than support vector classifiers alone for this type of data. Furthermore, a new unsupervised learning algorithm---Density based Adaptive Window Clustering algorithm (DAWC) was designed to process large volumes of data for finding location of high data cluster in real-time. It reduces the computational load to ∼O(N) number of computations, and thus making the algorithm more attractive and faster than current hierarchical algorithms.
Resumo:
Electronic database handling of buisness information has gradually gained its popularity in the hospitality industry. This article provides an overview on the fundamental concepts of a hotel database and investigates the feasibility of incorporating computer-assisted data mining techniques into hospitality database applications. The author also exposes some potential myths associated with data mining in hospitaltiy database applications.
Resumo:
[EN]The classification speed of state-of-the-art classifiers such as SVM is an important aspect to be considered for emerging applications and domains such as data mining and human-computer interaction. Usually, a test-time speed increase in SVMs is achieved by somehow reducing the number of support vectors, which allows a faster evaluation of the decision function. In this paper a novel approach is described for fast classification in a PCA+SVM scenario. In the proposed approach, classification of an unseen sample is performed incrementally in increasingly larger feature spaces. As soon as the classification confidence is above a threshold the process stops and the class label is retrieved...
Resumo:
In this talk, I will describe various computational modelling and data mining solutions that form the basis of how the office of Deputy Head of Department (Resources) works to serve you. These include lessons I learn about, and from, optimisation issues in resource allocation, uncertainty analysis on league tables, modelling the process of winning external grants, and lessons we learn from student satisfaction surveys, some of which I have attempted to inject into our planning processes.
Resumo:
Data mining, as a heatedly discussed term, has been studied in various fields. Its possibilities in refining the decision-making process, realizing potential patterns and creating valuable knowledge have won attention of scholars and practitioners. However, there are less studies intending to combine data mining and libraries where data generation occurs all the time. Therefore, this thesis plans to fill such a gap. Meanwhile, potential opportunities created by data mining are explored to enhance one of the most important elements of libraries: reference service. In order to thoroughly demonstrate the feasibility and applicability of data mining, literature is reviewed to establish a critical understanding of data mining in libraries and attain the current status of library reference service. The result of the literature review indicates that free online data resources other than data generated on social media are rarely considered to be applied in current library data mining mandates. Therefore, the result of the literature review motivates the presented study to utilize online free resources. Furthermore, the natural match between data mining and libraries is established. The natural match is explained by emphasizing the data richness reality and considering data mining as one kind of knowledge, an easy choice for libraries, and a wise method to overcome reference service challenges. The natural match, especially the aspect that data mining could be helpful for library reference service, lays the main theoretical foundation for the empirical work in this study. Turku Main Library was selected as the case to answer the research question: whether data mining is feasible and applicable for reference service improvement. In this case, the daily visit from 2009 to 2015 in Turku Main Library is considered as the resource for data mining. In addition, corresponding weather conditions are collected from Weather Underground, which is totally free online. Before officially being analyzed, the collected dataset is cleansed and preprocessed in order to ensure the quality of data mining. Multiple regression analysis is employed to mine the final dataset. Hourly visits are the independent variable and weather conditions, Discomfort Index and seven days in a week are dependent variables. In the end, four models in different seasons are established to predict visiting situations in each season. Patterns are realized in different seasons and implications are created based on the discovered patterns. In addition, library-climate points are generated by a clustering method, which simplifies the process for librarians using weather data to forecast library visiting situation. Then the data mining result is interpreted from the perspective of improving reference service. After this data mining work, the result of the case study is presented to librarians so as to collect professional opinions regarding the possibility of employing data mining to improve reference services. In the end, positive opinions are collected, which implies that it is feasible to utilizing data mining as a tool to enhance library reference service.
Resumo:
The incredible rapid development to huge volumes of air travel, mainly because of jet airliners that appeared to the sky in the 1950s, created the need for systematic research for aviation safety and collecting data about air traffic. The structured data can be analysed easily using queries from databases and running theseresults through graphic tools. However, in analysing narratives that often give more accurate information about the case, mining tools are needed. The analysis of textual data with computers has not been possible until data mining tools have been developed. Their use, at least among aviation, is still at a moderate level. The research aims at discovering lethal trends in the flight safety reports. The narratives of 1,200 flight safety reports from years 1994 – 1996 in Finnish were processed with three text mining tools. One of them was totally language independent, the other had a specific configuration for Finnish and the third originally created for English, but encouraging results had been achieved with Spanish and that is why a Finnish test was undertaken, too. The global rate of accidents is stabilising and the situation can now be regarded as satisfactory, but because of the growth in air traffic, the absolute number of fatal accidents per year might increase, if the flight safety will not be improved. The collection of data and reporting systems have reached their top level. The focal point in increasing the flight safety is analysis. The air traffic has generally been forecasted to grow 5 – 6 per cent annually over the next two decades. During this period, the global air travel will probably double also with relatively conservative expectations of economic growth. This development makes the airline management confront growing pressure due to increasing competition, signify cant rise in fuel prices and the need to reduce the incident rate due to expected growth in air traffic volumes. All this emphasises the urgent need for new tools and methods. All systems provided encouraging results, as well as proved challenges still to be won. Flight safety can be improved through the development and utilisation of sophisticated analysis tools and methods, like data mining, using its results supporting the decision process of the executives.
Resumo:
In this work we focus on pattern recognition methods related to EMG upper-limb prosthetic control. After giving a detailed review of the most widely used classification methods, we propose a new classification approach. It comes as a result of comparison in the Fourier analysis between able-bodied and trans-radial amputee subjects. We thus suggest a different classification method which considers each surface electrodes contribute separately, together with five time domain features, obtaining an average classification accuracy equals to 75% on a sample of trans-radial amputees. We propose an automatic feature selection procedure as a minimization problem in order to improve the method and its robustness.