954 resultados para educational data mining


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The idea of extracting knowledge in process mining is a descendant of data mining. Both mining disciplines emphasise data flow and relations among elements in the data. Unfortunately, challenges have been encountered when working with the data flow and relations. One of the challenges is that the representation of the data flow between a pair of elements or tasks is insufficiently simplified and formulated, as it considers only a one-to-one data flow relation. In this paper, we discuss how the effectiveness of knowledge representation can be extended in both disciplines. To this end, we introduce a new representation of the data flow and dependency formulation using a flow graph. The flow graph solves the issue of the insufficiency of presenting other relation types, such as many-to-one and one-to-many relations. As an experiment, a new evaluation framework is applied to the Teleclaim process in order to show how this method can provide us with more precise results when compared with other representations.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Over the past decade, many powerful data mining techniques have been developed to analyze temporal and sequential data. The time is now fertile for addressing problems of larger scope under the purview of temporal data mining. The fourth SIGKDD workshop on temporal data mining focused on the question: What can we infer about the structure of a complex dynamical system from observed temporal data? The goals of the workshop were to critically evaluate the need in this area by bringing together leading researchers from industry and academia, and to identify promising technologies and methodologies for doing the same. We provide a brief summary of the workshop proceedings and ideas arising out of the discussions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The problem of classification of time series data is an interesting problem in the field of data mining. Even though several algorithms have been proposed for the problem of time series classification we have developed an innovative algorithm which is computationally fast and accurate in several cases when compared with 1NN classifier. In our method we are calculating the fuzzy membership of each test pattern to be classified to each class. We have experimented with 6 benchmark datasets and compared our method with 1NN classifier.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

No presente trabalho foram utilizados modelos de classificação para minerar dados relacionados à aprendizagem de Matemática e ao perfil de professores do ensino fundamental. Mais especificamente, foram abordados os fatores referentes aos educadores do Estado do Rio de Janeiro que influenciam positivamente e negativamente no desempenho dos alunos do 9 ano do ensino básico nas provas de Matemática. Os dados utilizados para extrair estas informações são disponibilizados pelo Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira que avalia o sistema educacional brasileiro em diversos níveis e modalidades de ensino, incluindo a Educação Básica, cuja avaliação, que foi foco deste estudo, é realizada pela Prova Brasil. A partir desta base, foi aplicado o processo de Descoberta de Conhecimento em Bancos de Dados (KDD - Knowledge Discovery in Databases), composto das etapas de preparação, mineração e pós-processamento dos dados. Os padrões foram extraídos dos modelos de classificação gerados pelas técnicas árvore de decisão, indução de regras e classificadores Bayesianos, cujos algoritmos estão implementados no software Weka (Waikato Environment for Knowledge Analysis). Além disso, foram aplicados métodos de grupos e uma metodologia para tornar as classes uniformemente distribuídas, afim de melhorar a precisão dos modelos obtidos. Os resultados apresentaram importantes fatores que contribuem para o ensino-aprendizagem de Matemática, assim como evidenciaram aspectos que comprometem negativamente o desempenho dos discentes. Por fim, os resultados extraídos fornecem ao educador e elaborador de políticas públicas fatores para uma análise que os auxiliem em posteriores tomadas de decisão.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Compared with structured data sources that are usually stored and analyzed in spreadsheets, relational databases, and single data tables, unstructured construction data sources such as text documents, site images, web pages, and project schedules have been less intensively studied due to additional challenges in data preparation, representation, and analysis. In this paper, our vision for data management and mining addressing such challenges are presented, together with related research results from previous work, as well as our recent developments of data mining on text-based, web-based, image-based, and network-based construction databases.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

National Key Basic Research and Development Program of China [2006CB701305]; State Key Laboratory of Resource and Environment Information System [088RA400SA]; Chinese Academy of Sciences

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Clare, A., Williams, H. E. and Lester, N. M. (2004) Scalable Multi-Relational Association Mining. In proceedings of the 4th International Conference on Data Mining ICDM '04.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Ferr?, S. and King, R. D. (2004) A dichotomic search algorithm for mining and learning in domain-specific logics. Fundamenta Informaticae. IOS Press. To appear

Relevância:

90.00% 90.00%

Publicador:

Resumo:

R. Jensen, Q. Shen, Data Reduction with Rough Sets, In: Encyclopedia of Data Warehousing and Mining - 2nd Edition, Vol. II, 2008.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The aim of this research, which focused on the Irish adult population, was to generate information for policymakers by applying statistical analyses and current technologies to oral health administrative and survey databases. Objectives included identifying socio-demographic influences on oral health and utilisation of dental services, comparing epidemiologically-estimated dental treatment need with treatment provided, and investigating the potential of a dental administrative database to provide information on utilisation of services and the volume and types of treatment provided over time. Information was extracted from the claims databases for the Dental Treatment Benefit Scheme (DTBS) for employed adults and the Dental Treatment Services Scheme (DTSS) for less-well-off adults, the National Surveys of Adult Oral Health, and the 2007 Survey of Lifestyle Attitudes and Nutrition in Ireland. Factors associated with utilisation and retention of natural teeth were analysed using count data models and logistic regression. The chi-square test and the student’s t-test were used to compare epidemiologically-estimated need in a representative sample of adults with treatment provided. Differences were found in dental care utilisation and tooth retention by Socio-Economic Status. An analysis of the five-year utilisation behaviour of a 2003 cohort of DTBS dental attendees revealed that age and being female were positively associated with visiting annually and number of treatments. Number of adults using the DTBS increased, and mean number of treatments per patient decreased, between 1997 and 2008. As a percentage of overall treatments, restorations, dentures, and extractions decreased, while prophylaxis increased. Differences were found between epidemiologically-estimated treatment need and treatment provided for those using the DTBS and DTSS. This research confirms the utility of survey and administrative data to generate knowledge for policymakers. Public administrative databases have not been designed for research purposes, but they have the potential to provide a wealth of knowledge on treatments provided and utilisation patterns.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

BACKGROUND: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. METHODS/PRINCIPAL FINDINGS: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of "what if" situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. CONCLUSION/SIGNIFICANCE: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.

This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.

On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.

In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.

We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,

and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.

In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The last decade has witnessed an unprecedented growth in availability of data having spatio-temporal characteristics. Given the scale and richness of such data, finding spatio-temporal patterns that demonstrate significantly different behavior from their neighbors could be of interest for various application scenarios such as – weather modeling, analyzing spread of disease outbreaks, monitoring traffic congestions, and so on. In this paper, we propose an automated approach of exploring and discovering such anomalous patterns irrespective of the underlying domain from which the data is recovered. Our approach differs significantly from traditional methods of spatial outlier detection, and employs two phases – i) discovering homogeneous regions, and ii) evaluating these regions as anomalies based on their statistical difference from a generalized neighborhood. We evaluate the quality of our approach and distinguish it from existing techniques via an extensive experimental evaluation.