792 resultados para Pattern mining


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Il presente lavoro nasce dall’obiettivo di individuare strumenti statistici per indagare, sotto diversi aspetti, il flusso di lavoro di un Laboratorio di Anatomia Patologica. Il punto di partenza dello studio è l’ambiente di lavoro di ATHENA, software gestionale utilizzato nell’Anatomia Patologica, sviluppato dalla NoemaLife S.p.A., azienda specializzata nell’informatica per la sanità. A partire da tale applicativo è stato innanzitutto formalizzato il workflow del laboratorio (Capitolo 2), nelle sue caratteristiche e nelle sue possibili varianti, identificando le operazioni principali attraverso una serie di “fasi”. Proprio le fasi, unitamente alle informazioni addizionali ad esse associate, saranno per tutta la trattazione e sotto diversi punti di vista al centro dello studio. L’analisi che presentiamo è stata per completezza sviluppata in due scenari che tengono conto di diversi aspetti delle informazioni in possesso. Il primo scenario tiene conto delle sequenze di fasi, che si presentano nel loro ordine cronologico, comprensive di eventuali ripetizioni o cicli di fasi precedenti alla conclusione. Attraverso l’elaborazione dei dati secondo specifici formati è stata svolta un’iniziale indagine grafica di Workflow Mining (Capitolo 3) grazie all’ausilio di EMiT, un software che attraverso un set di log di processo restituisce graficamente il flusso di lavoro che li rappresenta. Questa indagine consente già di valutare la completezza dell’utilizzo di un applicativo rispetto alle sue potenzialità. Successivamente, le stesse fasi sono state elaborate attraverso uno specifico adattamento di un comune algoritmo di allineamento globale, l’algoritmo Needleman-Wunsch (Capitolo 4). L’utilizzo delle tecniche di allineamento applicate a sequenze di processo è in grado di individuare, nell’ambito di una specifica codifica delle fasi, le similarità tra casi clinici. L’algoritmo di Needleman-Wunsch individua le identità e le discordanze tra due stringhe di caratteri, assegnando relativi punteggi che portano a valutarne la similarità. Tale algoritmo è stato opportunamente modificato affinché possa riconoscere e penalizzare differentemente cicli e ripetizioni, piuttosto che fasi mancanti. Sempre in ottica di allineamento sarà utilizzato l’algoritmo euristico Clustal, che a partire da un confronto pairwise tra sequenze costruisce un dendrogramma rappresentante graficamente l’aggregazione dei casi in funzione della loro similarità. Proprio il dendrogramma, per la sua struttura grafica ad albero, è in grado di mostrare intuitivamente l’andamento evolutivo della similarità di un pattern di casi. Il secondo scenario (Capitolo 5) aggiunge alle sequenze l’informazione temporale in termini di istante di esecuzione di ogni fase. Da un dominio basato su sequenze di fasi, si passa dunque ad uno scenario di serie temporali. I tempi rappresentano infatti un dato essenziale per valutare la performance di un laboratorio e per individuare la conformità agli standard richiesti. Il confronto tra i casi è stato effettuato con diverse modalità, in modo da stabilire la distanza tra tutte le coppie sotto diversi aspetti: le sequenze, rappresentate in uno specifico sistema di riferimento, sono state confrontate in base alla Distanza Euclidea ed alla Dynamic Time Warping, in grado di esprimerne le discordanze rispettivamente temporali, di forma e, dunque, di processo. Alla luce dei risultati e del loro confronto, saranno presentate già in questa fase le prime valutazioni sulla pertinenza delle distanze e sulle informazioni deducibili da esse. Il Capitolo 6 rappresenta la ricerca delle correlazioni tra elementi caratteristici del processo e la performance dello stesso. Svariati fattori come le procedure utilizzate, gli utenti coinvolti ed ulteriori specificità determinano direttamente o indirettamente la qualità del servizio erogato. Le distanze precedentemente calcolate vengono dunque sottoposte a clustering, una tecnica che a partire da un insieme eterogeneo di elementi individua famiglie o gruppi simili. L’algoritmo utilizzato sarà l’UPGMA, comunemente applicato nel clustering in quanto, utilizzando, una logica di medie pesate, porta a clusterizzazioni pertinenti anche in ambiti diversi, dal campo biologico a quello industriale. L’ottenimento dei cluster potrà dunque essere finalmente sottoposto ad un’attività di ricerca di correlazioni utili, che saranno individuate ed interpretate relativamente all’attività gestionale del laboratorio. La presente trattazione propone quindi modelli sperimentali adattati al caso in esame ma idealmente estendibili, interamente o in parte, a tutti i processi che presentano caratteristiche analoghe.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Il problema relativo alla predizione, la ricerca di pattern predittivi all‘interno dei dati, è stato studiato ampiamente. Molte metodologie robuste ed efficienti sono state sviluppate, procedimenti che si basano sull‘analisi di informazioni numeriche strutturate. Quella testuale, d‘altro canto, è una tipologia di informazione fortemente destrutturata. Quindi, una immediata conclusione, porterebbe a pensare che per l‘analisi predittiva su dati testuali sia necessario sviluppare metodi completamente diversi da quelli ben noti dalle tecniche di data mining. Un problema di predizione può essere risolto utilizzando invece gli stessi metodi : dati testuali e documenti possono essere trasformati in valori numerici, considerando per esempio l‘assenza o la presenza di termini, rendendo di fatto possibile una utilizzazione efficiente delle tecniche già sviluppate. Il text mining abilita la congiunzione di concetti da campi di applicazione estremamente eterogenei. Con l‘immensa quantità di dati testuali presenti, basti pensare, sul World Wide Web, ed in continua crescita a causa dell‘utilizzo pervasivo di smartphones e computers, i campi di applicazione delle analisi di tipo testuale divengono innumerevoli. L‘avvento e la diffusione dei social networks e della pratica di micro blogging abilita le persone alla condivisione di opinioni e stati d‘animo, creando un corpus testuale di dimensioni incalcolabili aggiornato giornalmente. Le nuove tecniche di Sentiment Analysis, o Opinion Mining, si occupano di analizzare lo stato emotivo o la tipologia di opinione espressa all‘interno di un documento testuale. Esse sono discipline attraverso le quali, per esempio, estrarre indicatori dello stato d‘animo di un individuo, oppure di un insieme di individui, creando una rappresentazione dello stato emotivo sociale. L‘andamento dello stato emotivo sociale può condizionare macroscopicamente l‘evolvere di eventi globali? Studi in campo di Economia e Finanza Comportamentale assicurano un legame fra stato emotivo, capacità nel prendere decisioni ed indicatori economici. Grazie alle tecniche disponibili ed alla mole di dati testuali continuamente aggiornati riguardanti lo stato d‘animo di milioni di individui diviene possibile analizzare tali correlazioni. In questo studio viene costruito un sistema per la previsione delle variazioni di indici di borsa, basandosi su dati testuali estratti dalla piattaforma di microblogging Twitter, sotto forma di tweets pubblici; tale sistema include tecniche di miglioramento della previsione basate sullo studio di similarità dei testi, categorizzandone il contributo effettivo alla previsione.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analisi e applicazione dei processi di data mining al flusso informativo di sistemi real-time. Implementazione e analisi di un algoritmo autoadattivo per la ricerca di frequent patterns su macchine automatiche.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this position paper, we claim that the need for time consuming data preparation and result interpretation tasks in knowledge discovery, as well as for costly expert consultation and consensus building activities required for ontology building can be reduced through exploiting the interplay of data mining and ontology engineering. The aim is to obtain in a semi-automatic way new knowledge from distributed data sources that can be used for inference and reasoning, as well as to guide the extraction of further knowledge from these data sources. The proposed approach is based on the creation of a novel knowledge discovery method relying on the combination, through an iterative ?feedbackloop?, of (a) data mining techniques to make emerge implicit models from data and (b) pattern-based ontology engineering to capture these models in reusable, conceptual and inferable artefacts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There has been an increased demand for characterizing user access patterns using web mining techniques since the informative knowledge extracted from web server log files can not only offer benefits for web site structure improvement but also for better understanding of user navigational behavior. In this paper, we present a web usage mining method, which utilize web user usage and page linkage information to capture user access pattern based on Probabilistic Latent Semantic Analysis (PLSA) model. A specific probabilistic model analysis algorithm, EM algorithm, is applied to the integrated usage data to infer the latent semantic factors as well as generate user session clusters for revealing user access patterns. Experiments have been conducted on real world data set to validate the effectiveness of the proposed approach. The results have shown that the presented method is capable of characterizing the latent semantic factors and generating user profile in terms of weighted page vectors, which may reflect the common access interest exhibited by users among same session cluster.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pattern discovery in temporal event sequences is of great importance in many application domains, such as telecommunication network fault analysis. In reality, not every type of event has an accurate timestamp. Some of them, defined as inaccurate events may only have an interval as possible time of occurrence. The existence of inaccurate events may cause uncertainty in event ordering. The traditional support model cannot deal with this uncertainty, which would cause some interesting patterns to be missing. A new concept, precise support, is introduced to evaluate the probability of a pattern contained in a sequence. Based on this new metric, we define the uncertainty model and present an algorithm to discover interesting patterns in the sequence database that has one type of inaccurate event. In our model, the number of types of inaccurate events can be extended to k readily, however, at a cost of increasing computational complexity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A study of information available on the settlement characteristics of backfill in restored opencast coal mining sites and other similar earthworks projects has been undertaken. In addition, the methods of opencast mining, compaction controls, monitoring and test methods have been reviewed. To consider and develop the methods of predicting the settlement of fill, three sites in the West Midlands have been examined; at each, the backfill had been placed in a controlled manner. In addition, use has been made of a finite element computer program to compare a simple two-dimensional linear elastic analysis with field observations of surface settlements in the vicinity of buried highwalls. On controlled backfill sites, settlement predictions have been accurately made, based on a linear relationship between settlement (expressed as a percentage of fill height) against logarithm of time. This `creep' settlement was found to be effectively complete within 18 months of restoration. A decrease of this percentage settlement was observed with increasing fill thickness; this is believed to be related to the speed with which the backfill is placed. A rising water table within the backfill is indicated to cause additional gradual settlement. A prediction method, based on settlement monitoring, has been developed and used to determine the pattern of settlement across highwalls and buried highwalls. The zone of appreciable differential settlement was found to be mainly limited to the highwall area, the magnitude was dictated by the highwall inclination. With a backfill cover of about 15 metres over a buried highwall the magnitude of differential settlement was negligible. Use has been made of the proposed settlement prediction method and monitoring to control the re-development of restored opencase sites. The specifications, tests and monitoring techniques developed in recent years have been used to aid this. Such techniques have been valuable in restoring land previously derelict due to past underground mining.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

* The work is partially supported by Grant no. NIP917 of the Ministry of Science and Education – Republic of Bulgaria.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With advances in science and technology, computing and business intelligence (BI) systems are steadily becoming more complex with an increasing variety of heterogeneous software and hardware components. They are thus becoming progressively more difficult to monitor, manage and maintain. Traditional approaches to system management have largely relied on domain experts through a knowledge acquisition process that translates domain knowledge into operating rules and policies. It is widely acknowledged as a cumbersome, labor intensive, and error prone process, besides being difficult to keep up with the rapidly changing environments. In addition, many traditional business systems deliver primarily pre-defined historic metrics for a long-term strategic or mid-term tactical analysis, and lack the necessary flexibility to support evolving metrics or data collection for real-time operational analysis. There is thus a pressing need for automatic and efficient approaches to monitor and manage complex computing and BI systems. To realize the goal of autonomic management and enable self-management capabilities, we propose to mine system historical log data generated by computing and BI systems, and automatically extract actionable patterns from this data. This dissertation focuses on the development of different data mining techniques to extract actionable patterns from various types of log data in computing and BI systems. Four key problems—Log data categorization and event summarization, Leading indicator identification , Pattern prioritization by exploring the link structures , and Tensor model for three-way log data are studied. Case studies and comprehensive experiments on real application scenarios and datasets are conducted to show the effectiveness of our proposed approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many systems and applications are continuously producing events. These events are used to record the status of the system and trace the behaviors of the systems. By examining these events, system administrators can check the potential problems of these systems. If the temporal dynamics of the systems are further investigated, the underlying patterns can be discovered. The uncovered knowledge can be leveraged to predict the future system behaviors or to mitigate the potential risks of the systems. Moreover, the system administrators can utilize the temporal patterns to set up event management rules to make the system more intelligent. With the popularity of data mining techniques in recent years, these events grad- ually become more and more useful. Despite the recent advances of the data mining techniques, the application to system event mining is still in a rudimentary stage. Most of works are still focusing on episodes mining or frequent pattern discovering. These methods are unable to provide a brief yet comprehensible summary to reveal the valuable information from the high level perspective. Moreover, these methods provide little actionable knowledge to help the system administrators to better man- age the systems. To better make use of the recorded events, more practical techniques are required. From the perspective of data mining, three correlated directions are considered to be helpful for system management: (1) Provide concise yet comprehensive summaries about the running status of the systems; (2) Make the systems more intelligence and autonomous; (3) Effectively detect the abnormal behaviors of the systems. Due to the richness of the event logs, all these directions can be solved in the data-driven manner. And in this way, the robustness of the systems can be enhanced and the goal of autonomous management can be approached. This dissertation mainly focuses on the foregoing directions that leverage tem- poral mining techniques to facilitate system management. More specifically, three concrete topics will be discussed, including event, resource demand prediction, and streaming anomaly detection. Besides the theoretic contributions, the experimental evaluation will also be presented to demonstrate the effectiveness and efficacy of the corresponding solutions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Acid drainage influence on the water and sediment quality was investigated in a coal mining area (southern Brazil). Mine drainage showed pH between 3.2 and 4.6 and elevated concentrations of sulfate, As and metals, of which, Fe, Mn and Zn exceeded the limits for the emission of effluents stated in the Brazilian legislation. Arsenic also exceeded the limit, but only slightly. Groundwater monitoring wells from active mines and tailings piles showed pH interval and chemical concentrations similar to those of mine drainage. However, the river and ground water samples of municipal public water supplies revealed a pH range from 7.2 to 7.5 and low chemical concentrations, although Cd concentration slightly exceeded the limit adopted by Brazilian legislation for groundwater. In general, surface waters showed large pH range (6 to 10.8), and changes caused by acid drainage in the chemical composition of these waters were not very significant. Locally, acid drainage seemed to have dissolved carbonate rocks present in the local stratigraphic sequence, attenuating the dispersion of metals and As. Stream sediments presented anomalies of these elements, which were strongly dependent on the proximity of tailings piles and abandoned mines. We found that precipitation processes in sediments and the dilution of dissolved phases were responsible for the attenuation of the concentrations of the metals and As in the acid drainage and river water mixing zone. In general, a larger influence of mining activities on the chemical composition of the surface waters and sediments was observed when enrichment factors in relation to regional background levels were used.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although cartilaginous tumors have low microvascular density, vessels are important for the provision of nutrition so that the tumor can grow and generate metastasis. The aim of this study was to assess the value of the vascular pattern classification as a prognostic tool in chondrosarcomas (CSs) and its relation with vascular endothelial growth factor (VEGF) expression. This was a retrospective study of 21 enchondromas and 57 conventional CSs. Clinical data and outcome were retrieved from medical files. CSs histologic grades (on a scale of 1 to 3) were determined according to the World Health Organization classification. The vascular pattern (on a scale of A to C) was assessed through CD34, according to Kalinski. CD105 and VEGF were also evaluated. Poor outcome was significantly associated with vascular pattern groups B and C. Higher vascular pattern were 6.5 times more frequent in moderate-grade and high-grade CSs than in grade 1 CS. On multivariate analysis, a clear correlation was found between VEGF overexpression and B/C vascular patterns. Only 18 (benign and malignant) tumors stained for CD105. The results point to the use of the vascular pattern classification as a prognostic tool in CSs and to differentiate low-grade from moderate-grade/high-grade CSs. Vascular pattern might be also used to complement histologic grade, VEGF immunostaining, and microvascular density, for indicating a patient's prognosis. Low-grade CSs develop under low neoangiogenesis, which conforms to the slow growth rate of these tumors.