Biblioteca Digital

986 resultados para Event Log Mining

Generalized log-gamma regression models with cure fraction

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, the generalized log-gamma regression model is modified to allow the possibility that long-term survivors may be present in the data. This modification leads to a generalized log-gamma regression model with a cure rate, encompassing, as special cases, the log-exponential, log-Weibull and log-normal regression models with a cure rate typically used to model such data. The models attempt to simultaneously estimate the effects of explanatory variables on the timing acceleration/deceleration of a given event and the surviving fraction, that is, the proportion of the population for which the event never occurs. The normal curvatures of local influence are derived under some usual perturbation schemes and two martingale-type residuals are proposed to assess departures from the generalized log-gamma error assumption as well as to detect outlying observations. Finally, a data set from the medical area is analyzed.

Aircraft interior failure pattern recognition utilizing text mining and neural networks

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Mining dei Workflow di un Laboratorio di Anatomia Patologica

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Il presente lavoro nasce dall’obiettivo di individuare strumenti statistici per indagare, sotto diversi aspetti, il flusso di lavoro di un Laboratorio di Anatomia Patologica. Il punto di partenza dello studio è l’ambiente di lavoro di ATHENA, software gestionale utilizzato nell’Anatomia Patologica, sviluppato dalla NoemaLife S.p.A., azienda specializzata nell’informatica per la sanità. A partire da tale applicativo è stato innanzitutto formalizzato il workflow del laboratorio (Capitolo 2), nelle sue caratteristiche e nelle sue possibili varianti, identificando le operazioni principali attraverso una serie di “fasi”. Proprio le fasi, unitamente alle informazioni addizionali ad esse associate, saranno per tutta la trattazione e sotto diversi punti di vista al centro dello studio. L’analisi che presentiamo è stata per completezza sviluppata in due scenari che tengono conto di diversi aspetti delle informazioni in possesso. Il primo scenario tiene conto delle sequenze di fasi, che si presentano nel loro ordine cronologico, comprensive di eventuali ripetizioni o cicli di fasi precedenti alla conclusione. Attraverso l’elaborazione dei dati secondo specifici formati è stata svolta un’iniziale indagine grafica di Workflow Mining (Capitolo 3) grazie all’ausilio di EMiT, un software che attraverso un set di log di processo restituisce graficamente il flusso di lavoro che li rappresenta. Questa indagine consente già di valutare la completezza dell’utilizzo di un applicativo rispetto alle sue potenzialità. Successivamente, le stesse fasi sono state elaborate attraverso uno specifico adattamento di un comune algoritmo di allineamento globale, l’algoritmo Needleman-Wunsch (Capitolo 4). L’utilizzo delle tecniche di allineamento applicate a sequenze di processo è in grado di individuare, nell’ambito di una specifica codifica delle fasi, le similarità tra casi clinici. L’algoritmo di Needleman-Wunsch individua le identità e le discordanze tra due stringhe di caratteri, assegnando relativi punteggi che portano a valutarne la similarità. Tale algoritmo è stato opportunamente modificato affinché possa riconoscere e penalizzare differentemente cicli e ripetizioni, piuttosto che fasi mancanti. Sempre in ottica di allineamento sarà utilizzato l’algoritmo euristico Clustal, che a partire da un confronto pairwise tra sequenze costruisce un dendrogramma rappresentante graficamente l’aggregazione dei casi in funzione della loro similarità. Proprio il dendrogramma, per la sua struttura grafica ad albero, è in grado di mostrare intuitivamente l’andamento evolutivo della similarità di un pattern di casi. Il secondo scenario (Capitolo 5) aggiunge alle sequenze l’informazione temporale in termini di istante di esecuzione di ogni fase. Da un dominio basato su sequenze di fasi, si passa dunque ad uno scenario di serie temporali. I tempi rappresentano infatti un dato essenziale per valutare la performance di un laboratorio e per individuare la conformità agli standard richiesti. Il confronto tra i casi è stato effettuato con diverse modalità, in modo da stabilire la distanza tra tutte le coppie sotto diversi aspetti: le sequenze, rappresentate in uno specifico sistema di riferimento, sono state confrontate in base alla Distanza Euclidea ed alla Dynamic Time Warping, in grado di esprimerne le discordanze rispettivamente temporali, di forma e, dunque, di processo. Alla luce dei risultati e del loro confronto, saranno presentate già in questa fase le prime valutazioni sulla pertinenza delle distanze e sulle informazioni deducibili da esse. Il Capitolo 6 rappresenta la ricerca delle correlazioni tra elementi caratteristici del processo e la performance dello stesso. Svariati fattori come le procedure utilizzate, gli utenti coinvolti ed ulteriori specificità determinano direttamente o indirettamente la qualità del servizio erogato. Le distanze precedentemente calcolate vengono dunque sottoposte a clustering, una tecnica che a partire da un insieme eterogeneo di elementi individua famiglie o gruppi simili. L’algoritmo utilizzato sarà l’UPGMA, comunemente applicato nel clustering in quanto, utilizzando, una logica di medie pesate, porta a clusterizzazioni pertinenti anche in ambiti diversi, dal campo biologico a quello industriale. L’ottenimento dei cluster potrà dunque essere finalmente sottoposto ad un’attività di ricerca di correlazioni utili, che saranno individuate ed interpretate relativamente all’attività gestionale del laboratorio. La presente trattazione propone quindi modelli sperimentali adattati al caso in esame ma idealmente estendibili, interamente o in parte, a tutti i processi che presentano caratteristiche analoghe.

Applicability of Process Mining Techniques in Business Environments

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis analyses problems related to the applicability, in business environments, of Process Mining tools and techniques. The first contribution is a presentation of the state of the art of Process Mining and a characterization of companies, in terms of their "process awareness". The work continues identifying circumstance where problems can emerge: data preparation; actual mining; and results interpretation. Other problems are the configuration of parameters by not-expert users and computational complexity. We concentrate on two possible scenarios: "batch" and "on-line" Process Mining. Concerning the batch Process Mining, we first investigated the data preparation problem and we proposed a solution for the identification of the "case-ids" whenever this field is not explicitly indicated. After that, we concentrated on problems at mining time and we propose the generalization of a well-known control-flow discovery algorithm in order to exploit non instantaneous events. The usage of interval-based recording leads to an important improvement of performance. Later on, we report our work on the parameters configuration for not-expert users. We present two approaches to select the "best" parameters configuration: one is completely autonomous; the other requires human interaction to navigate a hierarchy of candidate models. Concerning the data interpretation and results evaluation, we propose two metrics: a model-to-model and a model-to-log. Finally, we present an automatic approach for the extension of a control-flow model with social information, in order to simplify the analysis of these perspectives. The second part of this thesis deals with control-flow discovery algorithms in on-line settings. We propose a formal definition of the problem, and two baseline approaches. The actual mining algorithms proposed are two: the first is the adaptation, to the control-flow discovery problem, of a frequency counting algorithm; the second constitutes a framework of models which can be used for different kinds of streams (stationary versus evolving).

Analisi e applicazione dei processi di data mining al flusso informativo di sistemi real-time: Adattamento di un algoritmo di apprendimento automatico per la caratterizzazione e la ricerca di frequent patterns su macchine automatiche

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La tesi da me svolta durante questi ultimi sei mesi è stata sviluppata presso i laboratori di ricerca di IMA S.p.a.. IMA (Industria Macchine Automatiche) è una azienda italiana che naque nel 1961 a Bologna ed oggi riveste il ruolo di leader mondiale nella produzione di macchine automatiche per il packaging di medicinali. Vorrei subito mettere in luce che in tale contesto applicativo l’utilizzo di algoritmi di data-mining risulta essere ostico a causa dei due ambienti in cui mi trovo. Il primo è quello delle macchine automatiche che operano con sistemi in tempo reale dato che non presentano a pieno le risorse di cui necessitano tali algoritmi. Il secondo è relativo alla produzione di farmaci in quanto vige una normativa internazionale molto restrittiva che impone il tracciamento di tutti gli eventi trascorsi durante l’impacchettamento ma che non permette la visione al mondo esterno di questi dati sensibili. Emerge immediatamente l’interesse nell’utilizzo di tali informazioni che potrebbero far affiorare degli eventi riconducibili a un problema della macchina o a un qualche tipo di errore al fine di migliorare l’efficacia e l’efficienza dei prodotti IMA. Lo sforzo maggiore per riuscire ad ideare una strategia applicativa è stata nella comprensione ed interpretazione dei messaggi relativi agli aspetti software. Essendo i dati molti, chiusi, e le macchine con scarse risorse per poter applicare a dovere gli algoritmi di data mining ho provveduto ad adottare diversi approcci in diversi contesti applicativi: • Sistema di identificazione automatica di errore al fine di aumentare di diminuire i tempi di correzione di essi. • Modifica di un algoritmo di letteratura per la caratterizzazione della macchina. La trattazione è così strutturata: • Capitolo 1: descrive la macchina automatica IMA Adapta della quale ci sono stati forniti i vari file di log. Essendo lei l’oggetto di analisi per questo lavoro verranno anche riportati quali sono i flussi di informazioni che essa genera. • Capitolo 2: verranno riportati degli screenshoot dei dati in mio possesso al fine di, tramite un’analisi esplorativa, interpretarli e produrre una formulazione di idee/proposte applicabili agli algoritmi di Machine Learning noti in letteratura. • Capitolo 3 (identificazione di errore): in questo capitolo vengono riportati i contesti applicativi da me progettati al fine di implementare una infrastruttura che possa soddisfare il requisito, titolo di questo capitolo. • Capitolo 4 (caratterizzazione della macchina): definirò l’algoritmo utilizzato, FP-Growth, e mostrerò le modifiche effettuate al fine di poterlo impiegare all’interno di macchine automatiche rispettando i limiti stringenti di: tempo di cpu, memoria, operazioni di I/O e soprattutto la non possibilità di aver a disposizione l’intero dataset ma solamente delle sottoporzioni. Inoltre verranno generati dei DataSet per il testing di dell’algoritmo FP-Growth modificato.

Mining tissue microarray data to uncover combinations of biomarker expression patterns that improve intermediate staging and grading of clear cell renal cell cancer

Relevância:

30.00% 30.00%

Publicador:

Resumo:

PURPOSE: Tumor stage and nuclear grade are the most important prognostic parameters of clear cell renal cell carcinoma (ccRCC). The progression risk of ccRCC remains difficult to predict particularly for tumors with organ-confined stage and intermediate differentiation grade. Elucidating molecular pathways deregulated in ccRCC may point to novel prognostic parameters that facilitate planning of therapeutic approaches. EXPERIMENTAL DESIGN: Using tissue microarrays, expression patterns of 15 different proteins were evaluated in over 800 ccRCC patients to analyze pathways reported to be physiologically controlled by the tumor suppressors von Hippel-Lindau protein and phosphatase and tensin homologue (PTEN). Tumor staging and grading were improved by performing variable selection using Cox regression and a recursive bootstrap elimination scheme. RESULTS: Patients with pT2 and pT3 tumors that were p27 and CAIX positive had a better outcome than those with all remaining marker combinations. A prolonged survival among patients with intermediate grade (grade 2) correlated with both nuclear p27 and cytoplasmic PTEN expression, as well as with inactive, nonphosphorylated ribosomal protein S6. By applying graphical log-linear modeling for over 700 ccRCC for which the molecular parameters were available, only a weak conditional dependence existed between the expression of p27, PTEN, CAIX, and p-S6, suggesting that the dysregulation of several independent pathways are crucial for tumor progression. CONCLUSIONS: The use of recursive bootstrap elimination, as well as graphical log-linear modeling for comprehensive tissue microarray (TMA) data analysis allows the unraveling of complex molecular contexts and may improve predictive evaluations for patients with advanced renal cancer.

Computer network monitoring and abnormal event detection using graph matching and multidimensional scaling

Relevância:

30.00% 30.00%

Publicador:

Modeling multilevel sleep transitional data via Poisson log-linear multilevel models

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes Poisson log-linear multilevel models to investigate population variability in sleep state transition rates. We specifically propose a Bayesian Poisson regression model that is more flexible, scalable to larger studies, and easily fit than other attempts in the literature. We further use hierarchical random effects to account for pairings of individuals and repeated measures within those individuals, as comparing diseased to non-diseased subjects while minimizing bias is of epidemiologic importance. We estimate essentially non-parametric piecewise constant hazards and smooth them, and allow for time varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming piecewise constant hazards. This relationship allows us to synthesize two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed.

Automatic Labeling of Software Components and their Evolution using Log-Likelihood Ratio of Word Frequencies in Source Code

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As more and more open-source software components become available on the internet we need automatic ways to label and compare them. For example, a developer who searches for reusable software must be able to quickly gain an understanding of retrieved components. This understanding cannot be gained at the level of source code due to the semantic gap between source code and the domain model. In this paper we present a lexical approach that uses the log-likelihood ratios of word frequencies to automatically provide labels for software components. We present a prototype implementation of our labeling/comparison algorithm and provide examples of its application. In particular, we apply the approach to detect trends in the evolution of a software system.

Handwritten drilling log of Hole 302-M0004C

Relevância:

30.00% 30.00%

Publicador:

Handwritten drilling log of Hole 302-M0001A

Relevância:

30.00% 30.00%

Publicador:

Log-ratio of silica to aluminium counts (ln(Si/Al)) from ODP site 108-658

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Growing evidence suggests that the low atmospheric CO2 concentration of the ice ages resulted from enhanced storage of CO2 in the ocean interior, largely as a result of changes in the Southern Ocean1. Early in the most recent deglaciation, a reduction in North Atlantic overturning circulation seems to have driven CO2 release from the Southern Ocean**2, 3, 4, 5, but the mechanism connecting the North Atlantic and the Southern Ocean remains unclear. Biogenic opal export in the low-latitude ocean relies on silicate from the underlying thermocline, the concentration of which is affected by the circulation of the ocean interior. Here we report a record of biogenic opal export from a coastal upwelling system off the coast of northwest Africa that shows pronounced opal maxima during each glacial termination over the past 550,000 years. These opal peaks are consistent with a strong deglacial reduction in the formation of silicate-poor glacial North Atlantic intermediate water**2 (GNAIW). The loss of GNAIW allowed mixing with underlying silicate-rich deep water to increase the silicate supply to the surface ocean. An increase in westerly-wind-driven upwelling in the Southern Ocean in response to the North Atlantic change has been proposed to drive the deglacial rise in atmospheric CO2 (refs 3, 4). However, such a circulation change would have accelerated the formation of Antarctic intermediate water and sub-Antarctic mode water, which today have as little silicate as North Atlantic Deep Water and would have thus maintained low silicate concentrations in the Atlantic thermocline. The deglacial opal maxima reported here suggest an alternative mechanism for the deglacial CO2 release**5, 6. Just as the reduction in GNAIW led to upward silicate transport, it should also have allowed the downward mixing of warm, low-density surface water to reach into the deep ocean. The resulting decrease in the density of the deep Atlantic relative to the Southern Ocean surface promoted Antarctic overturning, which released CO2 to the atmosphere.

Petrology of mafic and ultramafic intrusions from the Portneuf-Mauricie Domain, Grenville Province, Canada

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Portneuf-Mauricie Domain (PMD), located in the south-central part of the Grenville Province, comprises several mafic and ultramafic intrusions hosting Ni-Cu ± platinum-group element (PGE) prospects and a former small mining operation (Lac Édouard mine). These meter- to kilometer-scale, sulfide-bearing intrusions display diverse forms, such as layered and tabular bodies with no particular internal structure, and zoned plutons. They were injected ~ 1.40 Ga into a mature oceanic arc, before and during accretion of the arc to the Laurentian margin. The pressure-temperature conditions of the magmas at the beginning of their emplacement were 3 kbar and 1319-1200 °C (according to the petrologic modeling results from this study). The PMD mineralized intrusions are interpreted to represent former magma chambers or magma conduits in the roots of the oceanic arc. The parent magmas of the mineralized intrusions resulted mainly from the partial melting of a mantle source composed of spinel-bearing lherzolite. Petrologic modeling and the occurrence of primary amphibole in the plutonic rocks indicate that these parent melts were basaltic and hydrous. In addition, fractional crystallization modeling and Mg/Fe ratios suggest that most of the intrusions may have formed from evolved magmas, with Mg# = 60, resulting from the fractionation of more primitive magmas (primary magmas, with Mg# = 68). Petrologic modeling demonstrates that 30% fractional crystallization resulted in the primitive to evolved characteristics of the studied intrusive rocks (as indicated by the crystallization sequences and mineral chemistry). Exceptions are the Réservoir Blanc, Boivin, and Rochette West parent magmas, which may have undergone more extensive fractional crystallization, since these intrusions contain pyroxenes that are more iron rich and have lower Mg numbers than pyroxenes in the other PMD intrusions. The PMD mafic and ultramafic intrusions were intruded into an island arc located offshore from the Laurentian continent. Thus, their presence confirms the existence of a well-developed magmatic network (responsible of the fractionation processes) beneath the Proterozoic arc, which resulted in the wide range of compositions observed in the various plutons.

Tie points and log-adjusted depth scales for ODP Leg 207 Sites

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multiple copies of Cretaceous black shales extending from the early Cenomanian to the end of the Santonian were recovered at five sites on Demerara Rise during Leg 207 of the Ocean Drilling Program. These sediments are primarily composed of laminated organic-rich claystones interbedded with coarser, lightly laminated foraminferal-bearing packstones and wackestones. The black shales represent the local expression of widespread organic-rich sedimentation in the Atlantic during the mid-Cretaceous. However, incomplete recovery prevented construction of continuous composite sections, resulting in uncertainties concerning the correct stratigraphic placement of individual cores. By combining high-resolution measurements of bulk density collected shipboard on the multisensor track with continuous downhole measurements of formation resistivity using the Formation MicroScanner, an equivalent logging depth scale was constructed for black shales recovered from Sites 1258, 1260, and 1261. The integrated depths approach centimeter-scale resolution and are supported by comparisons of coarser resolution natural gamma ray emissions collected on cores and through downhole logging operations. The new depths highlight the extent of both intra- and intercore gaps and provide an opportunity to further constrain temporal and spatial paleoceanographic changes captured in proxy records from these sediments.

(Table 4) Volume of natural gas within the downhole log-inferred gas-hydrate occurrences in ODP Leg 164 sites

Relevância:

30.00% 30.00%

Publicador:

«
1
2
3
4
5
6
7
8
...
65
66
»