928 resultados para sequential frequent pattern
Resumo:
Data mining, frequent pattern mining, database mining, mining algorithms in SQL
Resumo:
We present a method to enhance fault localization for software systems based on a frequent pattern mining algorithm. Our method is based on a large set of test cases for a given set of programs in which faults can be detected. The test executions are recorded as function call trees. Based on test oracles the tests can be classified into successful and failing tests. A frequent pattern mining algorithm is used to identify frequent subtrees in successful and failing test executions. This information is used to rank functions according to their likelihood of containing a fault. The ranking suggests an order in which to examine the functions during fault analysis. We validate our approach experimentally using a subset of Siemens benchmark programs.
Resumo:
Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset.
Resumo:
Analisi e applicazione dei processi di data mining al flusso informativo di sistemi real-time. Implementazione e analisi di un algoritmo autoadattivo per la ricerca di frequent patterns su macchine automatiche.
Resumo:
A large epidemic of serogroup B meningococcal disease (MD), has been occurring in greater São Paulo, Brazil, since 1988.21 A Cuban-produced vaccine, based on outer-membrane-protein (OMP) from serogroup B: serotype 4: serosubtype P1.15 (B:4:P1.15) Neisseria meningitidis, was given to about 2.4 million children aged from 3 months to 6 years during 1989 and 1990. The administration of vaccine had little or no measurable effects on this outbreak. In order to detect clonal changes that could explain the continued increase in the incidence of disease after the vaccination, we serotyped isolates recovered between 1990 and 1996 from 834 patients with systemic disease. Strains B:4:P1.15, which was detected in the area as early as 1977, has been the most prevalent phenotype since 1988. These strains are still prevalent in the area and were responsible for about 68% of 834 serogroup B cases in the last 7 years. We analyzed 438 (52%) of these strains by restriction fragment length polymorphism (RFLPs) of rRNA genes (ribotyping). The most frequent pattern obtained was referred to as Rb1 (68%). We concluded that the same clone of B:4:P1.15-Rb1 strains was the most prevalent strain and responsible for the continued increase of incidence of serogroup B MD cases in greater São Paulo during the last 7 years in spite of the vaccination trial.
Resumo:
Purpose: In the last years, MRI appears as a complementary diagnostic method to US in the diagnosis of congenital lung lesions. Focal homogeneous pulmonary hyperintensity on T2-WI constitutes a frequent pattern observed. Our purpose is to determine if this finding is associated with a characteristic pulmonary lesion. Materials and methods: Between 01.01.00 and 31.12.07, a total of 50 prenatal MRI in fetuses with echographic diagnosis of thoracic pathology were performed in our institution, including 12 cases of suspected congenital pulmonary lesions. Prenatal images were correlated with post-natal diagnosis. Results: In 12 cases, fetal MRI detected congenital pulmonary lesions. In 8 patients, typical signs (cystic lesions, septations, anomalous vasculature) clearly suggested a specific pathology. In 4 cases, MRI showed a focal homogeneous increase of the signal intensity (SI) on T2-WI of the pathologic lung related to the normal one. The final diagnosis of these fetuses included 1 patient with congenital cystic adenomatoid malformation type III, 1 patient with segmental emphysema and 2 cases of bronchial atresia. In all 4 cases, a significant post-natal reduction of the lesion size related to prenatal MRI studies was observed. Conclusion: Our study suggests that a focal increment of the SI of the lung on T2-WI is a non specific sign of congenital lung disease, present in different pathologies. Therefore, a prospective diagnosis is not possible.
Resumo:
This paper analyses the role of prosody in parenthetical insertions, a type of structure that is extremely common in both speech and writing. The materials under study come from a corpus of spontaneous speech acts in Central Catalan (with varying degrees of spontaneity) from which a corpus of oral parenthetical insertions has been compiled. The prototypical prosodic features of a parenthetical insertion in Catalan are: prosodic autonomy, limited extension, production in between pauses or final pause, tendency towards acceleration, fall in intensity, lower pitch range and, finally, falling or rising melodic pattern. While the final fall is the most frequent pattern in spontaneous conversations with a high degree of confidence between interlocutors, a final rising structure is found in interviews in which the degree of confidence between participants is smaller, their roles are unequal, and the interviewed constructs a narrative discourse. We thus suggest that the pitch contour of parenthetical insertions is related to formality and discourse typology (in this case, narrative vs. dialogue). Bearing in mind the discursive functions performed by these insertions, we propose a typology which classifies them with regards to two main functions: completion of information, and modalisation.
Resumo:
In this paper, moving flock patterns are mined from spatio- temporal datasets by incorporating a clustering algorithm. A flock is defined as the set of data that move together for a certain continuous amount of time. Finding out moving flock patterns using clustering algorithms is a potential method to find out frequent patterns of movement in large trajectory datasets. In this approach, SPatial clusteRing algoRithm thrOugh sWarm intelligence (SPARROW) is the clustering algorithm used. The advantage of using SPARROW algorithm is that it can effectively discover clusters of widely varying sizes and shapes from large databases. Variations of the proposed method are addressed and also the experimental results show that the problem of scalability and duplicate pattern formation is addressed. This method also reduces the number of patterns produced
Resumo:
The research of this thesis dissertation covers developments and applications of short-and long-term climate predictions. The short-term prediction emphasizes monthly and seasonal climate, i.e. forecasting from up to the next month over a season to up to a year or so. The long-term predictions pertain to the analysis of inter-annual- and decadal climate variations over the whole 21st century. These two climate prediction methods are validated and applied in the study area, namely, Khlong Yai (KY) water basin located in the eastern seaboard of Thailand which is a major industrial zone of the country and which has been suffering from severe drought and water shortage in recent years. Since water resources are essential for the further industrial development in this region, a thorough analysis of the potential climate change with its subsequent impact on the water supply in the area is at the heart of this thesis research. The short-term forecast of the next-season climate, such as temperatures and rainfall, offers a potential general guideline for water management and reservoir operation. To that avail, statistical models based on autoregressive techniques, i.e., AR-, ARIMA- and ARIMAex-, which includes additional external regressors, and multiple linear regression- (MLR) models, are developed and applied in the study region. Teleconnections between ocean states and the local climate are investigated and used as extra external predictors in the ARIMAex- and the MLR-model and shown to enhance the accuracy of the short-term predictions significantly. However, as the ocean state – local climate teleconnective relationships provide only a one- to four-month ahead lead time, the ocean state indices can support only a one-season-ahead forecast. Hence, GCM- climate predictors are also suggested as an additional predictor-set for a more reliable and somewhat longer short-term forecast. For the preparation of “pre-warning” information for up-coming possible future climate change with potential adverse hydrological impacts in the study region, the long-term climate prediction methodology is applied. The latter is based on the downscaling of climate predictions from several single- and multi-domain GCMs, using the two well-known downscaling methods SDSM and LARS-WG and a newly developed MLR-downscaling technique that allows the incorporation of a multitude of monthly or daily climate predictors from one- or several (multi-domain) parent GCMs. The numerous downscaling experiments indicate that the MLR- method is more accurate than SDSM and LARS-WG in predicting the recent past 20th-century (1971-2000) long-term monthly climate in the region. The MLR-model is, consequently, then employed to downscale 21st-century GCM- climate predictions under SRES-scenarios A1B, A2 and B1. However, since the hydrological watershed model requires daily-scale climate input data, a new stochastic daily climate generator is developed to rescale monthly observed or predicted climate series to daily series, while adhering to the statistical and geospatial distributional attributes of observed (past) daily climate series in the calibration phase. Employing this daily climate generator, 30 realizations of future daily climate series from downscaled monthly GCM-climate predictor sets are produced and used as input in the SWAT- distributed watershed model, to simulate future streamflow and other hydrological water budget components in the study region in a multi-realization manner. In addition to a general examination of the future changes of the hydrological regime in the KY-basin, potential future changes of the water budgets of three main reservoirs in the basin are analysed, as these are a major source of water supply in the study region. The results of the long-term 21st-century downscaled climate predictions provide evidence that, compared with the past 20th-reference period, the future climate in the study area will be more extreme, particularly, for SRES A1B. Thus, the temperatures will be higher and exhibit larger fluctuations. Although the future intensity of the rainfall is nearly constant, its spatial distribution across the region is partially changing. There is further evidence that the sequential rainfall occurrence will be decreased, so that short periods of high intensities will be followed by longer dry spells. This change in the sequential rainfall pattern will also lead to seasonal reductions of the streamflow and seasonal changes (decreases) of the water storage in the reservoirs. In any case, these predicted future climate changes with their hydrological impacts should encourage water planner and policy makers to develop adaptation strategies to properly handle the future water supply in this area, following the guidelines suggested in this study.
Resumo:
Traditional dictionary learning algorithms are used for finding a sparse representation on high dimensional data by transforming samples into a one-dimensional (1D) vector. This 1D model loses the inherent spatial structure property of data. An alternative solution is to employ Tensor Decomposition for dictionary learning on their original structural form —a tensor— by learning multiple dictionaries along each mode and the corresponding sparse representation in respect to the Kronecker product of these dictionaries. To learn tensor dictionaries along each mode, all the existing methods update each dictionary iteratively in an alternating manner. Because atoms from each mode dictionary jointly make contributions to the sparsity of tensor, existing works ignore atoms correlations between different mode dictionaries by treating each mode dictionary independently. In this paper, we propose a joint multiple dictionary learning method for tensor sparse coding, which explores atom correlations for sparse representation and updates multiple atoms from each mode dictionary simultaneously. In this algorithm, the Frequent-Pattern Tree (FP-tree) mining algorithm is employed to exploit frequent atom patterns in the sparse representation. Inspired by the idea of K-SVD, we develop a new dictionary update method that jointly updates elements in each pattern. Experimental results demonstrate our method outperforms other tensor based dictionary learning algorithms.
Resumo:
Trabalho Final do Curso de Mestrado Integrado em Medicina, Faculdade de Medicina, Universidade de Lisboa, 2014
Resumo:
Many systems and applications are continuously producing events. These events are used to record the status of the system and trace the behaviors of the systems. By examining these events, system administrators can check the potential problems of these systems. If the temporal dynamics of the systems are further investigated, the underlying patterns can be discovered. The uncovered knowledge can be leveraged to predict the future system behaviors or to mitigate the potential risks of the systems. Moreover, the system administrators can utilize the temporal patterns to set up event management rules to make the system more intelligent. With the popularity of data mining techniques in recent years, these events grad- ually become more and more useful. Despite the recent advances of the data mining techniques, the application to system event mining is still in a rudimentary stage. Most of works are still focusing on episodes mining or frequent pattern discovering. These methods are unable to provide a brief yet comprehensible summary to reveal the valuable information from the high level perspective. Moreover, these methods provide little actionable knowledge to help the system administrators to better man- age the systems. To better make use of the recorded events, more practical techniques are required. From the perspective of data mining, three correlated directions are considered to be helpful for system management: (1) Provide concise yet comprehensive summaries about the running status of the systems; (2) Make the systems more intelligence and autonomous; (3) Effectively detect the abnormal behaviors of the systems. Due to the richness of the event logs, all these directions can be solved in the data-driven manner. And in this way, the robustness of the systems can be enhanced and the goal of autonomous management can be approached. This dissertation mainly focuses on the foregoing directions that leverage tem- poral mining techniques to facilitate system management. More specifically, three concrete topics will be discussed, including event, resource demand prediction, and streaming anomaly detection. Besides the theoretic contributions, the experimental evaluation will also be presented to demonstrate the effectiveness and efficacy of the corresponding solutions.
Resumo:
Gas-liquid two-phase flow is very common in industrial applications, especially in the oil and gas, chemical, and nuclear industries. As operating conditions change such as the flow rates of the phases, the pipe diameter and physical properties of the fluids, different configurations called flow patterns take place. In the case of oil production, the most frequent pattern found is slug flow, in which continuous liquid plugs (liquid slugs) and gas-dominated regions (elongated bubbles) alternate. Offshore scenarios where the pipe lies onto the seabed with slight changes of direction are extremely common. With those scenarios and issues in mind, this work presents an experimental study of two-phase gas-liquid slug flows in a duct with a slight change of direction, represented by a horizontal section followed by a downward sloping pipe stretch. The experiments were carried out at NUEM (Núcleo de Escoamentos Multifásicos UTFPR). The flow initiated and developed under controlled conditions and their characteristic parameters were measured with resistive sensors installed at four pipe sections. Two high-speed cameras were also used. With the measured results, it was evaluated the influence of a slight direction change on the slug flow structures and on the transition between slug flow and stratified flow in the downward section.
Resumo:
This paper investigates how sequential bilingual (L2) Turkish-English children comprehend English reflexives and pronouns and tests whether they pattern similarly to monolingual (L1) children, L2 adults, or children with Specific Language Impairment (SLI). Thirty nine 6- to 9-year-old L2 children with an age of onset of 30-48 months and exposure to English of 30-72 months and 33 L1 age-matched control children completed the Advanced Syntactic Test of Pronominal Reference-Revised (van der Lely, 1997). The L2 children’s performance was compared to L2 adults from Demirci (2001) and children with SLI from van der Lely & Stollwerck (1997). The L2 children’s performance in the comprehension of reflexives was almost identical to their age-matched controls, and differed from L2 adults and children with SLI. In the comprehension of pronouns, L2 children showed an asymmetry between referential and quantificational NPs, a pattern attested in younger L1 children and children with SLI. Our study provides evidence that the development of comprehension of reflexives and pronouns in these children resembles monolingual L1 acquisition and not adult L2 acquisition or acquisition of children with SLI.
Resumo:
Sequential pattern mining is an important subject in data mining with broad applications in many different areas. However, previous sequential mining algorithms mostly aimed to calculate the number of occurrences (the support) without regard to the degree of importance of different data items. In this paper, we propose to explore the search space of subsequences with normalized weights. We are not only interested in the number of occurrences of the sequences (supports of sequences), but also concerned about importance of sequences (weights). When generating subsequence candidates we use both the support and the weight of the candidates while maintaining the downward closure property of these patterns which allows to accelerate the process of candidate generation.