970 resultados para extended frequent pattern tree (EFPTree)
Resumo:
Data mining, frequent pattern mining, database mining, mining algorithms in SQL
Resumo:
Traditional dictionary learning algorithms are used for finding a sparse representation on high dimensional data by transforming samples into a one-dimensional (1D) vector. This 1D model loses the inherent spatial structure property of data. An alternative solution is to employ Tensor Decomposition for dictionary learning on their original structural form —a tensor— by learning multiple dictionaries along each mode and the corresponding sparse representation in respect to the Kronecker product of these dictionaries. To learn tensor dictionaries along each mode, all the existing methods update each dictionary iteratively in an alternating manner. Because atoms from each mode dictionary jointly make contributions to the sparsity of tensor, existing works ignore atoms correlations between different mode dictionaries by treating each mode dictionary independently. In this paper, we propose a joint multiple dictionary learning method for tensor sparse coding, which explores atom correlations for sparse representation and updates multiple atoms from each mode dictionary simultaneously. In this algorithm, the Frequent-Pattern Tree (FP-tree) mining algorithm is employed to exploit frequent atom patterns in the sparse representation. Inspired by the idea of K-SVD, we develop a new dictionary update method that jointly updates elements in each pattern. Experimental results demonstrate our method outperforms other tensor based dictionary learning algorithms.
Resumo:
We present a method to enhance fault localization for software systems based on a frequent pattern mining algorithm. Our method is based on a large set of test cases for a given set of programs in which faults can be detected. The test executions are recorded as function call trees. Based on test oracles the tests can be classified into successful and failing tests. A frequent pattern mining algorithm is used to identify frequent subtrees in successful and failing test executions. This information is used to rank functions according to their likelihood of containing a fault. The ranking suggests an order in which to examine the functions during fault analysis. We validate our approach experimentally using a subset of Siemens benchmark programs.
Resumo:
Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset.
Resumo:
Analisi e applicazione dei processi di data mining al flusso informativo di sistemi real-time. Implementazione e analisi di un algoritmo autoadattivo per la ricerca di frequent patterns su macchine automatiche.
Resumo:
A large epidemic of serogroup B meningococcal disease (MD), has been occurring in greater São Paulo, Brazil, since 1988.21 A Cuban-produced vaccine, based on outer-membrane-protein (OMP) from serogroup B: serotype 4: serosubtype P1.15 (B:4:P1.15) Neisseria meningitidis, was given to about 2.4 million children aged from 3 months to 6 years during 1989 and 1990. The administration of vaccine had little or no measurable effects on this outbreak. In order to detect clonal changes that could explain the continued increase in the incidence of disease after the vaccination, we serotyped isolates recovered between 1990 and 1996 from 834 patients with systemic disease. Strains B:4:P1.15, which was detected in the area as early as 1977, has been the most prevalent phenotype since 1988. These strains are still prevalent in the area and were responsible for about 68% of 834 serogroup B cases in the last 7 years. We analyzed 438 (52%) of these strains by restriction fragment length polymorphism (RFLPs) of rRNA genes (ribotyping). The most frequent pattern obtained was referred to as Rb1 (68%). We concluded that the same clone of B:4:P1.15-Rb1 strains was the most prevalent strain and responsible for the continued increase of incidence of serogroup B MD cases in greater São Paulo during the last 7 years in spite of the vaccination trial.
Resumo:
Purpose: In the last years, MRI appears as a complementary diagnostic method to US in the diagnosis of congenital lung lesions. Focal homogeneous pulmonary hyperintensity on T2-WI constitutes a frequent pattern observed. Our purpose is to determine if this finding is associated with a characteristic pulmonary lesion. Materials and methods: Between 01.01.00 and 31.12.07, a total of 50 prenatal MRI in fetuses with echographic diagnosis of thoracic pathology were performed in our institution, including 12 cases of suspected congenital pulmonary lesions. Prenatal images were correlated with post-natal diagnosis. Results: In 12 cases, fetal MRI detected congenital pulmonary lesions. In 8 patients, typical signs (cystic lesions, septations, anomalous vasculature) clearly suggested a specific pathology. In 4 cases, MRI showed a focal homogeneous increase of the signal intensity (SI) on T2-WI of the pathologic lung related to the normal one. The final diagnosis of these fetuses included 1 patient with congenital cystic adenomatoid malformation type III, 1 patient with segmental emphysema and 2 cases of bronchial atresia. In all 4 cases, a significant post-natal reduction of the lesion size related to prenatal MRI studies was observed. Conclusion: Our study suggests that a focal increment of the SI of the lung on T2-WI is a non specific sign of congenital lung disease, present in different pathologies. Therefore, a prospective diagnosis is not possible.
Resumo:
This paper analyses the role of prosody in parenthetical insertions, a type of structure that is extremely common in both speech and writing. The materials under study come from a corpus of spontaneous speech acts in Central Catalan (with varying degrees of spontaneity) from which a corpus of oral parenthetical insertions has been compiled. The prototypical prosodic features of a parenthetical insertion in Catalan are: prosodic autonomy, limited extension, production in between pauses or final pause, tendency towards acceleration, fall in intensity, lower pitch range and, finally, falling or rising melodic pattern. While the final fall is the most frequent pattern in spontaneous conversations with a high degree of confidence between interlocutors, a final rising structure is found in interviews in which the degree of confidence between participants is smaller, their roles are unequal, and the interviewed constructs a narrative discourse. We thus suggest that the pitch contour of parenthetical insertions is related to formality and discourse typology (in this case, narrative vs. dialogue). Bearing in mind the discursive functions performed by these insertions, we propose a typology which classifies them with regards to two main functions: completion of information, and modalisation.
Resumo:
In this paper, moving flock patterns are mined from spatio- temporal datasets by incorporating a clustering algorithm. A flock is defined as the set of data that move together for a certain continuous amount of time. Finding out moving flock patterns using clustering algorithms is a potential method to find out frequent patterns of movement in large trajectory datasets. In this approach, SPatial clusteRing algoRithm thrOugh sWarm intelligence (SPARROW) is the clustering algorithm used. The advantage of using SPARROW algorithm is that it can effectively discover clusters of widely varying sizes and shapes from large databases. Variations of the proposed method are addressed and also the experimental results show that the problem of scalability and duplicate pattern formation is addressed. This method also reduces the number of patterns produced
Resumo:
The ferric uptake regulator protein Fur regulates iron-dependent gene expression in bacteria. In the human pathogen Helicobacter pylori, Fur has been shown to regulate iron-induced and iron-repressed genes. Herein we investigate the molecular mechanisms that control this differential iron-responsive Fur regulation. Hydroxyl radical footprinting showed that Fur has different binding architectures, which characterize distinct operator typologies. On operators recognized with higher affinity by holo-Fur, the protein binds to a continuous AT-rich stretch of about 20 bp, displaying an extended protection pattern. This is indicative of protein wrapping around the DNA helix. DNA binding interference assays with the minor groove binding drug distamycin A, point out that the recognition of the holo-operators occurs through the minor groove of the DNA. By contrast, on the apo-operators, Fur binds primarily to thymine dimers within a newly identified TCATTn10TT consensus element, indicative of Fur binding to one side of the DNA, in the major groove of the double helix. Reconstitution of the TCATTn10TT motif within a holo-operator results in a feature binding swap from an holo-Fur- to an apo-Fur-recognized operator, affecting both affinity and binding architecture of Fur, and conferring apo-Fur repression features in vivo. Size exclusion chromatography indicated that Fur is a dimer in solution. However, in the presence of divalent metal ions the protein is able to multimerize. Accordingly, apo-Fur binds DNA as a dimer in gel shift assays, while in presence of iron, higher order complexes are formed. Stoichiometric Ferguson analysis indicates that these complexes correspond to one or two Fur tetramers, each bound to an operator element. Together these data suggest that the apo- and holo-Fur repression mechanisms apparently rely on two distinctive modes of operator-recognition, involving respectively the readout of a specific nucleotide consensus motif in the major groove for apo-operators, and the recognition of AT-rich stretches in the minor groove for holo-operators, whereas the iron-responsive binding affinity is controlled through metal-dependent shaping of the protein structure in order to match preferentially the major or the minor groove.
Resumo:
Independent and-parallelism, dependent and-parallelism and or-parallelism are the three main forms of implicit parallelism present in logic programs. In this paper we present a model, IDIOM, which exploits all three forms of parallelism in a single framework. IDIOM is based on a combination of the Basic Andorra Model and the Extended And-Or Tree Model. Our model supports both Prolog as well as the fíat concurrent logic languages. We discuss the issues that arise in combining the three forms of parallelism, and our solutions to them. We also present an implementation scheme, based on binding arrays, for implementing IDIOM.
Resumo:
Trabalho Final do Curso de Mestrado Integrado em Medicina, Faculdade de Medicina, Universidade de Lisboa, 2014
Resumo:
Many systems and applications are continuously producing events. These events are used to record the status of the system and trace the behaviors of the systems. By examining these events, system administrators can check the potential problems of these systems. If the temporal dynamics of the systems are further investigated, the underlying patterns can be discovered. The uncovered knowledge can be leveraged to predict the future system behaviors or to mitigate the potential risks of the systems. Moreover, the system administrators can utilize the temporal patterns to set up event management rules to make the system more intelligent. With the popularity of data mining techniques in recent years, these events grad- ually become more and more useful. Despite the recent advances of the data mining techniques, the application to system event mining is still in a rudimentary stage. Most of works are still focusing on episodes mining or frequent pattern discovering. These methods are unable to provide a brief yet comprehensible summary to reveal the valuable information from the high level perspective. Moreover, these methods provide little actionable knowledge to help the system administrators to better man- age the systems. To better make use of the recorded events, more practical techniques are required. From the perspective of data mining, three correlated directions are considered to be helpful for system management: (1) Provide concise yet comprehensive summaries about the running status of the systems; (2) Make the systems more intelligence and autonomous; (3) Effectively detect the abnormal behaviors of the systems. Due to the richness of the event logs, all these directions can be solved in the data-driven manner. And in this way, the robustness of the systems can be enhanced and the goal of autonomous management can be approached. This dissertation mainly focuses on the foregoing directions that leverage tem- poral mining techniques to facilitate system management. More specifically, three concrete topics will be discussed, including event, resource demand prediction, and streaming anomaly detection. Besides the theoretic contributions, the experimental evaluation will also be presented to demonstrate the effectiveness and efficacy of the corresponding solutions.
Resumo:
Gas-liquid two-phase flow is very common in industrial applications, especially in the oil and gas, chemical, and nuclear industries. As operating conditions change such as the flow rates of the phases, the pipe diameter and physical properties of the fluids, different configurations called flow patterns take place. In the case of oil production, the most frequent pattern found is slug flow, in which continuous liquid plugs (liquid slugs) and gas-dominated regions (elongated bubbles) alternate. Offshore scenarios where the pipe lies onto the seabed with slight changes of direction are extremely common. With those scenarios and issues in mind, this work presents an experimental study of two-phase gas-liquid slug flows in a duct with a slight change of direction, represented by a horizontal section followed by a downward sloping pipe stretch. The experiments were carried out at NUEM (Núcleo de Escoamentos Multifásicos UTFPR). The flow initiated and developed under controlled conditions and their characteristic parameters were measured with resistive sensors installed at four pipe sections. Two high-speed cameras were also used. With the measured results, it was evaluated the influence of a slight direction change on the slug flow structures and on the transition between slug flow and stratified flow in the downward section.
Resumo:
In contrast with mammals and birds, most poikilothermic vertebrates feature structurally undifferentiated sex chromosomes, which may result either from frequent turnovers, or from occasional events of XY recombination. The latter mechanism was recently suggested to be responsible for sex-chromosome homomorphy in European tree frogs (Hyla arborea). However, no single case of male recombination has been identified in large-scale laboratory crosses, and populations from NW Europe consistently display sex-specific allelic frequencies with male-diagnostic alleles, suggesting the absence of recombination in their recent history. To address this apparent paradox, we extended the phylogeographic scope of investigations, by analyzing the sequences of three sex-linked markers throughout the whole species distribution. Refugial populations (southern Balkans and Adriatic coast) show a mix of X and Y alleles in haplotypic networks, and no more within-individual pairwise nucleotide differences in males than in females, testifying to recurrent XY recombination. In contrast, populations of NW Europe, which originated from a recent postglacial expansion, show a clear pattern of XY differentiation; the X and Y gametologs of the sex-linked gene Med15 present different alleles, likely fixed by drift on the front wave of expansions, and kept differentiated since. Our results support the view that sex-chromosome homomorphy in H. arborea is maintained by occasional or historical events of recombination; whether the frequency of these events indeed differs between populations remains to be clarified.