776 resultados para Pattern discovery
Resumo:
Pattern discovery in temporal event sequences is of great importance in many application domains, such as telecommunication network fault analysis. In reality, not every type of event has an accurate timestamp. Some of them, defined as inaccurate events may only have an interval as possible time of occurrence. The existence of inaccurate events may cause uncertainty in event ordering. The traditional support model cannot deal with this uncertainty, which would cause some interesting patterns to be missing. A new concept, precise support, is introduced to evaluate the probability of a pattern contained in a sequence. Based on this new metric, we define the uncertainty model and present an algorithm to discover interesting patterns in the sequence database that has one type of inaccurate event. In our model, the number of types of inaccurate events can be extended to k readily, however, at a cost of increasing computational complexity.
Resumo:
Based in internet growth, through semantic web, together with communication speed improvement and fast development of storage device sizes, data and information volume rises considerably every day. Because of this, in the last few years there has been a growing interest in structures for formal representation with suitable characteristics, such as the possibility to organize data and information, as well as the reuse of its contents aimed for the generation of new knowledge. Controlled Vocabulary, specifically Ontologies, present themselves in the lead as one of such structures of representation with high potential. Not only allow for data representation, as well as the reuse of such data for knowledge extraction, coupled with its subsequent storage through not so complex formalisms. However, for the purpose of assuring that ontology knowledge is always up to date, they need maintenance. Ontology Learning is an area which studies the details of update and maintenance of ontologies. It is worth noting that relevant literature already presents first results on automatic maintenance of ontologies, but still in a very early stage. Human-based processes are still the current way to update and maintain an ontology, which turns this into a cumbersome task. The generation of new knowledge aimed for ontology growth can be done based in Data Mining techniques, which is an area that studies techniques for data processing, pattern discovery and knowledge extraction in IT systems. This work aims at proposing a novel semi-automatic method for knowledge extraction from unstructured data sources, using Data Mining techniques, namely through pattern discovery, focused in improving the precision of concept and its semantic relations present in an ontology. In order to verify the applicability of the proposed method, a proof of concept was developed, presenting its results, which were applied in building and construction sector.
Resumo:
Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset.
Resumo:
Pattern discovery in a long temporal event sequence is of great importance in many application domains. Most of the previous work focuses on identifying positive associations among time stamped event types. In this paper, we introduce the problem of defining and discovering negative associations that, as positive rules, may also serve as a source of knowledge discovery. In general, an event-oriented pattern is a pattern that associates with a selected type of event, called a target event. As a counter-part of previous research, we identify patterns that have a negative relationship with the target events. A set of criteria is defined to evaluate the interestingness of patterns associated with such negative relationships. In the process of counting the frequency of a pattern, we propose a new approach, called unique minimal occurrence, which guarantees that the Apriori property holds for all patterns in a long sequence. Based on the interestingness measures, algorithms are proposed to discover potentially interesting patterns for this negative rule problem. Finally, the experiment is made for a real application.
Resumo:
A major task of traditional temporal event sequence mining is to predict the occurrences of a special type of event (called target event) in a long temporal sequence. Our previous work has defined a new type of pattern, called event-oriented pattern, which can potentially predict the target event within a certain period of time. However, in the event-oriented pattern discovery, because the size of interval for prediction is pre-defined, the mining results could be inaccurate and carry misleading information. In this paper, we introduce a new concept, called temporal feature, to rectify this shortcoming. Generally, for any event-oriented pattern discovered under the pre-given size of interval, the temporal feature is the minimal size of interval that makes the pattern interesting. Thus, by further investigating the temporal features of discovered event-oriented patterns, we can refine the knowledge for the target event prediction.
Resumo:
The synthesis and biological evaluation of novel 1-aryl-3-[2-, 3- or 4-(thieno[3,2-b]pyridin-7-ylthio)phenyl]ureas 3, 4 and 5 as VEGFR-2 tyrosine kinase inhibitors, are reported. The 1-aryl-3-[3-(thieno[3,2-b]pyridin-7-ylthio)phenyl]ureas 4a-4h, with the arylurea in the meta position to the thioether, showed the lowest IC50 values in enzymatic assays (10-206 nM), the most potent compounds 4d-4h (IC50 10-28 nM) bearing hydrophobic groups (Me, F, CF3 and Cl) in the terminal phenyl ring. A convincing rationalization was achieved for the highest potent compounds 4 as type II VEGFR-2 inhibitors, based on the simultaneous presence of: (1) the thioether linker and (2) the arylurea moiety in the meta position. For compounds 4, significant inhibition of Human Umbilical Vein Endothelial Cells (HUVECs) proliferation (BrdU assay), migration (wound-healing assay) and tube formation were observed at low concentrations. These compounds have also shown to increase apoptosis using the TUNEL assay. Immunostaining for total and phosphorylated (active) VEGFR-2 was performed by Western blotting. The phosphorylation of the receptor was significantly inhibited at 1.0 and 2.5 microM for the most promising compounds. Altogether, these findings point to an antiangiogenic effect in HUVECs.
Resumo:
Macro- and microarrays are well-established technologies to determine gene functions through repeated measurements of transcript abundance. We constructed a chicken skeletal muscle-associated array based on a muscle-specific EST database, which was used to generate a tissue expression dataset of similar to 4500 chicken genes across 5 adult tissues (skeletal muscle, heart, liver, brain, and skin). Only a small number of ESTs were sufficiently well characterized by BLAST searches to determine their probable cellular functions. Evidence of a particular tissue-characteristic expression can be considered an indication that the transcript is likely to be functionally significant. The skeletal muscle macroarray platform was first used to search for evidence of tissue-specific expression, focusing on the biological function of genes/transcripts, since gene expression profiles generated across tissues were found to be reliable and consistent. Hierarchical clustering analysis revealed consistent clustering among genes assigned to 'developmental growth', such as the ontology genes and germ layers. Accuracy of the expression data was supported by comparing information from known transcripts and tissue from which the transcript was derived with macroarray data. Hybridization assays resulted in consistent tissue expression profile, which will be useful to dissect tissue-regulatory networks and to predict functions of novel genes identified after extensive sequencing of the genomes of model organisms. Screening our skeletal-muscle platform using 5 chicken adult tissues allowed us identifying 43 'tissue-specific' transcripts, and 112 co-expressed uncharacterized transcripts with 62 putative motifs. This platform also represents an important tool for functional investigation of novel genes; to determine expression pattern according to developmental stages; to evaluate differences in muscular growth potential between chicken lines, and to identify tissue-specific genes.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
BACKGROUND: Genes involved in arbuscular mycorrhizal (AM) symbiosis have been identified primarily by mutant screens, followed by identification of the mutated genes (forward genetics). In addition, a number of AM-related genes has been identified by their AM-related expression patterns, and their function has subsequently been elucidated by knock-down or knock-out approaches (reverse genetics). However, genes that are members of functionally redundant gene families, or genes that have a vital function and therefore result in lethal mutant phenotypes, are difficult to identify. If such genes are constitutively expressed and therefore escape differential expression analyses, they remain elusive. The goal of this study was to systematically search for AM-related genes with a bioinformatics strategy that is insensitive to these problems. The central element of our approach is based on the fact that many AM-related genes are conserved only among AM-competent species. RESULTS: Our approach involves genome-wide comparisons at the proteome level of AM-competent host species with non-mycorrhizal species. Using a clustering method we first established orthologous/paralogous relationships and subsequently identified protein clusters that contain members only of the AM-competent species. Proteins of these clusters were then analyzed in an extended set of 16 plant species and ranked based on their relatedness among AM-competent monocot and dicot species, relative to non-mycorrhizal species. In addition, we combined the information on the protein-coding sequence with gene expression data and with promoter analysis. As a result we present a list of yet uncharacterized proteins that show a strongly AM-related pattern of sequence conservation, indicating that the respective genes may have been under selection for a function in AM. Among the top candidates are three genes that encode a small family of similar receptor-like kinases that are related to the S-locus receptor kinases involved in sporophytic self-incompatibility. CONCLUSIONS: We present a new systematic strategy of gene discovery based on conservation of the protein-coding sequence that complements classical forward and reverse genetics. This strategy can be applied to diverse other biological phenomena if species with established genome sequences fall into distinguished groups that differ in a defined functional trait of interest.
Resumo:
The paper analyzes the effects of strategic behavior by an insider in a price discovery process, akin to an information tatonnement, in the presence of a competitive informed sector. Such processes are used in the preopening period of continuous trading systems in several exchanges. It is found that the insider manipulates the market using a contrarian strategy in order to neutralize the effect of the trades of competitive informed agents. Furthermore, consistently with the empirical evidence available, we find that information revelation accelerates close to the opening, that the market price does not converge to the fundamental value no matter how many rounds the tatonnement has, and that the expected trading volume displays a U-shaped pattern. We also find that a market with a larger competitive sector (smaller insider) has an improved informational efficiency and an increased trading volume. The insider provides a public good (a lower informativeness of the price) for the competitive informed sector.
Resumo:
Apoptotic beta cell death is an underlying cause majorly for type I and to a lesser extent for type II diabetes. Recently, MST1 kinase was identified as a key apoptotic agent in diabetic condition. In this study, I have examined MST1 and closely related kinases namely, MST2, MST3 and MST4, aiming to tackle diabetes by exploring ways to selectively block MST1 kinase activity. The first investigation was directed towards evaluating possibilities of selectively blocking the ATP binding site of MST1 kinase that is essential for the activity of the enzymes. Structure and sequence analyses of this site however revealed a near absolute conservation between the MSTs and very few changes with other kinases. The observed residue variations also displayed similar physicochemical properties making it hard for selective inhibition of the enzyme. Second, possibilities for allosteric inhibition of the enzyme were evaluated. Analysis of the recognized allosteric site also posed the same problem as the MSTs shared almost all of the same residues. The third analysis was made on the SARAH domain, which is required for the dimerization and activation of MST1 and MST2 kinases. MST3 and MST4 lack this domain, hence selectivity against these two kinases can be achieved. Other proteins with SARAH domains such as the RASSF proteins were also examined. Their interaction with the MST1 SARAH domain were evaluated to mimic their binding pattern and design a peptide inhibitor that interferes with MST1 SARAH dimerization. In molecular simulations the RASSF5 SARAH domain was shown to strongly interact with the MST1 SARAH domain and possibly preventing MST1 SARAH dimerization. Based on this, the peptidic inhibitor was suggested to be based on the sequence of RASSF5 SARAH domain. Since the MST2 kinase also interacts with RASSF5 SARAH domain, absolute selectivity might not be achieved.
Resumo:
Growth of the maize (Zea mays) endosperm is tightly regulated by maternal zygotic and sporophytic genes, some of which are subject to a parent-of-origin effect. We report here a novel gene, maternally expressed gene1 (meg1), which shows a maternal parent-of-origin expression pattern during early stages of endosperm development but biallelic expression at later stages. Interestingly, a stable reporter fusion containing the meg1 promoter exhibits a similar pattern of expression. meg1 is exclusively expressed in the basal transfer region of the endosperm. Further, we show that the putatively processed MEG1 protein is glycosylated and subsequently localized to the labyrinthine ingrowths of the transfer cell walls. Hence, the discovery of a parent-of-origin gene expressed solely in the basal transfer region opens the door to epigenetic mechanisms operating in the endosperm to regulate certain aspects of nutrient trafficking from the maternal tissue into the developing seed.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
This paper describes a data mining environment for knowledge discovery in bioinformatics applications. The system has a generic kernel that implements the mining functions to be applied to input primary databases, with a warehouse architecture, of biomedical information. Both supervised and unsupervised classification can be implemented within the kernel and applied to data extracted from the primary database, with the results being suitably stored in a complex object database for knowledge discovery. The kernel also includes a specific high-performance library that allows designing and applying the mining functions in parallel machines. The experimental results obtained by the application of the kernel functions are reported. © 2003 Elsevier Ltd. All rights reserved.