895 resultados para sequence data mining


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The data mining of Eucalyptus ESTs genome finds four clusters (EGCEST2257E11.g, EGBGRT3213F11.g, and EGCCFB1223H11.g) from highly conservative 14-3-3 protein family which modulates a wide variety of cellular processes. Multiple alignments were built from twenty four sequences of 14-3-3 proteins searched into the GenBank databases and into the four pools of Eucalyptus genome programs. The alignment has shown two regions highly conservative on the sequences corresponding to the motifs of protein phosphorylation and nine highly conservative regions on the sequence corresponding to the linkage regions of alpha helices structure based on three dimensional of dimer functional structure. The differences of amino acid into the structural and functional domains of 14-3-3 plant protein were identified and can explain the functional diversity of different isoforms. The phylogenic protein trees were built by the maximum parsimony and neighborjoining procedures of Clustal X alignments and PAUP software for phylogenic analysis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Oxidative stress generating active oxygen species has been proved to be one of the underlying agents causing tissue injury after the exposure of Eucalyptus (Eucalyptus spp.) plants to a wide variety of stress conditions. The objective of this study was to perform data mining to identify favorable genes and alleles associated with the enzyme systems superoxide dismutase, catalase, peroxidases, and glutathione S-transferase that are related to tolerance for environmental stresses and damage caused by pests, diseases, herbicides, and by weeds themselves. This was undertaken by using the eucalyptus expressed-sequence database (https//forests.esalq.usp.br). The alignment results between amino acid and nucleotide sequences indicated that the studied enzymes were adequately represented in the ESTs database of the FORESTs project.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this study, we report the cloning and nucleotide sequence of PCR-generated 5S rDNA from the Tilapiine cichlid fish, Oreochromis niloticus. Two types of 5S rDNA were detected that differed by insertions and/or deletions and base substitutions within the non-transcribed spacer (NTS). Two 5S rDNA loci were observed by fluorescent in situ hybridization (FISH) in metaphase spreads of tilapia chromosomes. FISH using an 18S rDNA probe and silver nitrate sequential staining of 5S-FISH slides showed three 18S rDNA loci that are not syntenic to the 5S rDNA loci.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Chromobacterium violaceum is one of millions of species of free-living microorganisms that populate the soil and water in the extant areas of tropical biodiversity around the world. Its complete genome sequence reveals (i) extensive alternative pathways for energy generation, (ii) ≈500 ORFs for transport-related proteins, (iii) complex and extensive systems for stress adaptation and motility, and (iv) wide-spread utilization of quorum sensing for control of inducible systems, all of which underpin the versatility and adaptability of the organism. The genome also contains extensive but incomplete arrays of ORFs coding for proteins associated with mammalian pathogenicity, possibly involved in the occasional but often fatal cases of human C. violaceum infection. There is, in addition, a series of previously unknown but important enzymes and secondary metabolites including paraquat-inducible proteins, drug and heavy-metal-resistance proteins, multiple chitinases, and proteins for the detoxification of xenobiotics that may have biotechnological applications.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The genome sequence of Leifsonia xyli subsp. xyli, which causes ratoon stunting disease and affects sugarcane worldwide, was determined. The single circular chromosome of Leifsonia xyli subsp. xyli CTCB07 was 2.6 Mb in length with a GC content of 68% and 2,044 predicted open reading frames. The analysis also revealed 307 predicted pseudogenes, which is more than any bacterial plant pathogen sequenced to date. Many of these pseudogenes, if functional, would likely be involved in the degradation of plant heteropolysaccharides, uptake of free sugars, and synthesis of amino acids. Although L. xyli subsp. xyli has only been identified colonizing the xylem vessels of sugarcane, the numbers of predicted regulatory genes and sugar transporters are similar to those in free-living organisms. Some of the predicted pathogenicity genes appear to have been acquired by lateral transfer and include genes for cellulase, pectinase, wilt-inducing protein, lysozyme, and desaturase. The presence of the latter may contribute to stunting, since it is likely involved in the synthesis of abscisic acid, a hormone that arrests growth. Our findings are consistent with the nutritionally fastidious behavior exhibited by L. xyli subsp. xyli and suggest an ongoing adaptation to the restricted ecological niche it inhabits.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Cysticercosis is one of the most important zoonosis, not only because of the effects on animal health and its economic consequences, but also due to the serious danger it poses to humans. The two main parasites involved in the taeniasis-cysticercosis complex in Brazil are Taenia saginata and Taenia solium. Differentiating between these two parasites is important both for disease control and for epidemiological studies. The purpose of this work was to identify genetic markers that could be used to differentiate these parasites. Out of 120 oligonucleotide decamers tested in random amplified polymorphic DNA (RAPD) assays, 107 were shown to discriminate between the two species of Taenia. Twenty-one DNA fragments that were specific for each species of Taenia were chosen for DNA cloning and sequencing. Seven RAPD markers were converted into sequence characterized amplified region (SCAR) markers with two specific for T. saginata and five specific for T. solium as shown by agarose gel electrophoresis. These markers were developed as potential tools to differentiate T. solium from T. saginata in epidemiological studies. © 2007 Elsevier Inc. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Molossidae species, Cynomops abrasus (2n = 34, fundamental number, FN = 64), Eumops auripendulus (2n = 42, FN = 62), Molossus rufus (2n = 48, FN = 64), Molossops temminckii (2n = 48, FN = 64), and Nyctinomops laticaudatus (2n = 48, FN = 64), and Phyllostomidae species, Phyllostomus discolor (2n = 32, FN = 60), have karyotypes with different chromosome and fundamental numbers, different localization of constitutive heterochromatin, and different numbers and location of nucleolar organizer regions (NORs). Fluorescence in situ hybridization with a human probe of the telomeric sequence (TTAGGG)n produced fluorescent signals in telomeric regions of the six bat species' chromosomes; in E. auripendulus, pericentromeric signals were also observed in the acrocentric and subtelocentric chromosomes. A relationship between telomeric sequences and NORs, and between telomeric sequences and constitutive heterochromatin was detected in chromosomes bearing NORs in C. abrasus, M. temminckii, N. laticaudatus, and P. discolor. No interstitial signal was observed in the meta- or submetacentric chromosomes of these species. ©FUNPEC-RP.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The multi-relational Data Mining approach has emerged as alternative to the analysis of structured data, such as relational databases. Unlike traditional algorithms, the multi-relational proposals allow mining directly multiple tables, avoiding the costly join operations. In this paper, is presented a comparative study involving the traditional Patricia Mine algorithm and its corresponding multi-relational proposed, MR-Radix in order to evaluate the performance of two approaches for mining association rules are used for relational databases. This study presents two original contributions: the proposition of an algorithm multi-relational MR-Radix, which is efficient for use in relational databases, both in terms of execution time and in relation to memory usage and the presentation of the empirical approach multirelational advantage in performance over several tables, which avoids the costly join operations from multiple tables. © 2011 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Multi-relational data mining enables pattern mining from multiple tables. The existing multi-relational mining association rules algorithms are not able to process large volumes of data, because the amount of memory required exceeds the amount available. The proposed algorithm MRRadix presents a framework that promotes the optimization of memory usage. It also uses the concept of partitioning to handle large volumes of data. The original contribution of this proposal is enable a superior performance when compared to other related algorithms and moreover successfully concludes the task of mining association rules in large databases, bypass the problem of available memory. One of the tests showed that the MR-Radix presents fourteen times less memory usage than the GFP-growth. © 2011 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In soil surveys, several sampling systems can be used to define the most representative sites for sample collection and description of soil profiles. In recent years, the conditioned Latin hypercube sampling system has gained prominence for soil surveys. In Brazil, most of the soil maps are at small scales and in paper format, which hinders their refinement. The objectives of this work include: (i) to compare two sampling systems by conditioned Latin hypercube to map soil classes and soil properties; (II) to retrieve information from a detailed scale soil map of a pilot watershed for its refinement, comparing two data mining tools, and validation of the new soil map; and (III) to create and validate a soil map of a much larger and similar area from the extrapolation of information extracted from the existing soil map. Two sampling systems were created by conditioned Latin hypercube and by the cost-constrained conditioned Latin hypercube. At each prospection place, soil classification and measurement of the A horizon thickness were performed. Maps were generated and validated for each sampling system, comparing the efficiency of these methods. The conditioned Latin hypercube captured greater variability of soils and properties than the cost-constrained conditioned Latin hypercube, despite the former provided greater difficulty in field work. The conditioned Latin hypercube can capture greater soil variability and the cost-constrained conditioned Latin hypercube presents great potential for use in soil surveys, especially in areas of difficult access. From an existing detailed scale soil map of a pilot watershed, topographical information for each soil class was extracted from a Digital Elevation Model and its derivatives, by two data mining tools. Maps were generated using each tool. The more accurate of these tools was used for extrapolation of soil information for a much larger and similar area and the generated map was validated. It was possible to retrieve the existing soil map information and apply it on a larger area containing similar soil forming factors, at much low financial cost. The KnowledgeMiner tool for data mining, and ArcSIE, used to create the soil map, presented better results and enabled the use of existing soil map to extract soil information and its application in similar larger areas at reduced costs, which is especially important in development countries with limited financial resources for such activities, such as Brazil.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Animal behavioral parameters can be used to assess welfare status in commercial broiler breeders. Behavioral parameters can be monitored with a variety of sensing devices, for instance, the use of video cameras allows comprehensive assessment of animal behavioral expressions. Nevertheless, the development of efficient methods and algorithms to continuously identify and differentiate animal behavior patterns is needed. The objective this study was to provide a methodology to identify hen white broiler breeder behavior using combined techniques of image processing and computer vision. These techniques were applied to differentiate body shapes from a sequence of frames as the birds expressed their behaviors. The method was comprised of four stages: (1) identification of body positions and their relationship with typical behaviors. For this stage, the number of frames required to identify each behavior was determined; (2) collection of image samples, with the isolation of the birds that expressed a behavior of interest; (3) image processing and analysis using a filter developed to separate white birds from the dark background; and finally (4) construction and validation of a behavioral classification tree, using the software tool Weka (model 148). The constructed tree was structured in 8 levels and 27 leaves, and it was validated using two modes: the set training mode with an overall rate of success of 96.7%, and the cross validation mode with an overall rate of success of 70.3%. The results presented here confirmed the feasibility of the method developed to identify white broiler breeder behavior for a particular group of study. Nevertheless, more improvements in the method can be made in order to increase the validation overall rate of success. (C) 2013 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Concept drift, which refers to non stationary learning problems over time, has increasing importance in machine learning and data mining. Many concept drift applications require fast response, which means an algorithm must always be (re)trained with the latest available data. But the process of data labeling is usually expensive and/or time consuming when compared to acquisition of unlabeled data, thus usually only a small fraction of the incoming data may be effectively labeled. Semi-supervised learning methods may help in this scenario, as they use both labeled and unlabeled data in the training process. However, most of them are based on assumptions that the data is static. Therefore, semi-supervised learning with concept drifts is still an open challenging task in machine learning. Recently, a particle competition and cooperation approach has been developed to realize graph-based semi-supervised learning from static data. We have extend that approach to handle data streams and concept drift. The result is a passive algorithm which uses a single classifier approach, naturally adapted to concept changes without any explicit drift detection mechanism. It has built-in mechanisms that provide a natural way of learning from new data, gradually "forgetting" older knowledge as older data items are no longer useful for the classification of newer data items. The proposed algorithm is applied to the KDD Cup 1999 Data of network intrusion, showing its effectiveness.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)