25 resultados para sequence data mining

em Repositório Institucional UNESP - Universidade Estadual Paulista "Julio de Mesquita Filho"


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Variations in the phenotypic expression of heterozygous beta thalassemia reflect the formation of different populations. To better understand the profile of heterozygous beta-thalassemia of the Brazilian population, we aimed at establishing parameters to direct the diagnosis of carriers and calculate the frequency from information stored in an electronic database. Using a Data Mining tool, we evaluated information on 10,960 blood samples deposited in a relational database. Over the years, improved diagnostic technology has facilitated the elucidation of suspected beta thalassemia heterozygote cases with an average frequency of 3.5% of referred cases. We also found that the Brazilian beta thalassemia trait has classic increases of Hb A2 and Hb F (60%), mainly caused by mutations in beta zero thalassemia, especially in the southeast of the country.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article introduces the software program called EthoSeq, which is designed to extract probabilistic behavioral sequences (tree-generated sequences, or TGSs) from observational data and to prepare a TGS-species matrix for phylogenetic analysis. The program uses Graph Theory algorithms to automatically detect behavioral patterns within the observational sessions. It includes filtering tools to adjust the search procedure to user-specified statistical needs. Preliminary analyses of data sets, such as grooming sequences in birds and foraging tactics in spiders, uncover a large number of TGSs which together yield single phylogenetic trees. An example of the use of the program is our analysis of felid grooming sequences, in which we have obtained 1,386 felid grooming TGSs for seven species, resulting in a single phylogeny. These results show that behavior is definitely useful in phylogenetic analysis. EthoSeq simplifies and automates such analyses, uncovers much of the hidden patterns of long behavioral sequences, and prepares this data for further analysis with standard phylogenetic programs. We hope it will encourage many empirical studies on the evolution of behavior.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We sequenced 912 bp of the cytochrome-b gene to examine phylogenetic relationships of the enigmatic Saw-billed Hermit (Ramphodon naevius), a large and distinctive hummingbird endemic to tropical forests of southeastern Brazil. Bootstrapped maximum parsimony and maximum likelihood analyses of sequence data from 11 hummingbirds and several outgroups (two swifts, one goatsucker) support: (a) monophyly of the traditional hermit (Phaethornithinae) and nonhermit (Trochilinae) subfamilies, (b) placement of Ramphodon among hermits, and (c) a sister relationship between Ramphodon and an exemplar of the widespread polytypic hermit genus Glaucis. The association of Ramphodon with derived hermit lineages is concordant with subfamilial patterns of wing anatomy and nest architecture. However, the unusual plumages (striped underparts) and male bills (long, serrated, hooked) shared by Ramphodon and the Tooth-billed Hummingbird (Androdon aequatorialis) appear to have evolved within separate hermit and nonhermit tooth-billed clades. Distal placement of the Ramphodon-Glaucis clade within hermits implies that even distinctive Brazilian endemics such as Ramphodon are derived forms that evolved relatively recently.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The increase in the number of spatial data collected has motivated the development of geovisualisation techniques, aiming to provide an important resource to support the extraction of knowledge and decision making. One of these techniques are 3D graphs, which provides a dynamic and flexible increase of the results analysis obtained by the spatial data mining algorithms, principally when there are incidences of georeferenced objects in a same local. This work presented as an original contribution the potentialisation of visual resources in a computational environment of spatial data mining and, afterwards, the efficiency of these techniques is demonstrated with the use of a real database. The application has shown to be very interesting in interpreting obtained results, such as patterns that occurred in a same locality and to provide support for activities which could be done as from the visualisation of results. © 2013 Springer-Verlag.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Once multi-relational approach has emerged as an alternative for analyzing structured data such as relational databases, since they allow applying data mining in multiple tables directly, thus avoiding expensive joining operations and semantic losses, this work proposes an algorithm with multi-relational approach. Methods: Aiming to compare traditional approach performance and multi-relational for mining association rules, this paper discusses an empirical study between PatriciaMine - an traditional algorithm - and its corresponding multi-relational proposed, MR-Radix. Results: This work showed advantages of the multi-relational approach in performance over several tables, which avoids the high cost for joining operations from multiple tables and semantic losses. The performance provided by the algorithm MR-Radix shows faster than PatriciaMine, despite handling complex multi-relational patterns. The utilized memory indicates a more conservative growth curve for MR-Radix than PatriciaMine, which shows the increase in demand of frequent items in MR-Radix does not result in a significant growth of utilized memory like in PatriciaMine. Conclusion: The comparative study between PatriciaMine and MR-Radix confirmed efficacy of the multi-relational approach in data mining process both in terms of execution time and in relation to memory usage. Besides that, the multi-relational proposed algorithm, unlike other algorithms of this approach, is efficient for use in large relational databases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The increase in new electronic devices had generated a considerable increase in obtaining spatial data information; hence these data are becoming more and more widely used. As well as for conventional data, spatial data need to be analyzed so interesting information can be retrieved from them. Therefore, data clustering techniques can be used to extract clusters of a set of spatial data. However, current approaches do not consider the implicit semantics that exist between a region and an object’s attributes. This paper presents an approach that enhances spatial data mining process, so they can use the semantic that exists within a region. A framework was developed, OntoSDM, which enables spatial data mining algorithms to communicate with ontologies in order to enhance the algorithm’s result. The experiments demonstrated a semantically improved result, generating more interesting clusters, therefore reducing manual analysis work of an expert.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Cultivated peanut (Arachis hypogaea) is an important crop, widely grown in tropical and subtropical regions of the world. It is highly susceptible to several biotic and abiotic stresses to which wild species are resistant. As a first step towards the introgression of these resistance genes into cultivated peanut, a linkage map based on microsatellite markers was constructed, using an F-2 population obtained from a cross between two diploid wild species with AA genome (A. duranensis and A. stenosperma). A total of 271 new microsatellite markers were developed in the present study from SSR-enriched genomic libraries, expressed sequence tags (ESTs), and by data-mining sequences available in GenBank. of these, 66 were polymorphic for cultivated peanut. The 271 new markers plus another 162 published for peanut were screened against both progenitors and 204 of these (47.1%) were polymorphic, with 170 codominant and 34 dominant markers. The 80 codominant markers segregating 1:2:1 (P < 0.05) were initially used to establish the linkage groups. Distorted and dominant markers were subsequently included in the map. The resulting linkage map consists of 11 linkage groups covering 1,230.89 cM of total map distance, with an average distance of 7.24 cM between markers. This is the first microsatellite-based map published for Arachis, and the first map based on sequences that are all currently publicly available. Because most markers used were derived from ESTs and genomic libraries made using methylation-sensitive restriction enzymes, about one-third of the mapped markers are genic. Linkage group ordering is being validated in other mapping populations, with the aim of constructing a transferable reference map for Arachis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The data mining of Eucalyptus ESTs genome finds four clusters (EGCEST2257E11.g, EGBGRT3213F11.g, and EGCCFB1223H11.g) from highly conservative 14-3-3 protein family which modulates a wide variety of cellular processes. Multiple alignments were built from twenty four sequences of 14-3-3 proteins searched into the GenBank databases and into the four pools of Eucalyptus genome programs. The alignment has shown two regions highly conservative on the sequences corresponding to the motifs of protein phosphorylation and nine highly conservative regions on the sequence corresponding to the linkage regions of alpha helices structure based on three dimensional of dimer functional structure. The differences of amino acid into the structural and functional domains of 14-3-3 plant protein were identified and can explain the functional diversity of different isoforms. The phylogenic protein trees were built by the maximum parsimony and neighborjoining procedures of Clustal X alignments and PAUP software for phylogenic analysis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Oxidative stress generating active oxygen species has been proved to be one of the underlying agents causing tissue injury after the exposure of Eucalyptus (Eucalyptus spp.) plants to a wide variety of stress conditions. The objective of this study was to perform data mining to identify favorable genes and alleles associated with the enzyme systems superoxide dismutase, catalase, peroxidases, and glutathione S-transferase that are related to tolerance for environmental stresses and damage caused by pests, diseases, herbicides, and by weeds themselves. This was undertaken by using the eucalyptus expressed-sequence database (https//forests.esalq.usp.br). The alignment results between amino acid and nucleotide sequences indicated that the studied enzymes were adequately represented in the ESTs database of the FORESTs project.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this study, we report the cloning and nucleotide sequence of PCR-generated 5S rDNA from the Tilapiine cichlid fish, Oreochromis niloticus. Two types of 5S rDNA were detected that differed by insertions and/or deletions and base substitutions within the non-transcribed spacer (NTS). Two 5S rDNA loci were observed by fluorescent in situ hybridization (FISH) in metaphase spreads of tilapia chromosomes. FISH using an 18S rDNA probe and silver nitrate sequential staining of 5S-FISH slides showed three 18S rDNA loci that are not syntenic to the 5S rDNA loci.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Chromobacterium violaceum is one of millions of species of free-living microorganisms that populate the soil and water in the extant areas of tropical biodiversity around the world. Its complete genome sequence reveals (i) extensive alternative pathways for energy generation, (ii) ≈500 ORFs for transport-related proteins, (iii) complex and extensive systems for stress adaptation and motility, and (iv) wide-spread utilization of quorum sensing for control of inducible systems, all of which underpin the versatility and adaptability of the organism. The genome also contains extensive but incomplete arrays of ORFs coding for proteins associated with mammalian pathogenicity, possibly involved in the occasional but often fatal cases of human C. violaceum infection. There is, in addition, a series of previously unknown but important enzymes and secondary metabolites including paraquat-inducible proteins, drug and heavy-metal-resistance proteins, multiple chitinases, and proteins for the detoxification of xenobiotics that may have biotechnological applications.