15 resultados para Genetic clustering analysis
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
Background and Aim: The identification of gastric carcinomas (GC) has traditionally been based on histomorphology. Recently, DNA microarrays have successfully been used to identify tumors through clustering of the expression profiles. Random forest clustering is widely used for tissue microarrays and other immunohistochemical data, because it handles highly-skewed tumor marker expressions well, and weighs the contribution of each marker according to its relatedness with other tumor markers. In the present study, we e identified biologically- and clinically-meaningful groups of GC by hierarchical clustering analysis of immunohistochemical protein expression. Methods: We selected 28 proteins (p16, p27, p21, cyclin D1, cyclin A, cyclin B1, pRb, p53, c-met, c-erbB-2, vascular endothelial growth factor, transforming growth factor [TGF]-beta I, TGF-beta II, MutS homolog-2, bcl-2, bax, bak, bcl-x, adenomatous polyposis coli, clathrin, E-cadherin, beta-catenin, mucin (MUC) 1, MUC2, MUC5AC, MUC6, matrix metalloproteinase [ MMP]-2, and MMP-9) to be investigated by immunohistochemistry in 482 GC. The analyses of the data were done using a random forest-clustering method. Results: Proteins related to cell cycle, growth factor, cell motility, cell adhesion, apoptosis, and matrix remodeling were highly expressed in GC. We identified protein expressions associated with poor survival in diffuse-type GC. Conclusions: Based on the expression analysis of 28 proteins, we identified two groups of GC that could not be explained by any clinicopathological variables, and a subgroup of long-surviving diffuse-type GC patients with a distinct molecular profile. These results provide not only a new molecular basis for understanding the biological properties of GC, but also better prediction of survival than the classic pathological grouping.
Resumo:
Abstract Background Banana cultivars are mostly derived from hybridization between wild diploid subspecies of Musa acuminata (A genome) and M. balbisiana (B genome), and they exhibit various levels of ploidy and genomic constitution. The Embrapa ex situ Musa collection contains over 220 accessions, of which only a few have been genetically characterized. Knowledge regarding the genetic relationships and diversity between modern cultivars and wild relatives would assist in conservation and breeding strategies. Our objectives were to determine the genomic constitution based on Internal Transcribed Spacer (ITS) regions polymorphism and the ploidy of all accessions by flow cytometry and to investigate the population structure of the collection using Simple Sequence Repeat (SSR) loci as co-dominant markers based on Structure software, not previously performed in Musa. Results From the 221 accessions analyzed by flow cytometry, the correct ploidy was confirmed or established for 212 (95.9%), whereas digestion of the ITS region confirmed the genomic constitution of 209 (94.6%). Neighbor-joining clustering analysis derived from SSR binary data allowed the detection of two major groups, essentially distinguished by the presence or absence of the B genome, while subgroups were formed according to the genomic composition and commercial classification. The co-dominant nature of SSR was explored to analyze the structure of the population based on a Bayesian approach, detecting 21 subpopulations. Most of the subpopulations were in agreement with the clustering analysis. Conclusions The data generated by flow cytometry, ITS and SSR supported the hypothesis about the occurrence of homeologue recombination between A and B genomes, leading to discrepancies in the number of sets or portions from each parental genome. These phenomenons have been largely disregarded in the evolution of banana, as the “single-step domestication” hypothesis had long predominated. These findings will have an impact in future breeding approaches. Structure analysis enabled the efficient detection of ancestry of recently developed tetraploid hybrids by breeding programs, and for some triploids. However, for the main commercial subgroups, Structure appeared to be less efficient to detect the ancestry in diploid groups, possibly due to sampling restrictions. The possibility of inferring the membership among accessions to correct the effects of genetic structure opens possibilities for its use in marker-assisted selection by association mapping.
Resumo:
Coccidiosis of the domestic fowl is a worldwide disease caused by seven species of protozoan parasites of the genus Eimeria. The genome of the model species, Eimeria tenella, presents a complexity of 55-60 MB distributed in 14 chromosomes. Relatively few studies have been undertaken to unravel the complexity of the transcriptome of Eimeria parasites. We report here the generation of more than 45,000 open reading frame expressed sequence tag (ORESTES) cDNA reads of E. tenella, Eimeria maxima and Eimeria acervulina, covering several developmental stages: unsporulated oocysts, sporoblastic oocysts, sporulated oocysts, sporozoites and second generation merozoites. All reads were assembled to constitute gene indices and submitted to a comprehensive functional annotation pipeline. In the case of E. tenella, we also incorporated publicly available ESTs to generate an integrated body of information. Orthology analyses have identified genes conserved across different apicomplexan parasites, as well as genes restricted to the genus Eimeria. Digital expression profiles obtained from ORESTES/EST countings, submitted to clustering analyses, revealed a high conservation pattern across the three Eimeria spp. Distance trees showed that unsporulated and sporoblastic oocysts constitute a distinct clade in all species, with sporulated oocysts forming a more external branch. This latter stage also shows a close relationship with sporozoites, whereas first and second generation merozoites are more closely related to each other than to sporozoites. The profiles were unambiguously associated with the distinct developmental stages and strongly correlated with the order of the stages in the parasite life cycle. Finally, we present The Eimeria Transcript Database (http://www.coccidia.icb.usp.br/eimeriatdb), a website that provides open access to all sequencing data, annotation and comparative analysis. We expect this repository to represent a useful resource to the Eimeria scientific community, helping to define potential candidates for the development of new strategies to control coccidiosis of the domestic fowl. (C) 2011 Australian Society for Parasitology Inc. Published by Elsevier Ltd. All rights reserved.
Resumo:
Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.
Resumo:
Melipona scutellaris Latreille has great economic and ecological importance, especially because it is a pollinator of native plant species. Despite the importance of this species, there is little information about the conservation status of their populations. The objective of this study was to assess the diversity in populations of M. scutellaris coming from a Semideciduous Forest Fragment and an Atlantic Forest Fragment in the Northeast Brazil, through geometric morphometric analysis of wings in worker bees. In each area, worker bees were collected from 10 colonies, 10 workers per colony. To assess the diversity on the right wings of worker bees, 15 landmarks were plotted and the measures were used in analysis of variance and multivariate analysis, principal component analysis, discriminant analysis and clustering analysis. There were significant differences in the shape of the wing venation patterns between colonies of two sites (Wilk's lambda = 0.000006; p < 0.000001), which is probably due to the geographical distance between places of origin which impedes the gene flow between them. It indicates that inter and intrapopulation morphometric variability exists (p < 0.000001) in M. scutellaris coming from two different biomes, revealing the existence of diversity in these populations, which is necessary for the conservation of this bee species.
Resumo:
Premise of the study: Microsatellite markers were developed and characterized to investigate genetic diversity and gene flow and to help in conservation efforts for the endangered timber species Plathymenia reticulata. Methods and Results: Eleven microsatellite loci were characterized using 60 adult trees of two populations of P. reticulata from the Atlantic Forest of southern Bahia, Brazil. Of these, nine loci were polymorphic, with an average of 4.39 alleles per locus. The average expected heterozygosity per population ranged from 0.47 to 0.55. The combined exclusion probability was 0.99996. Conclusions: Our results reveal that the microsatellite markers developed in this study are an effective tool for paternity and genetic structure analysis that may be useful for conservation strategies.
Resumo:
The endemic marine sponge Arenosclera brasiliensis (Porifera, Demospongiae, Haplosclerida) is a known source of secondary metabolites such as arenosclerins A-C. In the present study, we established the composition of the A. brasiliensis microbiome and the metabolic pathways associated with this community. We used 454 shotgun pyrosequencing to generate approximately 640,000 high-quality sponge-derived sequences (similar to 150 Mb). Clustering analysis including sponge, seawater and twenty-three other metagenomes derived from marine animal microbiomes shows that A. brasiliensis contains a specific microbiome. Fourteen bacterial phyla (including Proteobacteria, Cyanobacteria, Actinobacteria, Bacteroidetes, Firmicutes and Cloroflexi) were consistently found in the A. brasiliensis metagenomes. The A. brasiliensis microbiome is enriched for Betaproteobacteria (e.g., Burkholderia) and Gammaproteobacteria (e.g., Pseudomonas and Alteromonas) compared with the surrounding planktonic microbial communities. Functional analysis based on Rapid Annotation using Subsystem Technology (RAST) indicated that the A. brasiliensis microbiome is enriched for sequences associated with membrane transport and one-carbon metabolism. In addition, there was an overrepresentation of sequences associated with aerobic and anaerobic metabolism as well as the synthesis and degradation of secondary metabolites. This study represents the first analysis of sponge-associated microbial communities via shotgun pyrosequencing, a strategy commonly applied in similar analyses in other marine invertebrate hosts, such as corals and algae. We demonstrate that A. brasiliensis has a unique microbiome that is distinct from that of the surrounding planktonic microbes and from other marine organisms, indicating a species-specific microbiome.
Resumo:
Abstract Background Tnt1 was the first active plant retrotransposon identified in tobacco after nitrate reductase gene disruption. The Tnt1 superfamily comprises elements from Nicotiana (Tnt1 and Tto1) and Lycopersicon (Retrolyc1 and Tlc1) species. The study presented here was conducted to characterise Tnt1-related sequences in 20 wild species of Solanum and five cultivars of Solanum tuberosum. Results Tnt1-related sequences were amplified from total genomic DNA using a PCR-based approach. Purified fragments were cloned and sequenced, and clustering analysis revealed three groups that differ in their U3 region. Using a network approach with a total of 453 non-redundant sequences isolated from Solanum (197), Nicotiana (140) and Lycopersicon (116) species, it is demonstrated that the Tnt1 superfamily can be treated as a population to resolve previous phylogenetic multifurcations. The resulting RNAseH network revealed that sequences group according to the Solanaceae genus, supporting a strong association with the host genome, whereas tracing the U3 region sequence association characterises the modular evolutionary pattern within the Tnt1 superfamily. Within each genus, and irrespective of species, nearly 20% of Tnt1 sequences analysed are identical, indicative of being part of an active copy. The network approach enabled the identification of putative "master" sequences and provided evidence that within a genus these master sequences are associated with distinct U3 regions. Conclusion The results presented here support the hypothesis that the Tnt1 superfamily was present early in the evolution of Solanaceae. The evidence also suggests that the RNAseH region of Tnt1 became fixed at the host genus level whereas, within each genus, propagation was ensured by the diversification of the U3 region. Different selection pressures seemed to have acted on the U3 and RNAseH modules of ancestral Tnt1 elements, probably due to the distinct functions of these regions in the retrotransposon life cycle, resulting in both co evolution and adaptation of the element population with its host.
Resumo:
We investigated the color vision pattern in Cebus apella monkeys by means of electroretinogram measurements (ERG) and genetic analysis. Based on ERG we could discriminate among three types of dichromatic males. Among females, this classification is more complex and requires additional genetic analysis. We found five among 10 possible different phenotypes, two trichromats and three dichromats. We also found that Cebus present a new allele with spectral peak near 552 nm, with the amino acid combination SFT at positions 180, 277 and 285 of the opsin gene, in addition to the previously described SYT, AFT and AFA alleles. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Background: The development of sugarcane as a sustainable crop has unlimited applications. The crop is one of the most economically viable for renewable energy production, and CO2 balance. Linkage maps are valuable tools for understanding genetic and genomic organization, particularly in sugarcane due to its complex polyploid genome of multispecific origins. The overall objective of our study was to construct a novel sugarcane linkage map, compiling AFLP and EST-SSR markers, and to generate data on the distribution of markers anchored to sequences of scIvana_1, a complete sugarcane transposable element, and member of the Copia superfamily. Results: The mapping population parents ('IAC66-6' and 'TUC71-7') contributed equally to polymorphisms, independent of marker type, and generated markers that were distributed into nearly the same number of co-segregation groups (or CGs). Bi-parentally inherited alleles provided the integration of 19 CGs. The marker number per CG ranged from two to 39. The total map length was 4,843.19 cM, with a marker density of 8.87 cM. Markers were assembled into 92 CGs that ranged in length from 1.14 to 404.72 cM, with an estimated average length of 52.64 cM. The greatest distance between two adjacent markers was 48.25 cM. The scIvana_1-based markers (56) were positioned on 21 CGs, but were not regularly distributed. Interestingly, the distance between adjacent scIvana_1-based markers was less than 5 cM, and was observed on five CGs, suggesting a clustered organization. Conclusions: Results indicated the use of a NBS-profiling technique was efficient to develop retrotransposon-based markers in sugarcane. The simultaneous maximum-likelihood estimates of linkage and linkage phase based strategies confirmed the suitability of its approach to estimate linkage, and construct the linkage map. Interestingly, using our genetic data it was possible to calculate the number of retrotransposonscIvana_1 (similar to 60) copies in the sugarcane genome, confirming previously reported molecular results. In addition, this research possibly will have indirect implications in crop economics e. g., productivity enhancement via QTL studies, as the mapping population parents differ in response to an important fungal disease.
Resumo:
Oil content and grain yield in maize are negatively correlated, and so far the development of high-oil high-yielding hybrids has not been accomplished. Then a fully understand of the inheritance of the kernel oil content is necessary to implement a breeding program to improve both traits simultaneously. Conventional and molecular marker analyses of the design III were carried out from a reference population developed from two tropical inbred lines divergent for kernel oil content. The results showed that additive variance was quite larger than the dominance variance, and the heritability coefficient was very high. Sixteen QTL were mapped, they were not evenly distributed along the chromosomes, and accounted for 30.91% of the genetic variance. The average level of dominance computed from both conventional and QTL analysis was partial dominance. The overall results indicated that the additive effects were more important than the dominance effects, the latter were not unidirectional and then heterosis could not be exploited in crosses. Most of the favorable alleles of the QTL were in the high-oil parental inbred, which could be transferred to other inbreds via marker-assisted backcross selection. Our results coupled with reported information indicated that the development of high-oil hybrids with acceptable yields could be accomplished by using marker-assisted selection involving oil content, grain yield and its components. Finally, to exploit the xenia effect to increase even more the oil content, these hybrids should be used in the Top Cross((TM)) procedure.
Resumo:
Rare variants are becoming the new candidates in the search for genetic variants that predispose individuals to a phenotype of interest. Their low prevalence in a population requires the development of dedicated detection and analytical methods. A family-based approach could greatly enhance their detection and interpretation because rare variants are nearly family specific. In this report, we test several distinct approaches for analyzing the information provided by rare and common variants and how they can be effectively used to pinpoint putative candidate genes for follow-up studies. The analyses were performed on the mini-exome data set provided by Genetic Analysis Workshop 17. Eight approaches were tested, four using the trait’s heritability estimates and four using QTDT models. These methods had their sensitivity, specificity, and positive and negative predictive values compared in light of the simulation parameters. Our results highlight important limitations of current methods to deal with rare and common variants, all methods presented a reduced specificity and, consequently, prone to false positive associations. Methods analyzing common variants information showed an enhanced sensibility when compared to rare variants methods. Furthermore, our limited knowledge of the use of biological databases for gene annotations, possibly for use as covariates in regression models, imposes a barrier to further research.
Resumo:
Human endogenous retroviruses (HERVs) arise from ancient infections of the host germline cells by exogenous retroviruses, constituting 8% of the human genome. Elevated level of envelope transcripts from HERVs-W has been detected in CSF, plasma and brain tissues from patients with Multiple Sclerosis (MS), most of them from Xq22.3, 15q21.3, and 6q21 chromosomes. However, since the locus Xq22.3 (ERVWE2) lack the 5' LTR promoter and the putative protein should be truncated due to a stop codon, we investigated the ERVWE2 genomic loci from 84 individuals, including MS patients with active HERV-W expression detected in PBMC. In addition, an automated search for promoter sequences in 20 kb nearby region of ERVWE2 reference sequence was performed. Several putative binding sites for cellular cofactors and enhancers were found, suggesting that transcription may occur via alternative promoters. However, ERVWE2 DNA sequencing of MS and healthy individuals revealed that all of them harbor a stop codon at site 39, undermining the expression of a full-length protein. Finally, since plaque formation in central nervous system (CNS) of MS patients is attributed to immunological mechanisms triggered by autoimmune attack against myelin, we also investigated the level of similarity between envelope protein and myelin oligodendrocyte glycoprotein (MOG). Comparison of the MOG to the envelope identified five retroviral regions similar to the Ig-like domain of MOG. Interestingly, one of them includes T and B cell epitopes, capable to induce T effector functions and circulating Abs in rats. In sum, although no DNA substitutions that would link ERVWE2 to the MS pathogeny was found, the similarity between the envelope protein to MOG extends the idea that ERVEW2 may be involved on the immunopathogenesis of MS, maybe facilitating the MOG recognizing by the immune system. Although awaiting experimental evidences, the data presented here may expand the scope of the endogenous retroviruses involvement on MS pathogenesis
Resumo:
Introduction: Enterococcus faecalis is a member of the mammalian gastrointestinal microbiota but has been considered a leading cause of hospital-acquired infections. In the oral cavity, it is commonly detected from root canals of teeth with failed endodontic treatment. However, little is known about the virulence and genetic relatedness among E. faecalis isolates from different clinical sources. This study compared the presence of enterococcal virulence factors among root canal strains and clinical isolates from hospitalized patients to identify virulent clusters of E. faecalis. Methods: Multilocus sequence typing analysis was used to determine genetic lineages of 40 E. faecalis clinical isolates from different sources. Virulence clusters were determined by evaluating capsule (cps) locus polymorphisms, pathogenicity island gene content, and antibiotic resistance genes by polymerase chain reaction. Results: The clinical isolates from hospitalized patients formed a phylogenetically separate group and were mostly grouped in the clonal complex 2, which is a known virulent cluster of E. faecalis that has caused infection outbreaks globally. The clonal complex 2 group comprised capsule-producing strains harboring multiple antibiotic resistance and pathogenicity island genes. On the other hand, the endodontic isolates were more diverse and harbored few virulence and antibiotic resistance genes. In particular, although more closely related to isolates from hospitalized patients, capsuleproducing E. faecalis strains from root canals did not carry more virulence/antibiotic genes than other endodontic isolates. Conclusions: E. faecalis isolates from endodontic infections have a genetic and virulence profile different from pathogenic clusters of hospitalized patients’ isolates, which is most likely due to niche specialization conferred mainly by variable regions in the genome.