191 resultados para BIoinformàtica


Relevância:

10.00% 10.00%

Publicador:

Resumo:

The number of existing protein sequences spans a very small fraction of sequence space. Natural proteins have overcome a strong negative selective pressure to avoid the formation of insoluble aggregates. Stably folded globular proteins and intrinsically disordered proteins (IDP) use alternative solutions to the aggregation problem. While in globular proteins folding minimizes the access to aggregation prone regions IDPs on average display large exposed contact areas. Here, we introduce the concept of average meta-structure correlation map to analyze sequence space. Using this novel conceptual view we show that representative ensembles of folded and ID proteins show distinct characteristics and responds differently to sequence randomization. By studying the way evolutionary constraints act on IDPs to disable a negative function (aggregation) we might gain insight into the mechanisms by which function - enabling information is encoded in IDPs.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose a novel multifactor dimensionality reduction method for epistasis detection in small or extended pedigrees, FAM-MDR. It combines features of the Genome-wide Rapid Association using Mixed Model And Regression approach (GRAMMAR) with Model-Based MDR (MB-MDR). We focus on continuous traits, although the method is general and can be used for outcomes of any type, including binary and censored traits. When comparing FAM-MDR with Pedigree-based Generalized MDR (PGMDR), which is a generalization of Multifactor Dimensionality Reduction (MDR) to continuous traits and related individuals, FAM-MDR was found to outperform PGMDR in terms of power, in most of the considered simulated scenarios. Additional simulations revealed that PGMDR does not appropriately deal with multiple testing and consequently gives rise to overly optimistic results. FAM-MDR adequately deals with multiple testing in epistasis screens and is in contrast rather conservative, by construction. Furthermore, simulations show that correcting for lower order (main) effects is of utmost importance when claiming epistasis. As Type 2 Diabetes Mellitus (T2DM) is a complex phenotype likely influenced by gene-gene interactions, we applied FAM-MDR to examine data on glucose area-under-the-curve (GAUC), an endophenotype of T2DM for which multiple independent genetic associations have been observed, in the Amish Family Diabetes Study (AFDS). This application reveals that FAM-MDR makes more efficient use of the available data than PGMDR and can deal with multi-generational pedigrees more easily. In conclusion, we have validated FAM-MDR and compared it to PGMDR, the current state-of-the-art MDR method for family data, using both simulations and a practical dataset. FAM-MDR is found to outperform PGMDR in that it handles the multiple testing issue more correctly, has increased power, and efficiently uses all available information.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Assessing the contribution of promoters and coding sequences to gene evolution is an important step toward discovering the major genetic determinants of human evolution. Many specific examples have revealed the evolutionary importance of cis-regulatory regions. However, the relative contribution of regulatory and coding regions to the evolutionary process and whether systemic factors differentially influence their evolution remains unclear. To address these questions, we carried out an analysis at the genome scale to identify signatures of positive selection in human proximal promoters. Next, we examined whether genes with positively selected promoters (Prom+ genes) show systemic differences with respect to a set of genes with positively selected protein-coding regions (Cod+ genes). We found that the number of genes in each set was not significantly different (8.1% and 8.5%, respectively). Furthermore, a functional analysis showed that, in both cases, positive selection affects almost all biological processes and only a few genes of each group are located in enriched categories, indicating that promoters and coding regions are not evolutionarily specialized with respect to gene function. On the other hand, we show that the topology of the human protein network has a different influence on the molecular evolution of proximal promoters and coding regions. Notably, Prom+ genes have an unexpectedly high centrality when compared with a reference distribution (P = 0.008, for Eigenvalue centrality). Moreover, the frequency of Prom+ genes increases from the periphery to the center of the protein network (P = 0.02, for the logistic regression coefficient). This means that gene centrality does not constrain the evolution of proximal promoters, unlike the case with coding regions, and further indicates that the evolution of proximal promoters is more efficient in the center of the protein network than in the periphery. These results show that proximal promoters have had a systemic contribution to human evolution by increasing the participation of central genes in the evolutionary process.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Chromosomal anomalies, like Robertsonian and reciprocal translocations represent a big problem in cattle breeding as their presence induces, in the carrier subjects, a well documented fertility reduction. In cattle reciprocal translocations (RCPs, a chromosome abnormality caused by an exchange of material between nonhomologous chromosomes) are considered rare as to date only 19 reciprocal translocations have been described. In cattle it is common knowledge that the Robertsonian translocations represent the most common cytogenetic anomalies, and this is probably due to the existence of the endemic 1;29 Robertsonian translocation. However, these considerations are based on data obtained using techniques that are unable to identify all reciprocal translocations and thus their frequency is clearly underestimated. The purpose of this work is to provide a first realistic estimate of the impact of RCPs in the cattle population studied, trying to eliminate the factors which have caused an underestimation of their frequency so far. We performed this work using a mathematical as well as a simulation approach and, as biological data, we considered the cytogenetic results obtained in the last 15 years. The results obtained show that only 16% of reciprocal translocations can be detected using simple Giemsa techniques and consequently they could be present in no less than 0,14% of cattle subjects, a frequency five times higher than that shown by de novo Robertsonian translocations. This data is useful to open a debate about the need to introduce a more efficient method to identify RCP in cattle.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Abstract Background: The CWxP motif of transmembrane helix 6 (x: any residue) is highly conserved in class A GPCRs. Within this motif, W6.48 is a big star in the theory of the global “toggle switch” because of its key role in the activation mechanism of GPCRs upon ligand binding. With all footlights focused on W6.48, the reason why the preceding residue, C6.47, is largely conserved is still unknown. The present study is aimed to fill up this lack of knowledge by characterizing the role of C6.47 of the CWxP motif. Results: A complete analysis of available crystal structures has been made alongside with molecular dynamics simulations of model peptides to explore a possible structural role for C6.47. Conclusions: We conclude that C6.47 does not modulate the conformation of the TM6 proline kink and propose that C6.47 participates in the rearrangement of the TM6 and TM7 interface accompanying activation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn’s disease. Results: In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn’s disease (CD) data. Conclusions: Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn’s disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The relationship between inflammation and cancer is well established in several tumor types, including bladder cancer. We performed an association study between 886 inflammatory-gene variants and bladder cancer risk in 1,047 cases and 988 controls from the Spanish Bladder Cancer (SBC)/EPICURO Study. A preliminary exploration with the widely used univariate logistic regression approach did not identify any significant SNP after correcting for multiple testing. We further applied two more comprehensive methods to capture the complexity of bladder cancer genetic susceptibility: Bayesian Threshold LASSO (BTL), a regularized regression method, and AUC-Random Forest, a machine-learning algorithm. Both approaches explore the joint effect of markers. BTL analysis identified a signature of 37 SNPs in 34 genes showing an association with bladder cancer. AUC-RF detected an optimal predictive subset of 56 SNPs. 13 SNPs were identified by both methods in the total population. Using resources from the Texas Bladder Cancer study we were able to replicate 30% of the SNPs assessed. The associations between inflammatory SNPs and bladder cancer were reexamined among non-smokers to eliminate the effect of tobacco, one of the strongest and most prevalent environmental risk factor for this tumor. A 9 SNP-signature was detected by BTL. Here we report, for the first time, a set of SNP in inflammatory genes jointly associated with bladder cancer risk. These results highlight the importance of the complex structure of genetic susceptibility associated with cancer risk.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We have developed numerical simulations of three dimensional suspensions of active particles to characterize the capabilities of the hydrodynamic stresses induced by active swimmers to promote global order and emergent structures in active suspensions. We have considered squirmer suspensions embedded in a fluid modeled under a Lattice Boltzmann scheme. We have found that active stresses play a central role to decorrelate the collective motion of squirmers and that contractile squirmers develop significant aggregates.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

O objetivo deste trabalho foi confrontar as sequências parciais do gene 16S rRNA de estirpes padrão de rizóbios com as de estirpes recomendadas para a produção de inoculantes no Brasil, com vistas à verificação da confiabilidade do sequenciamento parcial desse gene para a identificação rápida de estirpes. Foram realizados sequenciamentos através de reação em cadeia da polimerase (PCR) com iniciadores relativos à região codificadora do gene 16S rRNA entre as bactérias estudadas. Os resultados foram analisados pela consulta de similaridade de nucleotídeos aos do "Basic Local Alignment Search Tool" (Blastn) e por meio da interpretação de árvores filogenéticas geradas usando ferramentas de bioinformática. A classificação taxonômica das estirpes Semia recomendadas para inoculação de leguminosas com base em propriedades morfológicas e especificidade hospedeira não foi confirmada em todas as estirpes. A maioria das estirpes estudadas, consultadas no Blastn, é consistente com a classificação proposta pela construção de árvores filogenéticas das sequências destas estirpes, com base na similaridade pelo sequenciamento parcial do gene considerado.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Although approximately 50% of Down Syndrome (DS) patients have heart abnormalities, they exhibit an overprotection against cardiac abnormalities related with the connective tissue, for example a lower risk of coronary artery disease. A recent study reported a case of a person affected by DS who carried mutations in FBN1, the gene causative for a connective tissue disorder called Marfan Syndrome (MFS). The fact that the person did not have any cardiac alterations suggested compensation effects due to DS. This observation is supported by a previous DS meta-analysis at the molecular level where we have found an overall upregulation of FBN1 (which is usually downregulated in MFS). Additionally, that result was cross-validated with independent expression data from DS heart tissue. The aim of this work is to elucidate the role of FBN1 in DS and to establish a molecular link to MFS and MFS-related syndromes using a computational approach. To reach that, we conducted different analytical approaches over two DS studies (our previous meta-analysis and independent expression data from DS heart tissue) and revealed expression alterations in the FBN1 interaction network, in FBN1 co-expressed genes and FBN1-related pathways. After merging the significant results from different datasets with a Bayesian approach, we prioritized 85 genes that were able to distinguish control from DS cases. We further found evidence for several of these genes (47%), such as FBN1, DCN, and COL1A2, being dysregulated in MFS and MFS-related diseases. Consequently, we further encourage the scientific community to take into account FBN1 and its related network for the study of DS cardiovascular characteristics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Motivation: The comparative analysis of gene gain and loss rates is critical for understanding the role of natural selection and adaptation in shaping gene family sizes. Studying complete genome data from closely related species allows accurate estimation of gene family turnover rates. Current methods and software tools, however, are not well designed for dealing with certain kinds of functional elements, such as microRNAs or transcription factor binding sites. Results: Here, we describe BadiRate, a new software tool to estimate family turnover rates, as well as the number of elements in internal phylogenetic nodes, by likelihood-based methods and parsimony. It implements two stochastic population models, which provide the appropriate statistical framework for testing hypothesis, such as lineage-specific gene family expansions or contractions. We have assessed the accuracy of BadiRate by computer simulations, and have also illustrated its functionality by analyzing a representative empirical dataset.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: DNA sequence polymorphisms analysis can provide valuable information on the evolutionary forces shaping nucleotide variation, and provides an insight into the functional significance of genomic regions. The recent ongoing genome projects will radically improve our capabilities to detect specific genomic regions shaped by natural selection. Current available methods and software, however, are unsatisfactory for such genome-wide analysis. RESULTS: We have developed methods for the analysis of DNA sequence polymorphisms at the genome-wide scale. These methods, which have been tested on a coalescent-simulated and actual data files from mouse and human, have been implemented in the VariScan software package version 2.0. Additionally, we have also incorporated a graphical-user interface. The main features of this software are: i) exhaustive population-genetic analyses including those based on the coalescent theory; ii) analysis adapted to the shallow data generated by the high-throughput genome projects; iii) use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv) identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v) visualization of the results integrated with current genome annotations in commonly available genome browsers. CONCLUSION: VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms. The current version implements new algorithms, methods, and capabilities, providing an important tool for an exhaustive exploratory analysis of genome-wide DNA polymorphism data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

DnaSP is a software package for a comprehensive analysis of DNA polymorphism data. Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets. Among other features, the newly implemented methods allow for: (i) analyses on multiple data files; (ii) haplotype phasing; (iii) analyses on insertion/deletion polymorphism data; (iv) visualizing sliding window results integrated with available genome annotations in the UCSC browser.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

O objetivo deste trabalho foi identificar marcadores moleculares relacionados à resistência do cafeeiro (Coffea arabica) à ferrugem (Hemileia vastatrix). Foram identificadas sequências de DNA potencialmente envolvidas na resistência do cafeeiro a doenças, por meio de análise "in silico", a partir das informações geradas pelo Projeto Brasileiro do Genoma Café. A partir das sequências mineradas, foram desenhados 59 pares de iniciadores para amplificá-las. Os 59 iniciadores foram testados em 12 cafeeiros resistentes e 12 susceptíveis a H. vastatrix. Vinte e sete iniciadores resultaram em bandas únicas e bem definidas, enquanto um deles amplificou fragmento de DNA em todos os cafeeiros resistentes, mas não nos suscetíveis. Esse marcador molecular polimórfico amplificou uma região do DNA que corresponde a uma janela aberta de leitura parcial do genoma de C. arabica que codifica uma proteína de resistência a doenças. O marcador CARF 005 é capaz de diferenciar os cafeeiros analisados em resistentes e susceptíveis a H. vastatrix.