931 resultados para bioinformatics


Relevância:

10.00% 10.00%

Publicador:

Resumo:

It has been previously described that p21 functions not only as a CDK inhibitor but also as a transcriptional co-repressor in some systems. To investigate the roles of p21 in transcriptional control, we studied the gene expression changes in two human cell systems. Using a human leukemia cell line (K562) with inducible p21 expression and human primary keratinocytes with adenoviral-mediated p21 expression, we carried out microarray-based gene expression profiling. We found that p21 rapidly and strongly repressed the mRNA levels of a number of genes involved in cell cycle and mitosis. One of the most strongly down-regulated genes was CCNE2 (cyclin E2 gene). Mutational analysis in K562 cells showed that the N-terminal region of p21 is required for repression of gene expression of CCNE2 and other genes. Chromatin immunoprecipitation assays indicated that p21 was bound to human CCNE2 and other p21-repressed genes gene in the vicinity of the transcription start site. Moreover, p21 repressed human CCNE2 promoter-luciferase constructs in K562 cells. Bioinformatic analysis revealed that the CDE motif is present in most of the promoters of the p21-regulated genes. Altogether, the results suggest that p21 exerts a repressive effect on a relevant number of genes controlling S phase and mitosis. Thus, p21 activity as inhibitor of cell cycle progression would be mediated not only by the inhibition of CDKs but also by the transcriptional down-regulation of key genes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: Available methods to simulate nucleotide or amino acid data typically use Markov models to simulate each position independently. These approaches are not appropriate to assess the performance of combinatorial and probabilistic methods that look for coevolving positions in nucleotide or amino acid sequences. RESULTS: We have developed a web-based platform that gives a user-friendly access to two phylogenetic-based methods implementing the Coev model: the evaluation of coevolving scores and the simulation of coevolving positions. We have also extended the capabilities of the Coev model to allow for the generalization of the alphabet used in the Markov model, which can now analyse both nucleotide and amino acid data sets. The simulation of coevolving positions is novel and builds upon the developments of the Coev model. It allows user to simulate pairs of dependent nucleotide or amino acid positions. CONCLUSIONS: The main focus of our paper is the new simulation method we present for coevolving positions. The implementation of this method is embedded within the web platform Coev-web that is freely accessible at http://coev.vital-it.ch/, and was tested in most modern web browsers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Although approximately 50% of Down Syndrome (DS) patients have heart abnormalities, they exhibit an overprotection against cardiac abnormalities related with the connective tissue, for example a lower risk of coronary artery disease. A recent study reported a case of a person affected by DS who carried mutations in FBN1, the gene causative for a connective tissue disorder called Marfan Syndrome (MFS). The fact that the person did not have any cardiac alterations suggested compensation effects due to DS. This observation is supported by a previous DS meta-analysis at the molecular level where we have found an overall upregulation of FBN1 (which is usually downregulated in MFS). Additionally, that result was cross-validated with independent expression data from DS heart tissue. The aim of this work is to elucidate the role of FBN1 in DS and to establish a molecular link to MFS and MFS-related syndromes using a computational approach. To reach that, we conducted different analytical approaches over two DS studies (our previous meta-analysis and independent expression data from DS heart tissue) and revealed expression alterations in the FBN1 interaction network, in FBN1 co-expressed genes and FBN1-related pathways. After merging the significant results from different datasets with a Bayesian approach, we prioritized 85 genes that were able to distinguish control from DS cases. We further found evidence for several of these genes (47%), such as FBN1, DCN, and COL1A2, being dysregulated in MFS and MFS-related diseases. Consequently, we further encourage the scientific community to take into account FBN1 and its related network for the study of DS cardiovascular characteristics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Reduced glomerular filtration rate defines chronic kidney disease and is associated with cardiovascular and all-cause mortality. We conducted a meta-analysis of genome-wide association studies for estimated glomerular filtration rate (eGFR), combining data across 133,413 individuals with replication in up to 42,166 individuals. We identify 24 new and confirm 29 previously identified loci. Of these 53 loci, 19 associate with eGFR among individuals with diabetes. Using bioinformatics, we show that identified genes at eGFR loci are enriched for expression in kidney tissues and in pathways relevant for kidney development and transmembrane transporter activity, kidney structure, and regulation of glucose metabolism. Chromatin state mapping and DNase I hypersensitivity analyses across adult tissues demonstrate preferential mapping of associated variants to regulatory regions in kidney but not extra-renal tissues. These findings suggest that genetic determinants of eGFR are mediated largely through direct effects within the kidney and highlight important cell types and biological pathways.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Construction of multiple sequence alignments is a fundamental task in Bioinformatics. Multiple sequence alignments are used as a prerequisite in many Bioinformatics methods, and subsequently the quality of such methods can be critically dependent on the quality of the alignment. However, automatic construction of a multiple sequence alignment for a set of remotely related sequences does not always provide biologically relevant alignments.Therefore, there is a need for an objective approach for evaluating the quality of automatically aligned sequences. The profile hidden Markov model is a powerful approach in comparative genomics. In the profile hidden Markov model, the symbol probabilities are estimated at each conserved alignment position. This can increase the dimension of parameter space and cause an overfitting problem. These two research problems are both related to conservation. We have developed statistical measures for quantifying the conservation of multiple sequence alignments. Two types of methods are considered, those identifying conserved residues in an alignment position, and those calculating positional conservation scores. The positional conservation score was exploited in a statistical prediction model for assessing the quality of multiple sequence alignments. The residue conservation score was used as part of the emission probability estimation method proposed for profile hidden Markov models. The results of the predicted alignment quality score highly correlated with the correct alignment quality scores, indicating that our method is reliable for assessing the quality of any multiple sequence alignment. The comparison of the emission probability estimation method with the maximum likelihood method showed that the number of estimated parameters in the model was dramatically decreased, while the same level of accuracy was maintained. To conclude, we have shown that conservation can be successfully used in the statistical model for alignment quality assessment and in the estimation of emission probabilities in the profile hidden Markov models.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Tämä tutkielma kuuluu merkkijonoalgoritmiikan piiriin. Merkkijono S on merkkijonojen X[1..m] ja Y[1..n] yhteinen alijono, mikäli se voidaan muodostaa poistamalla X:stä 0..m ja Y:stä 0..n kappaletta merkkejä mielivaltaisista paikoista. Jos yksikään X:n ja Y:n yhteinen alijono ei ole S:ää pidempi, sanotaan, että S on X:n ja Y:n pisin yhteinen alijono (lyh. PYA). Tässä työssä keskitytään kahden merkkijonon PYAn ratkaisemiseen, mutta ongelma on yleistettävissä myös useammalle jonolle. PYA-ongelmalle on sovelluskohteita – paitsi tietojenkäsittelytieteen niin myös bioinformatiikan osa-alueilla. Tunnetuimpia niistä ovat tekstin ja kuvien tiivistäminen, tiedostojen versionhallinta, hahmontunnistus sekä DNA- ja proteiiniketjujen rakennetta vertaileva tutkimus. Ongelman ratkaisemisen tekee hankalaksi ratkaisualgoritmien riippuvuus syötejonojen useista eri parametreista. Näitä ovat syötejonojen pituuden lisäksi mm. syöttöaakkoston koko, syötteiden merkkijakauma, PYAn suhteellinen osuus lyhyemmän syötejonon pituudesta ja täsmäävien merkkiparien lukumäärä. Täten on vaikeaa kehittää algoritmia, joka toimisi tehokkaasti kaikille ongelman esiintymille. Tutkielman on määrä toimia yhtäältä käsikirjana, jossa esitellään ongelman peruskäsitteiden kuvauksen jälkeen jo aikaisemmin kehitettyjä tarkkoja PYAalgoritmeja. Niiden tarkastelu on ryhmitelty algoritmin toimintamallin mukaan joko rivi, korkeuskäyrä tai diagonaali kerrallaan sekä monisuuntaisesti prosessoiviin. Tarkkojen menetelmien lisäksi esitellään PYAn pituuden ylä- tai alarajan laskevia heuristisia menetelmiä, joiden laskemia tuloksia voidaan hyödyntää joko sellaisinaan tai ohjaamaan tarkan algoritmin suoritusta. Tämä osuus perustuu tutkimusryhmämme julkaisemiin artikkeleihin. Niissä käsitellään ensimmäistä kertaa heuristiikoilla tehostettuja tarkkoja menetelmiä. Toisaalta työ sisältää laajahkon empiirisen tutkimusosuuden, jonka tavoitteena on ollut tehostaa olemassa olevien tarkkojen algoritmien ajoaikaa ja muistinkäyttöä. Kyseiseen tavoitteeseen on pyritty ohjelmointiteknisesti esittelemällä algoritmien toimintamallia hyvin tukevia tietorakenteita ja rajoittamalla algoritmien suorittamaa tuloksetonta laskentaa parantamalla niiden kykyä havainnoida suorituksen aikana saavutettuja välituloksia ja hyödyntää niitä. Tutkielman johtopäätöksinä voidaan yleisesti todeta tarkkojen PYA-algoritmien heuristisen esiprosessoinnin lähes systemaattisesti pienentävän niiden suoritusaikaa ja erityisesti muistintarvetta. Lisäksi algoritmin käyttämällä tietorakenteella on ratkaiseva vaikutus laskennan tehokkuuteen: mitä paikallisempia haku- ja päivitysoperaatiot ovat, sitä tehokkaampaa algoritmin suorittama laskenta on.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Large-scale genome projects have generated a rapidly increasing number of DNA sequences. Therefore, development of computational methods to rapidly analyze these sequences is essential for progress in genomic research. Here we present an automatic annotation system for preliminary analysis of DNA sequences. The gene annotation tool (GATO) is a Bioinformatics pipeline designed to facilitate routine functional annotation and easy access to annotated genes. It was designed in view of the frequent need of genomic researchers to access data pertaining to a common set of genes. In the GATO system, annotation is generated by querying some of the Web-accessible resources and the information is stored in a local database, which keeps a record of all previous annotation results. GATO may be accessed from everywhere through the internet or may be run locally if a large number of sequences are going to be annotated. It is implemented in PHP and Perl and may be run on any suitable Web server. Usually, installation and application of annotation systems require experience and are time consuming, but GATO is simple and practical, allowing anyone with basic skills in informatics to access it without any special training. GATO can be downloaded at [http://mariwork.iq.usp.br/gato/]. Minimum computer free space required is 2 MB.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Our objective was to clone, express and characterize adult Dermatophagoides farinae group 1 (Der f 1) allergens to further produce recombinant allergens for future clinical applications in order to eliminate side reactions from crude extracts of mites. Based on GenBank data, we designed primers and amplified the cDNA fragment coding for Der f 1 by nested-PCR. After purification and recovery, the cDNA fragment was cloned into the pMD19-T vector. The fragment was then sequenced, subcloned into the plasmid pET28a(+), expressed in Escherichia coli BL21 and identified by Western blotting. The cDNA coding for Der f 1 was cloned, sequenced and expressed successfully. Sequence analysis showed the presence of an open reading frame containing 966 bp that encodes a protein of 321 amino acids. Interestingly, homology analysis showed that the Der p 1 shared more than 87% identity in amino acid sequence with Eur m 1 but only 80% with Der f 1. Furthermore, phylogenetic analyses suggested that D. pteronyssinus was evolutionarily closer to Euroglyphus maynei than to D. farinae, even though D. pteronyssinus and D. farinae belong to the same Dermatophagoides genus. A total of three cysteine peptidase active sites were found in the predicted amino acid sequence, including 127-138 (QGGCGSCWAFSG), 267-277 (NYHAVNIVGYG) and 284-303 (YWIVRNSWDTTWGDSGYGYF). Moreover, secondary structure analysis revealed that Der f 1 contained an a helix (33.96%), an extended strand (17.13%), a ß turn (5.61%), and a random coil (43.30%). A simple three-dimensional model of this protein was constructed using a Swiss-model server. The cDNA coding for Der f 1 was cloned, sequenced and expressed successfully. Alignment and phylogenetic analysis suggests that D. pteronyssinus is evolutionarily more similar to E. maynei than to D. farinae.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

MicroRNAs (miRNAs) have gradually been recognized as regulators of embryonic development; however, relatively few miRNAs have been identified that regulate cardiac development. A series of recent papers have established an essential role for the miRNA-17-92 (miR-17-92) cluster of miRNAs in the development of the heart. Previous research has shown that the Friend of Gata-2 (FOG-2) is critical for cardiac development. To investigate the possibility that the miR-17-92 cluster regulates FOG-2 expression and inhibits proliferation in mouse embryonic cardiomyocytes we initially used bioinformatics to analyze 3’ untranslated regions (3’UTR) of FOG-2 to predict the potential of miR-17-92 to target it. We used luciferase assays to demonstrate that miR-17-5p and miR-20a of miR-17-92 interact with the predicted target sites in the 3’UTR of FOG-2. Furthermore, RT-PCR and Western blot were used to demonstrate the post-transcriptional regulation of FOG-2 by miR-17-92 in embryonic cardiomyocytes from E12.5-day pregnant C57BL/6J mice. Finally, EdU cell assays together with the FOG-2 rescue strategy were employed to evaluate the effect of proliferation on embryonic cardiomyocytes. We first found that the miR-17-5p and miR-20a of miR-17-92 directly target the 3’UTR of FOG-2 and post-transcriptionally repress the expression of FOG-2. Moreover, our findings demonstrated that over-expression of miR-17-92 may inhibit cell proliferation via post-transcriptional repression of FOG-2 in embryonic cardiomyocytes. These results indicate that the miR-17-92 cluster regulates the expression of FOG-2 protein and suggest that the miR-17-92 cluster might play an important role in heart development.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Hypertrophy is a major predictor of progressive heart disease and has an adverse prognosis. MicroRNAs (miRNAs) that accumulate during the course of cardiac hypertrophy may participate in the process. However, the nature of any interaction between a hypertrophy-specific signaling pathway and aberrant expression of miRNAs remains unclear. In this study, Spague Dawley male rats were treated with transverse aortic constriction (TAC) surgery to mimic pathological hypertrophy. Hearts were isolated from TAC and sham operated rats (n=5 for each group at 5, 10, 15, and 20 days after surgery) for miRNA microarray assay. The miRNAs dysexpressed during hypertrophy were further analyzed using a combination of bioinformatics algorithms in order to predict possible targets. Increased expression of the target genes identified in diverse signaling pathways was also analyzed. Two sets of miRNAs were identified, showing different expression patterns during hypertrophy. Bioinformatics analysis suggested the miRNAs may regulate multiple hypertrophy-specific signaling pathways by targeting the member genes and the interaction of miRNA and mRNA might form a network that leads to cardiac hypertrophy. In addition, the multifold changes in several miRNAs suggested that upregulation of rno-miR-331*, rno-miR-3596b, rno-miR-3557-5p and downregulation of rno-miR-10a, miR-221, miR-190, miR-451 could be seen as biomarkers of prognosis in clinical therapy of heart failure. This study described, for the first time, a potential mechanism of cardiac hypertrophy involving multiple signaling pathways that control up- and downregulation of miRNAs. It represents a first step in the systematic discovery of miRNA function in cardiovascular hypertrophy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Shellfish are a source of food allergens, and their consumption is the cause of severe allergic reactions in humans. Tropomyosins, a family of muscle proteins, have been identified as the major allergens in shellfish and mollusks species. Nevertheless, few experimentally determined three-dimensional structures are available in the Protein Data Base (PDB). In this study, 3D models of several homologous of tropomyosins present in marine shellfish and mollusk species (Chaf 1, Met e1, Hom a1, Per v1, and Pen a1) were constructed, validated, and their immunoglobulin E binding epitopes were identified using bioinformatics tools. All protein models for these allergens consisted of long alpha-helices. Chaf 1, Met e1, and Hom a1 had six conserved regions with sequence similarities to known epitopes, whereas Per v1 and Pen a1 contained only one. Lipophilic potentials of identified epitopes revealed a high propensity of hydrophobic amino acids in the immunoglobulin E binding site. This information could be useful to design tropomyosin-specific immunotherapy for sea food allergies.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The recent rapid development of biotechnological approaches has enabled the production of large whole genome level biological data sets. In order to handle thesedata sets, reliable and efficient automated tools and methods for data processingand result interpretation are required. Bioinformatics, as the field of studying andprocessing biological data, tries to answer this need by combining methods and approaches across computer science, statistics, mathematics and engineering to studyand process biological data. The need is also increasing for tools that can be used by the biological researchers themselves who may not have a strong statistical or computational background, which requires creating tools and pipelines with intuitive user interfaces, robust analysis workflows and strong emphasis on result reportingand visualization. Within this thesis, several data analysis tools and methods have been developed for analyzing high-throughput biological data sets. These approaches, coveringseveral aspects of high-throughput data analysis, are specifically aimed for gene expression and genotyping data although in principle they are suitable for analyzing other data types as well. Coherent handling of the data across the various data analysis steps is highly important in order to ensure robust and reliable results. Thus,robust data analysis workflows are also described, putting the developed tools andmethods into a wider context. The choice of the correct analysis method may also depend on the properties of the specific data setandthereforeguidelinesforchoosing an optimal method are given. The data analysis tools, methods and workflows developed within this thesis have been applied to several research studies, of which two representative examplesare included in the thesis. The first study focuses on spermatogenesis in murinetestis and the second one examines cell lineage specification in mouse embryonicstem cells.