17 resultados para protein sequence classification

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Background A large number of probabilistic models used in sequence analysis assign non-zero probability values to most input sequences. To decide when a given probability is sufficient the most common way is bayesian binary classification, where the probability of the model characterizing the sequence family of interest is compared to that of an alternative probability model. We can use as alternative model a null model. This is the scoring technique used by sequence analysis tools such as HMMER, SAM and INFERNAL. The most prevalent null models are position-independent residue distributions that include: the uniform distribution, genomic distribution, family-specific distribution and the target sequence distribution. This paper presents a study to evaluate the impact of the choice of a null model in the final result of classifications. In particular, we are interested in minimizing the number of false predictions in a classification. This is a crucial issue to reduce costs of biological validation. Results For all the tests, the target null model presented the lowest number of false positives, when using random sequences as a test. The study was performed in DNA sequences using GC content as the measure of content bias, but the results should be valid also for protein sequences. To broaden the application of the results, the study was performed using randomly generated sequences. Previous studies were performed on aminoacid sequences, using only one probabilistic model (HMM) and on a specific benchmark, and lack more general conclusions about the performance of null models. Finally, a benchmark test with P. falciparum confirmed these results. Conclusions Of the evaluated models the best suited for classification are the uniform model and the target model. However, the use of the uniform model presents a GC bias that can cause more false positives for candidate sequences with extreme compositional bias, a characteristic not described in previous studies. In these cases the target model is more dependable for biological validation due to its higher specificity.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Bananas (Musa spp.) are highly perishable fruit of notable economic and nutritional relevance. Because the identification of proteins involved in metabolic pathways could help to extend green-life and improve the quality of the fruit, this study aimed to compare the proteins of banana pulp at the pre-climacteric and climacteric stages. The use of two-dimensional fluorescence difference gel electrophoresis (2D-DIGE) revealed 50 differentially expressed proteins, and comparing those proteins to the Mass Spectrometry Protein Sequence Database (MSDB) identified 26 known proteins. Chitinases were the most abundant types of proteins in unripe bananas, and two isoforms in the ripe fruit have been implicated in the stress/defense response. In this regard, three heat shock proteins and isoflavone reductase were also abundant at the climacteric stage. Concerning fruit quality, pectate lyase, malate dehydrogenase, and starch phosphorylase accumulated during ripening. In addition to the ethylene formation enzyme amino cyclo carboxylic acid oxidase, the accumulation of S-adenosyl-L-homocysteine hydrolase was needed because of the increased ethylene synthesis and DNA methylation that occurred in ripening bananas. Differential analysis provided information on the ripening-associated changes that occurred in proteins involved in banana flavor, texture, defense, synthesis of ethylene, regulation of expression, and protein folding, and this analysis validated previous data on the transcripts during ripening. In this regard, the differential proteomics of fruit pulp enlarged our understanding of the process of banana ripening. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Intron splicing is one of the most important steps involved in the maturation process of a pre-mRNA. Although the sequence profiles around the splice sites have been studied extensively, the levels of sequence identity between the exonic sequences preceding the donor sites and the intronic sequences preceding the acceptor sites has not been examined as thoroughly. In this study we investigated identity patterns between the last 15 nucleotides of the exonic sequence preceding the 5' splice site and the intronic sequence preceding the 3' splice site in a set of human protein-coding genes that do not exhibit intron retention. We found that almost 60% of consecutive exons and introns in human protein-coding genes share at least two identical nucleotides at their 3' ends and, on average, the sequence identity length is 2.47 nucleotides. Based on our findings we conclude that the 3' ends of exons and introns tend to have longer identical sequences within a gene than when being taken from different genes. Our results hold even if the pairs are non-consecutive in the transcription order. (C) 2012 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective To investigate risk factors associated with the acquisition of antibodies against Plasmodium vivax Duffy binding protein (PvDBP) a leading malaria vaccine candidate in a well-consolidated agricultural settlement of the Brazilian Amazon Region and to determine the sequence diversity of the PvDBP ligand domain (DBPII) within the local malaria parasite population. Methods Demographic, epidemiological and clinical data were collected from 541 volunteers using a structured questionnaire. Malaria parasites were detected by conventional microscopy and PCR, and blood collection was used for antibody assays and molecular characterisation of DBPII. Results The frequency of malaria infection was 7% (6% for P. vivax and 1% for P. falciparum), with malaria cases clustered near mosquito breeding sites. Nearly 50% of settlers had anti-PvDBP IgG antibodies, as detected by enzyme-linked immunosorbent assay (ELISA) with subjects age being the only strong predictor of seropositivity to PvDBP. Unexpectedly, low levels of DBPII diversity were found within the local malaria parasites, suggesting the existence of low gene flow between P. vivax populations, probably due to the relative isolation of the studied settlement. Conclusion The recognition of PvDBP by a significant proportion of the community, associated with low levels of DBPII diversity among local P. vivax, reinforces the variety of malaria transmission patterns in communities from frontier settlements. Such studies should provide baseline information for antimalarial vaccines now in development.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVE: Differentiation between benign and malignant ovarian neoplasms is essential for creating a system for patient referrals. Therefore, the contributions of the tumor markers CA125 and human epididymis protein 4 (HE4) as well as the risk ovarian malignancy algorithm (ROMA) and risk malignancy index (RMI) values were considered individually and in combination to evaluate their utility for establishing this type of patient referral system. METHODS: Patients who had been diagnosed with ovarian masses through imaging analyses (n = 128) were assessed for their expression of the tumor markers CA125 and HE4. The ROMA and RMI values were also determined. The sensitivity and specificity of each parameter were calculated using receiver operating characteristic curves according to the area under the curve (AUC) for each method. RESULTS: The sensitivities associated with the ability of CA125, HE4, ROMA, or RMI to distinguish between malignant versus benign ovarian masses were 70.4%, 79.6%, 74.1%, and 63%, respectively. Among carcinomas, the sensitivities of CA125, HE4, ROMA (pre-and post-menopausal), and RMI were 93.5%, 87.1%, 80%, 95.2%, and 87.1%, respectively. The most accurate numerical values were obtained with RMI, although the four parameters were shown to be statistically equivalent. CONCLUSION: There were no differences in accuracy between CA125, HE4, ROMA, and RMI for differentiating between types of ovarian masses. RMI had the lowest sensitivity but was the most numerically accurate method. HE4 demonstrated the best overall sensitivity for the evaluation of malignant ovarian tumors and the differential diagnosis of endometriosis. All of the parameters demonstrated increased sensitivity when tumors with low malignancy potential were considered low-risk, which may be used as an acceptable assessment method for referring patients to reference centers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A total of 3,631 expressed sequence tags (ESTs) were established from two size-selected cDNA libraries made from the tetrasporophytic phase of the agarophytic red alga Gracilaria tenuistipitata. The average sizes of the inserts in the two libraries were 1,600 bp and 600 bp, with an average length of the edited sequences of 850 bp. Clustering gave 2,387 assembled sequences with a redundancy of 53%. Of the ESTs, 65% had significant matches to sequences deposited in public databases, 11% to proteins without known function, and 35% were novel. The most represented ESTs were a Na/K-transporting ATPase, a hedgehog-like protein, a glycine dehydrogenase and an actin. Most of the identified genes were involved in primary metabolism and housekeeping. The largest functional group was thus genes involved in metabolism with 14% of the ESTs; other large functional categories included energy, transcription, and protein synthesis and destination. The codon usage was examined using a subset of the data, and the codon bias was found to be limited with all codon combinations used.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Talisin is a seed-storage protein from Talisia esculenta that presents lectin-like activities, as well as proteinase-inhibitor properties. The present study aims to provide new in vitro and in silico biochemical information about this protein, shedding some light on its mechanistic inhibitory strategies. A theoretical three-dimensional structure of Talisin bound to trypsin was constructed in order to determine the relative interaction mode. Since the structure of non-competitive inhibition has not been elucidated, Talisin-trypsin docking was carried out using Hex v5.1, since the structure of non-competitive inhibition has not been elucidated. The predicted non-coincidence of the trypsin binding site is completely different from that previously proposed for Kunitz-type inhibitors, which demonstrate a substitution of an Arg(64) for the Glu(64) residue. Data, therefore, provide more information regarding the mechanisms of non-competitive plant proteinase inhibitors. Bioassays with Talisin also presented a strong insecticide effect on the larval development of Diatraea saccharalis, demonstrating LD50 and ED50 of ca. 2.0% and 1.5%, respectively. (C) 2011 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: RNA interference (RNAi) is a post-transcriptional gene silencing process in which double-stranded RNA (dsRNA) directs the degradation of a specific corresponding target mRNA. The mediators of this process are small dsRNAs of approximately 21 to 23 bp in length, called small interfering RNAs (siRNAs), which can be prepared in vitro and used to direct the degradation of specific mRNAs inside cells. Hence, siRNAs represent a powerful tool to study and control gene and cell function. Rapid progress has been made in the use of siRNA as a means to attenuate the expression of any protein for which the cDNA sequence is known. Individual siRNAs can be chemically synthesized, in vitro-transcribed, or expressed in cells from siRNA expression vectors. However, screening for the most efficient siRNAs for post-transcriptional gene silencing in cells in culture is a laborious and expensive process. In this study, the effectiveness of two siRNA production strategies for the attenuation of abundant proteins for DNA repair were compared in human cells: (a) the in vitro production of siRNA mixtures by the Dicer enzyme (Diced siRNAs); and (b) the chemical synthesis of very specific and unique siRNA sequences (Stealth RNai (TM)). Materials, Methods & Results: For in vitro-produced siRNAs, two segments of the human Ku70 (167 bp in exon 5; and 249 bp in exon 13; NM001469) and Xrcc4 (172 bp in exon 2; and 108 bp in exon 6; NM003401) genes were chosen to generate dsRNA for subsequent "Dicing" to create mixtures of siRNAs. The Diced fragments of siRNA for each gene sequence were pooled and stored at -80 degrees C. Alternatively, chemically synthesized Stealth siRNAs were designed and generated to match two very specific gene sequence regions for each target gene of interest (Ku70 and Xrcc4). HCT116 cells were plated at 30% confluence in 24- or 6-well culture plates. The next day, cells were transfected by lipofection with either Diced or Stealth siRNAs for Ku70 or Xrcc4, in duplicate, at various doses, with blank and sham transfections used as controls. Cells were harvested at 0, 24, 48, 72 and 96 h post-transfection for protein determination. The knockdown of specific targeted gene products was quantified by Western blot using GAPDH as control. Transfection of gene-specific siRNA to either Ku70 or Xrcc4 with both Diced and Stealth siRNAs resulted in a down regulation of the targeted proteins to approximately 10 to 20% of control levels 48 h after transfection, with recovery to pre-treatment levels by 96 h. Discussion: By transfecting cells with Diced or chemically synthesized Stealth siRNAs, Ku70 and Xrcc4, two highly expressed proteins in cells, were effectively attenuated, demonstrating the great potential for the use of both siRNA production strategies as tools to perform loss of function experiments in mammalian cells. In fact, down-regulation of Ku70 and Xrcc4 has been shown to reduce the activity of the non-homologous end joining DNA pathway, a very desirable approach for the use of homologous recombination technology for gene targeting or knockout studies. Stealth RNAi (TM) was developed to achieve high specificity and greater stability when compared with mixtures of enzymatically-produced (Diced) siRNA fragments. In this study, both siRNA approaches inhibited the expression of Ku70 and Xrcc4 gene products, with no detectable toxic effects to the cells in culture. However, similar knockdown effects using Diced siRNAs were only attained at concentrations 10-fold higher than with Stealth siRNAs. The application of RNAi technology will expand and continue to provide new insights into gene regulation and as potential applications for new therapies, transgenic animal production and basic research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aims. We studied four young star clusters to characterise their anomalous extinction or variable reddening and asses whether they could be due to contamination by either dense clouds or circumstellar effects. Methods. We evaluated the extinction law (R-V) by adopting two methods: (i) the use of theoretical expressions based on the colour-excess of stars with known spectral type; and (ii) the analysis of two-colour diagrams, where the slope of the observed colour distribution was compared to the normal distribution. An algorithm to reproduce the zero-age main-sequence (ZAMS) reddened colours was developed to derive the average visual extinction (A(V)) that provides the closest fit to the observational data. The structure of the clouds was evaluated by means of a statistical fractal analysis, designed to compare their geometric structure with the spatial distribution of the cluster members. Results. The cluster NGC 6530 is the only object of our sample affected by anomalous extinction. On average, the other clusters suffer normal extinction, but several of their members, mainly in NGC 2264, seem to have high R-V, probably because of circumstellar effects. The ZAMS fitting provides A(V) values that are in good agreement with those found in the literature. The fractal analysis shows that NGC 6530 has a centrally concentrated distribution of stars that differs from the substructures found in the density distribution of the cloud projected in the A(V) map, suggesting that the original cloud was changed by the cluster formation. However, the fractal dimension and statistical parameters of Berkeley 86, NGC 2244, and NGC 2264 indicate that there is a good cloud-cluster correlation, when compared to other works based on an artificial distribution of points.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Citrus leprosis, caused by Citrus leprosis virus C (CiLV-C), is currently considered the most important viral disease in the Brazilian citrus industry due to the high costs required for the chemical control of its vector, the mite Brevipalpus phoenicis. The pathogen induces a non-systemic infection and the disease is characterized by the appearance of localized lesions on citrus leaves, stems and fruits, premature fruit and leaf drop and dieback of stems. Attempts were made to promote in vitro expression of the putative cell-to-cell movement protein of CiLV-C in Escherichia coli and to produce a specific polyclonal antibody against this protein as a tool to investigate the virus-plant-vector relationship. The antibody reacted strongly with the homologous protein expressed in vitro by ELISA, but poorly with the native protein present in leaf lesion extracts from sweet orange caused by CiLV-C. Reactions from old lesions were more intense than those from young lesions. Western blot and in situ immunolocalization assays failed to detect the native protein. These results suggest low expression of the movement protein (MP) in host tissues. Moreover, it is possible that the conformation of the protein expressed in vitro and used to produce the antibody differs from that of the native MP, hindering a full recognition of the latter.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cowpea aphid-borne mosaic virus (CABMV) causes major diseases in cowpea and passion flower plants in Brazil and also in other countries. CABMV has also been isolated from leguminous species including, Cassia hoffmannseggii, Canavalia rosea, Crotalaria juncea and Arachis hypogaea in Brazil. The virus seems to be adapted to two distinct families, the Passifloraceae and Fabaceae. Aiming to identify CABMV and elucidate a possible host adaptation of this virus species, isolates from cowpea, passion flower and C.hoffmannseggii collected in the states of Pernambuco and Rio Grande do Norte were analysed by sequencing the complete coat protein genes. A phylogenetic tree was constructed based on the obtained sequences and those available in public databases. Major Brazilian isolates from passion flower, independently of the geographical distances among them, were grouped in three different clusters. The possible host adaptation was also observed in fabaceous-infecting CABMV Brazilian isolates. These host adaptations possibly occurred independently within Brazil, so all these clusters belong to a bigger Brazilian cluster. Nevertheless, African passion flower or cowpea-infecting isolates formed totally different clusters. These results showed that host adaptation could be one factor for CABMV evolution, although geographical isolation is a stronger factor.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In protein databases there is a substantial number of proteins structurally determined but without function annotation. Understanding the relationship between function and structure can be useful to predict function on a large scale. We have analyzed the similarities in global physicochemical parameters for a set of enzymes which were classified according to the four Enzyme Commission (EC) hierarchical levels. Using relevance theory we introduced a distance between proteins in the space of physicochemical characteristics. This was done by minimizing a cost function of the metric tensor built to reflect the EC classification system. Using an unsupervised clustering method on a set of 1025 enzymes, we obtained no relevant clustering formation compatible with EC classification. The distance distributions between enzymes from the same EC group and from different EC groups were compared by histograms. Such analysis was also performed using sequence alignment similarity as a distance. Our results suggest that global structure parameters are not sufficient to segregate enzymes according to EC hierarchy. This indicates that features essential for function are rather local than global. Consequently, methods for predicting function based on global attributes should not obtain high accuracy in main EC classes prediction without relying on similarities between enzymes from training and validation datasets. Furthermore, these results are consistent with a substantial number of studies suggesting that function evolves fundamentally by recruitment, i.e., a same protein motif or fold can be used to perform different enzymatic functions and a few specific amino acids (AAs) are actually responsible for enzyme activity. These essential amino acids should belong to active sites and an effective method for predicting function should be able to recognize them. (C) 2012 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The low efficiency of gene transfer is a recurrent problem in DNA vaccine development and gene therapy studies using non-viral vectors such as plasmid DNA (pDNA). This is mainly due to the fact that during their traffic to the target cell's nuclei, plasmid vectors must overcome a series of physical, enzymatic and diffusional barriers. The main objective of this work is the development of recombinant proteins specifically designed for pDNA delivery, which take advantage of molecular motors like dynein, for the transport of cargos from the periphery to the centrosome of mammalian cells. A DNA binding sequence was fused to the N-terminus of the recombinant human dynein light chain LC8. Expression studies indicated that the fusion protein was correctly expressed in soluble form using E. coli BL21(DE3) strain. As expected, gel permeation assays found the purified protein mainly present as dimers, the functional oligomeric state of LC8. Gel retardation assays and atomic force microscopy proved the ability of the fusion protein to interact and condense pDNA. Zeta potential measurements indicated that LC8 with DNA binding domain (LD4) has an enhanced capacity to interact and condense pDNA, generating positively charged complexes. Transfection of cultured HeLa cells confirmed the ability of the LD4 to facilitate pDNA uptake and indicate the involvement of the retrograde transport in the intracellular trafficking of pDNA: LD4 complexes. Finally, cytotoxicity studies demonstrated a very low toxicity of the fusion protein vector, indicating the potential for in vivo applications. The study presented here is part of an effort to develop new modular shuttle proteins able to take advantage of strategies used by viruses to infect mammalian cells, aiming to provide new tools for gene therapy and DNA vaccination studies. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and Aim: The identification of gastric carcinomas (GC) has traditionally been based on histomorphology. Recently, DNA microarrays have successfully been used to identify tumors through clustering of the expression profiles. Random forest clustering is widely used for tissue microarrays and other immunohistochemical data, because it handles highly-skewed tumor marker expressions well, and weighs the contribution of each marker according to its relatedness with other tumor markers. In the present study, we e identified biologically- and clinically-meaningful groups of GC by hierarchical clustering analysis of immunohistochemical protein expression. Methods: We selected 28 proteins (p16, p27, p21, cyclin D1, cyclin A, cyclin B1, pRb, p53, c-met, c-erbB-2, vascular endothelial growth factor, transforming growth factor [TGF]-beta I, TGF-beta II, MutS homolog-2, bcl-2, bax, bak, bcl-x, adenomatous polyposis coli, clathrin, E-cadherin, beta-catenin, mucin (MUC) 1, MUC2, MUC5AC, MUC6, matrix metalloproteinase [ MMP]-2, and MMP-9) to be investigated by immunohistochemistry in 482 GC. The analyses of the data were done using a random forest-clustering method. Results: Proteins related to cell cycle, growth factor, cell motility, cell adhesion, apoptosis, and matrix remodeling were highly expressed in GC. We identified protein expressions associated with poor survival in diffuse-type GC. Conclusions: Based on the expression analysis of 28 proteins, we identified two groups of GC that could not be explained by any clinicopathological variables, and a subgroup of long-surviving diffuse-type GC patients with a distinct molecular profile. These results provide not only a new molecular basis for understanding the biological properties of GC, but also better prediction of survival than the classic pathological grouping.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background The gene coding for the uncharacterized protein PAB1135 in the archaeon Pyrococcus abyssi is in the same operon as the ribonuclease P (RNase P) subunit Rpp30. Findings Here we report the expression, purification and structural analysis of PAB1135. We analyzed the interaction of PAB1135 with RNA and show that it binds efficiently double-stranded RNAs in a non-sequence specific manner. We also performed molecular modeling of the PAB1135 structure using the crystal structure of the protein Af2318 from Archaeoglobus fulgidus (2OGK) as the template. Conclusions Comparison of this model has lead to the identification of a region in PAB1135 that could be involved in recognizing double-stranded RNA.