208 resultados para Functional Classification Trees
em Indian Institute of Science - Bangalore - Índia
Resumo:
Background: The hot dog fold has been found in more than sixty proteins since the first report of its existence about a decade ago. The fold appears to have a strong association with fatty acid biosynthesis, its regulation and metabolism, as the proteins with this fold are predominantly coenzyme A-binding enzymes with a variety of substrates located at their active sites. Results: We have analyzed the structural features and sequences of proteins having the hot dog fold. This study reveals that though the basic architecture of the fold is well conserved in these proteins, significant differences exist in their sequence, nature of substrate and oligomerization. Segments with certain conserved sequence motifs seem to play crucial structural and functional roles in various classes of these proteins. Conclusion: The analysis led to predictions regarding the functional classification and identification of possible catalytic residues of a number of hot dog fold-containing hypothetical proteins whose structures were determined in high throughput structural genomics projects.
Resumo:
Background The genome of a wide variety of prokaryotes contains the luxS gene homologue, which encodes for the protein S-ribosylhomocysteinelyase (LuxS). This protein is responsible for the production of the quorum sensing molecule, AI-2 and has been implicated in a variety of functions such as flagellar motility, metabolic regulation, toxin production and even in pathogenicity. A high structural similarity is present in the LuxS structures determined from a few species. In this study, we have modelled the structures from several other species and have investigated their dimer interfaces. We have attempted to correlate the interface features of LuxS with the phenotypic nature of the organisms. Results The protein structure networks (PSN) are constructed and graph theoretical analysis is performed on the structures obtained from X-ray crystallography and on the modelled ones. The interfaces, which are known to contain the active site, are characterized from the PSNs of these homodimeric proteins. The key features presented by the protein interfaces are investigated for the classification of the proteins in relation to their function. From our analysis, structural interface motifs are identified for each class in our dataset, which showed distinctly different pattern at the interface of LuxS for the probiotics and some extremophiles. Our analysis also reveals potential sites of mutation and geometric patterns at the interface that was not evident from conventional sequence alignment studies. Conclusion The structure network approach employed in this study for the analysis of dimeric interfaces in LuxS has brought out certain structural details at the side-chain interaction level, which were elusive from the conventional structure comparison methods. The results from this study provide a better understanding of the relation between the luxS gene and its functional role in the prokaryotes. This study also makes it possible to explore the potential direction towards the design of inhibitors of LuxS and thus towards a wide range of antimicrobials.
Resumo:
Protein Kinase-Like Non-kinases (PKLNKs), which are closely related to protein kinases, lack the crucial catalytic aspartate in the catalytic loop, and hence cannot function as protein kinase, have been analysed. Using various sensitive sequence analysis methods, we have recognized 82 PKLNKs from four higher eukaryotic organisms, namely, Homo sapiens, Mus musculus, Rattus norvegicus, and Drosophila melanogaster. On the basis of their domain combination and function, PKLNKs have been classified mainly into four categories: (1) Ligand binding PKLNKs, (2) PKLNKs with extracellular protein-protein interaction domain, (3) PKLNKs involved in dimerization, and (4) PKLNKs with cytoplasmic protein-protein interaction module. While members of the first two classes of PKLNKs have transmembrane domain tethered to the PKLNK domain, members of the other two classes of PKLNKs are cytoplasmic in nature. The current classification scheme hopes to provide a convenient framework to classify the PKLNKs from other eukaryotes which would be helpful in deciphering their roles in cellular processes.
Resumo:
While plants of a single species emit a diversity of volatile organic compounds (VOCs) to attract or repel interacting organisms, these specific messages may be lost in the midst of the hundreds of VOCs produced by sympatric plants of different species, many of which may have no signal content. Receivers must be able to reduce the babel or noise in these VOCs in order to correctly identify the message. For chemical ecologists faced with vast amounts of data on volatile signatures of plants in different ecological contexts, it is imperative to employ accurate methods of classifying messages, so that suitable bioassays may then be designed to understand message content. We demonstrate the utility of `Random Forests' (RF), a machine-learning algorithm, for the task of classifying volatile signatures and choosing the minimum set of volatiles for accurate discrimination, using datam from sympatric Ficus species as a case study. We demonstrate the advantages of RF over conventional classification methods such as principal component analysis (PCA), as well as data-mining algorithms such as support vector machines (SVM), diagonal linear discriminant analysis (DLDA) and k-nearest neighbour (KNN) analysis. We show why a tree-building method such as RF, which is increasingly being used by the bioinformatics, food technology and medical community, is particularly advantageous for the study of plant communication using volatiles, dealing, as it must, with abundant noise.
Resumo:
Establishing functional relationships between multi-domain protein sequences is a non-trivial task. Traditionally, delineating functional assignment and relationships of proteins requires domain assignments as a prerequisite. This process is sensitive to alignment quality and domain definitions. In multi-domain proteins due to multiple reasons, the quality of alignments is poor. We report the correspondence between the classification of proteins represented as full-length gene products and their functions. Our approach differs fundamentally from traditional methods in not performing the classification at the level of domains. Our method is based on an alignment free local matching scores (LMS) computation at the amino-acid sequence level followed by hierarchical clustering. As there are no gold standards for full-length protein sequence classification, we resorted to Gene Ontology and domain-architecture based similarity measures to assess our classification. The final clusters obtained using LMS show high functional and domain architectural similarities. Comparison of the current method with alignment based approaches at both domain and full-length protein showed superiority of the LMS scores. Using this method we have recreated objective relationships among different protein kinase sub-families and also classified immunoglobulin containing proteins where sub-family definitions do not exist currently. This method can be applied to any set of protein sequences and hence will be instrumental in analysis of large numbers of full-length protein sequences.
Resumo:
Background: The function of a protein can be deciphered with higher accuracy from its structure than from its amino acid sequence. Due to the huge gap in the available protein sequence and structural space, tools that can generate functionally homogeneous clusters using only the sequence information, hold great importance. For this, traditional alignment-based tools work well in most cases and clustering is performed on the basis of sequence similarity. But, in the case of multi-domain proteins, the alignment quality might be poor due to varied lengths of the proteins, domain shuffling or circular permutations. Multi-domain proteins are ubiquitous in nature, hence alignment-free tools, which overcome the shortcomings of alignment-based protein comparison methods, are required. Further, existing tools classify proteins using only domain-level information and hence miss out on the information encoded in the tethered regions or accessory domains. Our method, on the other hand, takes into account the full-length sequence of a protein, consolidating the complete sequence information to understand a given protein better. Results: Our web-server, CLAP (Classification of Proteins), is one such alignment-free software for automatic classification of protein sequences. It utilizes a pattern-matching algorithm that assigns local matching scores (LMS) to residues that are a part of the matched patterns between two sequences being compared. CLAP works on full-length sequences and does not require prior domain definitions. Pilot studies undertaken previously on protein kinases and immunoglobulins have shown that CLAP yields clusters, which have high functional and domain architectural similarity. Moreover, parsing at a statistically determined cut-off resulted in clusters that corroborated with the sub-family level classification of that particular domain family. Conclusions: CLAP is a useful protein-clustering tool, independent of domain assignment, domain order, sequence length and domain diversity. Our method can be used for any set of protein sequences, yielding functionally relevant clusters with high domain architectural homogeneity. The CLAP web server is freely available for academic use at http://nslab.mbu.iisc.ernet.in/clap/.
Resumo:
A gradient in the density of hyperpolarization-activated cyclic-nucleotide gated (HCN) channels is necessary for the emergence of several functional maps within hippocampal pyramidal neurons. Here, we systematically analyzed the impact of dendritic atrophy on nine such functional maps, related to input resistance and local/transfer impedance properties, using conductance-based models of hippocampal pyramidal neurons. We introduced progressive dendritic atrophy in a CA1 pyramidal neuron reconstruction through a pruning algorithm, measured all functional maps in each pruned reconstruction, and arrived at functional forms for the dependence of underlying measurements on dendritic length. We found that, across frequencies, atrophied neurons responded with higher efficiency to incoming inputs, and the transfer of signals across the dendritic tree was more effective in an atrophied reconstruction. Importantly, despite the presence of identical HCN-channel density gradients, spatial gradients in input resistance, local/transfer resonance frequencies and impedance profiles were significantly constricted in reconstructions with dendrite atrophy, where these physiological measurements across dendritic locations converged to similar values. These results revealed that, in atrophied dendritic structures, the presence of an ion channel density gradient alone was insufficient to sustain homologous functional maps along the same neuronal topograph. We assessed the biophysical basis for these conclusions and found that this atrophy-induced constriction of functional maps was mediated by an enhanced spatial spread of the influence of an HCN-channel cluster in atrophied trees. These results demonstrated that the influence fields of ion channel conductances need to be localized for channel gradients to express themselves as homologous functional maps, suggesting that ion channel gradients are necessary but not sufficient for the emergence of functional maps within single neurons.
Resumo:
In this paper, we present a machine learning approach to measure the visual quality of JPEG-coded images. The features for predicting the perceived image quality are extracted by considering key human visual sensitivity (HVS) factors such as edge amplitude, edge length, background activity and background luminance. Image quality assessment involves estimating the functional relationship between HVS features and subjective test scores. The quality of the compressed images are obtained without referring to their original images ('No Reference' metric). Here, the problem of quality estimation is transformed to a classification problem and solved using extreme learning machine (ELM) algorithm. In ELM, the input weights and the bias values are randomly chosen and the output weights are analytically calculated. The generalization performance of the ELM algorithm for classification problems with imbalance in the number of samples per quality class depends critically on the input weights and the bias values. Hence, we propose two schemes, namely the k-fold selection scheme (KS-ELM) and the real-coded genetic algorithm (RCGA-ELM) to select the input weights and the bias values such that the generalization performance of the classifier is a maximum. Results indicate that the proposed schemes significantly improve the performance of ELM classifier under imbalance condition for image quality assessment. The experimental results prove that the estimated visual quality of the proposed RCGA-ELM emulates the mean opinion score very well. The experimental results are compared with the existing JPEG no-reference image quality metric and full-reference structural similarity image quality metric.
Resumo:
When freshly starved amoebae of Dictyostelium discoideum are loaded with the Ca2+-specific dye indo-1/AM and analyzed in a fluorescence-activated cell sorter, they exhibit a quasi-bimodal distribution of fluorescence. This permits a separation of the population into two classes: H, or ''high Ca2+-indo-1 fluorescence,'' and L, or ''low Ca2+-indo-1 fluorescence.'' Simultaneous monitoring of Ca2+-indo-1 and Ca2+-chlortetracycline fluorescence shows that by and large the same cells tend to have high (or low) levels of both cytoplasmic and sequestered Ca2+. Next we label H cells with tetramethylrhodamine isothiocyanate (TRITC) and mix them in a 1:4 ratio with L cells, In the slugs that result, TRITC fluorescence is confined mainly to the anterior prestalk region. This implies that amoebae with relatively high Ca2+ at the vegetative stage tend to develop into prestalk cells and those with low Ca2+ into prespores. Polysphondylium violaceum, a cellular slime mold that does not possess prestalk and prespore cells, also does not display a Ca2+-dependent heterogeneity at the vegetative stage or in slugs. Finally, confirming earlier findings with the fluorophore fura-2 (Azhar ef al., Curr. Sci. 68, 337-342 (1995)), a prestalk-prespore difference in cellular Ca2+ is present in the cells of the slug in vivo. These findings are discussed in light of the possible roles of Ca2+ for cell differentiation in D. discoideum.
Resumo:
1 Species-accumulation curves for woody plants were calculated in three tropical forests, based on fully mapped 50-ha plots in wet, old-growth forest in Peninsular Malaysia, in moist, old-growth forest in central Panama, and in dry, previously logged forest in southern India. A total of 610 000 stems were identified to species and mapped to < Im accuracy. Mean species number and stem number were calculated in quadrats as small as 5 m x 5 m to as large as 1000 m x 500 m, for a variety of stem sizes above 10 mm in diameter. Species-area curves were generated by plotting species number as a function of quadrat size; species-individual curves were generated from the same data, but using stem number as the independent variable rather than area. 2 Species-area curves had different forms for stems of different diameters, but species-individual curves were nearly independent of diameter class. With < 10(4) stems, species-individual curves were concave downward on log-log plots, with curves from different forests diverging, but beyond about 104 stems, the log-log curves became nearly linear, with all three sites having a similar slope. This indicates an asymptotic difference in richness between forests: the Malaysian site had 2.7 times as many species as Panama, which in turn was 3.3 times as rich as India. 3 Other details of the species-accumulation relationship were remarkably similar between the three sites. Rectangular quadrats had 5-27% more species than square quadrats of the same area, with longer and narrower quadrats increasingly diverse. Random samples of stems drawn from the entire 50 ha had 10-30% more species than square quadrats with the same number of stems. At both Pasoh and BCI, but not Mudumalai. species richness was slightly higher among intermediate-sized stems (50-100mm in diameter) than in either smaller or larger sizes, These patterns reflect aggregated distributions of individual species, plus weak density-dependent forces that tend to smooth the species abundance distribution and 'loosen' aggregations as stems grow. 4 The results provide support for the view that within each tree community, many species have their abundance and distribution guided more by random drift than deterministic interactions. The drift model predicts that the species-accumulation curve will have a declining slope on a log-log plot, reaching a slope of O.1 in about 50 ha. No other model of community structure can make such a precise prediction. 5 The results demonstrate that diversity studies based on different stem diameters can be compared by sampling identical numbers of stems. Moreover, they indicate that stem counts < 1000 in tropical forests will underestimate the percentage difference in species richness between two diverse sites. Fortunately, standard diversity indices (Fisher's sc, Shannon-Wiener) captured diversity differences in small stem samples more effectively than raw species richness, but both were sample size dependent. Two nonparametric richness estimators (Chao. jackknife) performed poorly, greatly underestimating true species richness.
Resumo:
In routine industrial design, fatigue life estimation is largely based on S-N curves and ad hoc cycle counting algorithms used with Miner's rule for predicting life under complex loading. However, there are well known deficiencies of the conventional approach. Of the many cumulative damage rules that have been proposed, Manson's Double Linear Damage Rule (DLDR) has been the most successful. Here we follow up, through comparisons with experimental data from many sources, on a new approach to empirical fatigue life estimation (A Constructive Empirical Theory for Metal Fatigue Under Block Cyclic Loading', Proceedings of the Royal Society A, in press). The basic modeling approach is first described: it depends on enforcing mathematical consistency between predictions of simple empirical models that include indeterminate functional forms, and published fatigue data from handbooks. This consistency is enforced through setting up and (with luck) solving a functional equation with three independent variables and six unknown functions. The model, after eliminating or identifying various parameters, retains three fitted parameters; for the experimental data available, one of these may be set to zero. On comparison against data from several different sources, with two fitted parameters, we find that our model works about as well as the DLDR and much better than Miner's rule. We finally discuss some ways in which the model might be used, beyond the scope of the DLDR.
Resumo:
Background: Phosphorylation by protein kinases is a common event in many cellular processes. Further, many kinases perform specialized roles and are regulated by non-kinase domains tethered to kinase domain. Perturbation in the regulation of kinases leads to malignancy. We have identified and analysed putative protein kinases encoded in the genome of chimpanzee which is a close evolutionary relative of human. Result: The shared core biology between chimpanzee and human is characterized by many orthologous protein kinases which are involved in conserved pathways. Domain architectures specific to chimp/human kinases have been observed. Chimp kinases with unique domain architectures are characterized by deletion of one or more non-kinase domains in the human kinases. Interestingly, counterparts of some of the multi-domain human kinases in chimp are characterized by identical domain architectures but with kinase-like non-kinase domain. Remarkably, out of 587 chimpanzee kinases no human orthologue with greater than 95% sequence identity could be identified for 160 kinases. Variations in chimpanzee kinases compared to human kinases are brought about also by differences in functions of domains tethered to the catalytic kinase domain. For example, the heterodimer forming PB1 domain related to the fold of ubiquitin/Ras-binding domain is seen uniquely tethered to PKC-like chimpanzee kinase. Conclusion: Though the chimpanzee and human are evolutionary very close, there are chimpanzee kinases with no close counterpart in the human suggesting differences in their functions. This analysis provides a direction for experimental analysis of human and chimpanzee protein kinases in order to enhance our understanding on their specific biological roles.
Resumo:
SHMT (serine hydoxymethyltransferase), a type I pyridoxal 5'-phosphate-dependent enzyme, catalyses the conversion of L-serine and THF (tetrahydrofolate) into glycine and 5,10-methylene THE SHMT also catalyses several THF-independent side reactions such as cleavage of P-hydroxy amino acids, trans-amination, racemization and decarboxylation. In the present study, the residues Asn(341), Tyr(60) and Phe(351), which are likely to influence THF binding, were mutated to alanine, alanine and glycine respectively, to elucidate the role of these residues in THF-dependent and -independent reactions catalysed by SHMT. The N341A and Y60A bsSHMT (Bacillus stearothermophilus SHMT) mutants were inactive for the THF-dependent activity, while the mutations had no effect on THF-independent activity. However, mutation of Phe(351) to glycine did not have any effect oil either of the activities. The crystal structures of the glycine binary complexes of the mutants showed that N341A bsSHMT forms an external aldimine as in bsSHMT, whereas Y60A and F351G bsSHMTs exist as a Mixture of internal/external aldimine and gem-diamine forms. Crystal structures of all of the three Mutants obtained in the presence of L-allo-threonine were similar to the respective glycine binary complexes. The structure of the ternary complex of F351G bsSHMT with glycine and FTHF (5-formyl THF) showed that the monoglutamate side chain of FTHF is ordered in both the subunits of the asymmetric unit, unlike in the wild-type bsSHMT. The present studies demonstrate that the residues Asn(341) and Tyr(60) are pivotal for the binding of THF/FTHF, whereas Phe(351) is responsible for the asymmetric binding of FTHF in the two subunits of the dimer.
Resumo:
Remote sensing provides a lucid and effective means for crop coverage identification. Crop coverage identification is a very important technique, as it provides vital information on the type and extent of crop cultivated in a particular area. This information has immense potential in the planning for further cultivation activities and for optimal usage of the available fertile land. As the frontiers of space technology advance, the knowledge derived from the satellite data has also grown in sophistication. Further, image classification forms the core of the solution to the crop coverage identification problem. No single classifier can prove to satisfactorily classify all the basic crop cover mapping problems of a cultivated region. We present in this paper the experimental results of multiple classification techniques for the problem of crop cover mapping of a cultivated region. A detailed comparison of the algorithms inspired by social behaviour of insects and conventional statistical method for crop classification is presented in this paper. These include the Maximum Likelihood Classifier (MLC), Particle Swarm Optimisation (PSO) and Ant Colony Optimisation (ACO) techniques. The high resolution satellite image has been used for the experiments.
Resumo:
A number of studies in yeast have shown that DNA topoisomerase TI is essential for chromosome condensation and disjunction during mitosis at the metaphase/anaphase transition and meiosis I. Accordingly, kinetic and mechanistic studies have implied a role for topoisomerase rr in chromosome disjunction. As a step toward understanding the nature and role of topoisomerase II in a mammalian germline in vivo, we have purified topoisomerase II from rat testis to homogeneity and ascertained several of its catalytic activities in conjunction with that of the purified enzyme from liver. The purified enzymes appeared to be monomers under denaturing conditions; however, they differed in their relative molecular mass. Topoisomerase II from testis and liver have apparent molecular masses of 150 +/- 10 kDa and 160 +/- 10 kDa, respectively. The native molecular mass of testis topoisomerase II as assayed by immunoblot analysis of cell-foe extracts, prepared in the presence of SDS and a number of protease inhibitors, corroborated with the size of the purified enzyme. Both enzymes are able to promote decatenation and relax supercoiled DNA substrates in an ATP and Mg2+-dependent manner. However, quantitative comparison of catalytic properties of topoisomerase II from testis with that of the enzyme from liver displayed significant differences in their efficiencies. Optimal pH values for testis enzyme are 6.5 to 8.5 while they are 6 to 7.5 for the liver enzyme. Intriguingly, the relaxation activity of liver topoisomerase II was inhibited by potassium glutamate at 1 M, whereas testis enzyme required about half its concentration. These findings argue that topoisomerase II from rat testis is structurally distinct from that of its somatic form and the functional differences between the two enzymes parallels with the physiological environment that is unique to these two tissues.