Biblioteca Digital

16 resultados para recall

em Indian Institute of Science - Bangalore - Índia

High-quality annotation of promoter regions for 913 bacterial genomes

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Motivation: The number of bacterial genomes being sequenced is increasing very rapidly and hence, it is crucial to have procedures for rapid and reliable annotation of their functional elements such as promoter regions, which control the expression of each gene or each transcription unit of the genome. The present work addresses this requirement and presents a generic method applicable across organisms. Results: Relative stability of the DNA double helical sequences has been used to discriminate promoter regions from non-promoter regions. Based on the difference in stability between neighboring regions, an algorithm has been implemented to predict promoter regions on a large scale over 913 microbial genome sequences. The average free energy values for the promoter regions as well as their downstream regions are found to differ, depending on their GC content. Threshold values to identify promoter regions have been derived using sequences flanking a subset of translation start sites from all microbial genomes and then used to predict promoters over the complete genome sequences. An average recall value of 72% (which indicates the percentage of protein and RNA coding genes with predicted promoter regions assigned to them) and precision of 56% is achieved over the 913 microbial genome dataset.

Neural network modeling of associative memory: Beyond the Hopfield model

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A number of neural network models, in which fixed-point and limit-cycle attractors of the underlying dynamics are used to store and associatively recall information, are described. In the first class of models, a hierarchical structure is used to store an exponentially large number of strongly correlated memories. The second class of models uses limit cycles to store and retrieve individual memories. A neurobiologically plausible network that generates low-amplitude periodic variations of activity, similar to the oscillations observed in electroencephalographic recordings, is also described. Results obtained from analytic and numerical studies of the properties of these networks are discussed.

DNA Free Energy-Based Promoter Prediction and Comparative Analysis of Arabidopsis and Rice Genomes

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The cis-regulatory regions on DNA serve as binding sites for proteins such as transcription factors and RNA polymerase. The combinatorial interaction of these proteins plays a crucial role in transcription initiation, which is an important point of control in the regulation of gene expression. We present here an analysis of the performance of an in silico method for predicting cis-regulatory regions in the plant genomes of Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) on the basis of free energy of DNA melting. For protein-coding genes, we achieve recall and precision of 96% and 42% for Arabidopsis and 97% and 31% for rice, respectively. For noncoding RNA genes, the program gives recall and precision of 94% and 75% for Arabidopsis and 95% and 90% for rice, respectively. Moreover, 96% of the false-positive predictions were located in noncoding regions of primary transcripts, out of which 20% were found in the first intron alone, indicating possible regulatory roles. The predictions for orthologous genes from the two genomes showed a good correlation with respect to prediction scores and promoter organization. Comparison of our results with an existing program for promoter prediction in plant genomes indicates that our method shows improved prediction capability.

Characterization of structural and free energy properties of promoters associated with Primary and Operon TSS in Helicobacter pylori genome and their orthologs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Promoter regions in the genomes of all domains of life show similar trends in several structural properties such as stability, bendability, curvature, etc. In current study we analysed the stability and bendability of various classes of promoter regions (based on the recent identification of different classes of transcription start sites) of Helicobacter pylori 26695 strain. It is found that primary TSS and operon-associated TSS promoters show significantly strong features in their promoter regions. DNA free-energy-based promoter prediction tool PromPredict was used to annotate promoters of different classes, and very high recall values (similar to 80%) are obtained for primary TSS. Orthologous genes from other strains of H. pylori show conservation of structural properties in promoter regions as well as coding regions. PromPredict annotates promoters of orthologous genes with very high recall and precision.

Multi-script and multi-oriented text localization from scene images

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes a new method of color text localization from generic scene images containing text of different scripts and with arbitrary orientations. A representative set of colors is first identified using the edge information to initiate an unsupervised clustering algorithm. Text components are identified from each color layer using a combination of a support vector machine and a neural network classifier trained on a set of low-level features derived from the geometric, boundary, stroke and gradient information. Experiments on camera-captured images that contain variable fonts, size, color, irregular layout, non-uniform illumination and multiple scripts illustrate the robustness of the method. The proposed method yields precision and recall of 0.8 and 0.86 respectively on a database of 100 images. The method is also compared with others in the literature using the ICDAR 2003 robust reading competition dataset.

OTCYMIST: Otsu-Canny minimal spanning tree for born-digital images

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Text segmentation and localization algorithms are proposed for the born-digital image dataset. Binarization and edge detection are separately carried out on the three colour planes of the image. Connected components (CC's) obtained from the binarized image are thresholded based on their area and aspect ratio. CC's which contain sufficient edge pixels are retained. A novel approach is presented, where the text components are represented as nodes of a graph. Nodes correspond to the centroids of the individual CC's. Long edges are broken from the minimum spanning tree of the graph. Pair wise height ratio is also used to remove likely non-text components. A new minimum spanning tree is created from the remaining nodes. Horizontal grouping is performed on the CC's to generate bounding boxes of text strings. Overlapping bounding boxes are removed using an overlap area threshold. Non-overlapping and minimally overlapping bounding boxes are used for text segmentation. Vertical splitting is applied to generate bounding boxes at the word level. The proposed method is applied on all the images of the test dataset and values of precision, recall and H-mean are obtained using different approaches.

Evaluation of document binarization using eigen value decomposition

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A necessary step for the recognition of scanned documents is binarization, which is essentially the segmentation of the document. In order to binarize a scanned document, we can find several algorithms in the literature. What is the best binarization result for a given document image? To answer this question, a user needs to check different binarization algorithms for suitability, since different algorithms may work better for different type of documents. Manually choosing the best from a set of binarized documents is time consuming. To automate the selection of the best segmented document, either we need to use ground-truth of the document or propose an evaluation metric. If ground-truth is available, then precision and recall can be used to choose the best binarized document. What is the case, when ground-truth is not available? Can we come up with a metric which evaluates these binarized documents? Hence, we propose a metric to evaluate binarized document images using eigen value decomposition. We have evaluated this measure on DIBCO and H-DIBCO datasets. The proposed method chooses the best binarized document that is close to the ground-truth of the document.

A context aware collaborative service provisioning system for mobile-commerce

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the process of service provisioning, providing required service to the user without user intervention, with reduction of the cognitive over loading is a real challenge. In this paper we propose a user centred context aware collaborative service provisioning system, which make use of context along with collaboration to provide the required service to the user dynamically. The system uses a novel approach of query expansion along with interactive and rating matrix based collaboration. Performance of the system is evaluated in Mobile-Commerce environment. The results show that the system is time efficient and perform with better precision and recall in comparison with context aware system.

A reexamination of some puzzling results in linearized elasticity

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we analyse three commonly discussed `flaws' of linearized elasticity theory and attempt to resolve them. The first `flaw' concerns cylindrically orthotropic material models. Since the work of Lekhnitskii (1968), there has been a growing body of work that continues to this day, that shows that infinite stresses arise with the use of a cylindrically orthotropic material model even in the case of linearized elasticity. Besides infinite stresses, interpenetration of matter is also shown to occur. These infinite stresses and interpenetration occur when the ratio of the circumferential Young modulus to the radial Young modulus is less than one. If the ratio is greater than one, then the stresses at the center of a spinning disk are found to be zero (recall that for an isotropic material model, the stresses are maximum at the center). Thus, the stresses go abruptly from a maximum value to a value of zero as the ratio is increased to a value even slightly above one! One of the explanations provided for this extremely anomalous behaviour is the failure of linearized elasticity to satisfy material frame-indifference. However, if this is the true cause, then the anomalous behaviour should also occur with the use of an isotropic material model, where, no such anomalies are observed. We show that the real cause of the problem is elsewhere and also show how these anomalies can be resolved. We also discuss how the formulation of linearized elastodynamics in the case of small deformations superposed on a rigid motion can be given in a succinct manner. Finally, we show how the long-standing problem of devising three compatibility relations instead of six can be resolved.

Using Relationships for Matching Textual Domain Models with Existing Code

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We address the task of mapping a given textual domain model (e.g., an industry-standard reference model) for a given domain (e.g., ERP), with the source code of an independently developed application in the same domain. This has applications in improving the understandability of an existing application, migrating it to a more flexible architecture, or integrating it with other related applications. We use the vector-space model to abstractly represent domain model elements as well as source-code artifacts. The key novelty in our approach is to leverage the relationships between source-code artifacts in a principled way to improve the mapping process. We describe experiments wherein we apply our approach to the task of matching two real, open-source applications to corresponding industry-standard domain models. We demonstrate the overall usefulness of our approach, as well as the role of our propagation techniques in improving the precision and recall of the mapping task.

Predicting gene ontology annotations of orphan GWAS genes using protein-protein interactions

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: The number of genome-wide association studies (GWAS) has increased rapidly in the past couple of years, resulting in the identification of genes associated with different diseases. The next step in translating these findings into biomedically useful information is to find out the mechanism of the action of these genes. However, GWAS studies often implicate genes whose functions are currently unknown; for example, MYEOV, ANKLE1, TMEM45B and ORAOV1 are found to be associated with breast cancer, but their molecular function is unknown. Results: We carried out Bayesian inference of Gene Ontology (GO) term annotations of genes by employing the directed acyclic graph structure of GO and the network of protein-protein interactions (PPIs). The approach is designed based on the fact that two proteins that interact biophysically would be in physical proximity of each other, would possess complementary molecular function, and play role in related biological processes. Predicted GO terms were ranked according to their relative association scores and the approach was evaluated quantitatively by plotting the precision versus recall values and F-scores (the harmonic mean of precision and recall) versus varying thresholds. Precisions of similar to 58% and similar to 40% for localization and functions respectively of proteins were determined at a threshold of similar to 30 (top 30 GO terms in the ranked list). Comparison with function prediction based on semantic similarity among nodes in an ontology and incorporation of those similarities in a k nearest neighbor classifier confirmed that our results compared favorably. Conclusions: This approach was applied to predict the cellular component and molecular function GO terms of all human proteins that have interacting partners possessing at least one known GO annotation. The list of predictions is available at http://severus.dbmi.pitt.edu/engo/GOPRED.html. We present the algorithm, evaluations and the results of the computational predictions, especially for genes identified in GWAS studies to be associated with diseases, which are of translational interest.

The opportunity for sampling: the ecological context of female mate choice

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Female mate choice decisions, which influence sexual selection, involve complex interactions between the 2 sexes and the environment. Theoretical models predict that male movement and spacing in the field should influence female sampling tactics, and in turn, females should drive the evolution of male movement and spacing to sample them optimally. Theoretically, simultaneous sampling of males using the best-of-n or comparative Bayes strategy should yield maximum mating benefits to females. We examined the ecological context of female mate sampling based on acoustic signals in the tree cricket Oecanthus henryi to determine whether the conditions for such optimal strategies were met in the field. These strategies involve recall of the quality and location of individual males, which in turn requires male positions to be stable within a night. Calling males rarely moved within a night, potentially enabling female sampling strategies that require recall. To examine the possibility of simultaneous acoustic sampling of males, we estimated male acoustic active spaces using information on male spacing, call transmission, and female hearing threshold. Males were found to be spaced far apart, and active space overlap was rare. We then examined female sampling scenarios by studying female spacing relative to male acoustic active spaces. Only 15% of sampled females could hear multiple males, suggesting that simultaneous mate sampling is rare in the field. Moreover, the relatively large distances between calling males suggest high search costs, which may favor threshold strategies that do not require memory.

De novo inference of protein function from coarse-grained dynamics

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Inference of molecular function of proteins is the fundamental task in the quest for understanding cellular processes. The task is getting increasingly difficult with thousands of new proteins discovered each day. The difficulty arises primarily due to lack of high-throughput experimental technique for assessing protein molecular function, a lacunae that computational approaches are trying hard to fill. The latter too faces a major bottleneck in absence of clear evidence based on evolutionary information. Here we propose a de novo approach to annotate protein molecular function through structural dynamics match for a pair of segments from two dissimilar proteins, which may share even <10% sequence identity. To screen these matches, corresponding 1 mu s coarse-grained (CG) molecular dynamics trajectories were used to compute normalized root-mean-square-fluctuation graphs and select mobile segments, which were, thereafter, matched for all pairs using unweighted three-dimensional autocorrelation vectors. Our in-house custom-built forcefield (FF), extensively validated against dynamics information obtained from experimental nuclear magnetic resonance data, was specifically used to generate the CG dynamics trajectories. The test for correspondence of dynamics-signature of protein segments and function revealed 87% true positive rate and 93.5% true negative rate, on a dataset of 60 experimentally validated proteins, including moonlighting proteins and those with novel functional motifs. A random test against 315 unique fold/function proteins for a negative test gave >99% true recall. A blind prediction on a novel protein appears consistent with additional evidences retrieved therein. This is the first proof-of-principle of generalized use of structural dynamics for inferring protein molecular function leveraging our custom-made CG FF, useful to all. (C) 2014 Wiley Periodicals, Inc.

Mining Unit Tests for Discovery and Migration of Math APIs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Today's programming languages are supported by powerful third-party APIs. For a given application domain, it is common to have many competing APIs that provide similar functionality. Programmer productivity therefore depends heavily on the programmer's ability to discover suitable APIs both during an initial coding phase, as well as during software maintenance. The aim of this work is to support the discovery and migration of math APIs. Math APIs are at the heart of many application domains ranging from machine learning to scientific computations. Our approach, called MATHFINDER, combines executable specifications of mathematical computations with unit tests (operational specifications) of API methods. Given a math expression, MATHFINDER synthesizes pseudo-code comprised of API methods to compute the expression by mining unit tests of the API methods. We present a sequential version of our unit test mining algorithm and also design a more scalable data-parallel version. We perform extensive evaluation of MATHFINDER (1) for API discovery, where math algorithms are to be implemented from scratch and (2) for API migration, where client programs utilizing a math API are to be migrated to another API. We evaluated the precision and recall of MATHFINDER on a diverse collection of math expressions, culled from algorithms used in a wide range of application areas such as control systems and structural dynamics. In a user study to evaluate the productivity gains obtained by using MATHFINDER for API discovery, the programmers who used MATHFINDER finished their programming tasks twice as fast as their counterparts who used the usual techniques like web and code search, IDE code completion, and manual inspection of library documentation. For the problem of API migration, as a case study, we used MATHFINDER to migrate Weka, a popular machine learning library. Overall, our evaluation shows that MATHFINDER is easy to use, provides highly precise results across several math APIs and application domains even with a small number of unit tests per method, and scales to large collections of unit tests.

Identifying functionally important cis-peptide containing segments in proteins and their utility in molecular function annotation

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Cis-peptide embedded segments are rare in proteins but often highlight their important role in molecular function when they do occur. The high evolutionary conservation of these segments illustrates this observation almost universally, although no attempt has been made to systematically use this information for the purpose of function annotation. In the present study, we demonstrate how geometric clustering and level-specific Gene Ontology molecular-function terms (also known as annotations) can be used in a statistically significant manner to identify cis-embedded segments in a protein linked to its molecular function. The present study identifies novel cis-peptide fragments, which are subsequently used for fragment-based function annotation. Annotation recall benchmarks interpreted using the receiver-operator characteristic plot returned an area-under-curve >0.9, corroborating the utility of the annotation method. In addition, we identified cis-peptide fragments occurring in conjunction with functionally important trans-peptide fragments, providing additional insights into molecular function. We further illustrate the applicability of our method in function annotation where homology-based annotation transfer is not possible. The findings of the present study add to the repertoire of function annotation approaches and also facilitate engineering, design and allied studies around the cis-peptide neighborhood of proteins.

«
1
2
»