24 resultados para Biology, Bioinformatics|Computer Science


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In computational molecular biology, the aim of restriction mapping is to locate the restriction sites of a given enzyme on a DNA molecule. Double digest and partial digest are two well-studied techniques for restriction mapping. While double digest is NP-complete, there is no known polynomial-time algorithm for partial digest. Another disadvantage of the above techniques is that there can be multiple solutions for reconstruction. In this paper, we study a simple technique called labeled partial digest for restriction mapping. We give a fast polynomial time (O(n(2) log n) worst-case) algorithm for finding all the n sites of a DNA molecule using this technique. An important advantage of the algorithm is the unique reconstruction of the DNA molecule from the digest. The technique is also robust in handling errors in fragment lengths which arises in the laboratory. We give a robust O(n(4)) worst-case algorithm that can provably tolerate an absolute error of O(Delta/n) (where Delta is the minimum inter-site distance), while giving a unique reconstruction. We test our theoretical results by simulating the performance of the algorithm on a real DNA molecule. Motivated by the similarity to the labeled partial digest problem, we address a related problem of interest-the de novo peptide sequencing problem (ACM-SIAM Symposium on Discrete Algorithms (SODA), 2000, pp. 389-398), which arises in the reconstruction of the peptide sequence of a protein molecule. We give a simple and efficient algorithm for the problem without using dynamic programming. The algorithm runs in time O(k log k), where k is the number of ions and is an improvement over the algorithm in Chen et al. (C) 2002 Elsevier Science (USA). All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Temporal analysis of gene expression data has been limited to identifying genes whose expression varies with time and/or correlation between genes that have similar temporal profiles. Often, the methods do not consider the underlying network constraints that connect the genes. It is becoming increasingly evident that interactions change substantially with time. Thus far, there is no systematic method to relate the temporal changes in gene expression to the dynamics of interactions between them. Information on interaction dynamics would open up possibilities for discovering new mechanisms of regulation by providing valuable insight into identifying time-sensitive interactions as well as permit studies on the effect of a genetic perturbation. Results: We present NETGEM, a tractable model rooted in Markov dynamics, for analyzing the dynamics of the interactions between proteins based on the dynamics of the expression changes of the genes that encode them. The model treats the interaction strengths as random variables which are modulated by suitable priors. This approach is necessitated by the extremely small sample size of the datasets, relative to the number of interactions. The model is amenable to a linear time algorithm for efficient inference. Using temporal gene expression data, NETGEM was successful in identifying (i) temporal interactions and determining their strength, (ii) functional categories of the actively interacting partners and (iii) dynamics of interactions in perturbed networks. Conclusions: NETGEM represents an optimal trade-off between model complexity and data requirement. It was able to deduce actively interacting genes and functional categories from temporal gene expression data. It permits inference by incorporating the information available in perturbed networks. Given that the inputs to NETGEM are only the network and the temporal variation of the nodes, this algorithm promises to have widespread applications, beyond biological systems. The source code for NETGEM is available from https://github.com/vjethava/NETGEM

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A computer-aided procedure is described for analyzing the reliability of complicated networks. This procedure breaks down a network into small subnetworks whose reliability can be more readily calculated. The subnetworks which are searched for are those with only two nodes; this allows the original network to be considerably simplified.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A computer-aided procedure is described for analyzing the reliability of complicated networks. This procedure breaks down a network into small subnetworks whose reliability can be more readily calculated. The subnetworks which are searched for are those with only two nodes; this allows the original network to be considerably simplified.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel algorithm for compression of single lead Electrocardiogram (ECG) signals. The method is based on Pole-Zero modelling of the Discrete Cosine Transformed (DCT) signal. An extension is proposed to the well known Steiglitz-Hcbride algorithm, to model the higher frequency components of the input signal more accurately. This is achieved by weighting the error function minimized by the algorithm to estimate the model parameters. The data compression achieved by the parametric model is further enhanced by Differential Pulse Code Modulation (DPCM) of the model parameters. The method accomplishes a compression ratio in the range of 1:20 to 1:40, which far exceeds those achieved by most of the current methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In many real world prediction problems the output is a structured object like a sequence or a tree or a graph. Such problems range from natural language processing to compu- tational biology or computer vision and have been tackled using algorithms, referred to as structured output learning algorithms. We consider the problem of structured classifi- cation. In the last few years, large margin classifiers like sup-port vector machines (SVMs) have shown much promise for structured output learning. The related optimization prob -lem is a convex quadratic program (QP) with a large num-ber of constraints, which makes the problem intractable for large data sets. This paper proposes a fast sequential dual method (SDM) for structural SVMs. The method makes re-peated passes over the training set and optimizes the dual variables associated with one example at a time. The use of additional heuristics makes the proposed method more efficient. We present an extensive empirical evaluation of the proposed method on several sequence learning problems.Our experiments on large data sets demonstrate that the proposed method is an order of magnitude faster than state of the art methods like cutting-plane method and stochastic gradient descent method (SGD). Further, SDM reaches steady state generalization performance faster than the SGD method. The proposed SDM is thus a useful alternative for large scale structured output learning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximizes the number of relevant objects retrieved at the absolute top of the list. The algorithm is a support vector style algorithm, but due to the different objective, it no longer leads to a quadratic programming problem. Instead, the dual optimization problem involves l1, ∞ constraints; we solve this dual problem using the recent l1, ∞ projection method of Quattoni et al (2009). Our algorithm can be viewed as an l∞-norm extreme of the lp-norm based algorithm of Rudin (2009) (albeit in a support vector setting rather than a boosting setting); thus we refer to the algorithm as the ‘Infinite Push’. Experiments on real-world data sets confirm the algorithm’s focus on accuracy at the absolute top of the list.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Aligning similar molecular structures is an important step in the process of bio-molecular structure and function analysis. Molecular surfaces are simple representations of molecular structure that are easily constructed from various forms of molecular data such as 3D atomic coordinates (PDB) and Electron Microscopy (EM) data. Methods: We present a Multi-Scale Morse-Smale Molecular-Surface Alignment tool, MS3ALIGN, which aligns molecular surfaces based on significant protrusions on the molecular surface. The input is a pair of molecular surfaces represented as triangle meshes. A key advantage of MS3ALIGN is computational efficiency that is achieved because it processes only a few carefully chosen protrusions on the molecular surface. Furthermore, the alignments are partial in nature and therefore allows for inexact surfaces to be aligned. Results: The method is evaluated in four settings. First, we establish performance using known alignments with varying overlap and noise values. Second, we compare the method with SurfComp, an existing surface alignment method. We show that we are able to determine alignments reported by SurfComp, as well as report relevant alignments not found by SurfComp. Third, we validate the ability of MS3ALIGN to determine alignments in the case of structurally dissimilar binding sites. Fourth, we demonstrate the ability of MS3ALIGN to align iso-surfaces derived from cryo-electron microscopy scans. Conclusions: We have presented an algorithm that aligns Molecular Surfaces based on the topology of surface curvature. Awebserver and standalone software implementation of the algorithm available at http://vgl.serc.iisc.ernet. in/ms3align.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Computer Assisted Assessment (CAA) has been existing for several years now. While some forms of CAA do not require sophisticated text understanding (e.g., multiple choice questions), there are also student answers that consist of free text and require analysis of text in the answer. Research towards the latter till date has concentrated on two main sub-tasks: (i) grading of essays, which is done mainly by checking the style, correctness of grammar, and coherence of the essay and (ii) assessment of short free-text answers. In this paper, we present a structured view of relevant research in automated assessment techniques for short free-text answers. We review papers spanning the last 15 years of research with emphasis on recent papers. Our main objectives are two folds. First we present the survey in a structured way by segregating information on dataset, problem formulation, techniques, and evaluation measures. Second we present a discussion on some of the potential future directions in this domain which we hope would be helpful for researchers.