964 resultados para alignment-free methods


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, we carried out a comparative analysis between two classical methodologies to prospect residue contacts in proteins: the traditional cutoff dependent (CD) approach and cutoff free Delaunay tessellation (DT). In addition, two alternative coarse-grained forms to represent residues were tested: using alpha carbon (CA) and side chain geometric center (GC). A database was built, comprising three top classes: all alpha, all beta, and alpha/beta. We found that the cutoff value? at about 7.0 A emerges as an important distance parameter.? Up to 7.0 A, CD and DT properties are unified, which implies that at this distance all contacts are complete and legitimate (not occluded). We also have shown that DT has an intrinsic missing edges problem when mapping the first layer of neighbors. In proteins, it may produce systematic errors affecting mainly the contact network in beta chains with CA. The almost-Delaunay (AD) approach has been proposed to solve this DT problem. We found that even AD may not be an advantageous solution. As a consequence, in the strict range up ? to 7.0 A, the CD approach revealed to be a simpler, more complete, and reliable technique than DT or AD. Finally, we have shown that coarse-grained residue representations may introduce bias in the analysis of neighbors in cutoffs up to ? 6.8 A, with CA favoring alpha proteins and GC favoring beta proteins. This provides an additional argument pointing to ? the value of 7.0 A as an important lower bound cutoff to be used in contact analysis of proteins.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Proteins are biochemical entities consisting of one or more blocks typically folded in a 3D pattern. Each block (a polypeptide) is a single linear sequence of amino acids that are biochemically bonded together. The amino acid sequence in a protein is defined by the sequence of a gene or several genes encoded in the DNA-based genetic code. This genetic code typically uses twenty amino acids, but in certain organisms the genetic code can also include two other amino acids. After linking the amino acids during protein synthesis, each amino acid becomes a residue in a protein, which is then chemically modified, ultimately changing and defining the protein function. In this study, the authors analyze the amino acid sequence using alignment-free methods, aiming to identify structural patterns in sets of proteins and in the proteome, without any other previous assumptions. The paper starts by analyzing amino acid sequence data by means of histograms using fixed length amino acid words (tuples). After creating the initial relative frequency histograms, they are transformed and processed in order to generate quantitative results for information extraction and graphical visualization. Selected samples from two reference datasets are used, and results reveal that the proposed method is able to generate relevant outputs in accordance with current scientific knowledge in domains like protein sequence/proteome analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have developed an alignment-free method that calculates phylogenetic distances using a maximum-likelihood approach for a model of sequence change on patterns that are discovered in unaligned sequences. To evaluate the phylogenetic accuracy of our method, and to conduct a comprehensive comparison of existing alignment-free methods (freely available as Python package decaf+py at http://www.bioinformatics.org.au), we have created a data set of reference trees covering a wide range of phylogenetic distances. Amino acid sequences were evolved along the trees and input to the tested methods; from their calculated distances we infered trees whose topologies we compared to the reference trees. We find our pattern-based method statistically superior to all other tested alignment-free methods. We also demonstrate the general advantage of alignment-free methods over an approach based on automated alignments when sequences violate the assumption of collinearity. Similarly, we compare methods on empirical data from an existing alignment benchmark set that we used to derive reference distances and trees. Our pattern-based approach yields distances that show a linear relationship to reference distances over a substantially longer range than other alignment-free methods. The pattern-based approach outperforms alignment-free methods and its phylogenetic accuracy is statistically indistinguishable from alignment-based distances.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In real optimization problems, usually the analytical expression of the objective function is not known, nor its derivatives, or they are complex. In these cases it becomes essential to use optimization methods where the calculation of the derivatives, or the verification of their existence, is not necessary: the Direct Search Methods or Derivative-free Methods are one solution. When the problem has constraints, penalty functions are often used. Unfortunately the choice of the penalty parameters is, frequently, very difficult, because most strategies for choosing it are heuristics strategies. As an alternative to penalty function appeared the filter methods. A filter algorithm introduces a function that aggregates the constrained violations and constructs a biobjective problem. In this problem the step is accepted if it either reduces the objective function or the constrained violation. This implies that the filter methods are less parameter dependent than a penalty function. In this work, we present a new direct search method, based on simplex methods, for general constrained optimization that combines the features of the simplex method and filter methods. This method does not compute or approximate any derivatives, penalty constants or Lagrange multipliers. The basic idea of simplex filter algorithm is to construct an initial simplex and use the simplex to drive the search. We illustrate the behavior of our algorithm through some examples. The proposed methods were implemented in Java.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The filter method is a technique for solving nonlinear programming problems. The filter algorithm has two phases in each iteration. The first one reduces a measure of infeasibility, while in the second the objective function value is reduced. In real optimization problems, usually the objective function is not differentiable or its derivatives are unknown. In these cases it becomes essential to use optimization methods where the calculation of the derivatives or the verification of their existence is not necessary: direct search methods or derivative-free methods are examples of such techniques. In this work we present a new direct search method, based on simplex methods, for general constrained optimization that combines the features of simplex and filter methods. This method neither computes nor approximates derivatives, penalty constants or Lagrange multipliers.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Dissertação apresentada para obtenção do Grau de Doutor em Bioquímica pela Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia.A presente dissertação foi preparada no âmbito do convénio bilateral existente entre a Universidade Nova de Lisboa e a Universidade de Vigo.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Els brassinoesteroides són productes naturals que actuen com a potents reguladors del creixement vegetal. Presenten aplicacions prometedores en l’agricultura degut a que, aplicats exògenament, augmenten la qualitat i la quantitat de les collites. Ara bé, el seu ús s’ha vist restringit degut a la seva costosa obtenció. Aquest fet ha motivat la recerca de nous compostos actius més assequibles. En aquest projecte es planteja el disseny i obtenció de nous anàlegs seguint diferents estratègies que impliquen tant l’ús de mètodes de modelització molecular com de síntesi orgànica. La primera d’aquestes estratègies consisteix en buscar compostos actius en bases de dades de compostos comercials a través de processos de Virtual Screening desenvolupats amb mètodes computacionals basats en Camps d’Interacció Molecular. Així, es van establir i interpretar models de Relacions Quantitatives Estructura-Activitat (QSAR) emprant descriptors independents de l’alineament (GRIND) i, amb col•laboració amb la Universitat de Perugia, aquest criteri de cerca es va ampliar amb l’aplicació de descriptors FLAP de nova generació. Una altra estratègia es va basar en intentar substituir l’esquelet esteroide dels brassinoesteroides per una estructura equivalent, fixant com a cadena lateral el grup (R)-hexahidromandelil. S’han aplicat dos criteris: mètodes computacionals basats en models QSAR establerts amb descriptors GRIND i també en la metodologia SHOP (scaffold hopping), i, per altra banda, anàlegs proposats racionalment a partir d’un estudi efectuat sobre disruptors endocrins no esteroïdals. Sobre les estructures trobades s’hi va unir la cadena lateral comercial esmentada per via sintètica, en la qual s’ha hagut de fer un èmfasi especial en grups protectors. En total, 49 estructures es proposen per a ser obtingudes sintèticament. També s’ha treballat en l’obtenció un agonista derivat de l’hipotètic antagonista KM-01. Totes les molècules candidates, ja siguin comercials o obtingudes sintèticament, estant sent avaluades en el test d’inclinació de la làmina d’arròs (RLIT).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Statistical methods of inference typically require the likelihood function to be computable in a reasonable amount of time. The class of “likelihood-free” methods termed Approximate Bayesian Computation (ABC) is able to eliminate this requirement, replacing the evaluation of the likelihood with simulation from it. Likelihood-free methods have gained in efficiency and popularity in the past few years, following their integration with Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) in order to better explore the parameter space. They have been applied primarily to estimating the parameters of a given model, but can also be used to compare models. Here we present novel likelihood-free approaches to model comparison, based upon the independent estimation of the evidence of each model under study. Key advantages of these approaches over previous techniques are that they allow the exploitation of MCMC or SMC algorithms for exploring the parameter space, and that they do not require a sampler able to mix between models. We validate the proposed methods using a simple exponential family problem before providing a realistic problem from human population genetics: the comparison of different demographic models based upon genetic data from the Y chromosome.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Si una red inalámbrica de sensores se implementa en un entorno hostil, las limitaciones intrínsecas a los nodos conllevan muchos problemas de seguridad. En este artículo se aborda un ataque particular a los protocolos de localización y descubrimiento de vecinos, llevada a cabo por dos nodos que actúan en connivencia y establecen un "agujero de gusano" para tratar de engañar a un nodo aislado, haciéndole creer que se encuentra en la vecindad de un conjunto de nodos locales. Para contrarrestar este tipo de amenazas, se presenta un marco de actuación genéricamente denominado "detection of wormhole attacks using range-free methods" (DWARF) dentro del cual derivamos dos estrategias para de detección de agujeros de gusano: el primer enfoque (DWARFLoc) realiza conjuntamente la localización y la detección de ataques, mientras que el otro (DWARFTest) valida la posición estimada por el nodo una vez finalizado el protocolo de localización. Las simulaciones muestran que ambas estrategias son eficaces en la detección de ataques tipo "agujero de gusano", y sus prestaciones se comparan con las de un test convencional basado en la razón de verosimilitudes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

G-protein coupled receptors (GPCRs) are a superfamily of membrane integral proteins responsible for a large number of physiological functions. Approximately 50% of marketed drugs are targeted toward a GPCR. Despite showing a high degree of structural homology, there is a large variance in sequence within the GPCR superfamily which has lead to difficulties in identifying and classifying potential new GPCR proteins. Here the various computational techniques that can be used to characterize a novel GPCR protein are discussed, including both alignment-based and alignment-free approaches. In addition, the application of homology modeling to building the three-dimensional structures of GPCRs is described.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

MOTIVATION: G protein-coupled receptors (GPCRs) play an important role in many physiological systems by transducing an extracellular signal into an intracellular response. Over 50% of all marketed drugs are targeted towards a GPCR. There is considerable interest in developing an algorithm that could effectively predict the function of a GPCR from its primary sequence. Such an algorithm is useful not only in identifying novel GPCR sequences but in characterizing the interrelationships between known GPCRs. RESULTS: An alignment-free approach to GPCR classification has been developed using techniques drawn from data mining and proteochemometrics. A dataset of over 8000 sequences was constructed to train the algorithm. This represents one of the largest GPCR datasets currently available. A predictive algorithm was developed based upon the simplest reasonable numerical representation of the protein's physicochemical properties. A selective top-down approach was developed, which used a hierarchical classifier to assign sequences to subdivisions within the GPCR hierarchy. The predictive performance of the algorithm was assessed against several standard data mining classifiers and further validated against Support Vector Machine-based GPCR prediction servers. The selective top-down approach achieves significantly higher accuracy than standard data mining methods in almost all cases.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The G-protein coupled receptor (GPCR) superfamily fulfils various metabolic functions and interacts with a diverse range of ligands. There is a lack of sequence similarity between the six classes that comprise the GPCR superfamily. Moreover, most novel GPCRs found have low sequence similarity to other family members which makes it difficult to infer properties from related receptors. Many different approaches have been taken towards developing efficient and accurate methods for GPCR classification, ranging from motif-based systems to machine learning as well as a variety of alignment-free techniques based on the physiochemical properties of their amino acid sequences. This review describes the inherent difficulties in developing a GPCR classification algorithm and includes techniques previously employed in this area.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background - Vaccine development in the post-genomic era often begins with the in silico screening of genome information, with the most probable protective antigens being predicted rather than requiring causative microorganisms to be grown. Despite the obvious advantages of this approach – such as speed and cost efficiency – its success remains dependent on the accuracy of antigen prediction. Most approaches use sequence alignment to identify antigens. This is problematic for several reasons. Some proteins lack obvious sequence similarity, although they may share similar structures and biological properties. The antigenicity of a sequence may be encoded in a subtle and recondite manner not amendable to direct identification by sequence alignment. The discovery of truly novel antigens will be frustrated by their lack of similarity to antigens of known provenance. To overcome the limitations of alignment-dependent methods, we propose a new alignment-free approach for antigen prediction, which is based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties. Results - Bacterial, viral and tumour protein datasets were used to derive models for prediction of whole protein antigenicity. Every set consisted of 100 known antigens and 100 non-antigens. The derived models were tested by internal leave-one-out cross-validation and external validation using test sets. An additional five training sets for each class of antigens were used to test the stability of the discrimination between antigens and non-antigens. The models performed well in both validations showing prediction accuracy of 70% to 89%. The models were implemented in a server, which we call VaxiJen. Conclusion - VaxiJen is the first server for alignment-independent prediction of protective antigens. It was developed to allow antigen classification solely based on the physicochemical properties of proteins without recourse to sequence alignment. The server can be used on its own or in combination with alignment-based prediction methods.