12 resultados para Multiple sequence alignment
em Aston University Research Archive
Resumo:
Motivation: Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function, and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Results: Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from “first passage probability distribution” to summarize statistics of ensemble averaged amino acid propensity values. In this paper, we introduce and elaborate this approach.
Resumo:
Subunit vaccine discovery is an accepted clinical priority. The empirical approach is time- and labor-consuming and can often end in failure. Rational information-driven approaches can overcome these limitations in a fast and efficient manner. However, informatics solutions require reliable algorithms for antigen identification. All known algorithms use sequence similarity to identify antigens. However, antigenicity may be encoded subtly in a sequence and may not be directly identifiable by sequence alignment. We propose a new alignment-independent method for antigen recognition based on the principal chemical properties of protein amino acid sequences. The method is tested by cross-validation on a training set of bacterial antigens and external validation on a test set of known antigens. The prediction accuracy is 83% for the cross-validation and 80% for the external test set. Our approach is accurate and robust, and provides a potent tool for the in silico discovery of medically relevant subunit vaccines.
Resumo:
Background - Vaccine development in the post-genomic era often begins with the in silico screening of genome information, with the most probable protective antigens being predicted rather than requiring causative microorganisms to be grown. Despite the obvious advantages of this approach – such as speed and cost efficiency – its success remains dependent on the accuracy of antigen prediction. Most approaches use sequence alignment to identify antigens. This is problematic for several reasons. Some proteins lack obvious sequence similarity, although they may share similar structures and biological properties. The antigenicity of a sequence may be encoded in a subtle and recondite manner not amendable to direct identification by sequence alignment. The discovery of truly novel antigens will be frustrated by their lack of similarity to antigens of known provenance. To overcome the limitations of alignment-dependent methods, we propose a new alignment-free approach for antigen prediction, which is based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties. Results - Bacterial, viral and tumour protein datasets were used to derive models for prediction of whole protein antigenicity. Every set consisted of 100 known antigens and 100 non-antigens. The derived models were tested by internal leave-one-out cross-validation and external validation using test sets. An additional five training sets for each class of antigens were used to test the stability of the discrimination between antigens and non-antigens. The models performed well in both validations showing prediction accuracy of 70% to 89%. The models were implemented in a server, which we call VaxiJen. Conclusion - VaxiJen is the first server for alignment-independent prediction of protective antigens. It was developed to allow antigen classification solely based on the physicochemical properties of proteins without recourse to sequence alignment. The server can be used on its own or in combination with alignment-based prediction methods.
Resumo:
The rapidly increasing demand for cellular telephony is placing greater demand on the limited bandwidth resources available. This research is concerned with techniques which enhance the capacity of a Direct-Sequence Code-Division-Multiple-Access (DS-CDMA) mobile telephone network. The capacity of both Private Mobile Radio (PMR) and cellular networks are derived and the many techniques which are currently available are reviewed. Areas which may be further investigated are identified. One technique which is developed is the sectorisation of a cell into toroidal rings. This is shown to provide an increased system capacity when the cell is split into these concentric rings and this is compared with cell clustering and other sectorisation schemes. Another technique for increasing the capacity is achieved by adding to the amount of inherent randomness within the transmitted signal so that the system is better able to extract the wanted signal. A system model has been produced for a cellular DS-CDMA network and the results are presented for two possible strategies. One of these strategies is the variation of the chip duration over a signal bit period. Several different variation functions are tried and a sinusoidal function is shown to provide the greatest increase in the maximum number of system users for any given signal-to-noise ratio. The other strategy considered is the use of additive amplitude modulation together with data/chip phase-shift-keying. The amplitude variations are determined by a sparse code so that the average system power is held near its nominal level. This strategy is shown to provide no further capacity since the system is sensitive to amplitude variations. When both strategies are employed, however, the sensitivity to amplitude variations is shown to reduce, thus indicating that the first strategy both increases the capacity and the ability to handle fluctuations in the received signal power.
Resumo:
A chip shooter machine in printed circuit board (PCB) assembly has three movable mechanisms: an X-Y table carrying a PCB, a feeder carrier with several feeders holding components and a rotary turret with multiple assembly heads to pick up and place components. In order to get the minimal placement or assembly time for a PCB on the machine, all the components on the board should be placed in a perfect sequence, and the components should be set up on a right feeder, or feeders since two feeders can hold the same type of components, and additionally, the assembly head should retrieve or pick up a component from a right feeder. The entire problem is very complicated, and this paper presents a genetic algorithm approach to tackle it.
Resumo:
The G-protein coupled receptors--or GPCRs--comprise simultaneously one of the largest and one of the most multi-functional protein families known to modern-day molecular bioscience. From a drug discovery and pharmaceutical industry perspective, the GPCRs constitute one of the most commercially and economically important groups of proteins known. The GPCRs undertake numerous vital metabolic functions and interact with a hugely diverse range of small and large ligands. Many different methodologies have been developed to efficiently and accurately classify the GPCRs. These range from motif-based techniques to machine learning as well as a variety of alignment-free techniques based on the physiochemical properties of sequences. We review here the available methodologies for the classification of GPCRs. Part of this work focuses on how we have tried to build the intrinsically hierarchical nature of sequence relations, implicit within the family, into an adaptive approach to classification. Importantly, we also allude to some of the key innate problems in developing an effective approach to classifying the GPCRs: the lack of sequence similarity between the six classes that comprise the GPCR family and the low sequence similarity to other family members evinced by many newly revealed members of the family.
Resumo:
We examined the effect of grouping by the alignment of implicit axes on the perception of multiple shapes, using a patient (GK) who shows simultanagnosia as part of Blint's syndrome. Five experiments demonstrated that: (1) GK was better able to judge the orientation of a global configuration if the constituent local shapes were aligned with their major axes than if they were aligned with their edges; (2) this axis information was used implicitly, since GK was unable to discriminate between configurations of axis-aligned and edge-aligned shapes; (3) GK's sensitivity to axis-alignment persisted even when the orientations of local shapes were kept constant, indicating some form of cooperative effect between the local elements; (4) axis-alignment of shapes also facilitated his ability to discriminate single-item from multi-item configurations; (5) the effect of axis-alignment could be attributed, at least partially, to the degree to which there was matching between the orientations of local shapes and the global configuration. Taken together, the results suggest that axis-based grouping can support the selection of multiple objects.
Resumo:
In this work we propose the hypothesis that replacing the current system of representing the chemical entities known as amino acids using Latin letters with one of several possible alternative symbolic representations will bring significant benefits to the human construction, modification, and analysis of multiple protein sequence alignments. We propose ways in which this might be done without prescribing the choice of actual scripts used. Specifically we propose and explore three ways to encode amino acid texts using novel symbolic alphabets free from precedents. Primary orthographic encoding is the direct substitution of a new alphabet for the standard, Latin-based amino acid code. Secondary encoding imposes static residue groupings onto the orthography of the alphabet by manipulating the shape and/or orientation of amino acid symbols. Tertiary encoding renders each residue as a composite symbol; each such symbol thus representing several alternative amino acid groupings simultaneously. We also propose that the use of a new group-focussed alphabet will free the colouring of amino acid residues often used as a tool to facilitate the representation or construction of multiple alignments for other purposes, possibly to indicate dynamic properties of an alignment such as position-wise residue conservation.
Resumo:
Background - Modelling the interaction between potentially antigenic peptides and Major Histocompatibility Complex (MHC) molecules is a key step in identifying potential T-cell epitopes. For Class II MHC alleles, the binding groove is open at both ends, causing ambiguity in the positional alignment between the groove and peptide, as well as creating uncertainty as to what parts of the peptide interact with the MHC. Moreover, the antigenic peptides have variable lengths, making naive modelling methods difficult to apply. This paper introduces a kernel method that can handle variable length peptides effectively by quantifying similarities between peptide sequences and integrating these into the kernel. Results - The kernel approach presented here shows increased prediction accuracy with a significantly higher number of true positives and negatives on multiple MHC class II alleles, when testing data sets from MHCPEP [1], MCHBN [2], and MHCBench [3]. Evaluation by cross validation, when segregating binders and non-binders, produced an average of 0.824 AROC for the MHCBench data sets (up from 0.756), and an average of 0.96 AROC for multiple alleles of the MHCPEP database. Conclusion - The method improves performance over existing state-of-the-art methods of MHC class II peptide binding predictions by using a custom, knowledge-based representation of peptides. Similarity scores, in contrast to a fixed-length, pocket-specific representation of amino acids, provide a flexible and powerful way of modelling MHC binding, and can easily be applied to other dynamic sequence problems.
Resumo:
We have previously identified a phosphorothioate oligonucleotide (PS-ODN) that inhibited epidermal growth factor receptor tyrosine kinase (TK) activity both in cell fractions and in intact A431 cells. Since ODN-based TK inhibitors may have anti-cancer applications and may also help understand the non-antisense mediated effects of PS-ODNs, we have further studied the sequence and chemistry requirements of the parent PS-ODN (sequence: 5′-GGA GGG TCG CAT CGC-3′) as a sequence-dependent TK inhibitor. Sequence deletion and substitution studies revealed that the 5′-terminal GGA GGG hexamer sequence in the parent compound was essential for anti-TK activity in A431 cells. Site-specific substitution of any G with a T in this 5′-terminal motif within the parent compound caused a significant loss in anti-TK activity. The fully PS-modified hexameric motif alone exhibited equipotent activity as the parent 15-mer whereas phosphodiester (PO) or 2′-O-methyl-modified versions of this motif had significantly reduced anti-TK activity. Further, T substitutions within the two 5′-terminal G residues of the hexameric PS-ODN to produce a sequence, TTA GGG, representing the telomeric repeats in human chromosomes, also did not exhibit a significant anti-TK activity. Multiple repeats of the active hexameric motif in PS-ODNs resulted in more potent inhibitors of TK activity than the parent ODN. These results suggested that PS-ODNs, but not PO or 2′-O-methyl modified ODNs, containing the GGA GGG motif can exert potent anti-TK activity which may be desirable in some anti-tumor applications. Additionally, the presence of this previously unidentified motif in antisense PS-ODN constructs may contribute to their biological effects in vitro and in vivo and should be accounted for in the design of the PS-modified antisense ODNs. © 2002 Published by Elsevier Science Inc.
Resumo:
Kernel methods provide a convenient way to apply a wide range of learning techniques to complex and structured data by shifting the representational problem from one of finding an embedding of the data to that of defining a positive semidefinite kernel. One problem with the most widely used kernels is that they neglect the locational information within the structures, resulting in less discrimination. Correspondence-based kernels, on the other hand, are in general more discriminating, at the cost of sacrificing positive-definiteness due to their inability to guarantee transitivity of the correspondences between multiple graphs. In this paper we generalize a recent structural kernel based on the Jensen-Shannon divergence between quantum walks over the structures by introducing a novel alignment step which rather than permuting the nodes of the structures, aligns the quantum states of their walks. This results in a novel kernel that maintains localization within the structures, but still guarantees positive definiteness. Experimental evaluation validates the effectiveness of the kernel for several structural classification tasks. © 2014 Springer-Verlag Berlin Heidelberg.
Resumo:
Switched mode power supplies (SMPSs) are essential components in many applications, and electromagnetic interference is an important consideration in the SMPS design. Spread spectrum based PWM strategies have been used in SMPS designs to reduce the switching harmonics. This paper proposes a novel method to integrate a communication function into spread spectrum based PWM strategy without extra hardware costs. Direct sequence spread spectrum (DSSS) and phase shift keying (PSK) data modulation are employed to the PWM of the SMPS, so that it has reduced switching harmonics and the input and output power line voltage ripples contain data. A data demodulation algorithm has been developed for receivers, and code division multiple access (CDMA) concept is employed as communication method for a system with multiple SMPSs. The proposed method has been implemented in both Buck and Boost converters. The experimental results validated the proposed DSSS based PWM strategy for both harmonic reduction and communication.