145 resultados para Applied Mathematics|Computer Engineering|Computer science

em University of Queensland eSpace - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Computer Science is a subject which has difficulty in marketing itself. Further, pinning down a standard curriculum is difficult-there are many preferences which are hard to accommodate. This paper argues the case that part of the problem is the fact that, unlike more established disciplines, the subject does not clearly distinguish the study of principles from the study of artifacts. This point was raised in Curriculum 2001 discussions, and debate needs to start in good time for the next curriculum standard. This paper provides a starting point for debate, by outlining a process by which principles and artifacts may be separated, and presents a sample curriculum to illustrate the possibilities. This sample curriculum has some positive points, though these positive points are incidental to the need to start debating the issue. Other models, with a less rigorous ordering of principles before artifacts, would still gain from making it clearer whether a specific concept was fundamental, or a property of a specific technology. (C) 2003 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: Prediction methods for identifying binding peptides could minimize the number of peptides required to be synthesized and assayed, and thereby facilitate the identification of potential T-cell epitopes. We developed a bioinformatic method for the prediction of peptide binding to MHC class II molecules. Results: Experimental binding data and expert knowledge of anchor positions and binding motifs were combined with an evolutionary algorithm (EA) and an artificial neural network (ANN): binding data extraction --> peptide alignment --> ANN training and classification. This method, termed PERUN, was implemented for the prediction of peptides that bind to HLA-DR4(B1*0401). The respective positive predictive values of PERUN predictions of high-, moderate-, low- and zero-affinity binder-a were assessed as 0.8, 0.7, 0.5 and 0.8 by cross-validation, and 1.0, 0.8, 0.3 and 0.7 by experimental binding. This illustrates the synergy between experimentation and computer modeling, and its application to the identification of potential immunotheraaeutic peptides.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sausage is a protein sequence threading program, but with remarkable run-time flexibility. Using different scripts, it can calculate protein sequence-structure alignments, search structure libraries, swap force fields, create models form alignments, convert file formats and analyse results. There are several different force fields which might be classed as knowledge-based, although they do not rely on Boltzmann statistics. Different force fields are used for alignment calculations and subsequent ranking of calculated models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sum: Plant biologists in fields of ecology, evolution, genetics and breeding frequently use multivariate methods. This paper illustrates Principal Component Analysis (PCA) and Gabriel's biplot as applied to microarray expression data from plant pathology experiments. Availability: An example program in the publicly distributed statistical language R is available from the web site (www.tpp.uq.edu.au) and by e-mail from the contact. Contact: scott.chapman@csiro.au.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The QU-GENE Computing Cluster (QCC) is a hardware and software solution to the automation and speedup of large QU-GENE (QUantitative GENEtics) simulation experiments that are designed to examine the properties of genetic models, particularly those that involve factorial combinations of treatment levels. QCC automates the management of the distribution of components of the simulation experiments among the networked single-processor computers to achieve the speedup.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. Results: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: A major issue in cell biology today is how distinct intracellular regions of the cell, like the Golgi Apparatus, maintain their unique composition of proteins and lipids. The cell differentially separates Golgi resident proteins from proteins that move through the organelle to other subcellular destinations. We set out to determine if we could distinguish these two types of transmembrane proteins using computational approaches. Results: A new method has been developed to predict Golgi membrane proteins based on their transmembrane domains. To establish the prediction procedure, we took the hydrophobicity values and frequencies of different residues within the transmembrane domains into consideration. A simple linear discriminant function was developed with a small number of parameters derived from a dataset of Type II transmembrane proteins of known localization. This can discriminate between proteins destined for Golgi apparatus or other locations (post-Golgi) with a success rate of 89.3% or 85.2%, respectively on our redundancy-reduced data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: Targeting peptides direct nascent proteins to their specific subcellular compartment. Knowledge of targeting signals enables informed drug design and reliable annotation of gene products. However, due to the low similarity of such sequences and the dynamical nature of the sorting process, the computational prediction of subcellular localization of proteins is challenging. Results: We contrast the use of feed forward models as employed by the popular TargetP/SignalP predictors with a sequence-biased recurrent network model. The models are evaluated in terms of performance at the residue level and at the sequence level, and demonstrate that recurrent networks improve the overall prediction performance. Compared to the original results reported for TargetP, an ensemble of the tested models increases the accuracy by 6 and 5% on non-plant and plant data, respectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Motivation: An important problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. We provide a straightforward and easily implemented method for estimating the posterior probability that an individual gene is null. The problem can be expressed in a two-component mixture framework, using an empirical Bayes approach. Current methods of implementing this approach either have some limitations due to the minimal assumptions made or with more specific assumptions are computationally intensive. Results: By converting to a z-score the value of the test statistic used to test the significance of each gene, we propose a simple two-component normal mixture that models adequately the distribution of this score. The usefulness of our approach is demonstrated on three real datasets.