876 resultados para Computer Science, Theory
Resumo:
An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local FDR (false discovery rate) is provided for each gene. An attractive feature of the mixture model approach is that it provides a framework for the estimation of the prior probability that a gene is not differentially expressed, and this probability can subsequently be used in forming a decision rule. The rule can also be formed to take the false negative rate into account. We apply this approach to a well-known publicly available data set on breast cancer, and discuss our findings with reference to other approaches.
Resumo:
Normal mixture models are often used to cluster continuous data. However, conventional approaches for fitting these models will have problems in producing nonsingular estimates of the component-covariance matrices when the dimension of the observations is large relative to the number of observations. In this case, methods such as principal components analysis (PCA) and the mixture of factor analyzers model can be adopted to avoid these estimation problems. We examine these approaches applied to the Cabernet wine data set of Ashenfelter (1999), considering the clustering of both the wines and the judges, and comparing our results with another analysis. The mixture of factor analyzers model proves particularly effective in clustering the wines, accurately classifying many of the wines by location.
Resumo:
Finding motifs that can elucidate rules that govern peptide binding to medically important receptors is important for screening targets for drugs and vaccines. This paper focuses on elucidation of peptide binding to I-A(g7) molecule of the non-obese diabetic (NOD) mouse - an animal model for insulin-dependent diabetes mellitus (IDDM). A number of proposed motifs that describe peptide binding to I-A(g7) have been proposed. These motifs results from independent experimental studies carried out on small data sets. Testing with multiple data sets showed that each of the motifs at best describes only a subset of the solution space, and these motifs therefore lack generalization ability. This study focuses on seeking a motif with higher generalization ability so that it can predict binders in all A(g7) data sets with high accuracy. A binding score matrix representing peptide binding motif to A(g7) was derived using genetic algorithm (GA). The evolved score matrix significantly outperformed previously reported
Resumo:
A biologically realizable, unsupervised learning rule is described for the online extraction of object features, suitable for solving a range of object recognition tasks. Alterations to the basic learning rule are proposed which allow the rule to better suit the parameters of a given input space. One negative consequence of such modifications is the potential for learning instability. The criteria for such instability are modeled using digital filtering techniques and predicted regions of stability and instability tested. The result is a family of learning rules which can be tailored to the specific environment, improving both convergence times and accuracy over the standard learning rule, while simultaneously insuring learning stability.
Resumo:
This paper describes an ongoing collaboration between Boeing Australia Limited and the University of Queensland to develop and deliver an introductory course on software engineering. The aims of the course are to provide a common understanding of the nature of software engineering for all Boeing Australia's engineering staff, and to ensure they understand the practices used throughout the company. The course is designed so that it can be presented to people with varying backgrounds, such as recent software engineering graduates, systems engineers, quality assurance personnel, etc. The paper describes the structure and content of the course, and the evaluation techniques used to collect feedback from the participants and the corresponding results. The immediate feedback on the course indicates that it has been well received by the participants, but also indicates a need for more advanced courses in specific areas. The long-term feedback from participants is less positive, and the long-term feedback from the managers of the course participants indicates a need to expand on the coverage of the Boeing-specific processes and methods. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
We show how to communicate Heisenberg-limited continuous (quantum) variables between Alice and Bob in the case where they occupy two inertial reference frames that differ by an unknown Lorentz boost. There are two effects that need to be overcome: the Doppler shift and the absence of synchronized clocks. Furthermore, we show how Alice and Bob can share Doppler-invariant entanglement, and we demonstrate that the protocol is robust under photon loss.
Resumo:
What is the computational power of a quantum computer? We show that determining the output of a quantum computation is equivalent to counting the number of solutions to an easily computed set of polynomials defined over the finite field Z(2). This connection allows simple proofs to be given for two known relationships between quantum and classical complexity classes, namely BQP subset of P-#P and BQP subset of PP.
Resumo:
Evolutionary algorithms perform optimization using a population of sample solution points. An interesting development has been to view population-based optimization as the process of evolving an explicit, probabilistic model of the search space. This paper investigates a formal basis for continuous, population-based optimization in terms of a stochastic gradient descent on the Kullback-Leibler divergence between the model probability density and the objective function, represented as an unknown density of assumed form. This leads to an update rule that is related and compared with previous theoretical work, a continuous version of the population-based incremental learning algorithm, and the generalized mean shift clustering framework. Experimental results are presented that demonstrate the dynamics of the new algorithm on a set of simple test problems.
Resumo:
Arguably the deepest fact known about the von Neumann entropy, the strong subadditivity inequality is a potent hammer in the quantum information theorist's toolkit. This short tutorial describes a simple proof of strong subadditivity due to Petz [Rep. on Math. Phys. 23 (1), 57-65 (1986)]. It assumes only knowledge of elementary linear algebra and quantum mechanics.
Resumo:
The research literature on metalieuristic and evolutionary computation has proposed a large number of algorithms for the solution of challenging real-world optimization problems. It is often not possible to study theoretically the performance of these algorithms unless significant assumptions are made on either the algorithm itself or the problems to which it is applied, or both. As a consequence, metalieuristics are typically evaluated empirically using a set of test problems. Unfortunately, relatively little attention has been given to the development of methodologies and tools for the large-scale empirical evaluation and/or comparison of metaheuristics. In this paper, we propose a landscape (test-problem) generator that can be used to generate optimization problem instances for continuous, bound-constrained optimization problems. The landscape generator is parameterized by a small number of parameters, and the values of these parameters have a direct and intuitive interpretation in terms of the geometric features of the landscapes that they produce. An experimental space is defined over algorithms and problems, via a tuple of parameters for any specified algorithm and problem class (here determined by the landscape generator). An experiment is then clearly specified as a point in this space, in a way that is analogous to other areas of experimental algorithmics, and more generally in experimental design. Experimental results are presented, demonstrating the use of the landscape generator. In particular, we analyze some simple, continuous estimation of distribution algorithms, and gain new insights into the behavior of these algorithms using the landscape generator.
Resumo:
What is the minimal size quantum circuit required to exactly implement a specified n-qubit unitary operation, U, without the use of ancilla qubits? We show that a lower bound on the minimal size is provided by the length of the minimal geodesic between U and the identity, I, where length is defined by a suitable Finsler metric on the manifold SU(2(n)). The geodesic curves on these manifolds have the striking property that once an initial position and velocity are set, the remainder of the geodesic is completely determined by a second order differential equation known as the geodesic equation. This is in contrast with the usual case in circuit design, either classical or quantum, where being given part of an optimal circuit does not obviously assist in the design of the rest of the circuit. Geodesic analysis thus offers a potentially powerful approach to the problem of proving quantum circuit lower bounds. In this paper we construct several Finsler metrics whose minimal length geodesics provide lower bounds on quantum circuit size. For each Finsler metric we give a procedure to compute the corresponding geodesic equation. We also construct a large class of solutions to the geodesic equation, which we call Pauli geodesics, since they arise from isometries generated by the Pauli group. For any unitary U diagonal in the computational basis, we show that: (a) provided the minimal length geodesic is unique, it must be a Pauli geodesic; (b) finding the length of the minimal Pauli geodesic passing from I to U is equivalent to solving an exponential size instance of the closest vector in a lattice problem (CVP); and (c) all but a doubly exponentially small fraction of such unitaries have minimal Pauli geodesics of exponential length.
Resumo:
In this paper, we present ICICLE (Image ChainNet and Incremental Clustering Engine), a prototype system that we have developed to efficiently and effectively retrieve WWW images based on image semantics. ICICLE has two distinguishing features. First, it employs a novel image representation model called Weight ChainNet to capture the semantics of the image content. A new formula, called list space model, for computing semantic similarities is also introduced. Second, to speed up retrieval, ICICLE employs an incremental clustering mechanism, ICC (Incremental Clustering on ChainNet), to cluster images with similar semantics into the same partition. Each cluster has a summary representative and all clusters' representatives are further summarized into a balanced and full binary tree structure. We conducted an extensive performance study to evaluate ICICLE. Compared with some recently proposed methods, our results show that ICICLE provides better recall and precision. Our clustering technique ICC facilitates speedy retrieval of images without sacrificing recall and precision significantly.
Resumo:
We revisit the one-unit gradient ICA algorithm derived from the kurtosis function. By carefully studying properties of the stationary points of the discrete-time one-unit gradient ICA algorithm, with suitable condition on the learning rate, convergence can be proved. The condition on the learning rate helps alleviate the guesswork that accompanies the problem of choosing suitable learning rate in practical computation. These results may be useful to extract independent source signals on-line.
Resumo:
Fuzzy data has grown to be an important factor in data mining. Whenever uncertainty exists, simulation can be used as a model. Simulation is very flexible, although it can involve significant levels of computation. This article discusses fuzzy decision-making using the grey related analysis method. Fuzzy models are expected to better reflect decision-making uncertainty, at some cost in accuracy relative to crisp models. Monte Carlo simulation is used to incorporate experimental levels of uncertainty into the data and to measure the impact of fuzzy decision tree models using categorical data. Results are compared with decision tree models based on crisp continuous data.
Resumo:
Machine learning techniques have been recognized as powerful tools for learning from data. One of the most popular learning techniques, the Back-Propagation (BP) Artificial Neural Networks, can be used as a computer model to predict peptides binding to the Human Leukocyte Antigens (HLA). The major advantage of computational screening is that it reduces the number of wet-lab experiments that need to be performed, significantly reducing the cost and time. A recently developed method, Extreme Learning Machine (ELM), which has superior properties over BP has been investigated to accomplish such tasks. In our work, we found that the ELM is as good as, if not better than, the BP in term of time complexity, accuracy deviations across experiments, and most importantly - prevention from over-fitting for prediction of peptide binding to HLA.