4 resultados para Data fusion applications
em Massachusetts Institute of Technology
Resumo:
Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.
Resumo:
The Support Vector Machine (SVM) is a new and very promising classification technique developed by Vapnik and his group at AT&T Bell Labs. This new learning algorithm can be seen as an alternative training technique for Polynomial, Radial Basis Function and Multi-Layer Perceptron classifiers. An interesting property of this approach is that it is an approximate implementation of the Structural Risk Minimization (SRM) induction principle. The derivation of Support Vector Machines, its relationship with SRM, and its geometrical insight, are discussed in this paper. Training a SVM is equivalent to solve a quadratic programming problem with linear and box constraints in a number of variables equal to the number of data points. When the number of data points exceeds few thousands the problem is very challenging, because the quadratic form is completely dense, so the memory needed to store the problem grows with the square of the number of data points. Therefore, training problems arising in some real applications with large data sets are impossible to load into memory, and cannot be solved using standard non-linear constrained optimization algorithms. We present a decomposition algorithm that can be used to train SVM's over large data sets. The main idea behind the decomposition is the iterative solution of sub-problems and the evaluation of, and also establish the stopping criteria for the algorithm. We present previous approaches, as well as results and important details of our implementation of the algorithm using a second-order variant of the Reduced Gradient Method as the solver of the sub-problems. As an application of SVM's, we present preliminary results we obtained applying SVM to the problem of detecting frontal human faces in real images.
Resumo:
It has been widely known that a significant part of the bits are useless or even unused during the program execution. Bit-width analysis targets at finding the minimum bits needed for each variable in the program, which ensures the execution correctness and resources saving. In this paper, we proposed a static analysis method for bit-widths in general applications, which approximates conservatively at compile time and is independent of runtime conditions. While most related work focus on integer applications, our method is also tailored and applicable to floating point variables, which could be extended to transform floating point number into fixed point numbers together with precision analysis. We used more precise representations for data value ranges of both scalar and array variables. Element level analysis is carried out for arrays. We also suggested an alternative for the standard fixed-point iterations in bi-directional range analysis. These techniques are implemented on the Trimaran compiler structure and tested on a set of benchmarks to show the results.
Resumo:
In this work we have made significant contributions in three different areas of interest: therapeutic protein stabilization, thermodynamics of natural gas clathrate-hydrates, and zeolite catalysis. In all three fields, using our various computational techniques, we have been able to elucidate phenomena that are difficult or impossible to explain experimentally. More specifically, in mixed solvent systems for proteins we developed a statistical-mechanical method to model the thermodynamic effects of additives in molecular-level detail. It was the first method demonstrated to have truly predictive (no adjustable parameters) capability for real protein systems. We also describe a novel mechanism that slows protein association reactions, called the “gap effect.” We developed a comprehensive picture of methioine oxidation by hydrogen peroxide that allows for accurate prediction of protein oxidation and provides a rationale for developing strategies to control oxidation. The method of solvent accessible area (SAA) was shown not to correlate well with oxidation rates. A new property, averaged two-shell water coordination number (2SWCN) was identified and shown to correlate well with oxidation rates. Reference parameters for the van der Waals Platteeuw model of clathrate-hydrates were found for structure I and structure II. These reference parameters are independent of the potential form (unlike the commonly used parameters) and have been validated by calculating phase behavior and structural transitions for mixed hydrate systems. These calculations are validated with experimental data for both structures and for systems that undergo transitions from one structure to another. This is the first method of calculating hydrate thermodynamics to demonstrate predictive capability for phase equilibria, structural changes, and occupancy in pure and mixed hydrate systems. We have computed a new mechanism for the methanol coupling reaction to form ethanol and water in the zeolite chabazite. The mechanism at 400°C proceeds via stable intermediates of water, methane, and protonated formaldehyde.