20 resultados para semantic leveling
Resumo:
Non-negative matrix factorization [5](NMF) is a well known tool for unsupervised machine learning. It can be viewed as a generalization of the K-means clustering, Expectation Maximization based clustering and aspect modeling by Probabilistic Latent Semantic Analysis (PLSA). Specifically PLSA is related to NMF with KL-divergence objective function. Further it is shown that K-means clustering is a special case of NMF with matrix L2 norm based error function. In this paper our objective is to analyze the relation between K-means clustering and PLSA by examining the KL-divergence function and matrix L2 norm based error function.
Resumo:
There are many popular models available for classification of documents like Naïve Bayes Classifier, k-Nearest Neighbors and Support Vector Machine. In all these cases, the representation is based on the “Bag of words” model. This model doesn't capture the actual semantic meaning of a word in a particular document. Semantics are better captured by proximity of words and their occurrence in the document. We propose a new “Bag of Phrases” model to capture this discriminative power of phrases for text classification. We present a novel algorithm to extract phrases from the corpus using the well known topic model, Latent Dirichlet Allocation(LDA), and to integrate them in vector space model for classification. Experiments show a better performance of classifiers with the new Bag of Phrases model against related representation models.
Resumo:
Background: The number of genome-wide association studies (GWAS) has increased rapidly in the past couple of years, resulting in the identification of genes associated with different diseases. The next step in translating these findings into biomedically useful information is to find out the mechanism of the action of these genes. However, GWAS studies often implicate genes whose functions are currently unknown; for example, MYEOV, ANKLE1, TMEM45B and ORAOV1 are found to be associated with breast cancer, but their molecular function is unknown. Results: We carried out Bayesian inference of Gene Ontology (GO) term annotations of genes by employing the directed acyclic graph structure of GO and the network of protein-protein interactions (PPIs). The approach is designed based on the fact that two proteins that interact biophysically would be in physical proximity of each other, would possess complementary molecular function, and play role in related biological processes. Predicted GO terms were ranked according to their relative association scores and the approach was evaluated quantitatively by plotting the precision versus recall values and F-scores (the harmonic mean of precision and recall) versus varying thresholds. Precisions of similar to 58% and similar to 40% for localization and functions respectively of proteins were determined at a threshold of similar to 30 (top 30 GO terms in the ranked list). Comparison with function prediction based on semantic similarity among nodes in an ontology and incorporation of those similarities in a k nearest neighbor classifier confirmed that our results compared favorably. Conclusions: This approach was applied to predict the cellular component and molecular function GO terms of all human proteins that have interacting partners possessing at least one known GO annotation. The list of predictions is available at http://severus.dbmi.pitt.edu/engo/GOPRED.html. We present the algorithm, evaluations and the results of the computational predictions, especially for genes identified in GWAS studies to be associated with diseases, which are of translational interest.
Resumo:
The characteristics of surface roughness span a range of length scales determined by the nature of the surface generation process. The mechanism by which material is removed at a length scale determines the roughness at that scale. Electropolishing preferentially reduces the peaks of surface protuberances at sub-micron length scales to produce smooth surfaces. The material removal in electropolishing occurs by two different mechanisms of anodic leveling and microsmoothing. Due to insufficient lateral resolution, individual contribution of these two mechanisms could not be measured by conventional roughness measurement techniques and parameters. In this work, we utilize the high lateral resolution offered by Atomic force microscopy along with the power spectral density method of characterization, to study the evolution of roughness during electropolishing. The power spectral density show two corner frequencies indicating the length scales over which the two mechanisms operate. These characteristic frequencies are found to be a function of the electropolishing time and hence can be used to optimize the electropolishing process.
Resumo:
We apply the objective method of Aldous to the problem of finding the minimum-cost edge cover of the complete graph with random independent and identically distributed edge costs. The limit, as the number of vertices goes to infinity, of the expected minimum cost for this problem is known via a combinatorial approach of Hessler and Wastlund. We provide a proof of this result using the machinery of the objective method and local weak convergence, which was used to prove the (2) limit of the random assignment problem. A proof via the objective method is useful because it provides us with more information on the nature of the edge's incident on a typical root in the minimum-cost edge cover. We further show that a belief propagation algorithm converges asymptotically to the optimal solution. This can be applied in a computational linguistics problem of semantic projection. The belief propagation algorithm yields a near optimal solution with lesser complexity than the known best algorithms designed for optimality in worst-case settings.