25 resultados para Latent semantic indexing
Resumo:
Automatic identification of software faults has enormous practical significance. This requires characterizing program execution behavior and the use of appropriate data mining techniques on the chosen representation. In this paper, we use the sequence of system calls to characterize program execution. The data mining tasks addressed are learning to map system call streams to fault labels and automatic identification of fault causes. Spectrum kernels and SVM are used for the former while latent semantic analysis is used for the latter The techniques are demonstrated for the intrusion dataset containing system call traces. The results show that kernel techniques are as accurate as the best available results but are faster by orders of magnitude. We also show that latent semantic indexing is capable of revealing fault-specific features.
Resumo:
Non-negative matrix factorization [5](NMF) is a well known tool for unsupervised machine learning. It can be viewed as a generalization of the K-means clustering, Expectation Maximization based clustering and aspect modeling by Probabilistic Latent Semantic Analysis (PLSA). Specifically PLSA is related to NMF with KL-divergence objective function. Further it is shown that K-means clustering is a special case of NMF with matrix L2 norm based error function. In this paper our objective is to analyze the relation between K-means clustering and PLSA by examining the KL-divergence function and matrix L2 norm based error function.
Resumo:
Facet-based sentiment analysis involves discovering the latent facets, sentiments and their associations. Traditional facet-based sentiment analysis algorithms typically perform the various tasks in sequence, and fail to take advantage of the mutual reinforcement of the tasks. Additionally,inferring sentiment levels typically requires domain knowledge or human intervention. In this paper, we propose aseries of probabilistic models that jointly discover latent facets and sentiment topics, and also order the sentiment topics with respect to a multi-point scale, in a language and domain independent manner. This is achieved by simultaneously capturing both short-range syntactic structure and long range semantic dependencies between the sentiment and facet words. The models further incorporate coherence in reviews, where reviewers dwell on one facet or sentiment level before moving on, for more accurate facet and sentiment discovery. For reviews which are supplemented with ratings, our models automatically order the latent sentiment topics, without requiring seed-words or domain-knowledge. To the best of our knowledge, our work is the first attempt to combine the notions of syntactic and semantic dependencies in the domain of review mining. Further, the concept of facet and sentiment coherence has not been explored earlier either. Extensive experimental results on real world review data show that the proposed models outperform various state of the art baselines for facet-based sentiment analysis.
Resumo:
In this paper, we present a novel approach that makes use of topic models based on Latent Dirichlet allocation(LDA) for generating single document summaries. Our approach is distinguished from other LDA based approaches in that we identify the summary topics which best describe a given document and only extract sentences from those paragraphs within the document which are highly correlated given the summary topics. This ensures that our summaries always highlight the crux of the document without paying any attention to the grammar and the structure of the documents. Finally, we evaluate our summaries on the DUC 2002 Single document summarization data corpus using ROUGE measures. Our summaries had higher ROUGE values and better semantic similarity with the documents than the DUC summaries.
Resumo:
Time-frequency analysis of various simulated and experimental signals due to elastic wave scattering from damage are performed using wavelet transform (WT) and Hilbert-Huang transform (HHT) and their performances are compared in context of quantifying the damages. Spectral finite element method is employed for numerical simulation of wave scattering. An analytical study is carried out to study the effects of higher-order damage parameters on the reflected wave from a damage. Based on this study, error bounds are computed for the signals in the spectral and also on the time-frequency domains. It is shown how such an error bound can provide all estimate of error in the modelling of wave propagation in structure with damage. Measures of damage based on WT and HHT is derived to quantify the damage information hidden in the signal. The aim of this study is to obtain detailed insights into the problem of (1) identifying localised damages (2) dispersion of multifrequency non-stationary signals after they interact with various types of damage and (3) quantifying the damages. Sensitivity analysis of the signal due to scattered wave based on time-frequency representation helps to correlate the variation of damage index measures with respect to the damage parameters like damage size and material degradation factors.
Resumo:
Laser mediated stimulation of biological process was amongst its very first effects documented by Mester et al. but the ambiguous and tissue-cell context specific biological effects of laser radiation is now termed ‘Photobiomodulation’. We found many parallels between the reported biological effects of lasers and a multiface-ted growth factor, Transforming Growth Factor-β (TGF-β). This review outlines the interestingparallelsbetween the twofieldsand our rationalefor pursuingtheir potential causal correlation. We explored this correlation using an in vitro assay systems and a human clinical trial on healing wound extraction sockets that we reported in a recent publication. In conclusion we report that low power laser irradiation can activate latent TGF-β1 and β3 complexes and suggest that this might be one of the major modes of the photobiomodulatory effects of low power lasers.
Resumo:
A semitheoretical equation for latent heat of vaporization has been derived and tested. The average error in predicting the value at the normal boiling point in the case of about 90 compounds, which includes polar and nonpolar liquids, is about 1.8%. A relation between latent heat of vaporization and surface tension is also derived and is shown to lead to Watson's empirical relation which gives the change of latent heat of vaporization with temperature. This gives a physico-chemical justification for Watson's empirical relation and provides a rapid method of determining latent heats by measuring surface tension.
Resumo:
Knowledge-based clusters are studied from the structural point of view. Generalized descriptions for such clusters are stated and illustrated. Peculiarities of certain knowledge-based cluster configurations are highlighted. The adequacy of the connectives logical and (“and”) logical or (“exclusive-or”) in describing such clusters is justified. The definition of “concept” is elaborated from the clustering point of view and used to establish the equivalence between, descriptions of clusters and concepts. The order-independence of semantic-directed clustering approach is established formally based on axiomatic considerations.
Resumo:
It is important to identify the ``correct'' number of topics in mechanisms like Latent Dirichlet Allocation(LDA) as they determine the quality of features that are presented as features for classifiers like SVM. In this work we propose a measure to identify the correct number of topics and offer empirical evidence in its favor in terms of classification accuracy and the number of topics that are naturally present in the corpus. We show the merit of the measure by applying it on real-world as well as synthetic data sets(both text and images). In proposing this measure, we view LDA as a matrix factorization mechanism, wherein a given corpus C is split into two matrix factors M-1 and M-2 as given by C-d*w = M1(d*t) x Q(t*w).Where d is the number of documents present in the corpus anti w is the size of the vocabulary. The quality of the split depends on ``t'', the right number of topics chosen. The measure is computed in terms of symmetric KL-Divergence of salient distributions that are derived from these matrix factors. We observe that the divergence values are higher for non-optimal number of topics - this is shown by a `dip' at the right value for `t'.
Resumo:
The decision to patent a technology is a difficult one to make for the top management of any organization. The expected value that the patent might deliver in the market is an important factor that impacts this judgement. Earlier researchers have suggested that patent prices are better indicators of value of a patent and that auction prices are the best way of determining value. However, the lack of public data on pricing has prevented research on understanding the dynamics of patent pricing. Our paper uses singleton patent auction price data of Ocean Tomo LLC to study the prices of patents. We describe price characteristics of these patents. The price of these patents was correlated with their age, and a significant correlation was found. A price - age matrix was developed and we describe the price characteristics of patents using four quadrants of the matrix, namely young and old patents with low and high prices. We also found that patents owned by small firms get transacted more often and inventor owned patents attracted a better price than assignee owned patents.
Resumo:
Indexing of a decagonal quasicrystal using the scheme utilizing five planar vectors and one perpendicular to them is examined in detail. A method for determining the indices of zone axes that a reciprocal vector would make in a decagonal phase of any periodicity has been proposed. By this method, the location of the zone axes made by any reciprocal vector can be predicted. The orthogonality condition has been simplified for the zone axes containing twofold vectors. The locations of zone axes have also been determined by an alternative method, utilizing spherical trigonometric calculations, which confirm the zone-axis locations given by the indices. The effect of one-dimensional periodicity on the indices and the accuracy of the zone-axis determination is discussed. Rules for the formation of zone axes between several reciprocal vectors and the prediction of all the reciprocal vectors in a zone are evolved.
Resumo:
The least path criterion or least path length in the context of redundant basis vector systems is discussed and a mathematical proof is presented of the uniqueness of indices obtained by applying the least path criterion. Though the method has greater generality, this paper concentrates on the two-dimensional decagonal lattice. The order of redundancy is also discussed; this will help eventually to correlate with other redundant but desirable indexing sets.