56 resultados para Latent Dirichlet Allocation

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is important to identify the ``correct'' number of topics in mechanisms like Latent Dirichlet Allocation(LDA) as they determine the quality of features that are presented as features for classifiers like SVM. In this work we propose a measure to identify the correct number of topics and offer empirical evidence in its favor in terms of classification accuracy and the number of topics that are naturally present in the corpus. We show the merit of the measure by applying it on real-world as well as synthetic data sets(both text and images). In proposing this measure, we view LDA as a matrix factorization mechanism, wherein a given corpus C is split into two matrix factors M-1 and M-2 as given by C-d*w = M1(d*t) x Q(t*w).Where d is the number of documents present in the corpus anti w is the size of the vocabulary. The quality of the split depends on ``t'', the right number of topics chosen. The measure is computed in terms of symmetric KL-Divergence of salient distributions that are derived from these matrix factors. We observe that the divergence values are higher for non-optimal number of topics - this is shown by a `dip' at the right value for `t'.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Song-selection and mood are interdependent. If we capture a song’s sentiment, we can determine the mood of the listener, which can serve as a basis for recommendation systems. Songs are generally classified according to genres, which don’t entirely reflect sentiments. Thus, we require an unsupervised scheme to mine them. Sentiments are classified into either two (positive/negative) or multiple (happy/angry/sad/...) classes, depending on the application. We are interested in analyzing the feelings invoked by a song, involving multi-class sentiments. To mine the hidden sentimental structure behind a song, in terms of “topics”, we consider its lyrics and use Latent Dirichlet Allocation (LDA). Each song is a mixture of moods. Topics mined by LDA can represent moods. Thus we get a scheme of collecting similar-mood songs. For validation, we use a dataset of songs containing 6 moods annotated by users of a particular website.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we present a novel approach that makes use of topic models based on Latent Dirichlet allocation(LDA) for generating single document summaries. Our approach is distinguished from other LDA based approaches in that we identify the summary topics which best describe a given document and only extract sentences from those paragraphs within the document which are highly correlated given the summary topics. This ensures that our summaries always highlight the crux of the document without paying any attention to the grammar and the structure of the documents. Finally, we evaluate our summaries on the DUC 2002 Single document summarization data corpus using ROUGE measures. Our summaries had higher ROUGE values and better semantic similarity with the documents than the DUC summaries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are many popular models available for classification of documents like Naïve Bayes Classifier, k-Nearest Neighbors and Support Vector Machine. In all these cases, the representation is based on the “Bag of words” model. This model doesn't capture the actual semantic meaning of a word in a particular document. Semantics are better captured by proximity of words and their occurrence in the document. We propose a new “Bag of Phrases” model to capture this discriminative power of phrases for text classification. We present a novel algorithm to extract phrases from the corpus using the well known topic model, Latent Dirichlet Allocation(LDA), and to integrate them in vector space model for classification. Experiments show a better performance of classifiers with the new Bag of Phrases model against related representation models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cum ./LSTA_A_8828879_O_XML_IMAGES/LSTA_A_8828879_O_ILM0001.gif rule [Singh (1975)] has been suggested in the literature for finding approximately optimum strata boundaries for proportional allocation, when the stratification is done on the study variable. This paper shows that for the class of density functions arising from the Wang and Aggarwal (1984) representation of the Lorenz Curve (or DBV curves in case of inventory theory), the cum ./LSTA_A_8828879_O_XML_IMAGES/LSTA_A_8828879_O_ILM0002.gif rule in place of giving approximately optimum strata boundaries, yields exactly optimum boundaries. It is also shown that the conjecture of Mahalanobis (1952) “. . .an optimum or nearly optimum solutions will be obtained when the expected contribution of each stratum to the total aggregate value of Y is made equal for all strata” yields exactly optimum strata boundaries for the case considered in the paper.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Fuzzy Waste Load Allocation Model (FWLAM), developed in an earlier study, derives the optimal fractional levels, for the base flow conditions, considering the goals of the Pollution Control Agency (PCA) and dischargers. The Modified Fuzzy Waste Load Allocation Model (MFWLAM) developed subsequently is a stochastic model and considers the moments (mean, variance and skewness) of water quality indicators, incorporating uncertainty due to randomness of input variables along with uncertainty due to imprecision. The risk of low water quality is reduced significantly by using this modified model, but inclusion of new constraints leads to a low value of acceptability level, A, interpreted as the maximized minimum satisfaction in the system. To improve this value, a new model, which is a combination Of FWLAM and MFWLAM, is presented, allowing for some violations in the constraints of MFWLAM. This combined model is a multiobjective optimization model having the objectives, maximization of acceptability level and minimization of violation of constraints. Fuzzy multiobjective programming, goal programming and fuzzy goal programming are used to find the solutions. For the optimization model, Probabilistic Global Search Lausanne (PGSL) is used as a nonlinear optimization tool. The methodology is applied to a case study of the Tunga-Bhadra river system in south India. The model results in a compromised solution of a higher value of acceptability level as compared to MFWLAM, with a satisfactory value of risk. Thus the goal of risk minimization is achieved with a comparatively better value of acceptability level.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents design of a Low power 256x72 bit TCAM in 0.13um CMOS technology. In contrast to conventional Match line (ML) sensing scheme in which equal power is consumed irrespective of match or mismatch, the ML scheme employed in this design allocates less power to match decisions involving a large number of mismatched bits. Typically, the probability of mismatch is high so this scheme results in significant CAM power reduction. We propose to use this technique along with pipelining of search operation in which the MLs are broken into several segments. Since most words fail to match in first segment, the search operation for subsequent segments is discontinued, resulting in further reduction in power consumption. The above architecture provides 70% power reduction while performing search in 3ns.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a detailed analysis of a model for military conflicts where the defending forces have to determine an optimal partitioning of available resources to counter attacks from an adversary in two different fronts in an area fire situation. Lanchester linear law attrition model is used to develop the dynamical equations governing the variation in force strength. Here we address a static resource allocation problem namely, Time-Zero-Allocation (TZA) where the resource allocation is done only at the initial time. Numerical examples are given to support the analytical results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Laser mediated stimulation of biological process was amongst its very first effects documented by Mester et al. but the ambiguous and tissue-cell context specific biological effects of laser radiation is now termed ‘Photobiomodulation’. We found many parallels between the reported biological effects of lasers and a multiface-ted growth factor, Transforming Growth Factor-β (TGF-β). This review outlines the interestingparallelsbetween the twofieldsand our rationalefor pursuingtheir potential causal correlation. We explored this correlation using an in vitro assay systems and a human clinical trial on healing wound extraction sockets that we reported in a recent publication. In conclusion we report that low power laser irradiation can activate latent TGF-β1 and β3 complexes and suggest that this might be one of the major modes of the photobiomodulatory effects of low power lasers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bandwidth allocation for multimedia applications in case of network congestion and failure poses technical challenges due to bursty and delay sensitive nature of the applications. The growth of multimedia services on Internet and the development of agent technology have made us to investigate new techniques for resolving the bandwidth issues in multimedia communications. Agent technology is emerging as a flexible promising solution for network resource management and QoS (Quality of Service) control in a distributed environment. In this paper, we propose an adaptive bandwidth allocation scheme for multimedia applications by deploying the static and mobile agents. It is a run-time allocation scheme that functions at the network nodes. This technique adaptively finds an alternate patchup route for every congested/failed link and reallocates the bandwidth for the affected multimedia applications. The designed method has been tested (analytical and simulation)with various network sizes and conditions. The results are presented to assess the performance and effectiveness of the approach. This work also demonstrates some of the benefits of the agent based schemes in providing flexibility, adaptability, software reusability, and maintainability. (C) 2004 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bandwidth allocation for multimedia applications in case of network congestion and failure poses technical challenges due to bursty and delay sensitive nature of the applications. The growth of multimedia services on Internet and the development of agent technology have made us to investigate new techniques for resolving the bandwidth issues in multimedia communications. Agent technology is emerging as a flexible promising solution for network resource management and QoS (Quality of Service) control in a distributed environment. In this paper, we propose an adaptive bandwidth allocation scheme for multimedia applications by deploying the static and mobile agents. It is a run-time allocation scheme that functions at the network nodes. This technique adaptively finds an alternate patchup route for every congested/failed link and reallocates the bandwidth for the affected multimedia applications. The designed method has been tested (analytical and simulation)with various network sizes and conditions. The results are presented to assess the performance and effectiveness of the approach. This work also demonstrates some of the benefits of the agent based schemes in providing flexibility, adaptability, software reusability, and maintainability. (C) 2004 Elsevier Inc. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A semitheoretical equation for latent heat of vaporization has been derived and tested. The average error in predicting the value at the normal boiling point in the case of about 90 compounds, which includes polar and nonpolar liquids, is about 1.8%. A relation between latent heat of vaporization and surface tension is also derived and is shown to lead to Watson's empirical relation which gives the change of latent heat of vaporization with temperature. This gives a physico-chemical justification for Watson's empirical relation and provides a rapid method of determining latent heats by measuring surface tension.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The decision to patent a technology is a difficult one to make for the top management of any organization. The expected value that the patent might deliver in the market is an important factor that impacts this judgement. Earlier researchers have suggested that patent prices are better indicators of value of a patent and that auction prices are the best way of determining value. However, the lack of public data on pricing has prevented research on understanding the dynamics of patent pricing. Our paper uses singleton patent auction price data of Ocean Tomo LLC to study the prices of patents. We describe price characteristics of these patents. The price of these patents was correlated with their age, and a significant correlation was found. A price - age matrix was developed and we describe the price characteristics of patents using four quadrants of the matrix, namely young and old patents with low and high prices. We also found that patents owned by small firms get transacted more often and inventor owned patents attracted a better price than assignee owned patents.

Relevância:

20.00% 20.00%

Publicador: