311 resultados para singular value decomposition (SVD)
Resumo:
This article explores two matrix methods to induce the ``shades of meaning" (SoM) of a word. A matrix representation of a word is computed from a corpus of traces based on the given word. Non-negative Matrix Factorisation (NMF) and Singular Value Decomposition (SVD) compute a set of vectors corresponding to a potential shade of meaning. The two methods were evaluated based on loss of conditional entropy with respect to two sets of manually tagged data. One set reflects concepts generally appearing in text, and the second set comprises words used for investigations into word sense disambiguation. Results show that for NMF consistently outperforms SVD for inducing both SoM of general concepts as well as word senses. The problem of inducing the shades of meaning of a word is more subtle than that of word sense induction and hence relevant to thematic analysis of opinion where nuances of opinion can arise.
Resumo:
The computation of compact and meaningful representations of high dimensional sensor data has recently been addressed through the development of Nonlinear Dimensional Reduction (NLDR) algorithms. The numerical implementation of spectral NLDR techniques typically leads to a symmetric eigenvalue problem that is solved by traditional batch eigensolution algorithms. The application of such algorithms in real-time systems necessitates the development of sequential algorithms that perform feature extraction online. This paper presents an efficient online NLDR scheme, Sequential-Isomap, based on incremental singular value decomposition (SVD) and the Isomap method. Example simulations demonstrate the validity and significant potential of this technique in real-time applications such as autonomous systems.
Resumo:
Internet chatrooms are common means of interaction and communications, and they carry valuable information about formal or ad-hoc formation of groups with diverse objectives. This work presents a fully automated surveillance system for data collection and analysis in Internet chatrooms. The system has two components: First, it has an eavesdropping tool which collects statistics on individual (chatter) and chatroom behavior. This data can be used to profile a chatroom and its chatters. Second, it has a computational discovery algorithm based on Singular Value Decomposition (SVD) to locate hidden communities and communication patterns within a chatroom. The eavesdropping tool is used for fine tuning the SVD-based discovery algorithm which can be deployed in real-time and requires no semantic information processing. The evaluation of the system on real data shows that (i) statistical properties of different chatrooms vary significantly, thus profiling is possible, (ii) SVD-based algorithm has up to 70-80% accuracy to discover groups of chatters.
Resumo:
Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.
Resumo:
Two decades after its inception, Latent Semantic Analysis(LSA) has become part and parcel of every modern introduction to Information Retrieval. For any tool that matures so quickly, it is important to check its lore and limitations, or else stagnation will set in. We focus here on the three main aspects of LSA that are well accepted, and the gist of which can be summarized as follows: (1) that LSA recovers latent semantic factors underlying the document space, (2) that such can be accomplished through lossy compression of the document space by eliminating lexical noise, and (3) that the latter can best be achieved by Singular Value Decomposition. For each aspect we performed experiments analogous to those reported in the LSA literature and compared the evidence brought to bear in each case. On the negative side, we show that the above claims about LSA are much more limited than commonly believed. Even a simple example may show that LSA does not recover the optimal semantic factors as intended in the pedagogical example used in many LSA publications. Additionally, and remarkably deviating from LSA lore, LSA does not scale up well: the larger the document space, the more unlikely that LSA recovers an optimal set of semantic factors. On the positive side, we describe new algorithms to replace LSA (and more recent alternatives as pLSA, LDA, and kernel methods) by trading its l2 space for an l1 space, thereby guaranteeing an optimal set of semantic factors. These algorithms seem to salvage the spirit of LSA as we think it was initially conceived.
Resumo:
The aim of this paper is to provide a comparison of various algorithms and parameters to build reduced semantic spaces. The effect of dimension reduction, the stability of the representation and the effect of word order are examined in the context of the five algorithms bearing on semantic vectors: Random projection (RP), singular value decom- position (SVD), non-negative matrix factorization (NMF), permutations and holographic reduced representations (HRR). The quality of semantic representation was tested by means of synonym finding task using the TOEFL test on the TASA corpus. Dimension reduction was found to improve the quality of semantic representation but it is hard to find the optimal parameter settings. Even though dimension reduction by RP was found to be more generally applicable than SVD, the semantic vectors produced by RP are somewhat unstable. The effect of encoding word order into the semantic vector representation via HRR did not lead to any increase in scores over vectors constructed from word co-occurrence in context information. In this regard, very small context windows resulted in better semantic vectors for the TOEFL test.
Resumo:
This paper analyzes the performance of some of the widely used voltage stability indices, namely, singular value, eigenvalue, and loading margin with different static load models. Well-known ZIP model is used to represent loads having components with different power to voltage sensitivities. Studies are carried out on a 10-bus power system and the New England 39-bus power system models. The effects of variation of load model on the performance of the voltage stability indices are discussed. The choice of voltage stability index in the context of load modelling is also suggested in this paper.
Resumo:
The results of a numerical investigation into the errors for least squares estimates of function gradients are presented. The underlying algorithm is obtained by constructing a least squares problem using a truncated Taylor expansion. An error bound associated with this method contains in its numerator terms related to the Taylor series remainder, while its denominator contains the smallest singular value of the least squares matrix. Perhaps for this reason the error bounds are often found to be pessimistic by several orders of magnitude. The circumstance under which these poor estimates arise is elucidated and an empirical correction of the theoretical error bounds is conjectured and investigated numerically. This is followed by an indication of how the conjecture is supported by a rigorous argument.
Resumo:
A fixed bed pyrolysis has been designed and fabricated for obtaining liquid fuel from Mahogany seeds. The major components of the system are fixed bed pyrolysis reactor, liquid condenser and liquid collectors. The Mahogany seed in particle form is pyrolysed in an externally heated 10 cm diameter and 36 cm high fixed bed reactor with nitrogen as the carrier gas. The reactor is heated by means of a biomass source cylindrical heater from 450oC to 600oC. The products are oil, char and gas. The reactor bed temperature, running time and feed particle size are considered as process parameters. A maximum liquid yield of 54wt% of biomass feed is obtained with particle size of 1.18 mm at a reactor bed temperature of 5500C with a running time of 90 minutes. The oil is found to possess favorable flash point and reasonable density and viscosity. The higher calorific value is found to be 39.9 MJ/kg which is higher than other biomass derived pyrolysis oils.
Resumo:
The morphological and chemical changes occurring during the thermal decomposition of weddelite, CaC2O4·2H2O, have been followed in real time in a heating stage attached to an Environmental Scanning Electron Microscope operating at a pressure of 2 Torr, with a heating rate of 10 °C/min and an equilibration time of approximately 10 min. The dehydration step around 120 °C and the loss of CO around 425 °C do not involve changes in morphology, but changes in the composition were observed. The final reaction of CaCO3 to CaO while evolving CO2 around 600 °C involved the formation of chains of very small oxide particles pseudomorphic to the original oxalate crystals. The change in chemical composition could only be observed after cooling the sample to 350 °C because of the effects of thermal radiation.
Resumo:
The thermal stability and thermal decomposition pathways for synthetic iowaite have been determined using thermogravimetry in conjunction with evolved gas mass spectrometry. Chemical analysis showed the formula of the synthesised iowaite to be Mg6.27Fe1.73(Cl)1.07(OH)16(CO3)0.336.1H2O and X-ray diffraction confirms the layered structure. Dehydration of the iowaite occurred at 35 and 79°C. Dehydroxylation occurred at 254 and 291°C. Both steps were associated with the loss of CO2. Hydrogen chloride gas was evolved in two steps at 368 and 434°C. The products of the thermal decomposition were MgO and a spinel MgFe2O4. Experimentally it was found to be difficult to eliminate CO2 from inclusion in the interlayer during the synthesis of the iowaite compound and in this way the synthesised iowaite resembled the natural mineral.