86 resultados para Latent semantic indexing
em CentAUR: Central Archive University of Reading - UK
Resumo:
In any data mining applications, automated text and text and image retrieval of information is needed. This becomes essential with the growth of the Internet and digital libraries. Our approach is based on the latent semantic indexing (LSI) and the corresponding term-by-document matrix suggested by Berry and his co-authors. Instead of using deterministic methods to find the required number of first "k" singular triplets, we propose a stochastic approach. First, we use Monte Carlo method to sample and to build much smaller size term-by-document matrix (e.g. we build k x k matrix) from where we then find the first "k" triplets using standard deterministic methods. Second, we investigate how we can reduce the problem to finding the "k"-largest eigenvalues using parallel Monte Carlo methods. We apply these methods to the initial matrix and also to the reduced one. The algorithms are running on a cluster of workstations under MPI and results of the experiments arising in textual retrieval of Web documents as well as comparison of the stochastic methods proposed are presented. (C) 2003 IMACS. Published by Elsevier Science B.V. All rights reserved.
Resumo:
A novel framework for multimodal semantic-associative collateral image labelling, aiming at associating image regions with textual keywords, is described. Both the primary image and collateral textual modalities are exploited in a cooperative and complementary fashion. The collateral content and context based knowledge is used to bias the mapping from the low-level region-based visual primitives to the high-level visual concepts defined in a visual vocabulary. We introduce the notion of collateral context, which is represented as a co-occurrence matrix, of the visual keywords, A collaborative mapping scheme is devised using statistical methods like Gaussian distribution or Euclidean distance together with collateral content and context-driven inference mechanism. Finally, we use Self Organising Maps to examine the classification and retrieval effectiveness of the proposed high-level image feature vector model which is constructed based on the image labelling results.
Resumo:
A large volume of visual content is inaccessible until effective and efficient indexing and retrieval of such data is achieved. In this paper, we introduce the DREAM system, which is a knowledge-assisted semantic-driven context-aware visual information retrieval system applied in the film post production domain. We mainly focus on the automatic labelling and topic map related aspects of the framework. The use of the context- related collateral knowledge, represented by a novel probabilistic based visual keyword co-occurrence matrix, had been proven effective via the experiments conducted during system evaluation. The automatically generated semantic labels were fed into the Topic Map Engine which can automatically construct ontological networks using Topic Maps technology, which dramatically enhances the indexing and retrieval performance of the system towards an even higher semantic level.
Resumo:
There are still major challenges in the area of automatic indexing and retrieval of digital data. The main problem arises from the ever increasing mass of digital media and the lack of efficient methods for indexing and retrieval of such data based on the semantic content rather than keywords. To enable intelligent web interactions or even web filtering, we need to be capable of interpreting the information base in an intelligent manner. Research has been ongoing for a few years in the field of ontological engineering with the aim of using ontologies to add knowledge to information. In this paper we describe the architecture of a system designed to automatically and intelligently index huge repositories of special effects video clips, based on their semantic content, using a network of scalable ontologies to enable intelligent retrieval.
Resumo:
Scene classification based on latent Dirichlet allocation (LDA) is a more general modeling method known as a bag of visual words, in which the construction of a visual vocabulary is a crucial quantization process to ensure success of the classification. A framework is developed using the following new aspects: Gaussian mixture clustering for the quantization process, the use of an integrated visual vocabulary (IVV), which is built as the union of all centroids obtained from the separate quantization process of each class, and the usage of some features, including edge orientation histogram, CIELab color moments, and gray-level co-occurrence matrix (GLCM). The experiments are conducted on IKONOS images with six semantic classes (tree, grassland, residential, commercial/industrial, road, and water). The results show that the use of an IVV increases the overall accuracy (OA) by 11 to 12% and 6% when it is implemented on the selected and all features, respectively. The selected features of CIELab color moments and GLCM provide a better OA than the implementation over CIELab color moment or GLCM as individuals. The latter increases the OA by only ∼2 to 3%. Moreover, the results show that the OA of LDA outperforms the OA of C4.5 and naive Bayes tree by ∼20%. © 2014 Society of Photo-Optical Instrumentation Engineers (SPIE) [DOI: 10.1117/1.JRS.8.083690]
Resumo:
The flow dynamics of crystal-rich high-viscosity magma is likely to be strongly influenced by viscous and latent heat release. Viscous heating is observed to play an important role in the dynamics of fluids with temperature-dependent viscosities. The growth of microlite crystals and the accompanying release of latent heat should play a similar role in raising fluid temperatures. Earlier models of viscous heating in magmas have shown the potential for unstable (thermal runaway) flow as described by a Gruntfest number, using an Arrhenius temperature dependence for the viscosity, but have not considered crystal growth or latent heating. We present a theoretical model for magma flow in an axisymmetric conduit and consider both heating effects using Finite Element Method techniques. We consider a constant mass flux in a 1-D infinitesimal conduit segment with isothermal and adiabatic boundary conditions and Newtonian and non-Newtonian magma flow properties. We find that the growth of crystals acts to stabilize the flow field and make the magma less likely to experience a thermal runaway. The additional heating influences crystal growth and can counteract supercooling from degassing-induced crystallization and drive the residual melt composition back towards the liquidus temperature. We illustrate the models with results generated using parameters appropriate for the andesite lava dome-forming eruption at Soufriere Hills Volcano, Montserrat. These results emphasize the radial variability of the magma. Both viscous and latent heating effects are shown to be capable of playing a significant role in the eruption dynamics of Soufriere Hills Volcano. Latent heating is a factor in the top two kilometres of the conduit and may be responsible for relatively short-term (days) transients. Viscous heating is less restricted spatially, but because thermal runaway requires periods of hundreds of days to be achieved, the process is likely to be interrupted. Our models show that thermal evolution of the conduit walls could lead to an increase in the effective diameter of flow and an increase in flux at constant magma pressure.
Resumo:
In this paper, we introduce a novel high-level visual content descriptor which is devised for performing semantic-based image classification and retrieval. The work can be treated as an attempt to bridge the so called “semantic gap”. The proposed image feature vector model is fundamentally underpinned by the image labelling framework, called Collaterally Confirmed Labelling (CCL), which incorporates the collateral knowledge extracted from the collateral texts of the images with the state-of-the-art low-level image processing and visual feature extraction techniques for automatically assigning linguistic keywords to image regions. Two different high-level image feature vector models are developed based on the CCL labelling of results for the purposes of image data clustering and retrieval respectively. A subset of the Corel image collection has been used for evaluating our proposed method. The experimental results to-date already indicates that our proposed semantic-based visual content descriptors outperform both traditional visual and textual image feature models.
Resumo:
The storage and processing capacity realised by computing has lead to an explosion of data retention. We now reach the point of information overload and must begin to use computers to process more complex information. In particular, the proposition of the Semantic Web has given structure to this problem, but has yet realised practically. The largest of its problems is that of ontology construction; without a suitable automatic method most will have to be encoded by hand. In this paper we discus the current methods for semi and fully automatic construction and their current shortcomings. In particular we pay attention the application of ontologies to products and the particle application of the ontologies.
Resumo:
Currently many ontologies are available for addressing different domains. However, it is not always possible to deploy such ontologies to support collaborative working, so that their full potential can be exploited to implement intelligent cooperative applications capable of reasoning over a network of context-specific ontologies. The main problem arises from the fact that presently ontologies are created in an isolated way to address specific needs. However we foresee the need for a network of ontologies which will support the next generation of intelligent applications/devices, and, the vision of Ambient Intelligence. The main objective of this paper is to motivate the design of a networked ontology (Meta) model which formalises ways of connecting available ontologies so that they are easy to search, to characterise and to maintain. The aim is to make explicit the virtual and implicit network of ontologies serving the Semantic Web.
Resumo:
Sensible and latent heat fluxes are often calculated from bulk transfer equations combined with the energy balance. For spatial estimates of these fluxes, a combination of remotely sensed and standard meteorological data from weather stations is used. The success of this approach depends on the accuracy of the input data and on the accuracy of two variables in particular: aerodynamic and surface conductance. This paper presents a Bayesian approach to improve estimates of sensible and latent heat fluxes by using a priori estimates of aerodynamic and surface conductance alongside remote measurements of surface temperature. The method is validated for time series of half-hourly measurements in a fully grown maize field, a vineyard and a forest. It is shown that the Bayesian approach yields more accurate estimates of sensible and latent heat flux than traditional methods.
Resumo:
Chess endgame tables should provide efficiently the value and depth of any required position during play. The indexing of an endgame’s positions is crucial to meeting this objective. This paper updates Heinz’ previous review of approaches to indexing and describes the latest approach by the first and third authors. Heinz’ and Nalimov’s endgame tables (EGTs) encompass the en passant rule and have the most compact index schemes to date. Nalimov’s EGTs, to the Distance-to-Mate (DTM) metric, require only 30.6 × 10^9 elements in total for all the 3-to-5-man endgames and are individually more compact than previous tables. His new index scheme has proved itself while generating the tables and in the 1999 World Computer Chess Championship where many of the top programs used the new suite of EGTs.
Resumo:
Chess endgame tables should provide efficiently the value and depth of any required position during play. The indexing of an endgame’s positions is crucial to meeting this objective. This paper updates Heinz’ previous review of approaches to indexing and describes the latest approach by the first and third authors. Heinz’ and Nalimov’s endgame tables (EGTs) encompass the en passant rule and have the most compact index schemes to date. Nalimov’s EGTs, to the Distance-to-Mate (DTM) metric, require only 30.6 × 109 elements in total for all the 3-to-5-man endgames and are individually more compact than previous tables. His new index scheme has proved itself while generating the tables and in the 1999 World Computer Chess Championship where many of the top programs used the new suite of EGTs.