24 resultados para cross-language information retrieval
em CentAUR: Central Archive University of Reading - UK
Resumo:
In any data mining applications, automated text and text and image retrieval of information is needed. This becomes essential with the growth of the Internet and digital libraries. Our approach is based on the latent semantic indexing (LSI) and the corresponding term-by-document matrix suggested by Berry and his co-authors. Instead of using deterministic methods to find the required number of first "k" singular triplets, we propose a stochastic approach. First, we use Monte Carlo method to sample and to build much smaller size term-by-document matrix (e.g. we build k x k matrix) from where we then find the first "k" triplets using standard deterministic methods. Second, we investigate how we can reduce the problem to finding the "k"-largest eigenvalues using parallel Monte Carlo methods. We apply these methods to the initial matrix and also to the reduced one. The algorithms are running on a cluster of workstations under MPI and results of the experiments arising in textual retrieval of Web documents as well as comparison of the stochastic methods proposed are presented. (C) 2003 IMACS. Published by Elsevier Science B.V. All rights reserved.
Resumo:
A large volume of visual content is inaccessible until effective and efficient indexing and retrieval of such data is achieved. In this paper, we introduce the DREAM system, which is a knowledge-assisted semantic-driven context-aware visual information retrieval system applied in the film post production domain. We mainly focus on the automatic labelling and topic map related aspects of the framework. The use of the context- related collateral knowledge, represented by a novel probabilistic based visual keyword co-occurrence matrix, had been proven effective via the experiments conducted during system evaluation. The automatically generated semantic labels were fed into the Topic Map Engine which can automatically construct ontological networks using Topic Maps technology, which dramatically enhances the indexing and retrieval performance of the system towards an even higher semantic level.
Resumo:
In general, ranking entities (resources) on the Semantic Web (SW) is subject to importance, relevance, and query length. Few existing SW search systems cover all of these aspects. Moreover, many existing efforts simply reuse the technologies from conventional Information Retrieval (IR), which are not designed for SW data. This paper proposes a ranking mechanism, which includes all three categories of rankings and are tailored to SW data.
Resumo:
This study investigated the relationships between phonological awareness and reading in Oriya and English. Oriya is the official language of Orissa, an eastern state of India. The writing system is an alphasyllabary. Ninety-nine fifth grade children (mean age 9 years 7 months) were assessed on measures of phonological awareness, word reading and pseudo-word reading in both languages. Forty-eight of the children attended Oriya-medium schools where they received literacy instruction in Oriya from grade 1 and learned English from grade 2. Fifty-one children attended English-medium schools where they received literacy instruction in English from grade 1 and in Oriya from grade 2. The results showed that phonological awareness in Oriya contributed significantly to reading Oriya and English words and pseudo-words for the children in the Oriya-medium schools. However, it only contributed to Oriya pseudo-word reading and English word reading for children in the English-medium schools. Phonological awareness in English contributed to English word and pseudo-word reading for both groups. Further analyses investigated the contribution of awareness of large phonological units (syllable, onsets and rimes) and small phonological units (phonemes) to reading in each language. The data suggest that cross-language transfer and facilitation of phonological awareness to word reading is not symmetrical across languages and may depend both on the characteristics of the different orthographies of the languages being learned and whether the first literacy language is also the first spoken language.
Resumo:
Search has become a hot topic in Internet computing, with rival search engines battling to become the de facto Web portal, harnessing search algorithms to wade through information on a scale undreamed of by early information retrieval (IR) pioneers. This article examines how search has matured from its roots in specialized IR systems to become a key foundation of the Web. The authors describe new challenges posed by the Web's scale, and show how search is changing the nature of the Web as much as the Web has changed the nature of search
Resumo:
The Web's link structure (termed the Web Graph) is a richly connected set of Web pages. Current applications use this graph for indexing and information retrieval purposes. In contrast the relationship between Web Graph and application is reversed by letting the structure of the Web Graph influence the behaviour of an application. Presents a novel Web crawling agent, AlienBot, the output of which is orthogonally coupled to the enemy generation strategy of a computer game. The Web Graph guides AlienBot, causing it to generate a stochastic process. Shows the effectiveness of such unorthodox coupling to both the playability of the game and the heuristics of the Web crawler. In addition, presents the results of the sample of Web pages collected by the crawling process. In particular, shows: how AlienBot was able to identify the power law inherent in the link structure of the Web; that 61.74 per cent of Web pages use some form of scripting technology; that the size of the Web can be estimated at just over 5.2 billion pages; and that less than 7 per cent of Web pages fully comply with some variant of (X)HTML.
Resumo:
A quasi-optical interferometric technique capable of measuring antenna phase patterns without the need for a heterodyne receiver is presented. It is particularly suited to the characterization of terahertz antennas feeding power detectors or mixers employing quasi-optical local oscillator injection. Examples of recorded antenna phase patterns at frequencies of 1.4 and 2.5 THz using homodyne detectors are presented. To our knowledge, these are the highest frequency antenna phase patterns ever recovered. Knowledge of both the amplitude and phase patterns in the far field enable a Gauss-Hermite or Gauss-Laguerre beam-mode analysis to be carried out for the antenna, of importance in performance optimization calculations, such as antenna gain and beam efficiency parameters at the design and prototype stage of antenna development. A full description of the beam would also be required if the antenna is to be used to feed a quasi-optical system in the near-field to far-field transition region. This situation could often arise when the device is fitted directly at the back of telescopes in flying observatories. A further benefit of the proposed technique is simplicity for characterizing systems in situ, an advantage of considerable importance as in many situations, the components may not be removable for further characterization once assembled. The proposed methodology is generic and should be useful across the wider sensing community, e.g., in single detector acoustic imaging or in adaptive imaging array applications. Furthermore, it is applicable across other frequencies of the EM spectrum, provided adequate spatial and temporal phase stability of the source can be maintained throughout the measurement process. Phase information retrieval is also of importance to emergent research areas, such as band-gap structure characterization, meta-materials research, electromagnetic cloaking, slow light, super-lens design as well as near-field and virtual imaging applications.
Resumo:
Information systems integration becomes critical in enhancing organisational competitiveness through effective use of information resource provided by the whole host of information systems. Information systems integration in its nature is a process of bringing about the capability of communication and information exchange between systems; while interoperability, often as the result of systems integration, is such a capability. However currently there is a lack of theoretical foundation for representation and measure of the interoperability in organisations. Organisational semiotics provides a theoretical foundation for systems interoperability. A notion of ‘semiotic interoperability’ is proposed in this paper as a paradigm, guiding systems integration and measuring degree of interoperability, covering aspects from physical properties, transmission structure of signs, placing emphasis on communicating meaning, intention to social consequence of information.
Resumo:
We report multi-instrument observations during an isolated substorm on 17 October 1989. The EISCAT radar operated in the SP-UK-POLI mode measuring ionospheric convection at latitudes 71°-78°. SAMNET and the EISCAT Magnetometer Cross provide information on the timing of substorm expansion phase onset and subsequent intensifications, as well as the location of the field aligned and ionospheric currents associated with the substorm current wedge. IMP-8 magnetic field data are also included. Evidence of a substorm growth phase is provided by the equatorward motion of a flow reversal boundary across the EISCAT radar field of view at 2130 MLT, following a southward turning of the interplanetary magnetic field (IMF). We infer that the polar cap expanded as a result of the addition of open magnetic flux to the tail lobes during this interval. The flow reversal boundary, which is a lower limit to the polar cap boundary, reached an invariant latitude equatorward of 71° by the time of the expansion phase onset. A westward electrojet, centred at 65.4°, occurred at the onset of the expansion phase. This electrojet subsequently moved poleward to a maximum of 68.1° at 2000 UT and also widened. During the expansion phase, there is evidence of bursts of plasma flow which are spatially localised at longitudes within the substorm current wedge and which occurred well poleward of the westward electrojet. We conclude that the substorm onset region in the ionosphere, defined by the westward electrojet, mapped to a part of the tail radially earthward of the boundary between open and closed magnetic flux, the “distant” neutral line. Thus the substorm was not initiated at the distant neutral line, although there is evidence that it remained active during the expansion phase. It is not obvious whether the electrojet mapped to a near-Earth neutral line, but at its most poleward, the expanded electrojet does not reach the estimated latitude of the polar cap boundary.
Resumo:
Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.