24 resultados para World Wide Web (Information Retrieval System)

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with “false correlation”. In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a twophase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problemresulted from the sparse term-paragraphmatrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerancerough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main argument of this paper is that Natural Language Processing (NLP) does, and will continue to, underlie the Semantic Web (SW), including its initial construction from unstructured sources like the World Wide Web (WWW), whether its advocates realise this or not. Chiefly, we argue, such NLP activity is the only way up to a defensible notion of meaning at conceptual levels (in the original SW diagram) based on lower level empirical computations over usage. Our aim is definitely not to claim logic-bad, NLP-good in any simple-minded way, but to argue that the SW will be a fascinating interaction of these two methodologies, again like the WWW (which has been basically a field for statistical NLP research) but with deeper content. Only NLP technologies (and chiefly information extraction) will be able to provide the requisite RDF knowledge stores for the SW from existing unstructured text databases in the WWW, and in the vast quantities needed. There is no alternative at this point, since a wholly or mostly hand-crafted SW is also unthinkable, as is a SW built from scratch and without reference to the WWW. We also assume that, whatever the limitations on current SW representational power we have drawn attention to here, the SW will continue to grow in a distributed manner so as to serve the needs of scientists, even if it is not perfect. The WWW has already shown how an imperfect artefact can become indispensable.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper summarizes the scientific work presented at the 32nd European Conference on Information Retrieval. It demonstrates that information retrieval (IR) as a research area continues to thrive with progress being made in three complementary sub-fields, namely IR theory and formal methods together with indexing and query representation issues, furthermore Web IR as a primary application area and finally research into evaluation methods and metrics. It is the combination of these areas that gives IR its solid scientific foundations. The paper also illustrates that significant progress has been made in other areas of IR. The keynote speakers addressed three such subject fields, social search engines using personalization and recommendation technologies, the renewed interest in applying natural language processing to IR, and multimedia IR as another fast-growing area.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities closely related to the target effectively and efficiently. With respect to such relatedness, a measure of relation strength between entities is defined. LRD uses relation strength to enhance the vector space model, and uses the enhanced vector space model for query based IR on documents and clustering documents in order to discover complex relationships among terms and entities. Our experiments on a standard dataset for query based IR shows that our LRD method performed significantly better than traditional vector space model and other five standard statistical methods for vector expansion.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Microposts are small fragments of social media content that have been published using a lightweight paradigm (e.g. Tweets, Facebook likes, foursquare check-ins). Microposts have been used for a variety of applications (e.g., sentiment analysis, opinion mining, trend analysis), by gleaning useful information, often using third-party concept extraction tools. There has been very large uptake of such tools in the last few years, along with the creation and adoption of new methods for concept extraction. However, the evaluation of such efforts has been largely consigned to document corpora (e.g. news articles), questioning the suitability of concept extraction tools and methods for Micropost data. This report describes the Making Sense of Microposts Workshop (#MSM2013) Concept Extraction Challenge, hosted in conjunction with the 2013 World Wide Web conference (WWW'13). The Challenge dataset comprised a manually annotated training corpus of Microposts and an unlabelled test corpus. Participants were set the task of engineering a concept extraction system for a defined set of concepts. Out of a total of 22 complete submissions 13 were accepted for presentation at the workshop; the submissions covered methods ranging from sequence mining algorithms for attribute extraction to part-of-speech tagging for Micropost cleaning and rule-based and discriminative models for token classification. In this report we describe the evaluation process and explain the performance of different approaches in different contexts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis describes the development of an operational river basin water resources information management system. The river or drainage basin is the fundamental unit of the system; in both the modelling and prediction of hydrological processes, and in the monitoring of the effect of catchment management policies. A primary concern of the study is the collection of sufficient and sufficiently accurate information to model hydrological processes. Remote sensing, in combination with conventional point source measurement, can be a valuable source of information, but is often overlooked by hydrologists, due to the cost of acquisition and processing. This thesis describes a number of cost effective methods of acquiring remotely sensed imagery, from airborne video survey to real time ingestion of meteorological satellite data. Inexpensive micro-computer systems and peripherals are used throughout to process and manipulate the data. Spatial information systems provide a means of integrating these data with topographic and thematic cartographic data, and historical records. For the system to have any real potential the data must be stored in a readily accessible format and be easily manipulated within the database. The design of efficient man-machine interfaces and the use of software enginering methodologies are therefore included in this thesis as a major part of the design of the system. The use of low cost technologies, from micro-computers to video cameras, enables the introduction of water resources information management systems into developing countries where the potential benefits are greatest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis explores how the world-wide-web can be used to support English language teachers doing further studies at a distance. The future of education worldwide is moving towards a requirement that we, as teacher educators, use the latest web technology not as a gambit, but as a viable tool to improve learning. By examining the literature on knowledge, teacher education and web training, a model of teacher knowledge development, along with statements of advice for web developers based upon the model are developed. Next, the applicability and viability of both the model and statements of advice are examined by developing a teacher support site (bttp://www. philseflsupport. com) according to these principles. The data collected from one focus group of users from sixteen different countries, all studying on the same distance Masters programme, is then analysed in depth. The outcomes from the research are threefold: A functioning website that is averaging around 15, 000 hits a month provides a professional contribution. An expanded model of teacher knowledge development that is based upon five theoretical principles that reflect the ever-expanding cyclical nature of teacher learning provides an academic contribution. A series of six statements of advice for developers of teacher support sites. These statements are grounded in the theoretical principles behind the model of teacher knowledge development and incorporate nine keys to effective web facilitation. Taken together, they provide a forward-looking contribution to the praxis of web supported teacher education, and thus to the potential dissemination of the research presented here. The research has succeeded in reducing the proliferation of terminology in teacher knowledge into a succinct model of teacher knowledge development. The model may now be used to further our understanding of how teachers learn and develop as other research builds upon the individual study here. NB: Appendix 4 is only available only available for consultation at Aston University Library with prior arrangement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The World Wide Web provides plentiful contents for Web-based learning, but its hyperlink-based architecture connects Web resources for browsing freely rather than for effective learning. To support effective learning, an e-learning system should be able to discover and make use of the semantic communities and the emerging semantic relations in a dynamic complex network of learning resources. Previous graph-based community discovery approaches are limited in ability to discover semantic communities. This paper first suggests the Semantic Link Network (SLN), a loosely coupled semantic data model that can semantically link resources and derive out implicit semantic links according to a set of relational reasoning rules. By studying the intrinsic relationship between semantic communities and the semantic space of SLN, approaches to discovering reasoning-constraint, rule-constraint, and classification-constraint semantic communities are proposed. Further, the approaches, principles, and strategies for discovering emerging semantics in dynamic SLNs are studied. The basic laws of the semantic link network motion are revealed for the first time. An e-learning environment incorporating the proposed approaches, principles, and strategies to support effective discovery and learning is suggested.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The World Wide Web is opening up access to documents and data for scholars. However it has not yet impacted on one of the primary activities in research: assessing new findings in the light of current knowledge and debating it with colleagues. The ClaiMaker system uses a directed graph model with similarities to hypertext, in which new ideas are published as nodes, which other contributors can build on or challenge in a variety of ways by linking to them. Nodes and links have semantic structure to facilitate the provision of specialist services for interrogating and visualizing the emerging network. By way of example, this paper is grounded in a ClaiMaker model to illustrate how new claims can be described in this structured way.