997 resultados para Link Mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

At NTCIR-10 we participated in the cross-lingual link discovery (CrossLink-2) task. In this paper we describe our systems for discovering cross-lingual links between the Chinese, Japanese, and Korean (CJK) Wikipedia and the English Wikipedia. The evaluation results show that our implementation of the cross-lingual linking method achieved promising results.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper, we discuss our participation to the INEX 2008 Link-the-Wiki track. We utilized a sliding window based algorithm to extract the frequent terms and phrases. Using the extracted phrases and term as descriptive vectors, the anchors and relevant links (both incoming and outgoing) are recognized efficiently.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

At NTCIR-9, we participated in the cross-lingual link discovery (Crosslink) task. In this paper we describe our approaches to discovering Chinese, Japanese, and Korean (CJK) cross-lingual links for English documents in Wikipedia. Our experimental results show that a link mining approach that mines the existing link structure for anchor probabilities and relies on the “translation” using cross-lingual document name triangulation performs very well. The evaluation shows encouraging results for our system.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we examine automated Chinese to English link discovery in Wikipedia and the effects of Chinese segmentation and Chinese to English translation on the hyperlink recommendation. Our experimental results show that the implemented link discovery framework can effectively recommend Chinese-to-English cross-lingual links. The techniques described here can assist bi-lingual users where a particular topic is not covered in Chinese, is not equally covered in both languages, or is biased in one language; as well as for language learning.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As plataformas de e-Learning são cada vez mais utilizadas na educação à distância, facto que se encontra diretamente relacionado com a possibilidade de proporcionarem aos seus alunos a valência de poderem assistir a cursos em qualquer lugar. Dentro do âmbito das plataformas de e-Learning encontra-se um grupo especialmente interessante: as plataformas adaptativas, que tendem a substituir o professor (presencial) através de interatividade, variabilidade de conteúdos, automatização e capacidade para resolução de problemas e simulação de comportamentos educacionais. O projeto ADAPT (plataforma adaptativa de e-Learning) consiste na criação de uma destas plataformas, implementando tutoria inteligente, resolução de problemas com base em experiências passadas, algoritmos genéticos e link-mining. É na área de link-mining que surge o desenvolvimento desta dissertação que documenta o desenvolvimento de quatro módulos distintos: O primeiro módulo consiste num motor de busca para sugestão de conteúdos alternativos; o segundo módulo consiste na identificação de mudanças de estilo de aprendizagem; o terceiro módulo consiste numa plataforma de análise de dados que implementa várias técnicas de data mining e estatística para fornecer aos professores/tutores informações importantes que não seriam visíveis sem recurso a este tipo de técnicas; por fim, o último módulo consiste num sistema de recomendações que sugere aos alunos os artigos mais adequados com base nas consultas de alunos com perfis semelhantes. Esta tese documenta o desenvolvimento dos vários protótipos para cada um destes módulos. Os testes efetuados para cada módulo mostram que as metodologias utilizadas são válidas e viáveis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the advent of Service Oriented Architecture, Web Services have gained tremendous popularity. Due to the availability of a large number of Web services, finding an appropriate Web service according to the requirement of the user is a challenge. This warrants the need to establish an effective and reliable process of Web service discovery. A considerable body of research has emerged to develop methods to improve the accuracy of Web service discovery to match the best service. The process of Web service discovery results in suggesting many individual services that partially fulfil the user’s interest. By considering the semantic relationships of words used in describing the services as well as the use of input and output parameters can lead to accurate Web service discovery. Appropriate linking of individual matched services should fully satisfy the requirements which the user is looking for. This research proposes to integrate a semantic model and a data mining technique to enhance the accuracy of Web service discovery. A novel three-phase Web service discovery methodology has been proposed. The first phase performs match-making to find semantically similar Web services for a user query. In order to perform semantic analysis on the content present in the Web service description language document, the support-based latent semantic kernel is constructed using an innovative concept of binning and merging on the large quantity of text documents covering diverse areas of domain of knowledge. The use of a generic latent semantic kernel constructed with a large number of terms helps to find the hidden meaning of the query terms which otherwise could not be found. Sometimes a single Web service is unable to fully satisfy the requirement of the user. In such cases, a composition of multiple inter-related Web services is presented to the user. The task of checking the possibility of linking multiple Web services is done in the second phase. Once the feasibility of linking Web services is checked, the objective is to provide the user with the best composition of Web services. In the link analysis phase, the Web services are modelled as nodes of a graph and an allpair shortest-path algorithm is applied to find the optimum path at the minimum cost for traversal. The third phase which is the system integration, integrates the results from the preceding two phases by using an original fusion algorithm in the fusion engine. Finally, the recommendation engine which is an integral part of the system integration phase makes the final recommendations including individual and composite Web services to the user. In order to evaluate the performance of the proposed method, extensive experimentation has been performed. Results of the proposed support-based semantic kernel method of Web service discovery are compared with the results of the standard keyword-based information-retrieval method and a clustering-based machine-learning method of Web service discovery. The proposed method outperforms both information-retrieval and machine-learning based methods. Experimental results and statistical analysis also show that the best Web services compositions are obtained by considering 10 to 15 Web services that are found in phase-I for linking. Empirical results also ascertain that the fusion engine boosts the accuracy of Web service discovery by combining the inputs from both the semantic analysis (phase-I) and the link analysis (phase-II) in a systematic fashion. Overall, the accuracy of Web service discovery with the proposed method shows a significant improvement over traditional discovery methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This report demonstrates the development of: • Development of software agents for data miningLink data mining to building model in virtual environments • Link knowledge development with building model in virtual environments • Demonstration of software agents for data mining • Populate with maintenance data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

peaker(s): Jon Hare Organiser: Time: 25/06/2014 11:00-11:50 Location: B32/3077 Abstract The aggregation of items from social media streams, such as Flickr photos and Twitter tweets, into meaningful groups can help users contextualise and effectively consume the torrents of information on the social web. This task is challenging due to the scale of the streams and the inherently multimodal nature of the information being contextualised. In this talk I'll describe some of our recent work on trend and event detection in multimedia data streams. We focus on scalable streaming algorithms that can be applied to multimedia data streams from the web and the social web. The talk will cover two particular aspects of our work: mining Twitter for trending images by detecting near duplicates; and detecting social events in multimedia data with streaming clustering algorithms. I'll will describe in detail our techniques, and explore open questions and areas of potential future work, in both these tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Distributed and collaborative data stream mining in a mobile computing environment is referred to as Pocket Data Mining PDM. Large amounts of available data streams to which smart phones can subscribe to or sense, coupled with the increasing computational power of handheld devices motivates the development of PDM as a decision making system. This emerging area of study has shown to be feasible in an earlier study using technological enablers of mobile software agents and stream mining techniques [1]. A typical PDM process would start by having mobile agents roam the network to discover relevant data streams and resources. Then other (mobile) agents encapsulating stream mining techniques visit the relevant nodes in the network in order to build evolving data mining models. Finally, a third type of mobile agents roam the network consulting the mining agents for a final collaborative decision, when required by one or more users. In this paper, we propose the use of distributed Hoeffding trees and Naive Bayes classifers in the PDM framework over vertically partitioned data streams. Mobile policing, health monitoring and stock market analysis are among the possible applications of PDM. An extensive experimental study is reported showing the effectiveness of the collaborative data mining with the two classifers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose – This paper aims to propose a conceptual framework to explore the link between strategic human resource management (SHRM) and firm performance of the coal mining companies in Central Queensland (CQ), Australia.

Design/methodology/approach – The paper reviews literature relating to the process and issues of transforming human resource practices and industrial relations of the coal industry in Australia for the past decade. Theoretical development and empirical studies on the SHRM-performance linkage are discussed. Based on the literature review, the paper develops an integrated model for testing the relationship between SHRM and firm performance in the context of CQ's coalmines and proposes a number of research propositions.

Findings – Three perceivable outcomes are likely derived from application of this framework in the field. First, a testing of the linkage between strategic HRM and firm performance in the coal industry, using an integrated approach, would complement the empirical deficiency of treatments on the prior SHRM models. Second, data at firm level could be collected to develop a better understanding of how the adoption of strategic HRM practices in coal companies can affect firm performance. Third, the extent of flexibility practices, use of contractors and associated management practices could be identified.

Originality/value – The coal industry is central to economic development of regional Queensland. The industry contributes substantially to GDP via employment, investment and product export. An exploration of the impact of SHRM on the coal industry will likely result in identifying some best practices that could be potentially adopted in the wider business community to foster regional economic development in Australia and worldwide.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conflicts between resources in stockyards cause mining companies millions of dollars a year. An effective planning strategy needs to be established in order to reduce these operational conflicts. In this research a stockyard simulation model of a mining operation is proposed. The simulation uses discrete event and continuous strategies to create a high detail level of visualization and animation that closely resemble actual stockyard operation. The proposed simulation model is tightly integrated with a stockpile planner and it is used to evaluate the feasibility of a given production plan. The high detail visualization of the simulation model allows planner to determine the source of conflict, which can be used to guide the elimination of these conflicts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The increase in new electronic devices had generated a considerable increase in obtaining spatial data information; hence these data are becoming more and more widely used. As well as for conventional data, spatial data need to be analyzed so interesting information can be retrieved from them. Therefore, data clustering techniques can be used to extract clusters of a set of spatial data. However, current approaches do not consider the implicit semantics that exist between a region and an object’s attributes. This paper presents an approach that enhances spatial data mining process, so they can use the semantic that exists within a region. A framework was developed, OntoSDM, which enables spatial data mining algorithms to communicate with ontologies in order to enhance the algorithm’s result. The experiments demonstrated a semantically improved result, generating more interesting clusters, therefore reducing manual analysis work of an expert.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The data set shows energy consumption per hour of work (in MJ/hour), and labour productivity (in USD/hour) in the PS economic sector (Energy & Mining + Industry + Construction) for the period 1970-2009 and for the following countries: Germany, Spain, USA, Canada, Italy, UK, France, Japan. The intention is to look at the relationship between energy consumption as a driver of improvements in the productivity of labour. This is of particular relevance for the discussion of reducing working time in the context of the 'degrowth' debate, as it is done in the article to which this data is a suplement.