882 resultados para knowlede discovery
Resumo:
The enforcement of Intellectual Property rights poses one of the greatest current threats to the privacy of individuals online. Recent trends have shown that the balance between privacy and intellectual property enforcement has been shifted in favour of intellectual property owners. This article discusses the ways in which the scope of preliminary discovery and Anton Piller orders have been overly expanded in actions where large amounts of electronic information is available, especially against online intermediaries (service providers and content hosts). The victim in these cases is usually the end user whose privacy has been infringed without a right of reply and sometimes without notice. This article proposes some ways in which the delicate balance can be restored, and considers some safeguards for user privacy. These safeguards include restructuring the threshold tests for discovery, limiting the scope of information disclosed, distinguishing identity discovery from information discovery, and distinguishing information preservation from preliminary discovery.
Resumo:
Recent research has begun to address and even compare nascent entrepreneurship and nascent corporate entrepreneurship. An opportunity based view holds great potential to integrate both streams of research, but also presents challenges in how we define corporate entrepreneurship. We extend (corporate) entrepreneurship literature to the opportunity identification phase by providing a framework to classify different types of corporate entrepreneurship. Through analysis of a large dataset on nascent (corporate) entrepreneurship (PSEDII) we show that these corporate entrepreneurs differ largely from each other in terms of human capital. Prior studies have indicated that independent and corporate entrepreneurs pursue different types of opportunities and utilize different strategies. Our findings from the opportunity identification phase challenge those differences and seem to indicate a difference between the opportunities corporate entrepreneurs identify versus the opportunities they exploit.
Resumo:
In today’s electronic world vast amounts of knowledge is stored within many datasets and databases. Often the default format of this data means that the knowledge within is not immediately accessible, but rather has to be mined and extracted. This requires automated tools and they need to be effective and efficient. Association rule mining is one approach to obtaining knowledge stored with datasets / databases which includes frequent patterns and association rules between the items / attributes of a dataset with varying levels of strength. However, this is also association rule mining’s downside; the number of rules that can be found is usually very big. In order to effectively use the association rules (and the knowledge within) the number of rules needs to be kept manageable, thus it is necessary to have a method to reduce the number of association rules. However, we do not want to lose knowledge through this process. Thus the idea of non-redundant association rule mining was born. A second issue with association rule mining is determining which ones are interesting. The standard approach has been to use support and confidence. But they have their limitations. Approaches which use information about the dataset’s structure to measure association rules are limited, but could yield useful association rules if tapped. Finally, while it is important to be able to get interesting association rules from a dataset in a manageable size, it is equally as important to be able to apply them in a practical way, where the knowledge they contain can be taken advantage of. Association rules show items / attributes that appear together frequently. Recommendation systems also look at patterns and items / attributes that occur together frequently in order to make a recommendation to a person. It should therefore be possible to bring the two together. In this thesis we look at these three issues and propose approaches to help. For discovering non-redundant rules we propose enhanced approaches to rule mining in multi-level datasets that will allow hierarchically redundant association rules to be identified and removed, without information loss. When it comes to discovering interesting association rules based on the dataset’s structure we propose three measures for use in multi-level datasets. Lastly, we propose and demonstrate an approach that allows for association rules to be practically and effectively used in a recommender system, while at the same time improving the recommender system’s performance. This especially becomes evident when looking at the user cold-start problem for a recommender system. In fact our proposal helps to solve this serious problem facing recommender systems.
Resumo:
Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase) based approaches should perform better than the term-based ones, but many experiments did not support this hypothesis. This paper presents an innovative technique, effective pattern discovery which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance.
Resumo:
This study examined the effect that temporal order within the entrepreneurial discovery-exploitation process has on the outcomes of venture creation. Consistent with sequential theories of discovery-exploitation, the general flow of venture creation was found to be directed from discovery toward exploitation in a random sample of nascent ventures. However, venture creation attempts which specifically follow this sequence derive poor outcomes. Moreover, simultaneous discovery-exploitation was the most prevalent temporal order observed, and venture attempts that proceed in this manner more likely become operational. These findings suggest that venture creation is a multi-scale phenomenon that is at once directional in time, and simultaneously driven by symbiotically coupled discovery and exploitation.
Resumo:
It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures.
Resumo:
We consider the problem of choosing, sequentially, a map which assigns elements of a set A to a few elements of a set B. On each round, the algorithm suffers some cost associated with the chosen assignment, and the goal is to minimize the cumulative loss of these choices relative to the best map on the entire sequence. Even though the offline problem of finding the best map is provably hard, we show that there is an equivalent online approximation algorithm, Randomized Map Prediction (RMP), that is efficient and performs nearly as well. While drawing upon results from the "Online Prediction with Expert Advice" setting, we show how RMP can be utilized as an online approach to several standard batch problems. We apply RMP to online clustering as well as online feature selection and, surprisingly, RMP often outperforms the standard batch algorithms on these problems.
Resumo:
The Wikipedia has become the most popular online source of encyclopedic information. The English Wikipedia collection, as well as some other languages collections, is extensively linked. However, as a multilingual collection the Wikipedia is only very weakly linked. There are few cross-language links or cross-dialect links (see, for example, Chinese dialects). In order to link the multilingual-Wikipedia as a single collection, automated cross language link discovery systems are needed – systems that identify anchor-texts in one language and targets in another. The evaluation of Link Discovery approaches within the English version of the Wikipedia has been examined in the INEX Link the-Wiki track since 2007, whilst both CLEF and NTCIR emphasized the investigation and the evaluation of cross-language information retrieval. In this position paper we propose a new virtual evaluation track: Cross Language Link Discovery (CLLD). The track will initially examine cross language linking of Wikipedia articles. This virtual track will not be tied to any one forum; instead we hope it can be connected to each of (at least): CLEF, NTCIR, and INEX as it will cover ground currently studied by each. The aim is to establish a virtual evaluation environment supporting continuous assessment and evaluation, and a forum for the exchange of research ideas. It will be free from the difficulties of scheduling and synchronizing groups of collaborating researchers and alleviate the necessity to travel across the globe in order to share knowledge. We aim to electronically publish peer-reviewed publications arising from CLLD in a similar fashion: online, with open access, and without fixed submission deadlines.
Resumo:
Information has no value unless it is accessible. Information must be connected together so a knowledge network can then be built. Such a knowledge base is a key resource for Internet users to interlink information from documents. Information retrieval, a key technology for knowledge management, guarantees access to large corpora of unstructured text. Collaborative knowledge management systems such as Wikipedia are becoming more popular than ever; however, their link creation function is not optimized for discovering possible links in the collection and the quality of automatically generated links has never been quantified. This research begins with an evaluation forum which is intended to cope with the experiments of focused link discovery in a collaborative way as well as with the investigation of the link discovery application. The research focus was on the evaluation strategy: the evaluation framework proposal, including rules, formats, pooling, validation, assessment and evaluation has proved to be efficient, reusable for further extension and efficient for conducting evaluation. The collection-split approach is used to re-construct the Wikipedia collection into a split collection comprising single passage files. This split collection is proved to be feasible for improving relevant passages discovery and is devoted to being a corpus for focused link discovery. Following these experiments, a mobile client-side prototype built on iPhone is developed to resolve the mobile Search issue by using focused link discovery technology. According to the interview survey, the proposed mobile interactive UI does improve the experience of mobile information seeking. Based on this evaluation framework, a novel cross-language link discovery proposal using multiple text collections is developed. A dynamic evaluation approach is proposed to enhance both the collaborative effort and the interacting experience between submission and evaluation. A realistic evaluation scheme has been implemented at NTCIR for cross-language link discovery tasks.
Resumo:
Discovering proper search intents is a vi- tal process to return desired results. It is constantly a hot research topic regarding information retrieval in recent years. Existing methods are mainly limited by utilizing context-based mining, query expansion, and user profiling techniques, which are still suffering from the issue of ambiguity in search queries. In this pa- per, we introduce a novel ontology-based approach in terms of a world knowledge base in order to construct personalized ontologies for identifying adequate con- cept levels for matching user search intents. An iter- ative mining algorithm is designed for evaluating po- tential intents level by level until meeting the best re- sult. The propose-to-attempt approach is evaluated in a large volume RCV1 data set, and experimental results indicate a distinct improvement on top precision after compared with baseline models.
Resumo:
It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term- based ones in describing user preferences, but many experiments do not support this hypothesis. This research presents a promising method, Relevance Feature Discovery (RFD), for solving this challenging issue. It discovers both positive and negative patterns in text documents as high-level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the high-level features. The thesis also introduces an adaptive model (called ARFD) to enhance the exibility of using RFD in adaptive environment. ARFD automatically updates the system's knowledge based on a sliding window over new incoming feedback documents. It can efficiently decide which incoming documents can bring in new knowledge into the system. Substantial experiments using the proposed models on Reuters Corpus Volume 1 and TREC topics show that the proposed models significantly outperform both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and other pattern-based methods.
Resumo:
Interaction with a mobile device remains difficult due to inherent physical limitations. This dif-ficulty is particularly evident for search, which re-quires typing. We extend the One-Search-Only search paradigm by adding a novel link-browsing scheme built on top of automatic link discovery. A prototype was built for iPhone and tested with 12 subjects. A post-use interview survey suggests that the extended paradigm improves the mobile information seeking experience.