913 resultados para Knowledge Discovery Tools


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Automatic keyword or keyphrase extraction is concerned with assigning keyphrases to documents based on words from within the document. Previous studies have shown that in a significant number of cases author-supplied keywords are not appropriate for the document to which they are attached. This can either be because they represent what the author believes the paper is about not what it actually is, or because they include keyphrases which are more classificatory than explanatory e.g., “University of Poppleton” instead of “Knowledge Discovery in Databases”. Thus, there is a need for a system that can generate appropriate and diverse range of keyphrases that reflect the document. This paper proposes a solution that examines the synonyms of words and phrases in the document to find the underlying themes, and presents these as appropriate keyphrases. The primary method explores taking n-grams of the source document phrases, and examining the synonyms of these, while the secondary considers grouping outputs by their synonyms. The experiments undertaken show the primary method produces good results and that the secondary method produces both good results and potential for future work.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Automatic keyword or keyphrase extraction is concerned with assigning keyphrases to documents based on words from within the document. Previous studies have shown that in a significant number of cases author-supplied keywords are not appropriate for the document to which they are attached. This can either be because they represent what the author believes a paper is about not what it actually is, or because they include keyphrases which are more classificatory than explanatory e.g., “University of Poppleton” instead of “Knowledge Discovery in Databases”. Thus, there is a need for a system that can generate an appropriate and diverse range of keyphrases that reflect the document. This paper proposes two possible solutions that examine the synonyms of words and phrases in the document to find the underlying themes, and presents these as appropriate keyphrases. Using three different freely available thesauri, the work undertaken examines two different methods of producing keywords and compares the outcomes across multiple strands in the timeline. The primary method explores taking n-grams of the source document phrases, and examining the synonyms of these, while the secondary considers grouping outputs by their synonyms. The experiments undertaken show the primary method produces good results and that the secondary method produces both good results and potential for future work. In addition, the different qualities of the thesauri are examined and it is concluded that the more entries in a thesaurus, the better it is likely to perform. The age of the thesaurus or the size of each entry does not correlate to performance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the recent years, the area of data mining has been experiencing considerable demand for technologies that extract knowledge from large and complex data sources. There has been substantial commercial interest as well as active research in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from large datasets. Artificial neural networks (NNs) are popular biologically-inspired intelligent methodologies, whose classification, prediction, and pattern recognition capabilities have been utilized successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction, and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks. © 2012 Wiley Periodicals, Inc.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Information systems integration becomes critical in enhancing organisational competitiveness through effective use of information resource provided by the whole host of information systems. Information systems integration in its nature is a process of bringing about the capability of communication and information exchange between systems; while interoperability, often as the result of systems integration, is such a capability. However currently there is a lack of theoretical foundation for representation and measure of the interoperability in organisations. Organisational semiotics provides a theoretical foundation for systems interoperability. A notion of ‘semiotic interoperability’ is proposed in this paper as a paradigm, guiding systems integration and measuring degree of interoperability, covering aspects from physical properties, transmission structure of signs, placing emphasis on communicating meaning, intention to social consequence of information.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this article, we review the state-of-the-art techniques in mining data streams for mobile and ubiquitous environments. We start the review with a concise background of data stream processing, presenting the building blocks for mining data streams. In a wide range of applications, data streams are required to be processed on small ubiquitous devices like smartphones and sensor devices. Mobile and ubiquitous data mining target these applications with tailored techniques and approaches addressing scarcity of resources and mobility issues. Two categories can be identified for mobile and ubiquitous mining of streaming data: single-node and distributed. This survey will cover both categories. Mining mobile and ubiquitous data require algorithms with the ability to monitor and adapt the working conditions to the available computational resources. We identify the key characteristics of these algorithms and present illustrative applications. Distributed data stream mining in the mobile environment is then discussed, presenting the Pocket Data Mining framework. Mobility of users stimulates the adoption of context-awareness in this area of research. Context-awareness and collaboration are discussed in the Collaborative Data Stream Mining, where agents share knowledge to learn adaptive accurate models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Twitter is both a micro-blogging service and a platform for public conversation. Direct conversation is facilitated in Twitter through the use of @’s (mentions) and replies. While the conversational element of Twitter is of particular interest to the marketing sector, relatively few data-mining studies have focused on this area. We analyse conversations associated with reciprocated mentions that take place in a data-set consisting of approximately 4 million tweets collected over a period of 28 days that contain at least one mention. We ignore tweet content and instead use the mention network structure and its dynamical properties to identify and characterise Twitter conversations between pairs of users and within larger groups. We consider conversational balance, meaning the fraction of content contributed by each party. The goal of this work is to draw out some of the mechanisms driving conversation in Twitter, with the potential aim of developing conversational models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Flexibility of information systems (IS) have been studied to improve the adaption in support of the business agility as the set of capabilities to compete more effectively and adapt to rapid changes in market conditions (Glossary of business agility terms, 2003). However, most of work on IS flexibility has been limited to systems architecture, ignoring the analysis of interoperability as a part of flexibility from the requirements. This paper reports a PhD project, which proposes an approach to develop IS with flexibility features, considering some challenges of flexibility in small and medium enterprises (SMEs) such as the lack of interoperability and the agility of their business. The motivation of this research are the high prices of IS in developing countries and the usefulness of organizational semiotics to support the analysis of requirements for IS. (Liu, 2005).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sunflower trypsin inhibitor-1 (SFI-1), a natural 14-residue cyclic peptide, and some of its synthetic acyclic variants are potent protease inhibitors displaying peculiar inhibitory profiles. Here we describe the synthesis and use of affinity sorbents prepared by coupling SFTI-1 analogues to agarose resin. Chymotrypsinand trypsin-like proteases could then be selectively isolated from pancreatin; similarly, other proteases were obtained from distinct biological sources. The binding capacity of [Lys5]-SFTI-1-agarose for trypsin was estimated at over 10 mg/mL of packed gel. SFTI-1-based resins could find application either to improve the performance of current purification protocols or as novel protease-discovery tools in different areas of biological investigation. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The present volume is the fruit of a research initiative on Access to Knowledge begun in 2004 by Yochai Benkler, Eddan Katz, and myself. Access to Knowledge is both a social movement and an approach to international and domestic policy. In the present era of globalization, intellectual property and information and communications technology are major determinants of wealth and power. The principle of access to knowledge argues that we best serve both human rights and economic development through policies that make knowledge, knowledge-creating tools, and nowledgeembedded goods as widely available as possible for decentralized innovation and use. Open technological standards, a balanced approach to intellectual property rights, and expansion of an open telecommunications infrastructure enable ordinary people around the world to benefit from the technological advances of the information age and allow them to generate a vibrant, participatory and democratic culture. Law plays a crucial role in securing access to knowledge, determining whether knowledge and knowledge goods are shared widely for the benefit of all, or controlled and monopolized for the benefit of a few.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Este trabalho minera as informações coletadas no processo de vestibular entre 2009 e 2012 para o curso de graduação de administração de empresas da FGV-EAESP, para estimar classificadores capazes de calcular a probabilidade de um novo aluno ter bom desempenho. O processo de KDD (Knowledge Discovery in Database) desenvolvido por Fayyad et al. (1996a) é a base da metodologia adotada e os classificadores serão estimados utilizando duas ferramentas matemáticas. A primeira é a regressão logística, muito usada por instituições financeiras para avaliar se um cliente será capaz de honrar com seus pagamentos e a segunda é a rede Bayesiana, proveniente do campo de inteligência artificial. Este estudo mostre que os dois modelos possuem o mesmo poder discriminatório, gerando resultados semelhantes. Além disso, as informações que influenciam a probabilidade de o aluno ter bom desempenho são a sua idade no ano de ingresso, a quantidade de vezes que ele prestou vestibular da FGV/EAESP antes de ser aprovado, a região do Brasil de onde é proveniente e as notas das provas de matemática fase 01 e fase 02, inglês, ciências humanas e redação. Aparentemente o grau de formação dos pais e o grau de decisão do aluno em estudar na FGV/EAESP não influenciam nessa probabilidade.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Telecommunication is one of the most dynamic and strategic areas in the world. Many technological innovations has modified the way information is exchanged. Information and knowledge are now shared in networks. Broadband Internet is the new way of sharing contents and information. This dissertation deals with performance indicators related to maintenance services of telecommunications networks and uses models of multivariate regression to estimate churn, which is the loss of customers to other companies. In a competitive environment, telecommunications companies have devised strategies to minimize the loss of customers. Loosing customers presents a higher cost than obtaining new ones. Corporations have plenty of data stored in a diversity of databases. Usually the data are not explored properly. This work uses the Knowledge Discovery in Databases (KDD) to establish rules and new models to explain how churn, as a dependent variable, are related to a diversity of service indicators, such as time to deploy the service (in hours), time to repair (in hours), and so on. Extraction of meaningful knowledge is, in many cases, a challenge. Models were tested and statistically analyzed. The work also shows results that allows the analysis and identification of which quality services indicators influence the churn. Actions are also proposed to solve, at least in part, this problem

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Nowadays, telecommunications is one of the most dynamic and strategic areas in the world. Organizations are always seeking to find new management practices within an ever increasing competitive environment where resources are getting scarce. In this scenario, data obtained from business and corporate processes have even greater importance, although this data is not yet adequately explored. Knowledge Discovery in Databases (KDD) appears then, as an option to allow the study of complex problems in different areas of management. This work proposes both a systematization of KDD activities using concepts from different methodologies, such as CRISP-DM, SEMMA and FAYYAD approaches and a study concerning the viability of multivariate regression analysis models to explain corporative telecommunications sales using performance indicators. Thus, statistical methods were outlined to analyze the effects of such indicators on the behavior of business productivity. According to business and standard statistical analysis, equations were defined and fit to their respective determination coefficients. Tests of hypotheses were also conducted on parameters with the purpose of validating the regression models. The results show that there is a relationship between these development indicators and the amount of sales