945 resultados para text mining clusterizzazione clustering auto-organizzazione conoscenza MoK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The management and sharing of complex data, information and knowledge is a fundamental and growing concern in the Water and other Industries for a variety of reasons. For example, risks and uncertainties associated with climate, and other changes require knowledge to prepare for a range of future scenarios and potential extreme events. Formal ways in which knowledge can be established and managed can help deliver efficiencies on acquisition, structuring and filtering to provide only the essential aspects of the knowledge really needed. Ontologies are a key technology for this knowledge management. The construction of ontologies is a considerable overhead on any knowledge management programme. Hence current computer science research is investigating generating ontologies automatically from documents using text mining and natural language techniques. As an example of this, results from application of the Text2Onto tool to stakeholder documents for a project on sustainable water cycle management in new developments are presented. It is concluded that by adopting ontological representations sooner, rather than later in an analytical process, decision makers will be able to make better use of highly knowledgeable systems containing automated services to ensure that sustainability considerations are included. © 2010 The authors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes the followed methodology to automatically generate titles for a corpus of questions that belong to sociological opinion polls. Titles for questions have a twofold function: (1) they are the input of user searches and (2) they inform about the whole contents of the question and possible answer options. Thus, generation of titles can be considered as a case of automatic summarization. However, the fact that summarization had to be performed over very short texts together with the aforementioned quality conditions imposed on new generated titles led the authors to follow knowledge-rich and domain-dependent strategies for summarization, disregarding the more frequent extractive techniques for summarization.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a highly connected society, avid for information and technological innovations, constantly changing the consumption patterns, the brand management strategy occupies a growing place. Allied with the increased competition among companies, the brand that can differentiate in consumers’ minds becomes strong. This aspect is even more important in the service industry, where the consumer experience, the definition and support of the brand’s values are vital to the continued strength of both your identity and image. These aspects are seen as a process of communication in which the way the image is developed in the minds of consumers comes from how identity is constructed and transmitted to them (DE CHERNATONY; DRURY; SEGAL-HORN, 2004). Considering the dynamic and complex scenario, this study aims to identify and analyze the possible convergences or divergences between the identity built by the organization and the brand image perceived by consumers of a telecommunications services company. To achieve this objective, the model proposed by De Chernatony, Drury and Segal-Horn (2004) was used as a theoretical basis, which addresses the transformation of identity in brand image, specifically under the perspective of Pontes (2009). For him, customers are more motivated to buy and consume products that they believe that take a complementary image that they have of themselves, and proposes the existence of multiple selves: the perceived, which refers to the employees and the organization’s management opinions on the brand; the ideal, which deals with effective brand identity thought by its leaders, the vision of what it should be; social, which shows how managers think that consumers see it; the apparent, formed by the image of the brand by customers; and finally the real self, that would be an integrated composite of all of these visions. In this regard, a case study was made in a telecommunications company with regional actions, from a qualitative and quantitative approach. It was identified the company’s vision through semi-structured interviews with marketing managers and analysis of documents related to the brand strategy. The point of view of consumers was addressed for text mining techniques applied to internal unstructured data coming from the collection of posts made on Facebook and Twitter, related to the brand, and customer interaction with the company through these social networks. The results showed the importance of the concepts of identity and brand image, and how they are interrelated. Moreover, the qualitative analysis it was shown that the vision of marketing executives is quite close and in line with the Brand Book, showing that there is a cohesive and well disseminated speech internally in the organization. On the other hand, when evaluating the customer's point of view there was no specific comments on the brand, and it was not possible to identify the evaluation of Algar Telecom image by consumers. Nevertheless, other relevant aspects could be identified for the consolidation of the brand identity, as the occurrence of a number of complaints, especially regarding the internet as well as the concern of customers for the quality of the provision of services.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.

Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.

Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.

Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A history of specialties in economics since the late 1950s is constructed on the basis of a large corpus of documents from economics journals. The production of this history relies on a combination of algorithmic methods that avoid subjective assessments of the boundaries of specialties: bibliographic coupling, automated community detection in dynamic networks, and text mining. These methods uncover a structuring of economics around recognizable specialties with some significant changes over the period covered (1956–2014). Among our results, especially noteworthy are (1) the clear-cut existence of ten families of specialties, (2) the disappearance in the late 1970s of a specialty focused on general economic theory, (3) the dispersal of the econometrics-centered specialty in the early 1990s and the ensuing importance of specific econometric methods for the identity of many specialties since the 1990s, and (4) the low level of specialization of individual economists throughout the period in contrast to physicists as early as the late 1960s.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As a way to gain greater insights into the operation of online communities, this dissertation applies automated text mining techniques to text-based communication to identify, describe and evaluate underlying social networks among online community members. The main thrust of the study is to automate the discovery of social ties that form between community members, using only the digital footprints left behind in their online forum postings. Currently, one of the most common but time consuming methods for discovering social ties between people is to ask questions about their perceived social ties. However, such a survey is difficult to collect due to the high investment in time associated with data collection and the sensitive nature of the types of questions that may be asked. To overcome these limitations, the dissertation presents a new, content-based method for automated discovery of social networks from threaded discussions, referred to as ‘name network’. As a case study, the proposed automated method is evaluated in the context of online learning communities. The results suggest that the proposed ‘name network’ method for collecting social network data is a viable alternative to costly and time-consuming collection of users’ data using surveys. The study also demonstrates how social networks produced by the ‘name network’ method can be used to study online classes and to look for evidence of collaborative learning in online learning communities. For example, educators can use name networks as a real time diagnostic tool to identify students who might need additional help or students who may provide such help to others. Future research will evaluate the usefulness of the ‘name network’ method in other types of online communities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Institutions are widely regarded as important, even ultimate drivers of economic growth and performance. A recent mainstream of institutional economics has concentrated on the effect of persisting, often imprecisely measured institutions and on cataclysmic events as agents of noteworthy institutional change. As a consequence, institutional change without large-scale shocks has received little attention. In this dissertation I apply a complementary, quantitative-descriptive approach that relies on measures of actually enforced institutions to study institutional persistence and change over a long time period that is undisturbed by the typically studied cataclysmic events. By placing institutional change into the center of attention one can recognize different speeds of institutional innovation and the continuous coexistence of institutional persistence and change. Specifically, I combine text mining procedures, network analysis techniques and statistical approaches to study persistence and change in England’s common law over the Industrial Revolution (1700-1865). Based on the doctrine of precedent - a peculiarity of common law systems - I construct and analyze the apparently first citation network that reflects lawmaking in England. Most strikingly, I find large-scale change in the making of English common law around the turn of the 19th century - a period free from the typically studied cataclysmic events. Within a few decades a legal innovation process with low depreciation rates (1 to 2 percent) and strong past-persistence transitioned to a present-focused innovation process with significantly higher depreciation rates (4 to 6 percent) and weak past-persistence. Comparison with U.S. Supreme Court data reveals a similar U.S. transition towards the end of the 19th century. The English and U.S. transitions appear to have unfolded in a very specific manner: a new body of law arose during the transitions and developed in a self-referential manner while the existing body of law lost influence, but remained prominent. Additional findings suggest that Parliament doubled its influence on the making of case law within the first decades after the Glorious Revolution and that England’s legal rules manifested a high degree of long-term persistence. The latter allows for the possibility that the often-noted persistence of institutional outcomes derives from the actual persistence of institutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

New developments in higher education and research are having their repercussions in dailylicencing practice. Examples are; demands for perpetual access usage of licensed content incourse packs or virtual research environments text mining open access to publications. Atthe Knowledge Exchange workshop on LicencingPractice, twenty Experts discussed how these newdevelopments could be incorporated in licencing. The workshop consisted of four presentations oncurrent developments in licencing followed by threeparallel breakout sessions on the topics open access,new developments and data and text mining. This led toa lively exchange of ideas. Especially the aspect of dataand text mining provided valuable insights in how thiscould be incorporated in licencing. The Knowledge Exchange Licensing expert group willwork on how to implement the model provisions discussed. Input from the workshop was collected for a workshop with publishers to take place in March 2012 and will include these provisions in their licences. The various suggestions will be also shared with other international organisations working inthis field.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Presentación para la docencia de la asignatura "Ingeniería del conocimiento biomédico y del producto, I+D en investigación traslacional del Master Universitario Investigación Traslacional y Medicina Personalizda (Transmed)de la Universidad de Granada.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Non-steroidal anti-inflammatory drugs (NSAIDs) are widely used in equine veterinary practice. These drugs exert their effect by inhibiting cyclooxygenase (COX) enzymes, which control prostaglandin production, a major regulator of tissue perfusion. Two isoforms of COX enzymes exist: COX-1 is physiologically present in tissues, while COX-2 is up-regulated during inflammation and has been indicated as responsible for the negative effects of an inflammatory response. Evidence suggests that NSAIDs that inhibit only COX-2, preserving the physiological function of COX-1 might have a safer profile. Studies that evaluate the effect of NSAIDs on COX enzymes are all performed under experimental conditions and none uses actual clinical patients. The biochemical investigations in this work focus on describing the effect on COX enzymes activity of flunixin meglumine and phenylbutazone, two non-selective COX inhibitors and firocoxib, a COX-2 selective inhibitor, in clinical patients undergoing elective surgery. A separate epidemiological investigation was aimed at describing the impact that the findings of biochemical data have on a large population of equids. Electronic medical records (EMRs) from 454,153 equids were obtained from practices in the United Kingdom, United States of America and Canada. Information on prevalence and indications for NSAIDs use was extracted from the EMRs via a text mining technique, improved from the literature and described and validated within this Thesis. Further the prevalence of a clinical sign compatible with NSAID toxicity, such as diarrhoea, is reported along with analysis evaluating NSAID administration in light of concurrent administration of other drugs and comorbidities. This work confirms findings from experimental settings that NSAIDs firocoxib is COX-2 selective and that flunixin meglumine and phenylbutazone are non-selective COX inhibitors and therefore their administration carries a greater risk of toxicity. However the impact of this finding needs to be interpreted with caution as epidemiological data suggest that the prevalence of toxicity is in fact small and the use of these drugs at the labelled dose is quite safe.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Prior research shows that electronic word of mouth (eWOM) wields considerable influence over consumer behavior. However, as the volume and variety of eWOM grows, firms are faced with challenges in analyzing and responding to this information. In this dissertation, I argue that to meet the new challenges and opportunities posed by the expansion of eWOM and to more accurately measure its impacts on firms and consumers, we need to revisit our methodologies for extracting insights from eWOM. This dissertation consists of three essays that further our understanding of the value of social media analytics, especially with respect to eWOM. In the first essay, I use machine learning techniques to extract semantic structure from online reviews. These semantic dimensions describe the experiences of consumers in the service industry more accurately than traditional numerical variables. To demonstrate the value of these dimensions, I show that they can be used to substantially improve the accuracy of econometric models of firm survival. In the second essay, I explore the effects on eWOM of online deals, such as those offered by Groupon, the value of which to both consumers and merchants is controversial. Through a combination of Bayesian econometric models and controlled lab experiments, I examine the conditions under which online deals affect online reviews and provide strategies to mitigate the potential negative eWOM effects resulting from online deals. In the third essay, I focus on how eWOM can be incorporated into efforts to reduce foodborne illness, a major public health concern. I demonstrate how machine learning techniques can be used to monitor hygiene in restaurants through crowd-sourced online reviews. I am able to identify instances of moral hazard within the hygiene inspection scheme used in New York City by leveraging a dictionary specifically crafted for this purpose. To the extent that online reviews provide some visibility into the hygiene practices of restaurants, I show how losses from information asymmetry may be partially mitigated in this context. Taken together, this dissertation contributes by revisiting and refining the use of eWOM in the service sector through a combination of machine learning and econometric methodologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper deals with the interaction between fictitious capital and the neoliberal model of growth and distribution, inspired by the classical economic tradition. Our renewed interest in this literature has a close connection with the recent international crisis in the capitalist economy. However, this discussion takes as its point of departure the fact that standard economic theory teaches that financial capital, in this world of increasing globalization, leads to new investment opportunities which improve levels of growth, employment, income distribution, and equilibrium. Accordingly, it is said that such financial resources expand the welfare of people and countries worldwide. Here we examine some illusions and paradoxes of such a paradigm. We show some theoretical and empirical consequences of this vision, which are quite different and have harmful constraints.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2015.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A history of specialties in economics since the late 1950s is constructed on the basis of a large corpus of documents from economics journals. The production of this history relies on a combination of algorithmic methods that avoid subjective assessments of the boundaries of specialties: bibliographic coupling, automated community detection in dynamic networks and text mining. these methods uncover a structuring of economics around recognizable specialties with some significant changes over the time-period covered (1956-2014). Among our results, especially noteworthy are (a) the clearcut existence of 10 families of specialties, (b) the disappearance in the late 1970s of a specialty focused on general economic theory, (c) the dispersal of the econometrics-centered specialty in the early 1990s and the ensuing importance of specific econometric methods for the identity of many specialties since the 1990s, (d) the low level of specialization of individual economists throughout the period in contrast to physicists as early as the late 1960s.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In questo elaborato ci siamo occupati della legge di Zipf sia da un punto di vista applicativo che teorico. Tale legge empirica afferma che il rango in frequenza (RF) delle parole di un testo seguono una legge a potenza con esponente -1. Per quanto riguarda l'approccio teorico abbiamo trattato due classi di modelli in grado di ricreare leggi a potenza nella loro distribuzione di probabilità. In particolare, abbiamo considerato delle generalizzazioni delle urne di Polya e i processi SSR (Sample Space Reducing). Di questi ultimi abbiamo dato una formalizzazione in termini di catene di Markov. Infine abbiamo proposto un modello di dinamica delle popolazioni capace di unificare e riprodurre i risultati dei tre SSR presenti in letteratura. Successivamente siamo passati all'analisi quantitativa dell'andamento del RF sulle parole di un corpus di testi. Infatti in questo caso si osserva che la RF non segue una pura legge a potenza ma ha un duplice andamento che può essere rappresentato da una legge a potenza che cambia esponente. Abbiamo cercato di capire se fosse possibile legare l'analisi dell'andamento del RF con le proprietà topologiche di un grafo. In particolare, a partire da un corpus di testi abbiamo costruito una rete di adiacenza dove ogni parola era collegata tramite un link alla parola successiva. Svolgendo un'analisi topologica della struttura del grafo abbiamo trovato alcuni risultati che sembrano confermare l'ipotesi che la sua struttura sia legata al cambiamento di pendenza della RF. Questo risultato può portare ad alcuni sviluppi nell'ambito dello studio del linguaggio e della mente umana. Inoltre, siccome la struttura del grafo presenterebbe alcune componenti che raggruppano parole in base al loro significato, un approfondimento di questo studio potrebbe condurre ad alcuni sviluppi nell'ambito della comprensione automatica del testo (text mining).