925 resultados para Text Mining


60.00% 60.00%



One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels have to be built using only the terms in the documents of the collection. This paper presents the SeCLAR (Selecting Candidate Labels using Association Rules) method, which explores the use of association rules for the selection of good candidates for labels of hierarchical document clusters. The candidates are processed by a classical method to generate the labels. The idea of the proposed method is to process each parent-child relationship of the nodes as an antecedent-consequent relationship of association rules. The experimental results show that the proposed method can improve the precision and recall of labels obtained by classical methods. © 2010 Springer-Verlag.


60.00% 60.00%



One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels must be built using all the terms in the documents of the collection. This paper presents the SeCLAR method, which explores the use of association rules in the selection of good candidates for labels of hierarchical document clusters. The purpose of this method is to select a subset of terms by exploring the relationship among the terms of each document. Thus, these candidates can be processed by a classical method to generate the labels. An experimental study demonstrates the potential of the proposed approach to improve the precision and recall of labels obtained by classical methods only considering the terms which are potentially more discriminative. © 2012 - IOS Press and the authors. All rights reserved.


60.00% 60.00%



Obiettivo di questa tesi dal titolo “Analisi di tecniche per l’estrazione di informazioni da documenti testuali e non strutturati” è quello di mostrare tecniche e metodologie informatiche che permettano di ricavare informazioni e conoscenza da dati in formato testuale. Gli argomenti trattati includono l'analisi di software per l'estrazione di informazioni, il web semantico, l'importanza dei dati e in particolare i Big Data, Open Data e Linked Data. Si parlerà inoltre di data mining e text mining.


60.00% 60.00%



Nowadays communication is switching from a centralized scenario, where communication media like newspapers, radio, TV programs produce information and people are just consumers, to a completely different decentralized scenario, where everyone is potentially an information producer through the use of social networks, blogs, forums that allow a real-time worldwide information exchange. These new instruments, as a result of their widespread diffusion, have started playing an important socio-economic role. They are the most used communication media and, as a consequence, they constitute the main source of information enterprises, political parties and other organizations can rely on. Analyzing data stored in servers all over the world is feasible by means of Text Mining techniques like Sentiment Analysis, which aims to extract opinions from huge amount of unstructured texts. This could lead to determine, for instance, the user satisfaction degree about products, services, politicians and so on. In this context, this dissertation presents new Document Sentiment Classification methods based on the mathematical theory of Markov Chains. All these approaches bank on a Markov Chain based model, which is language independent and whose killing features are simplicity and generality, which make it interesting with respect to previous sophisticated techniques. Every discussed technique has been tested in both Single-Domain and Cross-Domain Sentiment Classification areas, comparing performance with those of other two previous works. The performed analysis shows that some of the examined algorithms produce results comparable with the best methods in literature, with reference to both single-domain and cross-domain tasks, in $2$-classes (i.e. positive and negative) Document Sentiment Classification. However, there is still room for improvement, because this work also shows the way to walk in order to enhance performance, that is, a good novel feature selection process would be enough to outperform the state of the art. Furthermore, since some of the proposed approaches show promising results in $2$-classes Single-Domain Sentiment Classification, another future work will regard validating these results also in tasks with more than $2$ classes.


60.00% 60.00%



Questa tesi riguarda lo sviluppo di un'applicazione che sfrutta le tecnologie del Web Semantico e del Text Mining. L'applicazione rappresenta l'estensione di un lavoro relativo ad una tesi precedente, aggiungendo ad esso la funzionalità di ricerca semantica. Tale funzionalità permette il recupero di informazioni che con il metodo di ricerca normale non verrebbero considerate. Per raggiungere questo risultato si utilizza WordNet, un database semantico-lessicale, e una libreria per la Latent Semantic Analysis, una tecnica del Text Mining.


60.00% 60.00%



Negli ultimi anni i documenti web hanno attratto molta attenzione, poiché vengono visti come un nuovo mezzo che porta quello che sono le esperienze ed opinioni di un individuo da una parte all'altra del mondo, raggiungendo quindi persone che mai si incontreranno. Ed è proprio con la proliferazione del Web 2.0 che l’attenzione è stata incentrata sul contenuto generato dagli utenti della rete, i quali hanno a disposizione diverse piattaforme sulle quali condividere i loro pensieri, opinioni o andare a cercarne di altrui, magari per valutare l’acquisto di uno smartphone piuttosto che un altro o se valutare l’opzione di cambiare operatore telefonico, ponderando quali potrebbero essere gli svantaggi o i vantaggi che otterrebbe modificando la sia situazione attuale. Questa grande disponibilità di informazioni è molto preziosa per i singoli individui e le organizzazioni, che devono però scontrarsi con la grande difficoltà di trovare le fonti di tali opinioni, estrapolarle ed esprimerle in un formato standard. Queste operazioni risulterebbero quasi impossibili da eseguire a mano, per questo è nato il bisogno di automatizzare tali procedimenti, e la Sentiment Analysis è la risposta a questi bisogni. Sentiment analysis (o Opinion Mining, come è chiamata a volte) è uno dei tanti campi di studio computazionali che affronta il tema dell’elaborazione del linguaggio naturale orientato all'estrapolazione delle opinioni. Negli ultimi anni si è rilevato essere uno dei nuovi campi di tendenza nel settore dei social media, con una serie di applicazioni nel campo economico, politico e sociale. Questa tesi ha come obiettivo quello di fornire uno sguardo su quello che è lo stato di questo campo di studio, con presentazione di metodi e tecniche e di applicazioni di esse in alcuni studi eseguiti in questi anni.


60.00% 60.00%



Die kurzen Technologiezyklen in der IT-Industrie stellen Unternehmen vor das Problem, Mitarbeiter zeit- und themenadäquat weiter zu qualifizieren. Für Bildungsanbieter erwächst damit die Herausforderung, relevante Bildungsthemen möglichst frühzeitig zu identifizieren, ökonomisch zu bewerten und ausgewählte Themen in Form geeigneter Bildungsangebote zur Marktreife zu bringen. Zur Handhabung dieser Problematik wurde an der Hochschule für Telekommunikation Leipzig (HfTL), die sich in Trägerschaft der Deutsche Telekom AG befindet, ein innovatives Analyseinstrument entwickelt. Mit diesem Instrument, dem IT-KompetenzBarometer, werden Stellenanzeigen, die in Jobportalen online publiziert werden, ausgelesen und mithilfe von Text Mining-Methoden untersucht. Auf diese Weise können Informationen gewonnen werden, die differenzierte Auskunft über die qualitativen Kompetenzanforderungen zentraler Berufsbilder des IT-Sektors liefern. Dieser Beitrag stellt Ergebnisse vor, die durch Analyse von mehr als 40.000 Stellenanzeigen für IT-Fachkräfte aus Jobportalen im Zeitraum von Juni-September 2012 gewonnen werden konnten. Diese Ergebnisse liefern eine Informationsgrundlage, um marktrelevante Bildungsthemen zu identifizieren, sodass Bildungsangebote erfolgreich gestaltet und weiterentwickelt werden können.


60.00% 60.00%



Antimicrobial drugs may be used to treat diarrheal illness in companion animals. It is important to monitor antimicrobial use to better understand trends and patterns in antimicrobial resistance. There is no monitoring of antimicrobial use in companion animals in Canada. To explore how the use of electronic medical records could contribute to the ongoing, systematic collection of antimicrobial use data in companion animals, anonymized electronic medical records were extracted from 12 participating companion animal practices and warehoused at the University of Calgary. We used the pre-diagnostic, clinical features of diarrhea as the case definition in this study. Using text-mining technologies, cases of diarrhea were described by each of the following variables: diagnostic laboratory tests performed, the etiological diagnosis and antimicrobial therapies. The ability of the text miner to accurately describe the cases for each of the variables was evaluated. It could not reliably classify cases in terms of diagnostic tests or etiological diagnosis; a manual review of a random sample of 500 diarrhea cases determined that 88/500 (17.6%) of the target cases underwent diagnostic testing of which 36/88 (40.9%) had an etiological diagnosis. Text mining, compared to a human reviewer, could accurately identify cases that had been treated with antimicrobials with high sensitivity (92%, 95% confidence interval, 88.1%-95.4%) and specificity (85%, 95% confidence interval, 80.2%-89.1%). Overall, 7400/15,928 (46.5%) of pets presenting with diarrhea were treated with antimicrobials. Some temporal trends and patterns of the antimicrobial use are described. The results from this study suggest that informatics and the electronic medical records could be useful for monitoring trends in antimicrobial use.


60.00% 60.00%



Este proyecto es continuación de proyectos de crítica genética que se llevaron a cabo, o están en marcha en la Secretaría de Investigación de la Facultad de Humanidades de la UNaM, que tienen como objeto manuscritos de la literatura provincial. La labor de este proyecto implica una red de acuerdos teóricos, críticos y metodológicos iniciales, un rastreo e identificación de documentos en la región y la tramitación de préstamos ante poseedores actuales de los manuscritos a la que se suma lo interdisciplinario con el diálogo entre la crítica genética y la ciencia de la computación. A la luz de este diálogo el proyecto se propone en esta primera etapa promover tres acciones: a) desarrollar un sitio virtual-institucional que facilite el acceso en línea a archivos de escritores regionales que se vienen estudiando en la UNaM. b) hacer un relevamiento de los archivos de manuscritos que en la actualidad se encuentran diseminados, invisibles a las investigaciones para, en ese gesto, recuperarlos e incentivar su estudio. c) diseñar y construir una base de datos y un repositorio digital de manuscritos, utilizando para esta tarea software Open Source. d) sentar las bases para un estudio sobre la factibilidad de implementar un proceso de Text Mining que automatice la recuperación de información relevante, categorice los documentos y los agrupe de acuerdo a características comunes. e) Afianzar lazos institucionales con otros proyectos existentes en Argentina (UNLP), Francia (CRLA-Archivos), Bélgica (UCLovaina), España ( Universidad de Castilla La Mancha) y con UNNE y la UNLa con quien ya tenemos un convenio de colaboración en Minería de datos.


60.00% 60.00%



Este proyecto es continuación de proyectos de crítica genética que se llevaron a cabo, o están en marcha en la Secretaría de Investigación de la Facultad de Humanidades de la UNaM, que tienen como objeto manuscritos de la literatura provincial. La labor de este proyecto implica una red de acuerdos teóricos, críticos y metodológicos iniciales, un rastreo e identificación de documentos en la región y la tramitación de préstamos ante poseedores actuales de los manuscritos a la que se suma lo interdisciplinario con el diálogo entre la crítica genética y la ciencia de la computación. El proyecto se propone en esta primera etapa promover tres acciones: a) desarrollar un sitio virtual -institucional que facilite el acceso en línea a archivos de escritores regionales que se vienen estudiando en la UNaM. b) hacer un relevamiento de los archivos de manuscritos que en la actualidad se encuentran diseminados, invisibles a las investigaciones para, en ese gesto, recuperarlos e incentivar su estudio. c) diseñar y construir una base de datos y un repositorio digital de manuscritos, utilizando para esta tarea software Open Source. d) sentar las bases para un estudio sobre la factibilidad de implementar un proceso de Text Mining que automatice la recuperación de información relevante, categorice los documentos y los agrupe de acuerdo a características comunes. e) Afianzar lazos institucionales con otros proyectos existentes en Argentina (UNLP) y con UNNE y la UNLa con quien ya tenemos un convenio de colaboración en Minería de datos, con Francia (CRLA-Archivos), Bélgica (UCLovaina), España (Universidad de Castilla La Mancha).


60.00% 60.00%



Over the last years, and particularly in the context of the COMBIOMED network, our biomedical informatics (BMI) group at the Universidad Politecnica de Madrid has carried out several approaches to address a fundamental issue: to facilitate open access and retrieval to BMI resources —including software, databases and services. In this regard, we have followed various directions: a) a text mining-based approach to automatically build a “resourceome”, an inventory of open resources, b) methods for heterogeneous database integration —including clinical, -omics and nanoinformatics sources—; c) creating various services to provide access to different resources to African users and professionals, and d) an approach to facilitate access to open resources from research projects


60.00% 60.00%



The access to medical literature collections such as PubMed, MedScape or Cochrane has been increased notably in the last years by the web-based tools that provide instant access to the information. However, more sophisticated methodologies are needed to exploit efficiently all that information. The lack of advanced search methods in clinical domain produce that even using well-defined questions for a particular disease, clinicians receive too many results. Since no information analysis is applied afterwards, some relevant results which are not presented in the top of the resultant collection could be ignored by the expert causing an important loose of information. In this work we present a new method to improve scientific article search using patient information for query generation. Using federated search strategy, it is able to simultaneously search in different resources and present a unique relevant literature collection. And applying NLP techniques it presents semantically similar publications together, facilitating the identification of relevant information to clinicians. This method aims to be the foundation of a collaborative environment for sharing clinical knowledge related to patients and scientific publications.


60.00% 60.00%



Durante los últimos años ha aumentado la presencia de personas pertenecientes al mundo de la política en la red debido a la proliferación de las redes sociales, siendo Twitter la que mayor repercusión mediática tiene en este ámbito. El estudio del comportamiento de los políticos en Twitter y de la acogida que tienen entre los ciudadanos proporciona información muy valiosa a la hora de analizar las campañas electorales. De esta forma, se puede estudiar la repercusión real que tienen sus mensajes en los resultados electorales, así como distinguir aquellos comportamientos que tienen una mayor aceptación por parte de la la ciudadaná. Gracias a los avances desarrollados en el campo de la minería de textos, se poseen las herramientas necesarias para analizar un gran volumen de textos y extraer de ellos información de utilidad. Este proyecto tiene como finalidad recopilar una muestra significativa de mensajes de Twitter pertenecientes a los candidatos de los principales partidos políticos que se presentan a las elecciones autonómicas de Madrid en 2015. Estos mensajes, junto con las respuestas de otros usuarios, se han analizado usando algoritmos de aprendizaje automático y aplicando las técnicas de minería de textos más oportunas. Los resultados obtenidos para cada político se han examinado en profundidad y se han presentado mediante tablas y gráficas para facilitar su comprensión.---ABSTRACT---During the past few years the presence on the Internet of people related with politics has increased, due to the proliferation of social networks. Among all existing social networks, Twitter is the one which has the greatest media impact in this field. Therefore, an analysis of the behaviour of politicians in this social network, along with the response from the citizens, gives us very valuable information when analysing electoral campaigns. This way it is possible to know their messages impact in the election results. Moreover, it can be inferred which behaviours have better acceptance among the citizenship. Thanks to the advances achieved in the text mining field, its tools can be used to analyse a great amount of texts and extract from them useful information. The present project aims to collect a significant sample of Twitter messages from the candidates of the principal political parties for the 2015 autonomic elections in Madrid. These messages, as well as the answers received by the other users, have been analysed using machine learning algorithms and applying the most suitable data mining techniques. The results obtained for each politician have been examined in depth and have been presented using tables and graphs to make its understanding easier.