38 resultados para Sentiment Analysis Opinion Mining Text Mining Twitter

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this thesis we study the field of opinion mining by giving a comprehensive review of the available research that has been done in this topic. Also using this available knowledge we present a case study of a multilevel opinion mining system for a student organization's sales management system. We describe the field of opinion mining by discussing its historical roots, its motivations and applications as well as the different scientific approaches that have been used to solve this challenging problem of mining opinions. To deal with this huge subfield of natural language processing, we first give an abstraction of the problem of opinion mining and describe the theoretical frameworks that are available for dealing with appraisal language. Then we discuss the relation between opinion mining and computational linguistics which is a crucial pre-processing step for the accuracy of the subsequent steps of opinion mining. The second part of our thesis deals with the semantics of opinions where we describe the different ways used to collect lists of opinion words as well as the methods and techniques available for extracting knowledge from opinions present in unstructured textual data. In the part about collecting lists of opinion words we describe manual, semi manual and automatic ways to do so and give a review of the available lists that are used as gold standards in opinion mining research. For the methods and techniques of opinion mining we divide the task into three levels that are the document, sentence and feature level. The techniques that are presented in the document and sentence level are divided into supervised and unsupervised approaches that are used to determine the subjectivity and polarity of texts and sentences at these levels of analysis. At the feature level we give a description of the techniques available for finding the opinion targets, the polarity of the opinions about these opinion targets and the opinion holders. Also at the feature level we discuss the various ways to summarize and visualize the results of this level of analysis. In the third part of our thesis we present a case study of a sales management system that uses free form text and that can benefit from an opinion mining system. Using the knowledge gathered in the review of this field we provide a theoretical multi level opinion mining system (MLOM) that can perform most of the tasks needed from an opinion mining system. Based on the previous research we give some hints that many of the laborious market research tasks that are done by the sales force, which uses this sales management system, can improve their insight about their partners and by that increase the quality of their sales services and their overall results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This piece of work which is Identification of Research Portfolio for Development of Filtration Equipment aims at presenting a novel approach to identify promising research topics in the field of design and development of filtration equipment and processes. The projected approach consists of identifying technological problems often encountered in filtration processes. The sources of information for the problem retrieval were patent documents and scientific papers that discussed filtration equipments and processes. The problem identification method adopted in this work focussed on the semantic nature of a sentence in order to generate series of subject-action-object structures. This was achieved with software called Knowledgist. List of problems often encountered in filtration processes that have been mentioned in patent documents and scientific papers were generated. These problems were carefully studied and categorized. Suggestions were made on the various classes of these problems that need further investigation in order to propose a research portfolio. The uses and importance of other methods of information retrieval were also highlighted in this work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tässä työssä käsitellään kävijäseurannan menetelmiä ja toteutetaan niitä käytännössä. Web-analytiikkaohjelmistojen toimintaan tutustutaan, pääasiassa keskittyen Google Analyticsiin. Tavoitteena on selvittää Lappeenrannan matkailulaitepäätteiden käyttömääriä ja eriyttää niitä laitekohtaisesti. Web-analytiikasta tehdään kirjallisuuskatsaus ja kävijäseurantadataa analysoidaan sekä vertaillaan kahdesta eri verkkosivustosta. Lisäksi matkailulaitepäätteiden verkkosivuston lokeja tarkastellaan tiedonlouhinnan keinoin tarkoitusta varten kehitetyllä Python-sovelluksella. Työn pohjalta voidaan todeta, ettei matkailulaitepäätteiden käyttömääriä voida nykyisen toteutuksen perusteella eriyttää laitekohtaisesti. Istuntojen määrää ja tapahtumia voidaan kuitenkin seurata. Matkailulaitepäätteiden kävijäseurannassa tunnistetaan useita ongelmia, kuten päätteiden automaattisen verkkosivunpäivityksen tuloksia vääristävä vaikutus, osittainen Google Analytics -integraatio ja tärkeimpänä päätteen yksilöivän tunnistetiedon puuttuminen. Työssä ehdotetaan ratkaisuja, joilla mahdollistetaan kävijäseurannan tehokas käyttö ja laitekohtainen seuranta. Saadut tulokset korostavat kävijäseurannan toteutuksen suunnitelmallisuuden tärkeyttä.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tässä työssä tutkitaan Merima Oy:n tuotantolaitoksen toimintaa ohjattavuusanalyysillä. Työn tarkoituksena on etsiä potentiaalisia kehityskohteita yrityksen tuotannosta. Yrityksen tuotanto on muutospaineessa ja se tarvitsee kvantitatiivista informaatiota kehityksen suunnannäyttäjäksi. Työ aloitettiin jalostamalla yrityksen tuotannonohjausjärjestelmästä saatava data analysoitavaan muotoon. Tämän jälkeen päätettiin analyysityypeistä, joita työssä käytetään. Analyysivaiheessa jalostettu data saatettiin informoivaan muotoon, jonka jälkeen voitiin analysoida tuloksia ja tehdä johtopäätöksiä. Työn tuloksena saadaan joukko kehityskohteita, joihin kehityspanos tulisi fokusoida. Lisäksi raportin lopussa pohditaan keinoja kehityskohteiden parantamiseksi.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sanomalehdissä käyty keskustelu “innovaatioyliopistosta” eli Helsinkiin sijoittuvasta Teknillisen korkeakoulun, Helsingin kauppakorkeakoulun ja Taideteollisen korkeakoulun fuusiohankkeesta on ollut vilkasta. Työnimestä huolimatta sisällöllinen keskustelu innovatiivisuudesta on ollut vähäistä. Tämän työn tutkimusmetodi on diskurssianalyysi. Sen keinoin pureudutaan innovatiivisuuden merkityksentymiseen yliopistofuusion kontekstissa. Laajasta kuuden sanomalehden artikkeliaineistosta vuosina 2005-2008 on valittu innovaatiodiskurssia edustava pääaineisto, jota on tulkittu sosiaalisesta ja kielellisestä näkökulmasta käsin. Innovatiivisuus yliopistokontekstissa merkityksentyi pääasiassa innovaatiojärjestelmien kautta. Perustutkimuksen tärkeä rooli innovaatioketjussa esittäytyi näkyvästi.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Despite the complexity of the Chinese culture consumer product businesses should apply them in building brands for Chinese markets. There are reasons to believe that cultural values and attitudes affect the buying behavior of Chinese consumers. Companies that wish to create brands in China should therefore be aware of the prevailing cultural values and consumer attitudes. This thesis will examine which values and attitudes mostly affect Chinese consumers of health food products. The study will be done by conducting a netnography. Because netnography is actually a collection methods rather than a single method, other auxiliary methods will also be applied. These methods are emotion, language and sentiment analysis (ELS analysis). Emotion analysis will be conducted because cultural values are mostly built on emotional basis. Sentiment analysis will assist in recognizing the key factors that help to locate values and attitudes. Because the netnography will be conducted in Chinese web forums by a non-native researcher, linguistic aspects should also be analyzed in parallel with emotions and sentiment analysis. The study shows that the Chinese consumers of health food products put much importance on functional, analytical and collectivistic attitudes as well as social and psychological values. Of all the twelve cultural values defined, the role of family rose above all. Also perseverance, frugality, guanxi and harmony were highly presented. The attitudes were found by recognizing certain attitude factors. Of all the factors, health foods’ functional benefits and aesthetic content together with consumers’ value consciousness surpassed other factors. Besides these results that can be applied by foreign health food companies willing to enter Chinese consumer markets, also academia can benefit this new approach for conducting ethnographies online.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Keskustelu yrittäjyyden sijoittumisesta yliopistojen tehtäväkenttään on muodostunut 2000-luvun aikana osaksi yhteiskunnallista ja maailmanlaajuista korkeakoulutuksen muutosta. Tämän tutkimuksen tarkoituksena on lisätä ymmärrystä yliopiston yrittäjämäisyyden sosiaalisesta rakentumisesta. Lähestyn yliopiston sisällä tuotettua moniäänistä yrittäjyyspuhetta yliopiston yrittäjämäisyyden todellisuuden representaationa, ja pyrin tekemään siitä selkoa kommunikaatioon kohdistuvan analyysin avulla. Ilmiön empiirinen tarkastelu pohjautuu Aalto-yliopistossa ja sitä edeltäneessä Teknillisessä korkeakoulussa vuosina 2006–2013 kerättyihin tekstiaineistoihin. Aineistojen tulkinta tuottaa monikerroksellista tietoa ilmiöstä niin yksilö- kuin organisaatiotasolla. Tarkastelen tutkimuksessani millaisia merkityksenantoja yrittäjyydelle ja yrittäjämäisyydelle annetaan yliopiston sisällä eri näkökulmista katsottuna, kun yliopisto käy läpi merkittävää organisaatiorakenteellista ja institutionaalista muutosta. Tutkimusasetelman neljä näkökulmaa perustuvat tekniikan alan jatko-opiskelijoiden puheeseen, TKK:n opetussuunnitelmatekstiin, Aalto-yliopiston opettajien puheeseen ja johdon tuottamaan strategiatekstiaineistoon. Tutkimuksen tulosten mukaan käynnissä on ollut akateemisten arvojen ja arvojärjestelmien laajentumisen aika ja tila, jossa yrittäjämäisyys astuu areenalle, mutta sen suppeat tulkinnat eivät tarjoa mahdollisuuksia ruohonjuuritason kiinnittymiseen akateemisesta kulttuurista käsin. Yrittäjyyden ja yrittäjämäisyyden saamat taloudelliskaupalliset merkitykset koetaan uhkaksi, eikä tilannetta helpota yleisesti uuvuttavaksi koettu rakenteellinen ja yhteiskunnallinen yliopiston muutosvaihe. Hallitsevat tarinalinjat kuten puhe kilpailukyvystä ja kilpailutilanteen kovenemisesta tuottavat odotusten viitekehyksen, joilla yliopiston toimintaa raamitetaan ulkoapäin. Yrittäjyyteen laajassa merkityksessä sisältyvät mahdollisuudet jäävät niin ikään hyödyntämättä, ja lukuisat edistämistoiminnot kilpistyvät sosiaalisesti rakentuviin esteisiin. Tässä tutkimuksessa rakentuva laajennettu yliopiston yrittäjämäisyyden viitekehys pyrkii palvelemaan moniäänistymisen tarkoitusta käsitteellisellä tasolla ja avaamaan uusia mahdollisuuksia yliopiston yrittäjämäisyyden edistämiselle.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Choice of industrial development options and the relevant allocation of the research funds become more and more difficult because of the increasing R&D costs and pressure for shorter development period. Forecast of the research progress is based on the analysis of the publications activity in the field of interest as well as on the dynamics of its change. Moreover, allocation of funds is hindered by exponential growth in the number of publications and patents. Thematic clusters become more and more difficult to identify, and their evolution hard to follow. The existing approaches of research field structuring and identification of its development are very limited. They do not identify the thematic clusters with adequate precision while the identified trends are often ambiguous. Therefore, there is a clear need to develop methods and tools, which are able to identify developing fields of research. The main objective of this Thesis is to develop tools and methods helping in the identification of the promising research topics in the field of separation processes. Two structuring methods as well as three approaches for identification of the development trends have been proposed. The proposed methods have been applied to the analysis of the research on distillation and filtration. The results show that the developed methods are universal and could be used to study of the various fields of research. The identified thematic clusters and the forecasted trends of their development have been confirmed in almost all tested cases. It proves the universality of the proposed methods. The results allow for identification of the fast-growing scientific fields as well as the topics characterized by stagnant or diminishing research activity.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Raw measurement data does not always immediately convey useful information, but applying mathematical statistical analysis tools into measurement data can improve the situation. Data analysis can offer benefits like acquiring meaningful insight from the dataset, basing critical decisions on the findings, and ruling out human bias through proper statistical treatment. In this thesis we analyze data from an industrial mineral processing plant with the aim of studying the possibility of forecasting the quality of the final product, given by one variable, with a model based on the other variables. For the study mathematical tools like Qlucore Omics Explorer (QOE) and Sparse Bayesian regression (SB) are used. Later on, linear regression is used to build a model based on a subset of variables that seem to have most significant weights in the SB model. The results obtained from QOE show that the variable representing the desired final product does not correlate with other variables. For SB and linear regression, the results show that both SB and linear regression models built on 1-day averaged data seriously underestimate the variance of true data, whereas the two models built on 1-month averaged data are reliable and able to explain a larger proportion of variability in the available data, making them suitable for prediction purposes. However, it is concluded that no single model can fit well the whole available dataset and therefore, it is proposed for future work to make piecewise non linear regression models if the same available dataset is used, or the plant to provide another dataset that should be collected in a more systematic fashion than the present data for further analysis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The global concern about sustainability has been growing and the mining industry is questioned about its environmental and social performance. Corporate social responsibility (CSR) is an important issue for the extractive industries. The main objective of this study was to investigate the relationship between CSR performance and financial performance of selected mining companies. The study was conducted by identifying and comparing a selection of available CSR performance indicators with financial performance indicators. Based on the result of the study, the relationship between CSR performance and financial performance is unclear for the selected group of companies. The result is mixed and no industry specific realistic way to measure CSR performance uniformly is available. The result as a whole is contradictory and varies at company level as well as based on the selected indicators. The result of this study confirms that the relationship between CSR performance and financial performance is complicated and difficult to determine. As an outcome, evaluation of benefits of CSR in the mining sector could better be analyzed based on different attributes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis introduces heat demand forecasting models which are generated by using data mining algorithms. The forecast spans one full day and this forecast can be used in regulating heat consumption of buildings. For training the data mining models, two years of heat consumption data from a case building and weather measurement data from Finnish Meteorological Institute are used. The thesis utilizes Microsoft SQL Server Analysis Services data mining tools in generating the data mining models and CRISP-DM process framework to implement the research. Results show that the built models can predict heat demand at best with mean average percentage errors of 3.8% for 24-h profile and 5.9% for full day. A deployment model for integrating the generated data mining models into an existing building energy management system is also discussed.