32 resultados para text and data mining
Resumo:
This master's thesis coversthe concepts of knowledge discovery, data mining and technology forecasting methods in telecommunications. It covers the various aspects of knowledge discoveryin data bases and discusses in detail the methods of data mining and technologyforecasting methods that are used in telecommunications. Main concern in the overall process of this thesis is to emphasize the methods that are being used in technology forecasting for telecommunications and data mining. It tries to answer to some extent to the question of do forecasts create a future? It also describes few difficulties that arise in technology forecasting. This thesis was done as part of my master's studies in Lappeenranta University of Technology.
Resumo:
Yritysten syvällinen ymmärrys työntekijöistä vaatii yrityksiltä monipuolista panostusta tiedonhallintaan. Tämän yhdistäminen ennakoivaan analytiikkaan ja tiedonlouhintaan mahdollistaa yrityksille uudenlaisen ulottuvuuden kehittää henkilöstöhallinnon toimintoja niin työntekijöiden kuin yrityksen etujen mukaisesti. Tutkielman tavoitteena oli selvittää tiedonlouhinnan hyödyntämistä henkilöstöhallinnossa. Tutkielma toteutettiin konstruktiivistä menetelmää hyödyntäen. Teoreettinen viitekehys keskittyi ennakoivan analytiikan ja tiedonlouhinnan konseptin ymmärtämiseen. Tutkielman empiriaosuus rakentui kvalitatiiviseen ja kvantitatiiviseen osiin. Kvalitatiivinen osa koostui tutkielman esitutkimuksesta, jossa käsiteltiin ennakoivan analytiikan ja tiedonlouhinnan hyödyntämistä. Kvantitatiivinen osa rakentui tiedonlouhintaprojektiin, joka toteutettiin henkilöstöhallintoon tutkien henkilöstövaihtuvuutta. Esitutkimuksen tuloksena tiedonlouhinnan hyödyntämisen haasteiksi ilmeni muun muassa tiedon omistajuus, osaaminen ja ymmärrys mahdollisuuksista. Tiedonlouhintaprojektin tuloksena voidaan todeta, että tutkimuksessa sovelletuista korrelaatioiden tutkimisista ja logistisesta regressioanalyysistä oli havaittavissa tilastollisia riippuvuuksia vapaaehtoisesti poistuvien työntekijöiden osalta.
Resumo:
Leveraging cloud services, companies and organizations can significantly improve their efficiency, as well as building novel business opportunities. Cloud computing offers various advantages to companies while having some risks for them too. Advantages offered by service providers are mostly about efficiency and reliability while risks of cloud computing are mostly about security problems. Problems with security of the cloud still demand significant attention in order to tackle the potential problems. Security problems in the cloud as security problems in any area of computing, can not be fully tackled. However creating novel and new solutions can be used by service providers to mitigate the potential threats to a large extent. Looking at the security problem from a very high perspective, there are two focus directions. Security problems that threaten service user’s security and privacy are at one side. On the other hand, security problems that threaten service provider’s security and privacy are on the other side. Both kinds of threats should mostly be detected and mitigated by service providers. Looking a bit closer to the problem, mitigating security problems that target providers can protect both service provider and the user. However, the focus of research community mostly is to provide solutions to protect cloud users. A significant research effort has been put in protecting cloud tenants against external attacks. However, attacks that are originated from elastic, on-demand and legitimate cloud resources should still be considered seriously. The cloud-based botnet or botcloud is one of the prevalent cases of cloud resource misuses. Unfortunately, some of the cloud’s essential characteristics enable criminals to form reliable and low cost botclouds in a short time. In this paper, we present a system that helps to detect distributed infected Virtual Machines (VMs) acting as elements of botclouds. Based on a set of botnet related system level symptoms, our system groups VMs. Grouping VMs helps to separate infected VMs from others and narrows down the target group under inspection. Our system takes advantages of Virtual Machine Introspection (VMI) and data mining techniques.
Resumo:
Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.
Resumo:
In this thesis we study the field of opinion mining by giving a comprehensive review of the available research that has been done in this topic. Also using this available knowledge we present a case study of a multilevel opinion mining system for a student organization's sales management system. We describe the field of opinion mining by discussing its historical roots, its motivations and applications as well as the different scientific approaches that have been used to solve this challenging problem of mining opinions. To deal with this huge subfield of natural language processing, we first give an abstraction of the problem of opinion mining and describe the theoretical frameworks that are available for dealing with appraisal language. Then we discuss the relation between opinion mining and computational linguistics which is a crucial pre-processing step for the accuracy of the subsequent steps of opinion mining. The second part of our thesis deals with the semantics of opinions where we describe the different ways used to collect lists of opinion words as well as the methods and techniques available for extracting knowledge from opinions present in unstructured textual data. In the part about collecting lists of opinion words we describe manual, semi manual and automatic ways to do so and give a review of the available lists that are used as gold standards in opinion mining research. For the methods and techniques of opinion mining we divide the task into three levels that are the document, sentence and feature level. The techniques that are presented in the document and sentence level are divided into supervised and unsupervised approaches that are used to determine the subjectivity and polarity of texts and sentences at these levels of analysis. At the feature level we give a description of the techniques available for finding the opinion targets, the polarity of the opinions about these opinion targets and the opinion holders. Also at the feature level we discuss the various ways to summarize and visualize the results of this level of analysis. In the third part of our thesis we present a case study of a sales management system that uses free form text and that can benefit from an opinion mining system. Using the knowledge gathered in the review of this field we provide a theoretical multi level opinion mining system (MLOM) that can perform most of the tasks needed from an opinion mining system. Based on the previous research we give some hints that many of the laborious market research tasks that are done by the sales force, which uses this sales management system, can improve their insight about their partners and by that increase the quality of their sales services and their overall results.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
After sales business is an effective way to create profit and increase customer satisfaction in manufacturing companies. Despite this, some special business characteristics that are linked to these functions, make it exceptionally challenging in its own way. This Master’s Thesis examines the current situation of the data and inventory management in the case company regarding possibilities and challenges related to the consolidation of current business operations. The research examines process steps, procedures, data requirements, data mining practices and data storage management of spare part sales process, whereas the part focusing on inventory management is reviewing the current stock value and examining current practices and operational principles. There are two global after sales units which supply spare parts and issues reviewed in this study are examined from both units’ perspective. The analysis is focused on the operations of that unit where functions would be centralized by default, if change decisions are carried out. It was discovered that both data and inventory management include clear shortcomings, which result from lack of internal instructions and established processes as well as lack of cooperation with other stakeholders related to product’s lifecycle. The main product of data management was a guideline for consolidating the functions, tailored for the company’s needs. Additionally, potentially scrapped spare part were listed and a proposal of inventory management instructions was drafted. If the suggested spare part materials will be scrapped, stock value will decrease 46 percent. A guideline which was reviewed and commented in this thesis was chosen as the basis of the inventory management instructions.
Resumo:
Kiihtyvä kilpailu yritysten välillä on tuonut yritykset vaikeidenhaasteiden eteen. Tuotteet pitäisi saada markkinoille nopeammin, uusien tuotteiden pitäisi olla parempia kuin vanhojen ja etenkin parempia kuin kilpailijoiden vastaavat tuotteet. Lisäksi tuotteiden suunnittelu-, valmistus- ja muut kustannukset eivät saisi olla suuria. Näiden haasteiden toteuttamisessa yritetään usein käyttää apuna tuotetietoja, niiden hallintaa ja vaihtamista. Andritzin, kuten muidenkin yritysten, on otettava nämä asiat huomioon pärjätäkseen kilpailussa. Tämä työ on tehty Andritzille, joka on maailman johtavia paperin ja sellun valmistukseen tarkoitettujen laitteiden valmistajia ja huoltopalveluiden tarjoajia. Andritz on ottamassa käyttöön ERP-järjestelmän kaikissa toimipisteissään. Sitä halutaan hyödyntää mahdollisimman tehokkaasti, joten myös tuotetiedot halutaan järjestelmään koko elinkaaren ajalta. Osan tuotetiedoista luo Andritzin kumppanit ja alihankkijat, joten myös tietojen vaihto partnereiden välillä halutaan hoitaasiten, että tiedot saadaan suoraan ERP-järjestelmään. Tämän työn tavoitteena onkin löytää ratkaisu, jonka avulla Andritzin ja sen kumppaneiden välinen tietojenvaihto voidaan hoitaa. Tämä diplomityö esittelee tuotetietojen, niiden hallinnan ja vaihtamisen tarkoituksen ja tärkeyden. Työssä esitellään erilaisia ratkaisuvaihtoehtoja tiedonvaihtojärjestelmän toteuttamiseksi. Osa niistä perustuu yleisiin ja toimialakohtaisiin standardeihin. Myös kaksi kaupallista tuotetta esitellään. Tarkasteltavana onseuraavat standardit: PaperIXI, papiNet, X-OSCO, PSK-standardit sekä RosettaNet. Lisäksi työssä tarkastellaan ERP-järjestelmän toimittajan, SAP:in ratkaisuja tietojenvaihtoon. Näistä vaihtoehdoista parhaimpia tarkastellaan vielä yksityiskohtaisemmin ja lopuksi eri ratkaisuja vertaillaan keskenään, jotta löydettäisiin Andritzin tarpeisiin paras vaihtoehto.
Resumo:
Tutkimuksen ensisijaisena tavoitteena oli tarkastella luottamuksen rakentumista virtuaalitiimissä. Keskeistä tarkastelussa olivat luottamuksen lähteiden löytäminen, suhteen rakentuminen sekä teknologiavälitteinen kommunikaatio. Myös käytännön keinoja ja sovelluksia etsittiin. Tässä tutkimuksessa luottamus nähtiin tärkeänä yhteistyön mahdollistajana sekä keskeisenä elementtinä ihmisten välisten suhteiden rakentumisessa. Tämä tutkimus oli empiirinen ja kuvaileva tapaustutkimus. Tutkimuksessa kvalitatiivista aineistoa kerättiin pääasiassa web-pohjaisen kyselyn sekä puhelinhaastattelun avulla. Aineistonkeruu toteutettiin siis pääasiassa virtuaalisesti. Saatu aineisto analysoitiin teemoittelun avulla. Tässä työssä teemoja etsittiin tekstistä pääasiassa teoriasta johdettujen oletusten perusteella. Tutkimuksen tuloksena oli, että luottamusta rakentavia mekanismeja ovat, karkeasti luokiteltuna, yhteiset päämäärät ja vastuut, kommunikaatio, sosiaalinen kanssakäyminen ja informaation jakaminen, toisten huomioiminen ja henkilökohtaiset ominaisuudet. Mekanismit eivät suuresti eronneet luottamuksen rakentumisen mekanismeista perinteisessä kontekstissa. Virtuaalitiimityön alkuvaiheessa luottamus pohjautui käsityksille toisten tiimin jäsenten kyvykkyydestä. Myös institutionaalinen identifioituminen loi pohjaa luottamukselle alkuvaiheessa. Muuten luottamus rakentui vähän kerrassaan tehtävään liittyvän kommunikaation ja sosiaalisen kommunikaation kautta. Tekojen merkitys korostui erityisesti ajan myötä. Työssä esitettiin myös käytännön keinoja luottamuksen rakentamiseksi. Olemassa olevien teknologioiden havaittiin tukevan hyvin suhteen rakentumista tiedon jakamiseen ja sen varastoimiseen liittyvissä tehtävissä. Sen sijaan vuorovaikutuksen näkökulmasta tuen ei nähty olevan yhtä kattavaa. Kaiken kaikkiaan kuitenkin parannuksella sosiaalisissa suhteissa voitaneen saada enemmän aikaan kuin parannuksilla teknologian suhteen.
Resumo:
Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-effective hospital administration. Methods for automated processing of clinical text are emerging. The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality, and a road map for the technology development. Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow. Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising. Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality. The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.
Resumo:
After decades of mergers and acquisitions and successive technology trends such as CRM, ERP and DW, the data in enterprise systems is scattered and inconsistent. Global organizations face the challenge of addressing local uses of shared business entities, such as customer and material, and at the same time have a consistent, unique, and consolidate view of financial indicators. In addition, current enterprise systems do not accommodate the pace of organizational changes and immense efforts are required to maintain data. When it comes to systems integration, ERPs are considered “closed” and expensive. Data structures are complex and the “out-of-the-box” integration options offered are not based on industry standards. Therefore expensive and time-consuming projects are undertaken in order to have required data flowing according to business processes needs. Master Data Management (MDM) emerges as one discipline focused on ensuring long-term data consistency. Presented as a technology-enabled business discipline, it emphasizes business process and governance to model and maintain the data related to key business entities. There are immense technical and organizational challenges to accomplish the “single version of the truth” MDM mantra. Adding one central repository of master data might prove unfeasible in a few scenarios, thus an incremental approach is recommended, starting from areas most critically affected by data issues. This research aims at understanding the current literature on MDM and contrasting it with views from professionals. The data collected from interviews revealed details on the complexities of data structures and data management practices in global organizations, reinforcing the call for more in-depth research on organizational aspects of MDM. The most difficult piece of master data to manage is the “local” part, the attributes related to the sourcing and storing of materials in one particular warehouse in The Netherlands or a complex set of pricing rules for a subsidiary of a customer in Brazil. From a practical perspective, this research evaluates one MDM solution under development at a Finnish IT solution-provider. By means of applying an existing assessment method, the research attempts at providing the company with one possible tool to evaluate its product from a vendor-agnostics perspective.
Resumo:
Technological developments in microprocessors and ICT landscape have made a shift to a new era where computing power is embedded in numerous small distributed objects and devices in our everyday lives. These small computing devices are ne-tuned to perform a particular task and are increasingly reaching our society at every level. For example, home appliances such as programmable washing machines, microwave ovens etc., employ several sensors to improve performance and convenience. Similarly, cars have on-board computers that use information from many di erent sensors to control things such as fuel injectors, spark plug etc., to perform their tasks e ciently. These individual devices make life easy by helping in taking decisions and removing the burden from their users. All these objects and devices obtain some piece of information about the physical environment. Each of these devices is an island with no proper connectivity and information sharing between each other. Sharing of information between these heterogeneous devices could enable a whole new universe of innovative and intelligent applications. The information sharing between the devices is a diffcult task due to the heterogeneity and interoperability of devices. Smart Space vision is to overcome these issues of heterogeneity and interoperability so that the devices can understand each other and utilize services of each other by information sharing. This enables innovative local mashup applications based on shared data between heterogeneous devices. Smart homes are one such example of Smart Spaces which facilitate to bring the health care system to the patient, by intelligent interconnection of resources and their collective behavior, as opposed to bringing the patient into the health system. In addition, the use of mobile handheld devices has risen at a tremendous rate during the last few years and they have become an essential part of everyday life. Mobile phones o er a wide range of different services to their users including text and multimedia messages, Internet, audio, video, email applications and most recently TV services. The interactive TV provides a variety of applications for the viewers. The combination of interactive TV and the Smart Spaces could give innovative applications that are personalized, context-aware, ubiquitous and intelligent by enabling heterogeneous systems to collaborate each other by sharing information between them. There are many challenges in designing the frameworks and application development tools for rapid and easy development of these applications. The research work presented in this thesis addresses these issues. The original publications presented in the second part of this thesis propose architectures and methodologies for interactive and context-aware applications, and tools for the development of these applications. We demonstrated the suitability of our ontology-driven application development tools and rule basedapproach for the development of dynamic, context-aware ubiquitous iTV applications.
Resumo:
The global concern about sustainability has been growing and the mining industry is questioned about its environmental and social performance. Corporate social responsibility (CSR) is an important issue for the extractive industries. The main objective of this study was to investigate the relationship between CSR performance and financial performance of selected mining companies. The study was conducted by identifying and comparing a selection of available CSR performance indicators with financial performance indicators. Based on the result of the study, the relationship between CSR performance and financial performance is unclear for the selected group of companies. The result is mixed and no industry specific realistic way to measure CSR performance uniformly is available. The result as a whole is contradictory and varies at company level as well as based on the selected indicators. The result of this study confirms that the relationship between CSR performance and financial performance is complicated and difficult to determine. As an outcome, evaluation of benefits of CSR in the mining sector could better be analyzed based on different attributes.
Resumo:
Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014