774 resultados para outlier detection, data mining, gpgpu, gpu computing, supercomputing
Resumo:
Aineistojen käsittely ja jalostaminen. Esitys Liikearkistopäiville 2015.
Resumo:
The case company in this study is a large industrial engineering company whose business is largely based on delivering a wide-range of engineering projects. The aim of this study is to create and develop a fairly simple Excel-based tool for the sales department. The tool’s main function is to estimate and visualize the profitability of various small projects. The study also aims to find out other possible and more long-term solutions for tackling the problem in the future. The study is highly constructive and descriptive as it focuses on the development task and in the creation of a new operating model. The developed tool focuses on estimating the profitability of the small orders of the selected project portfolio currently on the bidding-phase (prospects) and will help the case company in the monthly reporting of sales figures. The tool will analyse the profitability of a certain project by calculating its fixed and variable costs, then further the gross margin and operating profit. The bidding phase of small project is a phase that has not been covered fully by the existing tools within the case company. The project portfolio tool can be taken into use immediately within the case company and it will provide fairly accurate estimate of the profitability figures of the recently sold small projects.
Resumo:
Kilpailuetua tavoittelevan yrityksen pitää kyetä jalostamaan tietoa ja tunnistamaan sen avulla uusia tulevaisuuden mahdollisuuksia. Tulevaisuuden mielikuvien luomiseksi yrityksen on tunnettava toimintaympäristönsä ja olla herkkänä havaitsemaan muutostrendit ja muut toimintaympäristön signaalit. Ympäristön elintärkeät signaalit liittyvät kilpailijoihin, teknologian kehittymiseen, arvomaailman muutoksiin, globaaleihin väestötrendeihin tai jopa ympäristön muutoksiin. Spatiaaliset suhteet ovat peruspilareita käsitteellistää maailmaamme. Pitney (2015) on arvioinut, että 80 % kaikesta bisnesdatasta sisältää jollakin tavoin viittauksia paikkatietoon. Siitä huolimatta paikkatietoa on vielä huonosti hyödynnetty yritysten strategisten päätösten tukena. Teknologioiden kehittyminen, tiedon nopea siirto ja paikannustekniikoiden integroiminen eri laitteisiin ovat mahdollistaneet sen, että paikkatietoa hyödyntäviä palveluja ja ratkaisuja tullaan yhä enemmän näkemään yrityskentässä. Tutkimuksen tavoitteena oli selvittää voiko location intelligence toimia strategisen päätöksenteon tukena ja jos voi, niin miten. Työ toteutettiin konstruktiivista tutkimusmenetelmää käyttäen, jolla pyritään ratkaisemaan jokin relevantti ongelma. Konstruktiivinen tutkimus tehtiin tiiviissä yhteistyössä kolmen pk-yrityksen kanssa ja siihen haastateltiin kuutta eri strategiasta vastaavaa henkilöä. Tutkimuksen tuloksena löydettiin, että location intelligenceä voidaan hyödyntää strategisen päätöksenteon tukena usealla eri tasolla. Yksinkertaisimmassa karttaratkaisussa halutut tiedot tuodaan kartalle ja luodaan visuaalinen esitys, jonka avulla johtopäätöksien tekeminen helpottuu. Toisen tason karttaratkaisu pitää sisällään sekä sijainti- että ominaisuustietoa, jota on yhdistetty eri lähteistä. Tämä toisen tason karttaratkaisu on usein kuvailevaa analytiikkaa, joka mahdollistaa erilaisten ilmiöiden analysoinnin. Kolmannen eli ylimmän tason karttaratkaisu tarjoaa ennakoivaa analytiikkaa ja malleja tulevaisuudesta. Tällöin ohjelmaan koodataan älykkyyttä, jossa informaation keskinäisiä suhteita on määritelty joko tiedon louhintaa tai tilastollisia analyysejä hyödyntäen. Tutkimuksen johtopäätöksenä voidaan todeta, että location intelligence pystyy tarjoamaan lisäarvoa strategisen päätöksenteon tueksi, mikäli yritykselle on hyödyllistä ymmärtää eri ilmiöiden, asiakastarpeiden, kilpailijoiden ja markkinamuutoksien maantieteellisiä eroavaisuuksia. Parhaimmillaan location intelligence -ratkaisu tarjoaa luotettavan analyysin, jossa tieto välittyy muuttumattomana päätöksentekijältä toiselle ja johtopäätökseen johtaneita syitä on mahdollista palata tarkastelemaan tarvittaessa uudelleen.
Resumo:
The strongest wish of the customer concerning chemical pulp features is consistent, uniform quality. Variation may be controlled and reduced by using statistical methods. However, studies addressing the application and benefits of statistical methods in forest product sector are scarce. Thus, the customer wish is the root cause of the motivation behind this dissertation. The research problem addressed by this dissertation is that companies in the chemical forest product sector require new knowledge for improving their utilization of statistical methods. To gain this new knowledge, the research problem is studied from five complementary viewpoints – challenges and success factors, organizational learning, problem solving, economic benefit, and statistical methods as management tools. The five research questions generated on the basis of these viewpoints are answered in four research papers, which are case studies based on empirical data collection. This research as a whole complements the literature dealing with the use of statistical methods in the forest products industry. Practical examples of the application of statistical process control, case-based reasoning, the cross-industry standard process for data mining, and performance measurement methods in the context of chemical forest products manufacturing are brought to the public knowledge of the scientific community. The benefit of the application of these methods is estimated or demonstrated. The purpose of this dissertation is to find pragmatic ideas for companies in the chemical forest product sector in order for them to improve their utilization of statistical methods. The main practical implications of this doctoral dissertation can be summarized in four points: 1. It is beneficial to reduce variation in chemical forest product manufacturing processes 2. Statistical tools can be used to reduce this variation 3. Problem-solving in chemical forest product manufacturing processes can be intensified through the use of statistical methods 4. There are certain success factors and challenges that need to be addressed when implementing statistical methods
Resumo:
Tässä työssä käsitellään kävijäseurannan menetelmiä ja toteutetaan niitä käytännössä. Web-analytiikkaohjelmistojen toimintaan tutustutaan, pääasiassa keskittyen Google Analyticsiin. Tavoitteena on selvittää Lappeenrannan matkailulaitepäätteiden käyttömääriä ja eriyttää niitä laitekohtaisesti. Web-analytiikasta tehdään kirjallisuuskatsaus ja kävijäseurantadataa analysoidaan sekä vertaillaan kahdesta eri verkkosivustosta. Lisäksi matkailulaitepäätteiden verkkosivuston lokeja tarkastellaan tiedonlouhinnan keinoin tarkoitusta varten kehitetyllä Python-sovelluksella. Työn pohjalta voidaan todeta, ettei matkailulaitepäätteiden käyttömääriä voida nykyisen toteutuksen perusteella eriyttää laitekohtaisesti. Istuntojen määrää ja tapahtumia voidaan kuitenkin seurata. Matkailulaitepäätteiden kävijäseurannassa tunnistetaan useita ongelmia, kuten päätteiden automaattisen verkkosivunpäivityksen tuloksia vääristävä vaikutus, osittainen Google Analytics -integraatio ja tärkeimpänä päätteen yksilöivän tunnistetiedon puuttuminen. Työssä ehdotetaan ratkaisuja, joilla mahdollistetaan kävijäseurannan tehokas käyttö ja laitekohtainen seuranta. Saadut tulokset korostavat kävijäseurannan toteutuksen suunnitelmallisuuden tärkeyttä.
Resumo:
This study examines the efficiency of search engine advertising strategies employed by firms. The research setting is the online retailing industry, which is characterized by extensive use of Web technologies and high competition for market share and profitability. For Internet retailers, search engines are increasingly serving as an information gateway for many decision-making tasks. In particular, Search engine advertising (SEA) has opened a new marketing channel for retailers to attract new customers and improve their performance. In addition to natural (organic) search marketing strategies, search engine advertisers compete for top advertisement slots provided by search brokers such as Google and Yahoo! through keyword auctions. The rationale being that greater visibility on a search engine during a keyword search will capture customers' interest in a business and its product or service offerings. Search engines account for most online activities today. Compared with the slow growth of traditional marketing channels, online search volumes continue to grow at a steady rate. According to the Search Engine Marketing Professional Organization, spending on search engine marketing by North American firms in 2008 was estimated at $13.5 billion. Despite the significant role SEA plays in Web retailing, scholarly research on the topic is limited. Prior studies in SEA have focused on search engine auction mechanism design. In contrast, research on the business value of SEA has been limited by the lack of empirical data on search advertising practices. Recent advances in search and retail technologies have created datarich environments that enable new research opportunities at the interface of marketing and information technology. This research uses extensive data from Web retailing and Google-based search advertising and evaluates Web retailers' use of resources, search advertising techniques, and other relevant factors that contribute to business performance across different metrics. The methods used include Data Envelopment Analysis (DEA), data mining, and multivariate statistics. This research contributes to empirical research by analyzing several Web retail firms in different industry sectors and product categories. One of the key findings is that the dynamics of sponsored search advertising vary between multi-channel and Web-only retailers. While the key performance metrics for multi-channel retailers include measures such as online sales, conversion rate (CR), c1ick-through-rate (CTR), and impressions, the key performance metrics for Web-only retailers focus on organic and sponsored ad ranks. These results provide a useful contribution to our organizational level understanding of search engine advertising strategies, both for multi-channel and Web-only retailers. These results also contribute to current knowledge in technology-driven marketing strategies and provide managers with a better understanding of sponsored search advertising and its impact on various performance metrics in Web retailing.
Resumo:
Rough Set Data Analysis (RSDA) is a non-invasive data analysis approach that solely relies on the data to find patterns and decision rules. Despite its noninvasive approach and ability to generate human readable rules, classical RSDA has not been successfully used in commercial data mining and rule generating engines. The reason is its scalability. Classical RSDA slows down a great deal with the larger data sets and takes much longer times to generate the rules. This research is aimed to address the issue of scalability in rough sets by improving the performance of the attribute reduction step of the classical RSDA - which is the root cause of its slow performance. We propose to move the entire attribute reduction process into the database. We defined a new schema to store the initial data set. We then defined SOL queries on this new schema to find the attribute reducts correctly and faster than the traditional RSDA approach. We tested our technique on two typical data sets and compared our results with the traditional RSDA approach for attribute reduction. In the end we also highlighted some of the issues with our proposed approach which could lead to future research.
Resumo:
Mobile augmented reality applications are increasingly utilized as a medium for enhancing learning and engagement in history education. Although these digital devices facilitate learning through immersive and appealing experiences, their design should be driven by theories of learning and instruction. We provide an overview of an evidence-based approach to optimize the development of mobile augmented reality applications that teaches students about history. Our research aims to evaluate and model the impacts of design parameters towards learning and engagement. The research program is interdisciplinary in that we apply techniques derived from design-based experiments and educational data mining. We outline the methodological and analytical techniques as well as discuss the implications of the anticipated findings.
Resumo:
Mobile augmented reality applications are increasingly utilized as a medium for enhancing learning and engagement in history education. Although these digital devices facilitate learning through immersive and appealing experiences, their design should be driven by theories of learning and instruction. We provide an overview of an evidence-based approach to optimize the development of mobile augmented reality applications that teaches students about history. Our research aims to evaluate and model the impacts of design parameters towards learning and engagement. The research program is interdisciplinary in that we apply techniques derived from design-based experiments and educational data mining. We outline the methodological and analytical techniques as well as discuss the implications of the anticipated findings.
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.
Resumo:
Feature selection plays an important role in knowledge discovery and data mining nowadays. In traditional rough set theory, feature selection using reduct - the minimal discerning set of attributes - is an important area. Nevertheless, the original definition of a reduct is restrictive, so in one of the previous research it was proposed to take into account not only the horizontal reduction of information by feature selection, but also a vertical reduction considering suitable subsets of the original set of objects. Following the work mentioned above, a new approach to generate bireducts using a multi--objective genetic algorithm was proposed. Although the genetic algorithms were used to calculate reduct in some previous works, we did not find any work where genetic algorithms were adopted to calculate bireducts. Compared to the works done before in this area, the proposed method has less randomness in generating bireducts. The genetic algorithm system estimated a quality of each bireduct by values of two objective functions as evolution progresses, so consequently a set of bireducts with optimized values of these objectives was obtained. Different fitness evaluation methods and genetic operators, such as crossover and mutation, were applied and the prediction accuracies were compared. Five datasets were used to test the proposed method and two datasets were used to perform a comparison study. Statistical analysis using the one-way ANOVA test was performed to determine the significant difference between the results. The experiment showed that the proposed method was able to reduce the number of bireducts necessary in order to receive a good prediction accuracy. Also, the influence of different genetic operators and fitness evaluation strategies on the prediction accuracy was analyzed. It was shown that the prediction accuracies of the proposed method are comparable with the best results in machine learning literature, and some of them outperformed it.
Prédiction de l'attrition en date de renouvellement en assurance automobile avec processus gaussiens
Resumo:
Le domaine de l’assurance automobile fonctionne par cycles présentant des phases de profitabilité et d’autres de non-profitabilité. Dans les phases de non-profitabilité, les compagnies d’assurance ont généralement le réflexe d’augmenter le coût des primes afin de tenter de réduire les pertes. Par contre, de très grandes augmentations peuvent avoir pour effet de massivement faire fuir la clientèle vers les compétiteurs. Un trop haut taux d’attrition pourrait avoir un effet négatif sur la profitabilité à long terme de la compagnie. Une bonne gestion des augmentations de taux se révèle donc primordiale pour une compagnie d’assurance. Ce mémoire a pour but de construire un outil de simulation de l’allure du porte- feuille d’assurance détenu par un assureur en fonction du changement de taux proposé à chacun des assurés. Une procédure utilisant des régressions à l’aide de processus gaus- siens univariés est développée. Cette procédure offre une performance supérieure à la régression logistique, le modèle généralement utilisé pour effectuer ce genre de tâche.
Resumo:
Thèse numérisée par la Division de la gestion de documents et des archives de l'Université de Montréal
Resumo:
Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal