39 resultados para Data mining and knowledge discovery

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Presentation of Kristiina Hormia-Poutanen at the 25th Anniversary Conference of The National Repository Library of Finland, Kuopio 22th of May 2015.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining, as a heatedly discussed term, has been studied in various fields. Its possibilities in refining the decision-making process, realizing potential patterns and creating valuable knowledge have won attention of scholars and practitioners. However, there are less studies intending to combine data mining and libraries where data generation occurs all the time. Therefore, this thesis plans to fill such a gap. Meanwhile, potential opportunities created by data mining are explored to enhance one of the most important elements of libraries: reference service. In order to thoroughly demonstrate the feasibility and applicability of data mining, literature is reviewed to establish a critical understanding of data mining in libraries and attain the current status of library reference service. The result of the literature review indicates that free online data resources other than data generated on social media are rarely considered to be applied in current library data mining mandates. Therefore, the result of the literature review motivates the presented study to utilize online free resources. Furthermore, the natural match between data mining and libraries is established. The natural match is explained by emphasizing the data richness reality and considering data mining as one kind of knowledge, an easy choice for libraries, and a wise method to overcome reference service challenges. The natural match, especially the aspect that data mining could be helpful for library reference service, lays the main theoretical foundation for the empirical work in this study. Turku Main Library was selected as the case to answer the research question: whether data mining is feasible and applicable for reference service improvement. In this case, the daily visit from 2009 to 2015 in Turku Main Library is considered as the resource for data mining. In addition, corresponding weather conditions are collected from Weather Underground, which is totally free online. Before officially being analyzed, the collected dataset is cleansed and preprocessed in order to ensure the quality of data mining. Multiple regression analysis is employed to mine the final dataset. Hourly visits are the independent variable and weather conditions, Discomfort Index and seven days in a week are dependent variables. In the end, four models in different seasons are established to predict visiting situations in each season. Patterns are realized in different seasons and implications are created based on the discovered patterns. In addition, library-climate points are generated by a clustering method, which simplifies the process for librarians using weather data to forecast library visiting situation. Then the data mining result is interpreted from the perspective of improving reference service. After this data mining work, the result of the case study is presented to librarians so as to collect professional opinions regarding the possibility of employing data mining to improve reference services. In the end, positive opinions are collected, which implies that it is feasible to utilizing data mining as a tool to enhance library reference service.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present thesis in focused on the minimization of experimental efforts for the prediction of pollutant propagation in rivers by mathematical modelling and knowledge re-use. Mathematical modelling is based on the well known advection-dispersion equation, while the knowledge re-use approach employs the methods of case based reasoning, graphical analysis and text mining. The thesis contribution to the pollutant transport research field consists of: (1) analytical and numerical models for pollutant transport prediction; (2) two novel techniques which enable the use of variable parameters along rivers in analytical models; (3) models for the estimation of pollutant transport characteristic parameters (velocity, dispersion coefficient and nutrient transformation rates) as functions of water flow, channel characteristics and/or seasonality; (4) the graphical analysis method to be used for the identification of pollution sources along rivers; (5) a case based reasoning tool for the identification of crucial information related to the pollutant transport modelling; (6) and the application of a software tool for the reuse of information during pollutants transport modelling research. These support tools are applicable in the water quality research field and in practice as well, as they can be involved in multiple activities. The models are capable of predicting pollutant propagation along rivers in case of both ordinary pollution and accidents. They can also be applied for other similar rivers in modelling of pollutant transport in rivers with low availability of experimental data concerning concentration. This is because models for parameter estimation developed in the present thesis enable the calculation of transport characteristic parameters as functions of river hydraulic parameters and/or seasonality. The similarity between rivers is assessed using case based reasoning tools, and additional necessary information can be identified by using the software for the information reuse. Such systems represent support for users and open up possibilities for new modelling methods, monitoring facilities and for better river water quality management tools. They are useful also for the estimation of environmental impact of possible technological changes and can be applied in the pre-design stage or/and in the practical use of processes as well.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Availability, Data Privacy and Copyrights – Opening Knowledge via Contracts and Pilots, discusses how in Aviisi-project of National Library of Finland, the digital contents, and their availability topics dealt together with pilot organizations

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This master's thesis coversthe concepts of knowledge discovery, data mining and technology forecasting methods in telecommunications. It covers the various aspects of knowledge discoveryin data bases and discusses in detail the methods of data mining and technologyforecasting methods that are used in telecommunications. Main concern in the overall process of this thesis is to emphasize the methods that are being used in technology forecasting for telecommunications and data mining. It tries to answer to some extent to the question of do forecasts create a future? It also describes few difficulties that arise in technology forecasting. This thesis was done as part of my master's studies in Lappeenranta University of Technology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Työn tarkoituksena oli kerätä käyttövarmuustietoa savukaasulinjasta kahdelta suomalaiselta sellutehtaalta niiden käyttöönotosta aina tähän päivään asti. Käyttövarmuustieto koostuu luotettavuustiedoista sekä kunnossapitotiedoista. Kerätyn tiedon avulla on mahdollista kuvata tarkasti laitoksen käyttövarmuutta seuraavilla tunnusluvuilla: suunnittelemattomien häiriöiden lukumäärä ja korjausajat, laitteiden seisokkiaika, vikojen todennäköisyys ja korjaavan kunnossapidon kustannukset suhteessa savukaasulinjan korjaavan kunnossapidon kokonaiskustannuksiin. Käyttövarmuustiedon keräysmetodi on esitelty. Savukaasulinjan kriittisten laitteiden määrittelyyn käytetty metodi on yhdistelmä kyselytutkimuksesta ja muunnellusta vian vaikutus- ja kriittisyysanalyysistä. Laitteiden valitsemiskriteerit lopulliseen kriittisyysanalyysiin päätettiin käyttövarmuustietojen sekä kyselytutkimuksen perusteella. Kriittisten laitteiden määrittämisen tarkoitus on löytää savukaasulinjasta ne laitteet, joiden odottamaton vikaantuminen aiheuttaa vakavimmat seuraukset savukaasulinjan luotettavuuteen, tuotantoon, turvallisuuteen, päästöihin ja kustannuksiin. Tiedon avulla rajoitetut kunnossapidon resurssit voidaan suunnata oikein. Kriittisten laitteiden määrittämisen tuloksena todetaan, että kolme kriittisintä laitetta savukaasulinjassa ovat molemmille sellutehtaille yhteisesti: savukaasupuhaltimet, laahakuljettimet sekä ketjukuljettimet. Käyttövarmuustieto osoittaa, että laitteiden luotettavuus on tehdaskohtaista, mutta periaatteessa samat päälinjat voidaan nähdä suunnittelemattomien vikojen todennäköisyyttä esittävissä kuvissa. Kustannukset, jotka esitetään laitteen suunnittelemattomien kunnossapitokustannusten suhteena savukaasulinjan kokonaiskustannuksiin, noudattelevat hyvin pitkälle luotettavuuskäyrää, joka on laskettu laitteen seisokkiajan suhteena käyttötunteihin. Käyttövarmuustiedon keräys yhdistettynä kriittisten laitteiden määrittämiseen mahdollistavat ennakoivan kunnossapidon oikean kohdistamisen ja ajoittamisen laitteiston elinaikana siten, että luotettavuus- ja kustannustehokkuusvaatimukset saavutetaan.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visual data mining (VDM) tools employ information visualization techniques in order to represent large amounts of high-dimensional data graphically and to involve the user in exploring data at different levels of detail. The users are looking for outliers, patterns and models – in the form of clusters, classes, trends, and relationships – in different categories of data, i.e., financial, business information, etc. The focus of this thesis is the evaluation of multidimensional visualization techniques, especially from the business user’s perspective. We address three research problems. The first problem is the evaluation of projection-based visualizations with respect to their effectiveness in preserving the original distances between data points and the clustering structure of the data. In this respect, we propose the use of existing clustering validity measures. We illustrate their usefulness in evaluating five visualization techniques: Principal Components Analysis (PCA), Sammon’s Mapping, Self-Organizing Map (SOM), Radial Coordinate Visualization and Star Coordinates. The second problem is concerned with evaluating different visualization techniques as to their effectiveness in visual data mining of business data. For this purpose, we propose an inquiry evaluation technique and conduct the evaluation of nine visualization techniques. The visualizations under evaluation are Multiple Line Graphs, Permutation Matrix, Survey Plot, Scatter Plot Matrix, Parallel Coordinates, Treemap, PCA, Sammon’s Mapping and the SOM. The third problem is the evaluation of quality of use of VDM tools. We provide a conceptual framework for evaluating the quality of use of VDM tools and apply it to the evaluation of the SOM. In the evaluation, we use an inquiry technique for which we developed a questionnaire based on the proposed framework. The contributions of the thesis consist of three new evaluation techniques and the results obtained by applying these evaluation techniques. The thesis provides a systematic approach to evaluation of various visualization techniques. In this respect, first, we performed and described the evaluations in a systematic way, highlighting the evaluation activities, and their inputs and outputs. Secondly, we integrated the evaluation studies in the broad framework of usability evaluation. The results of the evaluations are intended to help developers and researchers of visualization systems to select appropriate visualization techniques in specific situations. The results of the evaluations also contribute to the understanding of the strengths and limitations of the visualization techniques evaluated and further to the improvement of these techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This pilot project aims examine the factors of the Finnish subsidiaries local embeddedness, their knowledge creation capabilities and the transfer mechanisms of new practices in the context of the Russian market. The research is designed as a multiple case study conducted with a qualitative approach. The empirical data consists of the interviews of the four Finnish case companies operating in the Kaluga region and three local partner companies. The deductive and inductive approaches were employed to conduct the analysis of the data. The propositions for the future study were developed in the conclusive chapters of the research, where we propose that the factor of the economy growth and industrialization matters in terms of subsidiaries’ role dedication, their knowledge creation capabilities, and direction of the knowledge flow within the local environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Choice of industrial development options and the relevant allocation of the research funds become more and more difficult because of the increasing R&D costs and pressure for shorter development period. Forecast of the research progress is based on the analysis of the publications activity in the field of interest as well as on the dynamics of its change. Moreover, allocation of funds is hindered by exponential growth in the number of publications and patents. Thematic clusters become more and more difficult to identify, and their evolution hard to follow. The existing approaches of research field structuring and identification of its development are very limited. They do not identify the thematic clusters with adequate precision while the identified trends are often ambiguous. Therefore, there is a clear need to develop methods and tools, which are able to identify developing fields of research. The main objective of this Thesis is to develop tools and methods helping in the identification of the promising research topics in the field of separation processes. Two structuring methods as well as three approaches for identification of the development trends have been proposed. The proposed methods have been applied to the analysis of the research on distillation and filtration. The results show that the developed methods are universal and could be used to study of the various fields of research. The identified thematic clusters and the forecasted trends of their development have been confirmed in almost all tested cases. It proves the universality of the proposed methods. The results allow for identification of the fast-growing scientific fields as well as the topics characterized by stagnant or diminishing research activity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Workshop at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014