Biblioteca Digital

45 resultados para object mining

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland

Unsupervised visual object categorization

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The number of digital images has been increasing exponentially in the last few years. People have problems managing their image collections and finding a specific image. An automatic image categorization system could help them to manage images and find specific images. In this thesis, an unsupervised visual object categorization system was implemented to categorize a set of unknown images. The system is unsupervised, and hence, it does not need known images to train the system which needs to be manually obtained. Therefore, the number of possible categories and images can be huge. The system implemented in the thesis extracts local features from the images. These local features are used to build a codebook. The local features and the codebook are then used to generate a feature vector for an image. Images are categorized based on the feature vectors. The system is able to categorize any given set of images based on the visual appearance of the images. Images that have similar image regions are grouped together in the same category. Thus, for example, images which contain cars are assigned to the same cluster. The unsupervised visual object categorization system can be used in many situations, e.g., in an Internet search engine. The system can categorize images for a user, and the user can then easily find a specific type of image.

The syntactic and semantic structure of sentences with a position-filler "it" as a formal object

Relevância:

20.00% 20.00%

Publicador:

Identification of Research Portfolio for the Development of Filtration Equipment

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This piece of work which is Identification of Research Portfolio for Development of Filtration Equipment aims at presenting a novel approach to identify promising research topics in the field of design and development of filtration equipment and processes. The projected approach consists of identifying technological problems often encountered in filtration processes. The sources of information for the problem retrieval were patent documents and scientific papers that discussed filtration equipments and processes. The problem identification method adopted in this work focussed on the semantic nature of a sentence in order to generate series of subject-action-object structures. This was achieved with software called Knowledgist. List of problems often encountered in filtration processes that have been mentioned in patent documents and scientific papers were generated. These problems were carefully studied and categorized. Suggestions were made on the various classes of these problems that need further investigation in order to propose a research portfolio. The uses and importance of other methods of information retrieval were also highlighted in this work.

Local and Global Feature Extraction for Invariant Object Recognition

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Perceiving the world visually is a basic act for humans, but for computers it is still an unsolved problem. The variability present innatural environments is an obstacle for effective computer vision. The goal of invariant object recognition is to recognise objects in a digital image despite variations in, for example, pose, lighting or occlusion. In this study, invariant object recognition is considered from the viewpoint of feature extraction. Thedifferences between local and global features are studied with emphasis on Hough transform and Gabor filtering based feature extraction. The methods are examined with respect to four capabilities: generality, invariance, stability, and efficiency. Invariant features are presented using both Hough transform and Gabor filtering. A modified Hough transform technique is also presented where the distortion tolerance is increased by incorporating local information. In addition, methods for decreasing the computational costs of the Hough transform employing parallel processing and local information are introduced.

Web-pohjaisen videoeditorin toteutus ja Object-Relational Mapping

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tässä insinöörityössä esitellään Stadian verkkoviestinnän VIDEOS-hankkeeseen liittyvän web-pohjaisen videoeditorin kehitys ja käytetyt teknologiat. Fooga-nimiseksi nimetty videoeditorin käyttämät tekniikat ovat Ruby, Ruby on Rails, FFmpeg, Mencoder, ImageMagick ja FLVTool2. Ruby on olio-pohjainen skriptikieli, Ruby on Rails on websovelluskehys ja muut tekniikat ovat komentorivipohjaisia työkaluja, jotka tarjoavat tärkeimmät toiminnallisuudet Foogalle. Tavoitteina oli tämän työn yhteydessä ohjelmoida Foogaan perustoiminnallisuudet, jotka mahdollistavat minimaaliset käyttömahdollisuudet kevääseen 2007 mennessä. Kehitystyö jatkuu vuoteen 2009 asti tarjoamalla samalla mahdollisuuden usealle insinöörityölle tekniikan ja liikenteen koulutusohjelmasta. Tämän lisäksi tässä insinöörityössä perehdytään Object-Relational Mapping-tekniikan perusteisiiin ja verrataan Ruby on Railsin ja Javan ORM-ominaisuuksia. Ruby on Railsin osalta esitellään ActiveRecord-luokka ja Javan osalta Hibernate, jonka johdantona on DAO/DTO-sunnittelumalli.

Object-oriented development of client / server application: case study of service control software

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tässä työssä on esitetty sen ohjelmiston kehittämisen prosessi, joka on tarkoitettu annettavien palveluiden valvottavaksi käyttäen prototyyppimallia. Raportti sisältää vaatimusten, kohteisiin suunnatun analyysin ja suunnittelun, realisointiprosessien kuvauksen ja prototyypin testauksen. Ohjelmiston käyttöala – antavien palveluiden valvonta. Vaatimukset sovellukselle analysoitiin ohjelmistomarkkinoiden perusteella sekä ohjelmiston engineeringin periaatteiden mukaisesti. Ohjelmiston prototyyppi on realisoitu käyttäen asiakas-/palvelinhybridimallia sekä ralaatiokantaa. Kehitetty ohjelmisto on tarkoitettu venäläisille tietokonekerhoille, jotka erikoistuvat pelipalvelinten antamiseen.

Electric Motor and Frequency Converter Markets in Latin America and Especially in the Mining Industry of Chile

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Latinalaisen Amerikan osuus maailmantaloudesta on pieni verrattuna sen maantieteelliseen kokoon, väkilukuun ja luonnonvaroihin. Aluetta pidetään kuitenkin yhtenä tulevaisuuden merkittävistä kasvumarkkinoista. Useissa Latinalaisen Amerikan maissa on teollisuutta, joka hyödyntää luonnonvaroja ja tuottaa raaka-aineita sekä kotimaan että ulkomaiden markkinoille. Tällaisia tyypillisiä teollisuudenaloja Latinalaisessa Amerikassa ovat kaivos- ja metsäteollisuus sekä öljyn ja maakaasun tuotanto. Näiden teollisuudenalojen tuotantolaitteiden ja koneiden valmistusta ei Latinalaisessa Amerikassa juurikaan ole. Ne tuodaan yleensä Pohjois-Amerikasta ja Euroopasta. Tässä diplomityössä tutkitaan sähkömoottorien ja taajuusmuuttajien markkinapotentiaalia Latinalaisessa Amerikassa. Tutkimuksessa perehdytään Latinalaisen Amerikan maiden kansantalouksien tilaan sekä arvioidaan sähkömoottorien ja taajuusmuuttajien markkinoiden kokoa tullitilastojen avulla. Chilen kaivosteollisuudessa arvioidaan olevan erityistä potentiaalia. Diplomityössä selvitetään ostoprosessin kulkua Chilen kaivosteollisuudessa ja eri asiakastyyppien roolia siinä sekä tärkeimpiä päätöskriteerejä toimittaja- ja teknologiavalinnoissa.

Wollastoniittimarkkinat ja suomalaisen tuottajan asemoituminen markkinoilla

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Työn tarkoituksena oli tehdä yleiskuvaus wollastoniittimarkkinoista, ja siten auttaa yritystä sisäisesti määrittämään asemansa kyseisillä markkinoillaWollastoniitin suurimmat kuluttajasektorit ovat keraaminen-, muovi-, maali- ja pinnoite-, sekä metallurginen teollisuus. Eniten mineraalin kulutuksen odotetaan kasvavan muoviteollisuudessa. Suurimmat wollastoniitin tuottajamaat ovat Kiina, Intia, Yhdysvallat, Meksiko ja Suomi. Kiinassa aktiivisia kaivoksia on tällä hetkellä noin 60 kappaletta, ja koko maan tuotanto kattaa noin puolet mineraalin kokonaistuotannosta. Kiinan ulkopuolella toimii yhteensä viisi suurta tuottajaa.Suurimmat wollastoniitin tuottajat ovat intialainen Wolkem India Ltd., kiinalaiset Dalian Huanqui Minerals Corp. ja Jilin Shanwei Wollastonite Mining Co. Ltd, yhdysvaltalaiset Nyco Minerals Inc. ja R.T. Vanderbilt Co., meksikolainen Nyco Minera S.A. de C.V. ja suomalainen Nordkalk Oyj Abp. Uusia tuotantolaitoksia suunnitellaan rakennettavaksi Venäjälle, Kanadaan ja Kazakstaniin.

Object-Oriented Modelling of Multilayer Networks

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Tämä diplomityökuuluu tietoliikenneverkkojen suunnittelun tutkimukseen ja pohjimmiltaan kohdistuu verkon mallintamiseen. Tietoliikenneverkkojen suunnittelu on monimutkainen ja vaativa ongelma, joka sisältää mutkikkaita ja aikaa vieviä tehtäviä. Tämä diplomityö esittelee ”monikerroksisen verkkomallin”, jonka tarkoitus on auttaa verkon suunnittelijoita selviytymään ongelmien monimutkaisuudesta ja vähentää verkkojen suunnitteluun kuluvaa aikaa. Monikerroksinen verkkomalli perustuu yleisille objekteille, jotka ovat yhteisiä kaikille tietoliikenneverkoille. Tämä tekee mallista soveltuvan mielivaltaisille verkoille, välittämättä verkkokohtaisista ominaisuuksista tai verkon toteutuksessa käytetyistä teknologioista. Malli määrittelee tarkan terminologian ja käyttää kolmea käsitettä: verkon jakaminen tasoihin (plane separation), kerrosten muodostaminen (layering) ja osittaminen (partitioning). Nämä käsitteet kuvataan yksityiskohtaisesti tässä työssä. Monikerroksisen verkkomallin sisäinen rakenne ja toiminnallisuus ovat määritelty käyttäen Unified Modelling Language (UML) -notaatiota. Tämä työ esittelee mallin use case- , paketti- ja luokkakaaviot. Diplomityö esittelee myös tulokset, jotka on saatu vertailemalla monikerroksista verkkomallia muihin verkkomalleihin. Tulokset osoittavat, että monikerroksisella verkkomallilla on etuja muihin malleihin verrattuna.

New kernel functions and learning methods for text and data mining

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.

A Dependency Parsing Approach to Biomedical Text Mining

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Biomedical research is currently facing a new type of challenge: an excess of information, both in terms of raw data from experiments and in the number of scientific publications describing their results. Mirroring the focus on data mining techniques to address the issues of structured data, there has recently been great interest in the development and application of text mining techniques to make more effective use of the knowledge contained in biomedical scientific publications, accessible only in the form of natural human language. This thesis describes research done in the broader scope of projects aiming to develop methods, tools and techniques for text mining tasks in general and for the biomedical domain in particular. The work described here involves more specifically the goal of extracting information from statements concerning relations of biomedical entities, such as protein-protein interactions. The approach taken is one using full parsing—syntactic analysis of the entire structure of sentences—and machine learning, aiming to develop reliable methods that can further be generalized to apply also to other domains. The five papers at the core of this thesis describe research on a number of distinct but related topics in text mining. In the first of these studies, we assessed the applicability of two popular general English parsers to biomedical text mining and, finding their performance limited, identified several specific challenges to accurate parsing of domain text. In a follow-up study focusing on parsing issues related to specialized domain terminology, we evaluated three lexical adaptation methods. We found that the accurate resolution of unknown words can considerably improve parsing performance and introduced a domain-adapted parser that reduced the error rate of theoriginal by 10% while also roughly halving parsing time. To establish the relative merits of parsers that differ in the applied formalisms and the representation given to their syntactic analyses, we have also developed evaluation methodology, considering different approaches to establishing comparable dependency-based evaluation results. We introduced a methodology for creating highly accurate conversions between different parse representations, demonstrating the feasibility of unification of idiverse syntactic schemes under a shared, application-oriented representation. In addition to allowing formalism-neutral evaluation, we argue that such unification can also increase the value of parsers for domain text mining. As a further step in this direction, we analysed the characteristics of publicly available biomedical corpora annotated for protein-protein interactions and created tools for converting them into a shared form, thus contributing also to the unification of text mining resources. The introduced unified corpora allowed us to perform a task-oriented comparative evaluation of biomedical text mining corpora. This evaluation established clear limits on the comparability of results for text mining methods evaluated on different resources, prompting further efforts toward standardization. To support this and other research, we have also designed and annotated BioInfer, the first domain corpus of its size combining annotation of syntax and biomedical entities with a detailed annotation of their relationships. The corpus represents a major design and development effort of the research group, with manual annotation that identifies over 6000 entities, 2500 relationships and 28,000 syntactic dependencies in 1100 sentences. In addition to combining these key annotations for a single set of sentences, BioInfer was also the first domain resource to introduce a representation of entity relations that is supported by ontologies and able to capture complex, structured relationships. Part I of this thesis presents a summary of this research in the broader context of a text mining system, and Part II contains reprints of the five included publications.

Proceedings of LOUHI '08. The First Conference on Text and Data Mining of Clinical Documents

Relevância:

20.00% 20.00%

Publicador:

Proceedings of SMBM 2008. The Third International Symposium on Semantic Mining in Biomedicine. September 1st - 3rd, 2008, Turku, Finland

Relevância:

20.00% 20.00%

Publicador:

Opinion Mining

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this thesis we study the field of opinion mining by giving a comprehensive review of the available research that has been done in this topic. Also using this available knowledge we present a case study of a multilevel opinion mining system for a student organization's sales management system. We describe the field of opinion mining by discussing its historical roots, its motivations and applications as well as the different scientific approaches that have been used to solve this challenging problem of mining opinions. To deal with this huge subfield of natural language processing, we first give an abstraction of the problem of opinion mining and describe the theoretical frameworks that are available for dealing with appraisal language. Then we discuss the relation between opinion mining and computational linguistics which is a crucial pre-processing step for the accuracy of the subsequent steps of opinion mining. The second part of our thesis deals with the semantics of opinions where we describe the different ways used to collect lists of opinion words as well as the methods and techniques available for extracting knowledge from opinions present in unstructured textual data. In the part about collecting lists of opinion words we describe manual, semi manual and automatic ways to do so and give a review of the available lists that are used as gold standards in opinion mining research. For the methods and techniques of opinion mining we divide the task into three levels that are the document, sentence and feature level. The techniques that are presented in the document and sentence level are divided into supervised and unsupervised approaches that are used to determine the subjectivity and polarity of texts and sentences at these levels of analysis. At the feature level we give a description of the techniques available for finding the opinion targets, the polarity of the opinions about these opinion targets and the opinion holders. Also at the feature level we discuss the various ways to summarize and visualize the results of this level of analysis. In the third part of our thesis we present a case study of a sales management system that uses free form text and that can benefit from an opinion mining system. Using the knowledge gathered in the review of this field we provide a theoretical multi level opinion mining system (MLOM) that can perform most of the tasks needed from an opinion mining system. Based on the previous research we give some hints that many of the laborious market research tasks that are done by the sales force, which uses this sales management system, can improve their insight about their partners and by that increase the quality of their sales services and their overall results.

Analysis of patterns in mining time series using Qlucore

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Raw measurement data does not always immediately convey useful information, but applying mathematical statistical analysis tools into measurement data can improve the situation. Data analysis can offer benefits like acquiring meaningful insight from the dataset, basing critical decisions on the findings, and ruling out human bias through proper statistical treatment. In this thesis we analyze data from an industrial mineral processing plant with the aim of studying the possibility of forecasting the quality of the final product, given by one variable, with a model based on the other variables. For the study mathematical tools like Qlucore Omics Explorer (QOE) and Sparse Bayesian regression (SB) are used. Later on, linear regression is used to build a model based on a subset of variables that seem to have most significant weights in the SB model. The results obtained from QOE show that the variable representing the desired final product does not correlate with other variables. For SB and linear regression, the results show that both SB and linear regression models built on 1-day averaged data seriously underestimate the variance of true data, whereas the two models built on 1-month averaged data are reliable and able to explain a larger proportion of variability in the available data, making them suitable for prediction purposes. However, it is concluded that no single model can fit well the whole available dataset and therefore, it is proposed for future work to make piecewise non linear regression models if the same available dataset is used, or the plant to provide another dataset that should be collected in a more systematic fashion than the present data for further analysis.

«
1
2
3
»