36 resultados para Invariant Object Recognition
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
Perceiving the world visually is a basic act for humans, but for computers it is still an unsolved problem. The variability present innatural environments is an obstacle for effective computer vision. The goal of invariant object recognition is to recognise objects in a digital image despite variations in, for example, pose, lighting or occlusion. In this study, invariant object recognition is considered from the viewpoint of feature extraction. Thedifferences between local and global features are studied with emphasis on Hough transform and Gabor filtering based feature extraction. The methods are examined with respect to four capabilities: generality, invariance, stability, and efficiency. Invariant features are presented using both Hough transform and Gabor filtering. A modified Hough transform technique is also presented where the distortion tolerance is increased by incorporating local information. In addition, methods for decreasing the computational costs of the Hough transform employing parallel processing and local information are introduced.
Resumo:
Local features are used in many computer vision tasks including visual object categorization, content-based image retrieval and object recognition to mention a few. Local features are points, blobs or regions in images that are extracted using a local feature detector. To make use of extracted local features the localized interest points are described using a local feature descriptor. A descriptor histogram vector is a compact representation of an image and can be used for searching and matching images in databases. In this thesis the performance of local feature detectors and descriptors is evaluated for object class detection task. Features are extracted from image samples belonging to several object classes. Matching features are then searched using random image pairs of a same class. The goal of this thesis is to find out what are the best detector and descriptor methods for such task in terms of detector repeatability and descriptor matching rate.
Resumo:
Many industrial applications need object recognition and tracking capabilities. The algorithms developed for those purposes are computationally expensive. Yet ,real time performance, high accuracy and small power consumption are essential measures of the system. When all these requirements are combined, hardware acceleration of these algorithms becomes a feasible solution. The purpose of this study is to analyze the current state of these hardware acceleration solutions, which algorithms have been implemented in hardware and what modifications have been done in order to adapt these algorithms to hardware.
Resumo:
The usage of digital content, such as video clips and images, has increased dramatically during the last decade. Local image features have been applied increasingly in various image and video retrieval applications. This thesis evaluates local features and applies them to image and video processing tasks. The results of the study show that 1) the performance of different local feature detector and descriptor methods vary significantly in object class matching, 2) local features can be applied in image alignment with superior results against the state-of-the-art, 3) the local feature based shot boundary detection method produces promising results, and 4) the local feature based hierarchical video summarization method shows promising new new research direction. In conclusion, this thesis presents the local features as a powerful tool in many applications and the imminent future work should concentrate on improving the quality of the local features.
Resumo:
Human activity recognition in everyday environments is a critical, but challenging task in Ambient Intelligence applications to achieve proper Ambient Assisted Living, and key challenges still remain to be dealt with to realize robust methods. One of the major limitations of the Ambient Intelligence systems today is the lack of semantic models of those activities on the environment, so that the system can recognize the speci c activity being performed by the user(s) and act accordingly. In this context, this thesis addresses the general problem of knowledge representation in Smart Spaces. The main objective is to develop knowledge-based models, equipped with semantics to learn, infer and monitor human behaviours in Smart Spaces. Moreover, it is easy to recognize that some aspects of this problem have a high degree of uncertainty, and therefore, the developed models must be equipped with mechanisms to manage this type of information. A fuzzy ontology and a semantic hybrid system are presented to allow modelling and recognition of a set of complex real-life scenarios where vagueness and uncertainty are inherent to the human nature of the users that perform it. The handling of uncertain, incomplete and vague data (i.e., missing sensor readings and activity execution variations, since human behaviour is non-deterministic) is approached for the rst time through a fuzzy ontology validated on real-time settings within a hybrid data-driven and knowledgebased architecture. The semantics of activities, sub-activities and real-time object interaction are taken into consideration. The proposed framework consists of two main modules: the low-level sub-activity recognizer and the high-level activity recognizer. The rst module detects sub-activities (i.e., actions or basic activities) that take input data directly from a depth sensor (Kinect). The main contribution of this thesis tackles the second component of the hybrid system, which lays on top of the previous one, in a superior level of abstraction, and acquires the input data from the rst module's output, and executes ontological inference to provide users, activities and their in uence in the environment, with semantics. This component is thus knowledge-based, and a fuzzy ontology was designed to model the high-level activities. Since activity recognition requires context-awareness and the ability to discriminate among activities in di erent environments, the semantic framework allows for modelling common-sense knowledge in the form of a rule-based system that supports expressions close to natural language in the form of fuzzy linguistic labels. The framework advantages have been evaluated with a challenging and new public dataset, CAD-120, achieving an accuracy of 90.1% and 91.1% respectively for low and high-level activities. This entails an improvement over both, entirely data-driven approaches, and merely ontology-based approaches. As an added value, for the system to be su ciently simple and exible to be managed by non-expert users, and thus, facilitate the transfer of research to industry, a development framework composed by a programming toolbox, a hybrid crisp and fuzzy architecture, and graphical models to represent and con gure human behaviour in Smart Spaces, were developed in order to provide the framework with more usability in the nal application. As a result, human behaviour recognition can help assisting people with special needs such as in healthcare, independent elderly living, in remote rehabilitation monitoring, industrial process guideline control, and many other cases. This thesis shows use cases in these areas.
Resumo:
The problem of automatic recognition of the fish from the video sequences is discussed in this Master’s Thesis. This is a very urgent issue for many organizations engaged in fish farming in Finland and Russia because the process of automation control and counting of individual species is turning point in the industry. The difficulties and the specific features of the problem have been identified in order to find a solution and propose some recommendations for the components of the automated fish recognition system. Methods such as background subtraction, Kalman filtering and Viola-Jones method were implemented during this work for detection, tracking and estimation of fish parameters. Both the results of the experiments and the choice of the appropriate methods strongly depend on the quality and the type of a video which is used as an input data. Practical experiments have demonstrated that not all methods can produce good results for real data, whereas on synthetic data they operate satisfactorily.
Resumo:
In this thesis, we propose to infer pixel-level labelling in video by utilising only object category information, exploiting the intrinsic structure of video data. Our motivation is the observation that image-level labels are much more easily to be acquired than pixel-level labels, and it is natural to find a link between the image level recognition and pixel level classification in video data, which would transfer learned recognition models from one domain to the other one. To this end, this thesis proposes two domain adaptation approaches to adapt the deep convolutional neural network (CNN) image recognition model trained from labelled image data to the target domain exploiting both semantic evidence learned from CNN, and the intrinsic structures of unlabelled video data. Our proposed approaches explicitly model and compensate for the domain adaptation from the source domain to the target domain which in turn underpins a robust semantic object segmentation method for natural videos. We demonstrate the superior performance of our methods by presenting extensive evaluations on challenging datasets comparing with the state-of-the-art methods.
Resumo:
This thesis is about detection of local image features. The research topic belongs to the wider area of object detection, which is a machine vision and pattern recognition problem where an object must be detected (located) in an image. State-of-the-art object detection methods often divide the problem into separate interest point detection and local image description steps, but in this thesis a different technique is used, leading to higher quality image features which enable more precise localization. Instead of using interest point detection the landmark positions are marked manually. Therefore, the quality of the image features is not limited by the interest point detection phase and the learning of image features is simplified. The approach combines both interest point detection and local description into one phase for detection. Computational efficiency of the descriptor is therefore important, leaving out many of the commonly used descriptors as unsuitably heavy. Multiresolution Gabor features has been the main descriptor in this thesis and improving their efficiency is a significant part. Actual image features are formed from descriptors by using a classifierwhich can then recognize similar looking patches in new images. The main classifier is based on Gaussian mixture models. Classifiers are used in one-class classifier configuration where there are only positive training samples without explicit background class. The local image feature detection method has been tested with two freely available face detection databases and a proprietary license plate database. The localization performance was very good in these experiments. Other applications applying the same under-lying techniques are also presented, including object categorization and fault detection.
Resumo:
Tässä insinöörityössä esitellään Stadian verkkoviestinnän VIDEOS-hankkeeseen liittyvän web-pohjaisen videoeditorin kehitys ja käytetyt teknologiat. Fooga-nimiseksi nimetty videoeditorin käyttämät tekniikat ovat Ruby, Ruby on Rails, FFmpeg, Mencoder, ImageMagick ja FLVTool2. Ruby on olio-pohjainen skriptikieli, Ruby on Rails on websovelluskehys ja muut tekniikat ovat komentorivipohjaisia työkaluja, jotka tarjoavat tärkeimmät toiminnallisuudet Foogalle. Tavoitteina oli tämän työn yhteydessä ohjelmoida Foogaan perustoiminnallisuudet, jotka mahdollistavat minimaaliset käyttömahdollisuudet kevääseen 2007 mennessä. Kehitystyö jatkuu vuoteen 2009 asti tarjoamalla samalla mahdollisuuden usealle insinöörityölle tekniikan ja liikenteen koulutusohjelmasta. Tämän lisäksi tässä insinöörityössä perehdytään Object-Relational Mapping-tekniikan perusteisiiin ja verrataan Ruby on Railsin ja Javan ORM-ominaisuuksia. Ruby on Railsin osalta esitellään ActiveRecord-luokka ja Javan osalta Hibernate, jonka johdantona on DAO/DTO-sunnittelumalli.
Resumo:
Tässä työssä on esitetty sen ohjelmiston kehittämisen prosessi, joka on tarkoitettu annettavien palveluiden valvottavaksi käyttäen prototyyppimallia. Raportti sisältää vaatimusten, kohteisiin suunnatun analyysin ja suunnittelun, realisointiprosessien kuvauksen ja prototyypin testauksen. Ohjelmiston käyttöala – antavien palveluiden valvonta. Vaatimukset sovellukselle analysoitiin ohjelmistomarkkinoiden perusteella sekä ohjelmiston engineeringin periaatteiden mukaisesti. Ohjelmiston prototyyppi on realisoitu käyttäen asiakas-/palvelinhybridimallia sekä ralaatiokantaa. Kehitetty ohjelmisto on tarkoitettu venäläisille tietokonekerhoille, jotka erikoistuvat pelipalvelinten antamiseen.
Resumo:
Tämä diplomityökuuluu tietoliikenneverkkojen suunnittelun tutkimukseen ja pohjimmiltaan kohdistuu verkon mallintamiseen. Tietoliikenneverkkojen suunnittelu on monimutkainen ja vaativa ongelma, joka sisältää mutkikkaita ja aikaa vieviä tehtäviä. Tämä diplomityö esittelee ”monikerroksisen verkkomallin”, jonka tarkoitus on auttaa verkon suunnittelijoita selviytymään ongelmien monimutkaisuudesta ja vähentää verkkojen suunnitteluun kuluvaa aikaa. Monikerroksinen verkkomalli perustuu yleisille objekteille, jotka ovat yhteisiä kaikille tietoliikenneverkoille. Tämä tekee mallista soveltuvan mielivaltaisille verkoille, välittämättä verkkokohtaisista ominaisuuksista tai verkon toteutuksessa käytetyistä teknologioista. Malli määrittelee tarkan terminologian ja käyttää kolmea käsitettä: verkon jakaminen tasoihin (plane separation), kerrosten muodostaminen (layering) ja osittaminen (partitioning). Nämä käsitteet kuvataan yksityiskohtaisesti tässä työssä. Monikerroksisen verkkomallin sisäinen rakenne ja toiminnallisuus ovat määritelty käyttäen Unified Modelling Language (UML) -notaatiota. Tämä työ esittelee mallin use case- , paketti- ja luokkakaaviot. Diplomityö esittelee myös tulokset, jotka on saatu vertailemalla monikerroksista verkkomallia muihin verkkomalleihin. Tulokset osoittavat, että monikerroksisella verkkomallilla on etuja muihin malleihin verrattuna.
Resumo:
The main objective of the study is to form a framework that provides tools to recognise and classify items whose demand is not smooth but varies highly on size and/or frequency. The framework will then be combined with two other classification methods in order to form a three-dimensional classification model. Forecasting and inventory control of these abnormal demand items is difficult. Therefore another object of this study is to find out which statistical forecasting method is most suitable for forecasting of abnormal demand items. The accuracy of different methods is measured by comparing the forecast to the actual demand. Moreover, the study also aims at finding proper alternatives to the inventory control of abnormal demand items. The study is quantitative and the methodology is a case study. The research methods consist of theory, numerical data, current state analysis and testing of the framework in case company. The results of the study show that the framework makes it possible to recognise and classify the abnormal demand items. It is also noticed that the inventory performance of abnormal demand items differs significantly from the performance of smoothly demanded items. This makes the recognition of abnormal demand items very important.
Resumo:
This Master's thesis addresses the design and implementation of the optical character recognition (OCR) system for a mobile device working on the Symbian operating system. The developed OCR system, named OCRCapriccio, emphasizes the modularity, effective extensibility and reuse. The system consists of two parts which are the graphical user interface and the OCR engine that was implemented as a plug-in. In fact, the plug-in includes two implementations of the OCR engine for enabling two types of recognition: the bitmap comparison based recognition and statistical recognition. The implementation results have shown that the approach based on bitmap comparison is more suitable for the Symbian environment because of its nature. Although the current implementation of bitmap comparison is lacking in accuracy, further development should be done in its direction. The biggest challenges of this work were related to developing an OCR scheme that would be suitable for Symbian OS Smartphones that have limited computational power and restricted resources.
Resumo:
The number of digital images has been increasing exponentially in the last few years. People have problems managing their image collections and finding a specific image. An automatic image categorization system could help them to manage images and find specific images. In this thesis, an unsupervised visual object categorization system was implemented to categorize a set of unknown images. The system is unsupervised, and hence, it does not need known images to train the system which needs to be manually obtained. Therefore, the number of possible categories and images can be huge. The system implemented in the thesis extracts local features from the images. These local features are used to build a codebook. The local features and the codebook are then used to generate a feature vector for an image. Images are categorized based on the feature vectors. The system is able to categorize any given set of images based on the visual appearance of the images. Images that have similar image regions are grouped together in the same category. Thus, for example, images which contain cars are assigned to the same cluster. The unsupervised visual object categorization system can be used in many situations, e.g., in an Internet search engine. The system can categorize images for a user, and the user can then easily find a specific type of image.