941 resultados para Visual Speaker Recognition, Visual Speech Recognition, Cascading Appearance-Based Features


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Previous work has reported that it is not difficult to give people the illusion of ownership over an artificial body, providing a powerful tool for the investigation of the neural and cognitive mechanisms underlying body perception and self consciousness. We present an experimental study that uses immersive virtual reality (IVR) focused on identifying the perceptual building blocks of this illusion. We systematically manipulated visuotactile and visual sensorimotor contingencies, visual perspective, and the appearance of the virtual body in order to assess their relative role and mutual interaction. Consistent results from subjective reports and physiological measures showed that a first person perspective over a fake humanoid body is essential for eliciting a body ownership illusion. We found that the illusion of ownership can be generated when the virtual body has a realistic skin tone and spatially substitutes the real body seen from a first person perspective. In this case there is no need for an additional contribution of congruent visuotactile or sensorimotor cues. Additionally, we found that the processing of incongruent perceptual cues can be modulated by the level of the illusion: when the illusion is strong, incongruent cues are not experienced as incorrect. Participants exposed to asynchronous visuotactile stimulation can experience the ownership illusion and perceive touch as originating from an object seen to contact the virtual body. Analogously, when the level of realism of the virtual body is not high enough and/or when there is no spatial overlap between the two bodies, then the contribution of congruent multisensory and/or sensorimotor cues is required for evoking the illusion. On the basis of these results and inspired by findings from neurophysiological recordings in the monkey, we propose a model that accounts for many of the results reported in the literature.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We report two unrelated patients with a multisystem disease involving liver, eye, immune system, connective tissue, and bone, caused by biallelic mutations in the neuroblastoma amplified sequence (NBAS) gene. Both presented as infants with recurrent episodes triggered by fever with vomiting, dehydration, and elevated transaminases. They had frequent infections, hypogammaglobulinemia, reduced natural killer cells, and the Pelger-Huët anomaly of their granulocytes. Their facial features were similar with a pointed chin and proptosis; loose skin and reduced subcutaneous fat gave them a progeroid appearance. Skeletal features included short stature, slender bones, epiphyseal dysplasia with multiple phalangeal pseudo-epiphyses, and small C1-C2 vertebrae causing cervical instability and myelopathy. Retinal dystrophy and optic atrophy were present in one patient. NBAS is a component of the synthaxin-18 complex and is involved in nonsense-mediated mRNA decay control. Putative loss-of-function mutations in NBAS are already known to cause disease in humans. A specific founder mutation has been associated with short stature, optic nerve atrophy and Pelger-Huët anomaly of granulocytes (SOPH) in the Siberian Yakut population. A more recent report associates NBAS mutations with recurrent acute liver failure in infancy in a group of patients of European descent. Our observations indicate that the phenotypic spectrum of NBAS deficiency is wider than previously known and includes skeletal, hepatic, metabolic, and immunologic aspects. Early recognition of the skeletal phenotype is important for preventive management of cervical instability. © 2015 Wiley Periodicals, Inc.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation considers the segmental durations of speech from the viewpoint of speech technology, especially speech synthesis. The idea is that better models of segmental durations lead to higher naturalness and better intelligibility. These features are the key factors for better usability and generality of synthesized speech technology. Even though the studies are based on a Finnish corpus the approaches apply to all other languages as well. This is possibly due to the fact that most of the studies included in this dissertation are about universal effects taking place on utterance boundaries. Also the methods invented and used here are suitable for any other study of another language. This study is based on two corpora of news reading speech and sentences read aloud. The other corpus is read aloud by a 39-year-old male, whilst the other consists of several speakers in various situations. The use of two corpora is twofold: it involves a comparison of the corpora and a broader view on the matters of interest. The dissertation begins with an overview to the phonemes and the quantity system in the Finnish language. Especially, we are covering the intrinsic durations of phonemes and phoneme categories, as well as the difference of duration between short and long phonemes. The phoneme categories are presented to facilitate the problem of variability of speech segments. In this dissertation we cover the boundary-adjacent effects on segmental durations. In initial positions of utterances we find that there seems to be initial shortening in Finnish, but the result depends on the level of detail and on the individual phoneme. On the phoneme level we find that the shortening or lengthening only affects the very first ones at the beginning of an utterance. However, on average, the effect seems to shorten the whole first word on the word level. We establish the effect of final lengthening in Finnish. The effect in Finnish has been an open question for a long time, whilst Finnish has been the last missing piece for it to be a universal phenomenon. Final lengthening is studied from various angles and it is also shown that it is not a mere effect of prominence or an effect of speech corpus with high inter- and intra-speaker variation. The effect of final lengthening seems to extend from the final to the penultimate word. On a phoneme level it reaches a much wider area than the initial effect. We also present a normalization method suitable for corpus studies on segmental durations. The method uses an utterance-level normalization approach to capture the pattern of segmental durations within each utterance. This prevents the impact of various problematic variations within the corpora. The normalization is used in a study on final lengthening to show that the results on the effect are not caused by variation in the material. The dissertation shows an implementation and prowess of speech synthesis on a mobile platform. We find that the rule-based method of speech synthesis is a real-time software solution, but the signal generation process slows down the system beyond real time. Future aspects of speech synthesis on limited platforms are discussed. The dissertation considers ethical issues on the development of speech technology. The main focus is on the development of speech synthesis with high naturalness, but the problems and solutions are applicable to any other speech technology approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Response Surface Methodology (RSM) was applied to evaluate the chromatic features and sensory acceptance of emulsions that combine Soy Protein (SP) and red Guava Juice (GJ). The parameters analyzed were: instrumental color based on the coordinates a* (redness), b* (yellowness), L* (lightness), C* (chromaticity), h* (hue angle), visual color, acceptance, and appearance. The analyses of the results showed that GJ was responsible for the high measured values of red color, hue angle, chromaticity, acceptance, and visual color, whereas SP was the variable that increased the yellowness intensity of the assays. The redness (R²adj = 74.86%, p < 0.01) and hue angle (R²adj = 80.96%, p < 0.01) were related to the independent variables by linear models, while the sensory data (color and acceptance) could not be modeled due to a high variability. The models of yellowness, lightness, and chromaticity did not present lack of fit but presented adjusted determination coefficients bellow 70%. Notwithstanding, the linear correlations between sensory and instrumental data were not significant (p > 0.05) and low Pearson coefficients were obtained. The results showed that RSM is a useful tool to develop soy-based emulsions and model some chromatic features of guava-based emulsions through RSM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Biometrics is an efficient technology with great possibilities in the area of security system development for official and commercial applications. The biometrics has recently become a significant part of any efficient person authentication solution. The advantage of using biometric traits is that they cannot be stolen, shared or even forgotten. The thesis addresses one of the emerging topics in Authentication System, viz., the implementation of Improved Biometric Authentication System using Multimodal Cue Integration, as the operator assisted identification turns out to be tedious, laborious and time consuming. In order to derive the best performance for the authentication system, an appropriate feature selection criteria has been evolved. It has been seen that the selection of too many features lead to the deterioration in the authentication performance and efficiency. In the work reported in this thesis, various judiciously chosen components of the biometric traits and their feature vectors are used for realizing the newly proposed Biometric Authentication System using Multimodal Cue Integration. The feature vectors so generated from the noisy biometric traits is compared with the feature vectors available in the knowledge base and the most matching pattern is identified for the purpose of user authentication. In an attempt to improve the success rate of the Feature Vector based authentication system, the proposed system has been augmented with the user dependent weighted fusion technique.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an enhanced hypothesis verification strategy for 3D object recognition. A new learning methodology is presented which integrates the traditional dichotomic object-centred and appearance-based representations in computer vision giving improved hypothesis verification under iconic matching. The "appearance" of a 3D object is learnt using an eigenspace representation obtained as it is tracked through a scene. The feature representation implicitly models the background and the objects observed enabling the segmentation of the objects from the background. The method is shown to enhance model-based tracking, particularly in the presence of clutter and occlusion, and to provide a basis for identification. The unified approach is discussed in the context of the traffic surveillance domain. The approach is demonstrated on real-world image sequences and compared to previous (edge-based) iconic evaluation techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a novel, fast and accurate appearance-based method for infrared face recognition. By introducing the Optimum-Path Forest classifier, our objective is to get good recognition rates and effectively reduce the computational effort. The feature extraction procedure is carried out by PCA, and the results are compared to two other well known supervised learning classifiers; Artificial Neural Networks and Support Vector Machines. The achieved performance asserts the promise of the proposed framework. ©2009 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pós-graduação em Letras - FCLAS

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este texto é parte das reflexões teóricas do nosso Pós-doutorado realizado junto ao Laboratório de Antropologia Visual da Universidade Aberta de Portugal que abordou aspectos interculturais do estudo fotoetnográfico da publicidade e do consumo alimentar no Brasil e em Portugal. Aqui serão ressaltados os aspectos referentes às contribuições da semiótica para o estudo das comunicações publicitárias de alimentos. A proposta é entender os modelos de análise semiótica da publicidade como um meio de operacionalização da descrição densa, na perspectiva etnográfica, a partir da interface interdisciplinar com a produção de sentido das imagens publicitárias, no campo da alimentação, apresentado a análise de um anúncio do azeite Gallo como exemplo.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present thesis can be divided in three main parts. In all parts new polymer architecturesrnwere synthesized and characterized concerning their special features.rnThe first part will emphasize the advantage of a polystyrene-block-(hyperbranchedrnpolyglycerol) copolymer in comparison to an analogue polystyrene-block-(linear polyglycerol)rncopolymer. Therefore a synthethic route to prepare linear block copolymersrnhas been developed. Two strategies were examined. One strategy was based on thernclassic, sequential anionic polymerization; the second strategy was based on arn“Click-Chemistry” coupling reaction. In a following step glycidol was hypergraftedrnfrom these block copolymers by applying a hypergrafting reaction with glycidol. Thernbehavior of the amphiphilic block copolymers synthesized was studied in differentrnsolvents. Furthermore the polarity of the solvent was changed to form the correspondingrninverse micelles. DLS, SLS, SEC-MALLS-VISCO, AFM and Cyro TEMrnmeasurements were performed to obtain a visual image from the appearance of thernaggregates. It was found that a linear-hyperbranched architecture is necessary, ifrnwell defined, monodisperse aggregates are required, e.g. for the preparation of orderedrnnanoarrays. Linear-linear block copolymers formed only polydisperse aggregates.rnAdditionally it was found that size distribution could be improved dramaticallyrnby passing the aggregates through a SEC column with large pores. The SEC columnsrnacted like a template in which the aggregates adopt a more stable conformation.rnIn the second part anionic polymerization was employed to synthesize silaneendfunctionalizedrnmacromonomers with different molecular weights based on polybutadienernand polyisoprene. These were polymerized by a hydrosilylation reaction inrnbulk to obtain branched polymers, using Karstedt’s catalyst. Surprisingly the additionrnof monofunctional silanes during the polymerization had only a minimal effect concerningrnthe degree of polymerization. It was possible to introduce silanes without increasingrnthe overall number of reaction steps by a very convenient “pseudo-copolymerization”rnmethod. All branched polymers were analyzed by SEC, SEC-MALLS,rnSEC-viscometry, 1H-NMR-spectroscopy and DSC concerning their branching ratio.rnThe branching parameters for the branched polymers exhibited similar characteristicsrnas hyperbranched polymers based on AB2 monomers. Detailed kinetic study showedrnthat the polymerization occurred very rapidly in comparison to the hydrosilylation polymerizationrnof classical AB2 type carbosilanes monomers.rnThe last part will deal with ferrocenyl-functionalized polymers. On the one hand,rnferrocenyl-functionalized polyglycerols (PG) were studied. Esterification of PGs withrndifferent molecular weight using ferrocenemonocarboxylic acid gave the ferrocenylrnfuntionalized polymers in high yields. On the other hand three different block copolymersrnwere prepared with different ratios of styrene to butadiene units (10:1, 4:1, 2:1).rnThe double bonds of the 1,2-PB block were hydrosilylated using silanes bearing onern(HSiMe2Fc) or two (HSiMeFc2) ferrocene units. High degrees of functionalizationrnwere obtained (up to 83 %). In this manner, six different ferrocenyl-rich block copolymersrnwith different fractions of ferrocene were prepared and analyzed, employingrnNMR-spectroscopy, SEC, SEC/MALLS/viscometry, DLS and cyclic voltammetry. Thernredox properties of the studied polymers varied primarily with the nature of the silanernunit attached. Additionally, the redox properties in solution of the studied polymersrnwere influenced by the block length ratio of the block copolymers. Unexpectedly, withrnincreasing block length of the ferrocenyl block the fraction of active ferrocenes decreased.rnNevertheless, in case of thin monolayer films this behaviour was not observed.rnAll polymers (PG and PS-b-PB based) exhibited good electrochemical propertiesrnin a wide range of solvents, which rendered them very interesting for biosensoricrnapplications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PURPOSE: This study evaluated the long-term effect of pars plana vitrectomy (PPV) in children and adolescents with chronic uveitis on visual function, anatomical outcome, and the requirement of systemic treatment. Further, predictive preoperative factors associated with a beneficial visual outcome were assessed. METHODS: Retrospective review of 29 eyes of 23 consecutive paediatric and juvenile patients below 20 years of age with chronic uveitis who underwent a PPV for visually significant opacities in 25 eyes, vitreous haemorrhage in three eyes, and retinal detachment in one eye. The clinical diagnosis was chronic intermediate uveitis in 22 eyes and retinal vasculitis of different origin in seven eyes. RESULTS: LogMAR visual acuity improved from an average of 0.91 to 0.33 (P<0.001). Cystoid macular oedema (CME) was significantly reduced in eight of 10 eyes postoperatively (P=0.021). In the multiple regression analysis, a low preoperative logMAR visual acuity and the presence of a CME had a negative influence on the final logMAR visual acuity. Furthermore, the appearance of chronic uveitis relapses was significantly reduced from 15 eyes before to seven eyes after surgery (P=0.042). CONCLUSIONS: PPV has a beneficial effect on the course and the complications of chronic uveitis in paediatric and juvenile patients with respect to the anatomical and visual outcome. Preoperative logMAR visual acuity and clinically significant CME were the most accurate predictors for the functional outcome.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of automatic pathological voice detection systems is to serve as tools, to medical specialists, for a more objective, less invasive and improved diagnosis of diseases. In this respect, the gold standard for those system include the usage of a optimized representation of the spectral envelope, either based on cepstral coefficients from the mel-scaled Fourier spectral envelope (Mel-Frequency Cepstral Coefficients) or from an all-pole estimation (Linear Prediction Coding Cepstral Coefficients) forcharacterization, and Gaussian Mixture Models for posterior classification. However, the study of recently proposed GMM-based classifiers as well as Nuisance mitigation techniques, such as those employed in speaker recognition, has not been widely considered inpathology detection labours. The present work aims at testing whether or not the employment of such speaker recognition tools might contribute to improve system performance in pathology detection systems, specifically in the automatic detection of Obstructive Sleep Apnea. The testing procedure employs an Obstructive Sleep Apnea database, in conjunction with GMM-based classifiers looking for a better performance. The results show that an improved performance might be obtained by using such approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose an innovative method for the automatic detection and tracking of road traffic signs using an onboard stereo camera. It involves a combination of monocular and stereo analysis strategies to increase the reliability of the detections such that it can boost the performance of any traffic sign recognition scheme. Firstly, an adaptive color and appearance based detection is applied at single camera level to generate a set of traffic sign hypotheses. In turn, stereo information allows for sparse 3D reconstruction of potential traffic signs through a SURF-based matching strategy. Namely, the plane that best fits the cloud of 3D points traced back from feature matches is estimated using a RANSAC based approach to improve robustness to outliers. Temporal consistency of the 3D information is ensured through a Kalman-based tracking stage. This also allows for the generation of a predicted 3D traffic sign model, which is in turn used to enhance the previously mentioned color-based detector through a feedback loop, thus improving detection accuracy. The proposed solution has been tested with real sequences under several illumination conditions and in both urban areas and highways, achieving very high detection rates in challenging environments, including rapid motion and significant perspective distortion

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A partir de los grabados publicados en las páginas de los periódicos que incluyeron humor gráfico en sus ediciones durante la Guerra del Pacífico (1879 - 1883), los caricaturistas chilenos desplegaron un discurso visual agresivo en clave patriótica y belicista, donde presentaron a sus lectores una imagen crítica y despectiva respecto de los adversarios de Chile. Recalcaron la supuesta falta de ánimo y valor combativo, ante la sola presencia de los efectivos militares chilenos tanto en el mar como en tierra. Así, la tinta y el papel, se transformaron en otra de las armas que intervinieron en el conflicto de Chile contra el Perú y Bolivia por la posesión de los ricos territorios salitreros de Tarapacá y Antofagasta. Las imágenes fueron interpretadas a partir de los postulados de la Escuela de Warburg, en especial los de Erwin Panofsky, que propone tres niveles de estudio del significado de cada obra, a saber, la “descripción preiconográfica”, luego el “estudio iconográfico” en cuanto tal y, finalmente, la “interpretación iconológica”.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the rise of smart phones, lifelogging devices (e.g. Google Glass) and popularity of image sharing websites (e.g. Flickr), users are capturing and sharing every aspect of their life online producing a wealth of visual content. Of these uploaded images, the majority are poorly annotated or exist in complete semantic isolation making the process of building retrieval systems difficult as one must firstly understand the meaning of an image in order to retrieve it. To alleviate this problem, many image sharing websites offer manual annotation tools which allow the user to “tag” their photos, however, these techniques are laborious and as a result have been poorly adopted; Sigurbjörnsson and van Zwol (2008) showed that 64% of images uploaded to Flickr are annotated with < 4 tags. Due to this, an entire body of research has focused on the automatic annotation of images (Hanbury, 2008; Smeulders et al., 2000; Zhang et al., 2012a) where one attempts to bridge the semantic gap between an image’s appearance and meaning e.g. the objects present. Despite two decades of research the semantic gap still largely exists and as a result automatic annotation models often offer unsatisfactory performance for industrial implementation. Further, these techniques can only annotate what they see, thus ignoring the “bigger picture” surrounding an image (e.g. its location, the event, the people present etc). Much work has therefore focused on building photo tag recommendation (PTR) methods which aid the user in the annotation process by suggesting tags related to those already present. These works have mainly focused on computing relationships between tags based on historical images e.g. that NY and timessquare co-exist in many images and are therefore highly correlated. However, tags are inherently noisy, sparse and ill-defined often resulting in poor PTR accuracy e.g. does NY refer to New York or New Year? This thesis proposes the exploitation of an image’s context which, unlike textual evidences, is always present, in order to alleviate this ambiguity in the tag recommendation process. Specifically we exploit the “what, who, where, when and how” of the image capture process in order to complement textual evidences in various photo tag recommendation and retrieval scenarios. In part II, we combine text, content-based (e.g. # of faces present) and contextual (e.g. day-of-the-week taken) signals for tag recommendation purposes, achieving up to a 75% improvement to precision@5 in comparison to a text-only TF-IDF baseline. We then consider external knowledge sources (i.e. Wikipedia & Twitter) as an alternative to (slower moving) Flickr in order to build recommendation models on, showing that similar accuracy could be achieved on these faster moving, yet entirely textual, datasets. In part II, we also highlight the merits of diversifying tag recommendation lists before discussing at length various problems with existing automatic image annotation and photo tag recommendation evaluation collections. In part III, we propose three new image retrieval scenarios, namely “visual event summarisation”, “image popularity prediction” and “lifelog summarisation”. In the first scenario, we attempt to produce a rank of relevant and diverse images for various news events by (i) removing irrelevant images such memes and visual duplicates (ii) before semantically clustering images based on the tweets in which they were originally posted. Using this approach, we were able to achieve over 50% precision for images in the top 5 ranks. In the second retrieval scenario, we show that by combining contextual and content-based features from images, we are able to predict if it will become “popular” (or not) with 74% accuracy, using an SVM classifier. Finally, in chapter 9 we employ blur detection and perceptual-hash clustering in order to remove noisy images from lifelogs, before combining visual and geo-temporal signals in order to capture a user’s “key moments” within their day. We believe that the results of this thesis show an important step towards building effective image retrieval models when there lacks sufficient textual content (i.e. a cold start).