941 resultados para Visual Speaker Recognition, Visual Speech Recognition, Cascading Appearance-Based Features
Resumo:
The media tends to represent female athletes as women first and athletes second (Koivula, 1 999). The present study investigated whether this same trend was present for female sportscasters, using a self-presentational framework. Self-presentation is the process by which people try to control how others see them (Leary, 1995). One factor that may influence the type of image they try to project is their roles held in society, including gender roles. The gender roles for a man include dominance, assertiveness, and masculinity, while the gender roles for a woman include nurturer, femininity, and attractiveness (Deaux & Major, 1 987). By contrast, sports broadcasters are expected to be knowledgeable, assertive, and competent. Research suggests that female sports broadcasters are seen as less competent and less persuasive than male sports broadcasters (Mitrook & Dorr, 2001; Ordman & Zillmann, 1994, Toro, 2005). One reason for this difference may be that the gender roles for a man are much more similar to those of a sportscaster, compared to those of a woman. Thus, there may be a conflict between the two roles for women. The present study investigated whether the gender and perceived attractiveness of sportscasters influenced the audience's perceptions of the level of competence that a sportscaster demonstrates. Two hundred and four male (n =75) and female (n =129) undergraduate students were recruited from a southern Ontario university to participate in the study. The average age of the male participants was 21 .23 years {SD =1 .60), and the average age for female participants was 20.67 years {SD = 1 .31). The age range for all participants was from 19 to 30 years {M = 20.87 years, SD = 1 .45). Af^er providing informed consent, participants randomly received one of four possible questionnaire packages. The participants answered the demographic questionnaire, and then proceeded to view the picture and read the script of a sports newscast. Next, based on the picture and script, the participants answered the competence questionnaire, assessing the general, sport specific, and overall competence of the sportscaster. Once participants had finished, they returned the package to the researcher and were thanked for their time. Data was analyzed using an ANOVA to determine if general sport competence differs with respect to gender and attractiveness of the sportscaster. Overall, the ANOVA was non-significant (p > .05), indicating no differences on the dependent variable based on gender (F (3, 194) = .631, p = .426), attractiveness (F (3, 194) = .070, p = .791), or the interaction of the two {F (3, 194) = .043,/? = .836). Although none of the study hypotheses were supported, the study provided some insight to the perceived competence of female sportscasters. It is possible that female sportscasters are now seen as competent in the area of sports. Sample characteristics could also have influenced these results; the participants in the current study were primarily physical education and kinesiology students, who had experience participating in physical activity with both men and women. Future research should investigate this issue further by using a video sportscast. It is possible that delivery characteristics such as voice quality or eye contact may also impact perceptions of sportscasters.
Resumo:
In this paper, we described how a multidimensional wavelet neural networks based on Polynomial Powers of Sigmoid (PPS) can be constructed, trained and applied in image processing tasks. In this sense, a novel and uniform framework for face verification is presented. The framework is based on a family of PPS wavelets,generated from linear combination of the sigmoid functions, and can be considered appearance based in that features are extracted from the face image. The feature vectors are then subjected to subspace projection of PPS-wavelet. The design of PPS-wavelet neural networks is also discussed, which is seldom reported in the literature. The Stirling Universitys face database were used to generate the results. Our method has achieved 92 % of correct detection and 5 % of false detection rate on the database.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
A set of algorithms, which allows a computer to determine the answers of simulated patients during pure tone and speech audiometry, is presented. Based on these algorithms, a computer program for training in audiometry was written and found to be useful for teaching purposes.
Resumo:
In this paper we present an innovative technique to tackle the problem of automatic road sign detection and tracking using an on-board stereo camera. It involves a continuous 3D analysis of the road sign during the whole tracking process. Firstly, a color and appearance based model is applied to generate road sign candidates in both stereo images. A sparse disparity map between the left and right images is then created for each candidate by using contour-based and SURF-based matching in the far and short range, respectively. Once the map has been computed, the correspondences are back-projected to generate a cloud of 3D points, and the best-fit plane is computed through RANSAC, ensuring robustness to outliers. Temporal consistency is enforced by means of a Kalman filter, which exploits the intrinsic smoothness of the 3D camera motion in traffic environments. Additionally, the estimation of the plane allows to correct deformations due to perspective, thus easing further sign classification.
Resumo:
Evidence-based practice has become the dominant paradigm in the delivery of rehabilitation programme. However, occupational therapists in Australia and New Zealand have been slow in making the transition to become evidence-based practitioners. Collaboration between the university/ tertiary institute and clinical setting is one way that clinicians can be assisted with incorporating research into their practice. Two case examples are presented outlining how collaborative practice can result in improved out.. comes for all concerned.
Resumo:
The subject of this thesis was the acquisition of difficult non-native vowels by speakers of two different languages. In order to study the subject, a group of Finnish speakers and another group of American English speakers were recruited and they underwent a short listen-and-repeat training that included as stimuli the semisynthetically created pseudowords /ty:ti/ and /tʉ:ti/. The aim was to study the effect of the training method on the subjects as well as the possible influence of the speakers’ native language on the process of acquisition. The selection of the target vowels /y/ and /ʉ/ was made according to the Speech Learning Model and Perceptual Assimilation Model, both of which predict that second language speech sounds that share similar features with sounds of a person’s native language are most difficult for the person to learn. The vowel /ʉ/ is similar to Finnish vowels as well as to vowels of English, whereas /y/ exists in Finnish but not in English, although it is similar to other English vowels. Therefore, it can be hypothesized that /ʉ/ is a difficult vowel for both groups to learn and /y/ is difficult for English speakers. The effect of training was tested with a pretest-training-posttest protocol in which the stimuli were played alternately and the subjects’ task was to repeat the heard stimuli. The training method was thought to improve the production of non-native sounds by engaging different feedback mechanisms, such as auditory and somatosensory. These, according to Template Theory, modify the production of speech by altering the motor commands from the internal speech system or the feedforward signal which translates the motoric commands into articulatory movements. The subjects’ productions during the test phases were recorded and an acoustic analysis was performed in which the formant values of the target vowels were extracted. Statistical analyses showed a statistically significant difference between groups in the first formant, signaling a possible effect of native motor commands. Furthermore, a statistically significant difference between groups was observed in the standard deviation of the formants in the production of /y/, showing the uniformity of native production. The training had no observable effect, possibly due to the short nature of the training protocol.
Resumo:
To recognize a previously seen object, the visual system must overcome the variability in the object's appearance caused by factors such as illumination and pose. Developments in computer vision suggest that it may be possible to counter the influence of these factors, by learning to interpolate between stored views of the target object, taken under representative combinations of viewing conditions. Daily life situations, however, typically require categorization, rather than recognition, of objects. Due to the open-ended character both of natural kinds and of artificial categories, categorization cannot rely on interpolation between stored examples. Nonetheless, knowledge of several representative members, or prototypes, of each of the categories of interest can still provide the necessary computational substrate for the categorization of new instances. The resulting representational scheme based on similarities to prototypes appears to be computationally viable, and is readily mapped onto the mechanisms of biological vision revealed by recent psychophysical and physiological studies.
Resumo:
During grasping and intelligent robotic manipulation tasks, the camera position relative to the scene changes dramatically because the robot is moving to adapt its path and correctly grasp objects. This is because the camera is mounted at the robot effector. For this reason, in this type of environment, a visual recognition system must be implemented to recognize and “automatically and autonomously” obtain the positions of objects in the scene. Furthermore, in industrial environments, all objects that are manipulated by robots are made of the same material and cannot be differentiated by features such as texture or color. In this work, first, a study and analysis of 3D recognition descriptors has been completed for application in these environments. Second, a visual recognition system designed from specific distributed client-server architecture has been proposed to be applied in the recognition process of industrial objects without these appearance features. Our system has been implemented to overcome problems of recognition when the objects can only be recognized by geometric shape and the simplicity of shapes could create ambiguity. Finally, some real tests are performed and illustrated to verify the satisfactory performance of the proposed system.
Resumo:
Visual recognition is a fundamental research topic in computer vision. This dissertation explores datasets, features, learning, and models used for visual recognition. In order to train visual models and evaluate different recognition algorithms, this dissertation develops an approach to collect object image datasets on web pages using an analysis of text around the image and of image appearance. This method exploits established online knowledge resources (Wikipedia pages for text; Flickr and Caltech data sets for images). The resources provide rich text and object appearance information. This dissertation describes results on two datasets. The first is Berg’s collection of 10 animal categories; on this dataset, we significantly outperform previous approaches. On an additional set of 5 categories, experimental results show the effectiveness of the method. Images are represented as features for visual recognition. This dissertation introduces a text-based image feature and demonstrates that it consistently improves performance on hard object classification problems. The feature is built using an auxiliary dataset of images annotated with tags, downloaded from the Internet. Image tags are noisy. The method obtains the text features of an unannotated image from the tags of its k-nearest neighbors in this auxiliary collection. A visual classifier presented with an object viewed under novel circumstances (say, a new viewing direction) must rely on its visual examples. This text feature may not change, because the auxiliary dataset likely contains a similar picture. While the tags associated with images are noisy, they are more stable when appearance changes. The performance of this feature is tested using PASCAL VOC 2006 and 2007 datasets. This feature performs well; it consistently improves the performance of visual object classifiers, and is particularly effective when the training dataset is small. With more and more collected training data, computational cost becomes a bottleneck, especially when training sophisticated classifiers such as kernelized SVM. This dissertation proposes a fast training algorithm called Stochastic Intersection Kernel Machine (SIKMA). This proposed training method will be useful for many vision problems, as it can produce a kernel classifier that is more accurate than a linear classifier, and can be trained on tens of thousands of examples in two minutes. It processes training examples one by one in a sequence, so memory cost is no longer the bottleneck to process large scale datasets. This dissertation applies this approach to train classifiers of Flickr groups with many group training examples. The resulting Flickr group prediction scores can be used to measure image similarity between two images. Experimental results on the Corel dataset and a PASCAL VOC dataset show the learned Flickr features perform better on image matching, retrieval, and classification than conventional visual features. Visual models are usually trained to best separate positive and negative training examples. However, when recognizing a large number of object categories, there may not be enough training examples for most objects, due to the intrinsic long-tailed distribution of objects in the real world. This dissertation proposes an approach to use comparative object similarity. The key insight is that, given a set of object categories which are similar and a set of categories which are dissimilar, a good object model should respond more strongly to examples from similar categories than to examples from dissimilar categories. This dissertation develops a regularized kernel machine algorithm to use this category dependent similarity regularization. Experiments on hundreds of categories show that our method can make significant improvement for categories with few or even no positive examples.
Resumo:
Previous work examining context effects in children has been limited to semantic context. The current research examined the effects of grammatical priming of word-naming in fourth-grade children. In Experiment 1, children named both inflected and uninflected noun and verb target words faster when they were preceded by grammatically constraining primes than when they were preceded by neutral primes. Experiment 1 used a long stimulus onset asynchrony (SOA) interval of 750 msec. Experiment 2 replicated the grammatical priming effect at two SOA intervals (400 msec and 700 msec), suggesting that the grammatical priming effect does not reflect the operation of any gross strategic effects directly attributable to the long SOA interval employed in Experiment 1. Grammatical context appears to facilitate target word naming by constraining target word class. Further work is required to elucidate the loci of this effect.