984 resultados para Audiovisual speech recognition
Resumo:
In this paper, we present the Melodic Analysis of Speech method (MAS) that enables us to carry out complete and objective descriptions of a language's intonation, from a phonetic (melodic) point of view as well as from a phonological point of view. It is based on the acoustic-perceptive method by Cantero (2002), which has already been used in research on prosody in different languages. In this case, we present the results of its application in Spanish and Catalan.
Resumo:
The research group Gre‐TICE (Grupo de investigación en tecnologías de la Información y la Comunicación en Educación) has the acquisition of the multimedia language and their use as a form of expression as one of their lines of research. During the academic year 2002‐ 2003, following previous work in the use of ICT in Education, commenced upon the project: “The acquisition of visual and sound codes and the processes related to the visual media”. The intention of this project is to study how formal or non‐formal education context can help young adults and children to acquire visual and sound codes to become ‘critical consumers’ with the media and to use the tools in a creative way. To achieve this objective, the project team has developed a partner group which includes professional from different European regions; including teachers and managers from across the age spectrum, government institutions and cultural organisations. Whilst the project will call upon qualitative analysis of the previous projects / research, it will seek to develop ‘Good Practice’ guides and other resources/ materials to be disseminated to project partners (and others) to build innovative actions throughout the European region
Resumo:
This dissertation considers the segmental durations of speech from the viewpoint of speech technology, especially speech synthesis. The idea is that better models of segmental durations lead to higher naturalness and better intelligibility. These features are the key factors for better usability and generality of synthesized speech technology. Even though the studies are based on a Finnish corpus the approaches apply to all other languages as well. This is possibly due to the fact that most of the studies included in this dissertation are about universal effects taking place on utterance boundaries. Also the methods invented and used here are suitable for any other study of another language. This study is based on two corpora of news reading speech and sentences read aloud. The other corpus is read aloud by a 39-year-old male, whilst the other consists of several speakers in various situations. The use of two corpora is twofold: it involves a comparison of the corpora and a broader view on the matters of interest. The dissertation begins with an overview to the phonemes and the quantity system in the Finnish language. Especially, we are covering the intrinsic durations of phonemes and phoneme categories, as well as the difference of duration between short and long phonemes. The phoneme categories are presented to facilitate the problem of variability of speech segments. In this dissertation we cover the boundary-adjacent effects on segmental durations. In initial positions of utterances we find that there seems to be initial shortening in Finnish, but the result depends on the level of detail and on the individual phoneme. On the phoneme level we find that the shortening or lengthening only affects the very first ones at the beginning of an utterance. However, on average, the effect seems to shorten the whole first word on the word level. We establish the effect of final lengthening in Finnish. The effect in Finnish has been an open question for a long time, whilst Finnish has been the last missing piece for it to be a universal phenomenon. Final lengthening is studied from various angles and it is also shown that it is not a mere effect of prominence or an effect of speech corpus with high inter- and intra-speaker variation. The effect of final lengthening seems to extend from the final to the penultimate word. On a phoneme level it reaches a much wider area than the initial effect. We also present a normalization method suitable for corpus studies on segmental durations. The method uses an utterance-level normalization approach to capture the pattern of segmental durations within each utterance. This prevents the impact of various problematic variations within the corpora. The normalization is used in a study on final lengthening to show that the results on the effect are not caused by variation in the material. The dissertation shows an implementation and prowess of speech synthesis on a mobile platform. We find that the rule-based method of speech synthesis is a real-time software solution, but the signal generation process slows down the system beyond real time. Future aspects of speech synthesis on limited platforms are discussed. The dissertation considers ethical issues on the development of speech technology. The main focus is on the development of speech synthesis with high naturalness, but the problems and solutions are applicable to any other speech technology approaches.
Resumo:
In the modern warfare there is an active development of a new trend connected with a robotic warfare. One of the critical elements of robotics warfare systems is an automatic target recognition system, allowing to recognize objects, based on the data received from sensors. This work considers aspects of optical realization of such a system by means of NIR target scanning at fixed wavelengths. An algorithm was designed, an experimental setup was built and samples of various modern gear and apparel materials were tested. For pattern testing the samples of actively arm engaged armies camouflages were chosen. Tests were performed both in clear atmosphere and in the artificial extremely humid and hot atmosphere to simulate field conditions.
Resumo:
Speaker diarization is the process of sorting speeches according to the speaker. Diarization helps to search and retrieve what a certain speaker uttered in a meeting. Applications of diarization systemsextend to other domains than meetings, for example, lectures, telephone, television, and radio. Besides, diarization enhances the performance of several speech technologies such as speaker recognition, automatic transcription, and speaker tracking. Methodologies previously used in developing diarization systems are discussed. Prior results and techniques are studied and compared. Methods such as Hidden Markov Models and Gaussian Mixture Models that are used in speaker recognition and other speech technologies are also used in speaker diarization. The objective of this thesis is to develop a speaker diarization system in meeting domain. Experimental part of this work indicates that zero-crossing rate can be used effectively in breaking down the audio stream into segments, and adaptive Gaussian Models fit adequately short audio segments. Results show that 35 Gaussian Models and one second as average length of each segment are optimum values to build a diarization system for the tested data. Uniting the segments which are uttered by same speaker is done in a bottom-up clustering by a newapproach of categorizing the mixture weights.
Resumo:
Este estudo analisa a produção e a recepção do vídeo educativo Lição de Anatomia, a fim de compreender que sentidos são produzidos por alunos da disciplina de Psicologia Médica. Analisou-se o vídeo, entrevistaram-se seus produtores e foi feita uma exibição experimental, seguida de um grupo de discussão com estudantes de Medicina. O estudo da produção mostrou que o vídeo foi endereçado principalmente a estudantes de Medicina. Esperava-se provocar uma discussão e chamar a atenção para a formação médica como produtora de traumas e angústias. O estudo da recepção do vídeo mostrou que os espectadores estiveram, todo o tempo, conscientes da manipulação dos recursos estéticos do vídeo, não consideraram a narrativa crível e adotaram um posicionamento ideológico negociado, embora tenham compreendido e discutido alguns temas propostos pelo vídeo.
Resumo:
The flow of information within modern information society has increased rapidly over the last decade. The major part of this information flow relies on the individual’s abilities to handle text or speech input. For the majority of us it presents no problems, but there are some individuals who would benefit from other means of conveying information, e.g. signed information flow. During the last decades the new results from various disciplines have all suggested towards the common background and processing for sign and speech and this was one of the key issues that I wanted to investigate further in this thesis. The basis of this thesis is firmly within speech research and that is why I wanted to design analogous test batteries for widely used speech perception tests for signers – to find out whether the results for signers would be the same as in speakers’ perception tests. One of the key findings within biology – and more precisely its effects on speech and communication research – is the mirror neuron system. That finding has enabled us to form new theories about evolution of communication, and it all seems to converge on the hypothesis that all communication has a common core within humans. In this thesis speech and sign are discussed as equal and analogical counterparts of communication and all research methods used in speech are modified for sign. Both speech and sign are thus investigated using similar test batteries. Furthermore, both production and perception of speech and sign are studied separately. An additional framework for studying production is given by gesture research using cry sounds. Results of cry sound research are then compared to results from children acquiring sign language. These results show that individuality manifests itself from very early on in human development. Articulation in adults, both in speech and sign, is studied from two perspectives: normal production and re-learning production when the apparatus has been changed. Normal production is studied both in speech and sign and the effects of changed articulation are studied with regards to speech. Both these studies are done by using carrier sentences. Furthermore, sign production is studied giving the informants possibility for spontaneous speech. The production data from the signing informants is also used as the basis for input in the sign synthesis stimuli used in sign perception test battery. Speech and sign perception were studied using the informants’ answers to questions using forced choice in identification and discrimination tasks. These answers were then compared across language modalities. Three different informant groups participated in the sign perception tests: native signers, sign language interpreters and Finnish adults with no knowledge of any signed language. This gave a chance to investigate which of the characteristics found in the results were due to the language per se and which were due to the changes in modality itself. As the analogous test batteries yielded similar results over different informant groups, some common threads of results could be observed. Starting from very early on in acquiring speech and sign the results were highly individual. However, the results were the same within one individual when the same test was repeated. This individuality of results represented along same patterns across different language modalities and - in some occasions - across language groups. As both modalities yield similar answers to analogous study questions, this has lead us to providing methods for basic input for sign language applications, i.e. signing avatars. This has also given us answers to questions on precision of the animation and intelligibility for the users – what are the parameters that govern intelligibility of synthesised speech or sign and how precise must the animation or synthetic speech be in order for it to be intelligible. The results also give additional support to the well-known fact that intelligibility in fact is not the same as naturalness. In some cases, as shown within the sign perception test battery design, naturalness decreases intelligibility. This also has to be taken into consideration when designing applications. All in all, results from each of the test batteries, be they for signers or speakers, yield strikingly similar patterns, which would indicate yet further support for the common core for all human communication. Thus, we can modify and deepen the phonetic framework models for human communication based on the knowledge obtained from the results of the test batteries within this thesis.
Resumo:
OBJECTIVE: To evaluate the relation of medical research, with the participation of prominent plastic surgeon in Congress.METHODS: We reviewed the scientific programs of the last 3 Brazilian Congress of Surgery, were selected 21 Brazilian plástic surgeons invited to serve as panelists or speakers in roundtable sessions in the last 3 congresses (Group 1). We randomly selected and paired by other members (associates) of the Brazilian Society of Plastic Surgery, with no participation in congress as speaker (Group 2). We conducted a search for articles published in journals indexed in Medline, Lilacs and SciELO for all doctors selected during the entire academic career and the last 5 years from March 2007 until March 2012. We assessed the research activity through the simple counting of the number of publications in indexed journals for each professional. The number of publications groups was compared.RESULTS: articles produced throughout career: Group 1- 639 articles (average of 30.42 items each). Group 2- 79 articles (mean 3.95 articles each). Difference between medias: p <0.001.CONCLUSION: The results demonstrate that the Brazilian Society of Plastic Surgery seeking professionals with a greater number of publications and journals of higher impact. This approach encourages new members to pursue a higher qualification, and give security to congressmen, they can rely on the existence of a technical criterion in the choice of speakers.
Resumo:
During a possible loss of coolant accident in BWRs, a large amount of steam will be released from the reactor pressure vessel to the suppression pool. Steam will be condensed into the suppression pool causing dynamic and structural loads to the pool. The formation and break up of bubbles can be measured by visual observation using a suitable pattern recognition algorithm. The aim of this study was to improve the preliminary pattern recognition algorithm, developed by Vesa Tanskanen in his doctoral dissertation, by using MATLAB. Video material from the PPOOLEX test facility, recorded during thermal stratification and mixing experiments, was used as a reference in the development of the algorithm. The developed algorithm consists of two parts: the pattern recognition of the bubbles and the analysis of recognized bubble images. The bubble recognition works well, but some errors will appear due to the complex structure of the pool. The results of the image analysis were reasonable. The volume and the surface area of the bubbles were not evaluated. Chugging frequencies calculated by using FFT fitted well into the results of oscillation frequencies measured in the experiments. The pattern recognition algorithm works in the conditions it is designed for. If the measurement configuration will be changed, some modifications have to be done. Numerous improvements are proposed for the future 3D equipment.
Resumo:
The target of any immunization is to activate and expand lymphocyte clones with the desired recognition specificity and the necessary effector functions. In gene, recombinant and peptide vaccines, the immunogen is a single protein or a small assembly of epitopes from antigenic proteins. Since most immune responses against protein and peptide antigens are T-cell dependent, the molecular target of such vaccines is to generate at least 50-100 complexes between MHC molecule and the antigenic peptide per antigen-presenting cell, sensitizing a T cell population of appropriate clonal size and effector characteristics. Thus, the immunobiology of antigen recognition by T cells must be taken into account when designing new generation peptide- or gene-based vaccines. Since T cell recognition is MHC-restricted, and given the wide polymorphism of the different MHC molecules, distinct epitopes may be recognized by different individuals in the population. Therefore, the issue of whether immunization will be effective in inducing a protective immune response, covering the entire target population, becomes an important question. Many pathogens have evolved molecular mechanisms to escape recognition by the immune system by variation of antigenic protein sequences. In this short review, we will discuss the several concepts related to selection of amino acid sequences to be included in DNA and peptide vaccines.
Resumo:
The genome of Mycobacterium tuberculosis H37Rv contains three contiguous genes (plc-a, plc-b and plc-c) which are similar to the Pseudomonas aeruginosa phospholipase C (PLC) genes. Expression of mycobacterial PLC-a and PLC-b in E. coli and M. smegmatis has been reported, whereas expression of the native proteins in M. tuberculosis H37Rv has not been demonstrated. The objective of the present study was to demonstrate that native PLC-a is expressed in M. tuberculosis H37Rv. Sera from mice immunized with recombinant PLC-a expressed in E. coli were used in immunoblots to evaluate PLC-a expression. The immune serum recognized a 49-kDa protein in immunoblots against M. tuberculosis extracts. No bands were visible in M. tuberculosis culture supernatants or extracts from M. avium, M. bovis and M. smegmatis. A 550-bp DNA fragment upstream of plc-a was cloned in the pJEM12 vector and the existence of a functional promoter was evaluated by detection of ß-galactosidase activity. ß-Galactosidase activity was detected in M. smegmatis transformed with recombinant pJEM12 grown in vitro and inside macrophages. The putative promoter was active both in vitro and in vivo, suggesting that expression is constitutive. In conclusion, expression of non-secreted native PLC-a was demonstrated in M. tuberculosis.
Resumo:
Human activity recognition in everyday environments is a critical, but challenging task in Ambient Intelligence applications to achieve proper Ambient Assisted Living, and key challenges still remain to be dealt with to realize robust methods. One of the major limitations of the Ambient Intelligence systems today is the lack of semantic models of those activities on the environment, so that the system can recognize the speci c activity being performed by the user(s) and act accordingly. In this context, this thesis addresses the general problem of knowledge representation in Smart Spaces. The main objective is to develop knowledge-based models, equipped with semantics to learn, infer and monitor human behaviours in Smart Spaces. Moreover, it is easy to recognize that some aspects of this problem have a high degree of uncertainty, and therefore, the developed models must be equipped with mechanisms to manage this type of information. A fuzzy ontology and a semantic hybrid system are presented to allow modelling and recognition of a set of complex real-life scenarios where vagueness and uncertainty are inherent to the human nature of the users that perform it. The handling of uncertain, incomplete and vague data (i.e., missing sensor readings and activity execution variations, since human behaviour is non-deterministic) is approached for the rst time through a fuzzy ontology validated on real-time settings within a hybrid data-driven and knowledgebased architecture. The semantics of activities, sub-activities and real-time object interaction are taken into consideration. The proposed framework consists of two main modules: the low-level sub-activity recognizer and the high-level activity recognizer. The rst module detects sub-activities (i.e., actions or basic activities) that take input data directly from a depth sensor (Kinect). The main contribution of this thesis tackles the second component of the hybrid system, which lays on top of the previous one, in a superior level of abstraction, and acquires the input data from the rst module's output, and executes ontological inference to provide users, activities and their in uence in the environment, with semantics. This component is thus knowledge-based, and a fuzzy ontology was designed to model the high-level activities. Since activity recognition requires context-awareness and the ability to discriminate among activities in di erent environments, the semantic framework allows for modelling common-sense knowledge in the form of a rule-based system that supports expressions close to natural language in the form of fuzzy linguistic labels. The framework advantages have been evaluated with a challenging and new public dataset, CAD-120, achieving an accuracy of 90.1% and 91.1% respectively for low and high-level activities. This entails an improvement over both, entirely data-driven approaches, and merely ontology-based approaches. As an added value, for the system to be su ciently simple and exible to be managed by non-expert users, and thus, facilitate the transfer of research to industry, a development framework composed by a programming toolbox, a hybrid crisp and fuzzy architecture, and graphical models to represent and con gure human behaviour in Smart Spaces, were developed in order to provide the framework with more usability in the nal application. As a result, human behaviour recognition can help assisting people with special needs such as in healthcare, independent elderly living, in remote rehabilitation monitoring, industrial process guideline control, and many other cases. This thesis shows use cases in these areas.
Resumo:
This study discusses how audiovisual content can influence brand quality perceptions. The purpose of this study is to explore how audiovisual content creation can increase brand quality perceptions. This research problem is addressed with three sub questions, which aim at clarifying the role of emotions between content marketing and brand quality perception, explaining how different functions of audiovisual content can increase brand quality perception, and by identifying and comparing the key differences in content creation in business-to-consumer and business-to-businesscontexts. The theoretical background of the study is in brand personality, consumer emotions, consumerbrand relationships, content marketing and B2B branding literature. The empirical research part includes a single-case study. The case company was a Swiss startup that wished to build a highquality brand for both B2C and B2B segments. The empirical data was collected in September 2014. Eight interviews were conducted; seven with target segment representatives and one with an existing customer of the case company. The empirical findings were analyzed with thematic analysis and finally a 5-stage framework was created based on the findings of the research, offering a guideline for high-quality content creation. This study finds that emotions play an important role in brand quality perceptions. Psychological processes, emotion, cognition and conation, influence the engagement process of the target segment which ultimately can lead to activation and electronic word-of-mouth. Brand quality perception is the result of the overall emotion of the brand. The overall emotion derives from brand personality, brand concept, product attributes and utilitarian benefits of the brand. The entertaining and educational functions of the audiovisual content can target and evoke these emotional processes, and result in increased quality perceptions. In the B2B context, emotions are found to play a relatively smaller role in the quality perception processes. However, the significance of emotions cannot be ignored, since they can emphasize the value for the buying organization, and build on the trust and loyalty among the potential customers. The final framework presents five stages of content creation that ultimately improve brand quality perceptions. These stages help marketers to design and implement their content and evoke positive emotions in their target segment as part of a quality-based marketing strategy. Further research is warranted to quantitatively test the generalizability of the framework. Further research is also suggested to make the framework adaptable to different stages of the brand life cycle.
Resumo:
The problem of automatic recognition of the fish from the video sequences is discussed in this Master’s Thesis. This is a very urgent issue for many organizations engaged in fish farming in Finland and Russia because the process of automation control and counting of individual species is turning point in the industry. The difficulties and the specific features of the problem have been identified in order to find a solution and propose some recommendations for the components of the automated fish recognition system. Methods such as background subtraction, Kalman filtering and Viola-Jones method were implemented during this work for detection, tracking and estimation of fish parameters. Both the results of the experiments and the choice of the appropriate methods strongly depend on the quality and the type of a video which is used as an input data. Practical experiments have demonstrated that not all methods can produce good results for real data, whereas on synthetic data they operate satisfactorily.