904 resultados para Audio-Visual Automatic Speech Recognition
Resumo:
Esta dissertação apresenta a implementação de navegação no ambiente virtual, reconhecimento de gestos e controle de interface, feitos através do dispositivo Kinect, no Sistema ITV: um sistema de treinamento de operadores e mantenedores de usinas hidrelétricas e subestações elétricas. São mostrados, também, determinados aperfeiçoamentos recentes, como conversão em vídeo, telas de alarmes sonoros e visuais, ambientação sonora em três dimensões e narração do processo. Além da apresentação do Sistema ITV, são expostos o dispositivo Kinect e o algoritmo utilizado na comparação dos padrões de movimento, o DTW. Em seguida, são abordados em detalhes o projeto e a implementação da navegação, do reconhecimento de gestos e do controle de interface. Como estudo de caso, é exibida uma Instrução Técnica Virtual (ITV), elaborada especialmente para testar e avaliar a nova interface proposta. Posteriormente, são apresentados os resultados, considerados satisfatórios, obtidos através da análise de questionários qualitativos aplicados a estudantes da Universidade Federal do Pará. Por fim, são realizadas as considerações referentes a este trabalho e expostas idéias de trabalhos futuros.
Resumo:
A presente pesquisa tem como tema o estudo perceptual da prosódia como elemento de segmentação de narrativas orais espontâneas e visa confirmar, ou não, se a prosódia facilita ao ouvinte leigo e inexperiente perceber a estrutura do texto narrativo. Este estudo investiga se a diferença de tom é um elemento prosódico relevante. A dissertação tem como corpus quatro narrativas espontâneas, as quais fazem parte do corpus analisado por Oliveira Jr.(2000), autor do projeto que inspirou esta pesquisa. Para saber se os participantes são capazes de delimitar a estrutura narrativa, baseando-se apenas no aspecto perceptual, conduziu-se um teste de percepção com 112 voluntários, recrutados na Universidade Federal do Pará e na Universidade Federal de Alagoas. Coube aos participantes a tarefa de indicar os pontos em que o falante teve a intenção de finalizar uma unidade comunicativa nas narrativas. A interpretação sobre unidade comunicativa foi subjetiva. Apresentou-se cada narrativa em quatro condições diferentes, a saber: (i) transcrição sem marca de pontuação e sem paragrafação; (ii) transcrição da narrativa acompanhada de áudio ; (iii) narrativa somente em áudio e (iv) áudio filtrado da narrativa, resultando numa versão deslexicalizada (fala ininteligível), mas com preservação da estrutura prosódica do discurso. Nas duas primeiras condições, a segmentação foi no texto transcrito, com barras transversais (/); nas demais, utilizou-se um programa de computador chamado ELAN. A análise dos dados obtidos baseou-se em tabelas, gráficos, análise estatística (teste do Qui-Quadrado), análise acústica (utilização do Programa PRAAT). Os resultados sinalizam que a prosódia ajuda o ouvinte leigo a perceber a estrutura básica do discurso narrativo. Com relação ao peso do Pitch Reset para auxiliar os ouvintes na demarcação de fronteiras, pode-se dizer que o teste estatístico do Qui-Quadrado encontrou evidências que lhe atribui essa função. Assim, neste contexto, ratifica-se o relevante papel da prosódia para o reconhecimento da estrutura de narrativas orais espontâneas e identifica-se o reflexo do peso da diferença de tom na percepção dos participantes.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Pós-graduação em Engenharia Mecânica - FEG
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The attributes describing a data set may often be arranged in meaningful subsets, each of which corresponds to a different aspect of the data. An unsupervised algorithm (SCAD) that simultaneously performs fuzzy clustering and aspects weighting was proposed in the literature. However, SCAD may fail and halt given certain conditions. To fix this problem, its steps are modified and then reordered to reduce the number of parameters required to be set by the user. In this paper we prove that each step of the resulting algorithm, named ASCAD, globally minimizes its cost-function with respect to the argument being optimized. The asymptotic analysis of ASCAD leads to a time complexity which is the same as that of fuzzy c-means. A hard version of the algorithm and a novel validity criterion that considers aspect weights in order to estimate the number of clusters are also described. The proposed method is assessed over several artificial and real data sets.
Resumo:
The purpose of this study was to evaluate the visual outcome of chronic occupational exposure to a mixture of organic solvents by measuring color discrimination, achromatic contrast sensitivity and visual fields in a group of gas station workers. We tested 25 workers (20 males) and 25 controls with no history of chronic exposure to solvents (10 males). All participants had normal ophthalmologic exams. Subjects had worked in gas stations on an average of 9.6 +/- 6.2 years. Color vision was evaluated with the Lanthony D15d and Cambridge Colour Test (CCT). Visual field assessment consisted of white-on-white 24-2 automatic perimetry (Humphrey II-750i). Contrast sensitivity was measured for sinusoidal gratings of 0.2, 0.5, 1.0, 2.0, 5.0, 10.0 and 20.0 cycles per degree (cpd). Results from both groups were compared using the Mann-Whitney U test. The number of errors in the D15d was higher for workers relative to controls (p<0.01). Their CCT color discrimination thresholds were elevated compared to the control group along the protan, deutan and tritan confusion axes (p<0.01), and their ellipse area and ellipticity were higher (p<0.01). Genetic analysis of subjects with very elevated color discrimination thresholds excluded congenital causes for the visual losses. Automated perimetry thresholds showed elevation in the 9 degrees, 15 degrees and 21 degrees of eccentricity (p<0.01) and in MD and PSD indexes (p<0.01). Contrast sensitivity losses were found for all spatial frequencies measured (p<0.01) except for 0.5 cpd. Significant correlation was found between previous working years and deutan axis thresholds (rho = 0.59; p<0.05), indexes of the Lanthony D15d (rho = 0.52; p<0.05), perimetry results in the fovea (rho = -0.51; p<0.05) and at 3, 9 and 15 degrees of eccentricity (rho = -0.46; p<0.05). Extensive and diffuse visual changes were found, suggesting that specific occupational limits should be created.
Resumo:
The aims of this study were to investigate work conditions, to estimate the prevalence and to describe risk factors associated with Computer Vision Syndrome among two call centers' operators in Sao Paulo (n = 476). The methods include a quantitative cross-sectional observational study and an ergonomic work analysis, using work observation, interviews and questionnaires. The case definition was the presence of one or more specific ocular symptoms answered as always, often or sometimes. The multiple logistic regression model, were created using the stepwise forward likelihood method and remained the variables with levels below 5% (p < 0.05). The operators were mainly female and young (from 15 to 24 years old). The call center was opened 24 hours and the operators weekly hours were 36 hours with break time from 21 to 35 minutes per day. The symptoms reported were eye fatigue (73.9%), "weight" in the eyes (68.2%), "burning" eyes (54.6%), tearing (43.9%) and weakening of vision (43.5%). The prevalence of Computer Vision Syndrome was 54.6%. Associations verified were: being female (OR 2.6, 95% CI 1.6 to 4.1), lack of recognition at work (OR 1.4, 95% CI 1.1 to 1.8), organization of work in call center (OR 1.4, 95% CI 1.1 to 1.7) and high demand at work (OR 1.1, 95% CI 1.0 to 1.3). The organization and psychosocial factors at work should be included in prevention programs of visual syndrome among call centers' operators.
Resumo:
Hearing loss is one of the most common clinical findings in subjects with malformations of the ear. Treatment consists of surgery and/or adapt a hearing aid amplification by bone (HA VO). Early intervention is critical to auditory stimulation and development of speech and language. OBJECTIVE: To characterize the audiological profile of subjects with congenital malformation of the external ear and/or middle and evaluate the benefit and satisfaction of using HA VO. METHOD: A descriptive study, subjects with bilateral congenital malformations of the external ear and/or middle, conductive or mixed hearing loss, moderate or severe and HA VO users. Evaluation of the benefit test using sentence recognition in noise and measures of functional gain and satisfaction assessment questionnaire using international IQ - HA. RESULTS: 13 subjects were evaluated, 61% were male and 80% with moderate conductive hearing loss or severe. There was better performance in the evaluation proposal, provided with HA when compared to the condition without HA. CONCLUSION: HA VO showed advantages for the population studied and should be considered as an option for intervention. Satisfaction was confirmed by elevated scores obtained in IQ - HA.
Resumo:
Studies about cortical auditory evoked potentials using the speech stimuli in normal hearing individuals are important for understanding how the complexity of the stimulus influences the characteristics of the cortical potential generated. OBJECTIVE: To characterize the cortical auditory evoked potential and the P3 auditory cognitive potential with the vocalic and consonantal contrast stimuli in normally hearing individuals. METHOD: 31 individuals with no risk for hearing, neurologic and language alterations, in the age range between 7 and 30 years, participated in this study. The cortical auditory evoked potentials and the P3 auditory cognitive one were recorded in the Fz and Cz active channels using consonantal (/ba/-/da/) and vocalic (/i/-/a/) speech contrasts. Design: A crosssectional prospective cohort study. RESULTS: We found a statistically significant difference between the speech contrast used and the latencies of the N2 (p = 0.00) and P3 (p = 0.00) components, as well as between the active channel considered (Fz/Cz) and the P3 latency and amplitude values. These correlations did not occur for the exogenous components N1 and P2. CONCLUSION: The speech stimulus contrast, vocalic or consonantal, must be taken into account in the analysis of the cortical auditory evoked potential, N2 component, and auditory cognitive P3 potential.
Resumo:
The occurrence of a weak auditory warning stimulus increases the speed of the response to a subsequent visual target stimulus that must be identified. This facilitatory effect has been attributed to the temporal expectancy automatically induced by the warning stimulus. It has not been determined whether this results from a modulation of the stimulus identification process, the response selection process or both. The present study examined these possibilities. A group of 12 young adults performed a reaction time location identification task and another group of 12 young adults performed a reaction time shape identification task. A visual target stimulus was presented 1850 to 2350 ms plus a fixed interval (50, 100, 200, 400, 800, or 1600 ms, depending on the block) after the appearance of a fixation point, on its left or right side, above or below a virtual horizontal line passing through it. In half of the trials, a weak auditory warning stimulus (S1) appeared 50, 100, 200, 400, 800, or 1600 ms (according to the block) before the target stimulus (S2). Twelve trials were run for each condition. The S1 produced a facilitatory effect for the 200, 400, 800, and 1600 ms stimulus onset asynchronies (SOA) in the case of the side stimulus-response (S-R) corresponding condition, and for the 100 and 400 ms SOA in the case of the side S-R non-corresponding condition. Since these two conditions differ mainly by their response selection requirements, it is reasonable to conclude that automatic temporal expectancy influences the response selection process.
Resumo:
The strength and durability of materials produced from aggregates (e.g., concrete bricks, concrete, and ballast) are critically affected by the weathering of the particles, which is closely related to their mineral composition. It is possible to infer the degree of weathering from visual features derived from the surface of the aggregates. By using sound pattern recognition methods, this study shows that the characterization of the visual texture of particles, performed by using texture-related features of gray scale images, allows the effective differentiation between weathered and nonweathered aggregates. The selection of the most discriminative features is also performed by taking into account a feature ranking method. The evaluation of the methodology in the presence of noise suggests that it can be used in stone quarries for automatic detection of weathered materials.