992 resultados para Inventory-style speech enhancement
Resumo:
Background: Neuropsychiatric symptoms (NPS) affect almost all patients with dementia and are a major focus of study and treatment. Accurate assessment of NPS through valid, sensitive and reliable measures is crucial. Although current NPS measures have many strengths, they also have some limitations (e.g. acquisition of data is limited to informants or caregivers as respondents, limited depth of items specific to moderate dementia). Therefore, we developed a revised version of the NPI, known as the NPI-C. The NPI-C includes expanded domains and items, and a clinician-rating methodology. This study evaluated the reliability and convergent validity of the NPI-C at ten international sites (seven languages). Methods: Face validity for 78 new items was obtained through a Delphi panel. A total of 128 dyads (caregivers/patients) from three severity categories of dementia (mild = 58, moderate = 49, severe = 21) were interviewed separately by two trained raters using two rating methods: the original NPI interview and a clinician-rated method. Rater 1 also administered four additional, established measures: the Apathy Evaluation Scale, the Brief Psychiatric Rating Scale, the Cohen-Mansfield Agitation Index, and the Cornell Scale for Depression in Dementia. Intraclass correlations were used to determine inter-rater reliability. Pearson correlations between the four relevant NPI-C domains and their corresponding outside measures were used for convergent validity. Results: Inter-rater reliability was strong for most items. Convergent validity was moderate (apathy and agitation) to strong (hallucinations and delusions; agitation and aberrant vocalization; and depression) for clinician ratings in NPI-C domains. Conclusion: Overall, the NPI-C shows promise as a versatile tool which can accurately measure NPS and which uses a uniform scale system to facilitate data comparisons across studies. Copyright © 2010 International Psychogeriatric Association.
Resumo:
Esse estudo foi realizado com o objetivo de contribuir com a avaliação dos efeitos de uma intervenção comportamental direcionada a pais/responsáveis de crianças com Transtorno do Déficit de Atenção e Hiperatividade-TDAH, investigando os efeitos desse modelo de treino parental em duas condições, setting terapêutico (Condição 1) e ambiente domiciliar (Condição 2), sobre a ocorrência de comportamentos de hiperatividade versus autocontrole. Os participantes foram pais de quatro crianças, na faixa etária entre cinco e nove anos. Utilizaram-se como instrumentos: Termo de Consentimento Livre e Esclarecido-TCLE, Inventário de Estilos Parentais-IEP, Lista de Verificação Comportamental para Crianças e Adolescentes-CBCL/TRF, Escala do TDAH versão para professor, Roteiro de Entrevista Inicial, Roteiro de Entrevista de Avaliação, Roteiro de Entrevista Final, Critério de Classificação Econômica Brasil-CCEB. O procedimento de pesquisa consistiu em: (a) contato com neuropediatra; (b) triagem e convite aos participantes; (c) distribuição de dois participantes para cada condição de intervenção; (d) avaliação inicial, incluindo entrevista com os responsáveis, aplicações do TCLE, IEP e CBCL; (e) visita à escola e aplicações do TCLE, da Escala do TDAH e do TRF, versões para professor; (f) realização de cinco sessões de intervenção, gravadas em áudio e vídeo, duas de linha de base, uma de habituação às regras e duas de manutenção das regras e instalação de comportamento de auto-observação, que envolveram situações de interação em jogos de regras, com participação da terapeuta-pesquisadora, da mãe e da criança; (g) realização de entrevista de avaliação da primeira fase; (h) reversão de contextos para os participantes e (i) avaliação final, realizada por meio de entrevista com os responsáveis e re-aplicação dos instrumentais padronizados com pais e professores, utilizados anteriormente, mais o CCEB. Os dados obtidos por meio dos instrumentos padronizados receberam o tratamento indicado nos manuais. Dois sistemas de categorias de análise do comportamento foram utilizados, um para descrever os comportamentos das mães e outro para comportamentos observados nas crianças. Os principais resultados sugerem que as crianças em ambiente de consultório tiveram maior ocorrência de emissão de comportamentos de autocontrole do que as em ambiente de domicílio, as quais, por sua vez, tiveram prevalência de comportamentos de hiperatividade/impulsividade. Do mesmo modo, as mães em ambiente de consultório obtiveram maiores escores em práticas educativas positivas e menos em negativas, comparadas às mães do grupo de domicílio. Houve aumento de práticas educativas positivas para a maioria das mães. Discute-se o contexto de consultório enquanto um ambiente eficaz de intervenção, embora se reconheça que as dificuldades de controle de comportamentos inadequados são maiores para os pais em ambiente domiciliar, por isso, intervenções em ambiente natural devem ser consideradas no processo terapêutico. Por outro lado, o treino parental demonstrou ser efetivo na aquisição, fortalecimento e manutenção de práticas educativas positivas em todas as mães, o que pode influenciar beneficamente os comportamentos das crianças com TDAH.
Resumo:
Aim: to evaluate the association of antenatal depressive symptomatology (AD) with life events and coping styles, the hypothesis was that certain coping strategies are associated to depressive symptomatology. Methods: we performed a cross sectional study of 312 women attending a private clinic in the city of Osasco, Sao Paulo from 27/05/1998 to 13/05/2002. The following instruments were used: Beck Depression Inventory (BDI), Holmes and Rahe Schedule of Recent Events (SSRS), Folkman and Lazarus Ways of Coping Questionnaire and questionnaire with social-demographic and obstetric data. Inclusion criteria: women with 110 past history of depression, psychiatric treatment, alcohol or drug abuse and no clinical-obstetrical complications. Odds ratios and 95% CI were used to examine the association between AD (according to BDI) and exposures variables. Hypothesis testing was done with chi(2) tests and a p value < .05. Results: AD occurred in 21.1% of pregnant women. By the univariate analyses, education, number of pregnancies, previous abortion, husband income, situation of marriage and score of SSRS were associated with AD. All coping styles were associated with AD, except seeking support and positive reappraisal. By the multivariate analyses, four coping styles were kept in the final model: confront (p = .039), accepting responsibility (p < .001), escape-avoidance (p = .002), problem-solving (p = .005). Conclusions: AD was highly prevalent and was associated with maladaptive coping styles.
Resumo:
Open-ended interviews of 90 min length of 38 patients were analyzed with respect to speech stylistics, shown by Schucker and Jacobs to differentiate individuals with type A personality features from those with type B. In our patients, Type A/B had been assessed by the Bortner Personality Inventory. The stylistics studied were: repeated words swallowed words, interruptions, simultaneous speech, silence latency (between question and answer) (SL), speed of speech, uneven speed of speech (USS), explosive words (PW), uneven speech volume (USV), and speech volume. Correlations between both raters for all speech categories were high. Positive correlations between extent of type A and SL (r = 0.33; p = 0.022), USS (r = 0.51; p = 0.002), PW (r = 0.46; p = 0.003) and USV (r = 0.39; p = 0.012) were found. Our results indicate that the speech in nonstress open-ended interviews of type A individuals tends to show a higher emotional tension (positive correlations for USS PW and USV) and is more controlled in conversation (positive correlation for SL).
Resumo:
Among daily computer users who are proficient, some are flexible at accomplishing unfamiliar tasks on their own and others have difficulty. Software designers and evaluators involved with Human Computer Interaction (HCI) should account for any group of proficient daily users that are shown to stumble over unfamiliar tasks. We define "Just Enough" (JE) users as proficient daily computer users with predominantly extrinsic motivation style who know just enough to get what they want or need from the computer. We hypothesize that JE users have difficulty with unfamiliar computer tasks and skill transfer, whereas intrinsically motivated daily users accomplish unfamiliar tasks readily. Intrinsic motivation can be characterized by interest, enjoyment, and choice and extrinsic motivation is externally regulated. In our study we identified users by motivation style and then did ethnographic observations. Our results confirm that JE users do have difficulty accomplishing unfamiliar tasks on their own but had fewer problems with near skill transfer. In contrast, intrinsically motivated users had no trouble with unfamiliar tasks nor with near skill transfer. This supports our assertion that JE users know enough to get routine tasks done and can transfer that knowledge, but become unproductive when faced with unfamiliar tasks. This study combines quantitative and qualitative methods. We identified 66 daily users by motivation style using an inventory adapted from Deci and Ryan (Ryan and Deci 2000) and from Guay, Vallerand, and Blanchard (Guay et al. 2000). We used qualitative ethnographic methods with a think aloud protocol to observe nine extrinsic users and seven intrinsic users. Observation sessions had three customized phases where the researcher directed the participant to: 1) confirm the participant's proficiency; 2) test the participant accomplishing unfamiliar tasks; and 3) test transfer of existing skills to unfamiliar software.
Resumo:
Internet has affected our lives and society in manifold ways, and partly, in fundamental ways. Therefore, it is no surprise that one of the affected areas is language and communication itself. Over the last few years, online social networks have become a widespread and continuously expanding medium of communication. Being a new medium of social interaction, online social networks produce their own communication style, which in many cases differs considerably from real speech and is also perceived differently. The focus of analysis of my PhD thesis is how social network users from the city of Malaga create this virtual style by means of phonic features typical of the Andalusian variety of Spanish and how the users’ language attitude has an influence on the use of these phonic features. The data collection was fourfold: 1) a main corpus was compiled from 240 informants’ utterances on Facebook and Tuenti; 2) a corpus constituted of broad transcriptions of recordings with 120 people from Malaga served as a comparison; 3) a survey in which 240 participants rated the use of said phonetic variants on the following axes: “good–bad”, “correct–incorrect” and “beautiful–ugly” was carried out; 4) a survey with 240 participants who estimated with which frequency the analysed features are used in Malaga was conducted. For the analysis, which is quantitative and qualitative, ten variables were chosen. Results show that the studied variants are employed differently in virtual and real speech depending on how people perceive these variants. In addition, the use of the features is constrained by social factors. In general, people from Malaga have a more positive attitude towards non-‐standard features if they are used in virtual speech than in real speech. Thus, virtual communication is seen as a style serving to create social meaning and to express linguistic identity. These stylistic practices reflect an amalgam of social presuppositions about usage conventions and individual strategies for handling a new medium. In sum, the virtual style is an initiative deliberately taken by the users, to create their, real and virtual, identities, and to define their language attitudes towards the features of their variety of speech.
Resumo:
Theory: Interpersonal factors play a major role in causing and maintaining depression. It is unclear, however, to what degree significant others of the patient need to be involved for characterizing the patient's interpersonal style. Therefore, our study sought to investigate how impact messages as perceived by the patients' significant others add to the prediction of psychotherapy process and outcome above and beyond routine assessments, and therapist factors. Method: 143 outpatients with major depressive disorder were treated by 24 therapists with CBT or Exposure-Based Cognitive Therapy. Interpersonal style was measured pre and post therapy with the informant‐based Impact Message Inventory (IMI), in addition to the self‐report Inventory of Interpersonal Problems (IIP‐32). Indicators for the patients' dominance and affiliation as well as interpersonal distress were calculated from these measures. Depressive and general symptomatology was assessed at pre, post, and at three months follow‐up, and by process measures after every session. Results: Whereas significant other's reports did not add significantly to the prediction of the early therapeutic alliance, central mechanisms of change, or post‐therapy outcome including therapist factors, the best predictor of outcome 3 months post therapy was an increase in dominance as perceived by significant others. Conclusions: The patients' significant others seem to provide important additional information about the patients' interpersonal style and therefore should be included in the diagnostic process. Moreover, practitioners should specifically target interpersonal change as a potential mechanism of change in psychotherapy for depression.
Resumo:
Current text-to-speech systems are developed using studio-recorded speech in a neutral style or based on acted emotions. However, the proliferation of media sharing sites would allow developing a new generation of speech-based systems which could cope with spontaneous and styled speech. This paper proposes an architecture to deal with realistic recordings and carries out some experiments on unsupervised speaker diarization. In order to maximize the speaker purity of the clusters while keeping a high speaker coverage, the paper evaluates the F-measure of a diarization module, achieving high scores (>85%) especially when the clusters are longer than 30 seconds, even for the more spontaneous and expressive styles (such as talk shows or sports).
Resumo:
La presente Tesis analiza las posibilidades que ofrecen en la actualidad las tecnologías del habla para la detección de patologías clínicas asociadas a la vía aérea superior. El estudio del habla que tradicionalmente cubre tanto la producción como el proceso de transformación del mensaje y las señales involucradas, desde el emisor hasta alcanzar al receptor, ofrece una vía de estudio alternativa para estas patologías. El hecho de que la señal emitida no solo contiene este mensaje, sino también información acerca del locutor, ha motivado el desarrollo de sistemas orientados a la identificación y verificación de la identidad de los locutores. Estos trabajos han recibido recientemente un nuevo impulso, orientándose tanto hacia la caracterización de rasgos que son comunes a varios locutores, como a las diferencias existentes entre grabaciones de un mismo locutor. Los primeros resultan especialmente relevantes para esta Tesis dado que estos rasgos podrían evidenciar la presencia de características relacionadas con una cierta condición común a varios locutores, independiente de su identidad. Tal es el caso que se enfrenta en esta Tesis, donde los rasgos identificados se relacionarían con una de la patología particular y directamente vinculada con el sistema de físico de conformación del habla. El caso del Síndrome de Apneas Hipopneas durante el Sueno (SAHS) resulta paradigmático. Se trata de una patología con una elevada prevalencia mundo, que aumenta con la edad. Los pacientes de esta patología experimentan episodios de cese involuntario de la respiración durante el sueño, que se prolongan durante varios segundos y que se reproducen a lo largo de la noche impidiendo el correcto descanso. En el caso de la apnea obstructiva, estos episodios se deben a la imposibilidad de mantener un camino abierto a través de la vía aérea, de forma que el flujo de aire se ve interrumpido. En la actualidad, el diagnostico de estos pacientes se realiza a través de un estudio polisomnográfico, que se centra en el análisis de los episodios de apnea durante el sueño, requiriendo que el paciente permanezca en el hospital durante una noche. La complejidad y el elevado coste de estos procedimientos, unidos a las crecientes listas de espera, han evidenciado la necesidad de contar con técnicas rápidas de detección, que si bien podrían no obtener tasas tan elevadas, permitirían reorganizar las listas de espera en función del grado de severidad de la patología en cada paciente. Entre otros, los sistemas de diagnostico por imagen, así como la caracterización antropométrica de los pacientes, han evidenciado la existencia de patrones anatómicos que tendrían influencia directa sobre el habla. Los trabajos dedicados al estudio del SAHS en lo relativo a como esta afecta al habla han sido escasos y algunos de ellos incluso contradictorios. Sin embargo, desde finales de la década de 1980 se conoce la existencia de patrones específicos relativos a la articulación, la fonación y la resonancia. Sin embargo, su descripción resultaba difícilmente aprovechable a través de un sistema de reconocimiento automático, pero apuntaba la existencia de un nexo entre voz y SAHS. En los últimos anos las técnicas de procesado automático han permitido el desarrollo de sistemas automáticos que ya son capaces de identificar diferencias significativas en el habla de los pacientes del SAHS, y que los distinguen de los locutores sanos. Por contra, poco se conoce acerca de la conexión entre estos nuevos resultados, los sé que habían obtenido en el pasado y la patogénesis del SAHS. Esta Tesis continua la labor desarrollada en este ámbito considerando específicamente: el estudio de la forma en que el SAHS afecta el habla de los pacientes, la mejora en las tasas de clasificación automática y la combinación de la información obtenida con los predictores utilizados por los especialistas clínicos en sus evaluaciones preliminares. Las dos primeras tareas plantean problemas simbióticos, pero diferentes. Mientras el estudio de la conexión entre el SAHS y el habla requiere de modelos acotados que puedan ser interpretados con facilidad, los sistemas de reconocimiento se sirven de un elevado número de dimensiones para la caracterización y posterior identificación de patrones. Así, la primera tarea debe permitirnos avanzar en la segunda, al igual que la incorporación de los predictores utilizados por los especialistas clínicos. La Tesis aborda el estudio tanto del habla continua como del habla sostenida, con el fin de aprovechar las sinergias y diferencias existentes entre ambas. En el análisis del habla continua se tomo como punto de partida un esquema que ya fue evaluado con anterioridad, y sobre el cual se ha tratado la evaluación y optimización de la representación del habla, así como la caracterización de los patrones específicos asociados al SAHS. Ello ha evidenciado la conexión entre el SAHS y los elementos fundamentales de la señal de voz: los formantes. Los resultados obtenidos demuestran que el éxito de estos sistemas se debe, fundamentalmente, a la capacidad de estas representaciones para describir dichas componentes, obviando las dimensiones ruidosas o con poca capacidad discriminativa. El esquema resultante ofrece una tasa de error por debajo del 18%, sirviéndose de clasificadores notablemente menos complejos que los descritos en el estado del arte y de una única grabación de voz de corta duración. En relación a la conexión entre el SAHS y los patrones observados, fue necesario considerar las diferencias inter- e intra-grupo, centrándonos en la articulación característica del locutor, sustituyendo los complejos modelos de clasificación por el estudio de los promedios espectrales. El resultado apunta con claridad hacia ciertas regiones del eje de frecuencias, sugiriendo la existencia de un estrechamiento sistemático en la sección del tracto en la región de la orofaringe, ya prevista en la patogénesis de este síndrome. En cuanto al habla sostenida, se han reproducido los estudios realizados sobre el habla continua en grabaciones de la vocal /a/ sostenida. Los resultados son cualitativamente análogos a los anteriores, si bien en este caso las tasas de clasificación resultan ser más bajas. Con el objetivo de identificar el sentido de este resultado se reprodujo el estudio de los promedios espectrales y de la variabilidad inter e intra-grupo. Ambos estudios mostraron importantes diferencias con los anteriores que podrían explicar estos resultados. Sin embargo, el habla sostenida ofrece otras oportunidades al establecer un entorno controlado para el estudio de la fonación, que también había sido identificada como una fuente de información para la detección del SAHS. De su estudio se pudo observar que, en el conjunto de datos disponibles, no existen variaciones que pudieran asociarse fácilmente con la fonación. Únicamente aquellas dimensiones que describen la distribución de energía a lo largo del eje de frecuencia evidenciaron diferencias significativas, apuntando, una vez más, en la dirección de las resonancias espectrales. Analizados los resultados anteriores, la Tesis afronta la fusión de ambas fuentes de información en un único sistema de clasificación. Con ello es posible mejorar las tasas de clasificación, bajo la hipótesis de que la información presente en el habla continua y el habla sostenida es fundamentalmente distinta. Esta tarea se realizo a través de un sencillo esquema de fusión que obtuvo un 88.6% de aciertos en clasificación (tasa de error del 11.4%), lo que representa una mejora significativa respecto al estado del arte. Finalmente, la combinación de este clasificador con los predictores utilizados por los especialistas clínicos ofreció una tasa del 91.3% (tasa de error de 8.7%), que se encuentra dentro del margen ofrecido por esquemas más costosos e intrusivos, y que a diferencia del propuesto, no pueden ser utilizados en la evaluación previa de los pacientes. Con todo, la Tesis ofrece una visión clara sobre la relación entre el SAHS y el habla, evidenciando el grado de madurez alcanzado por la tecnología del habla en la caracterización y detección del SAHS, poniendo de manifiesto que su uso para la evaluación de los pacientes ya sería posible, y dejando la puerta abierta a futuras investigaciones que continúen el trabajo aquí iniciado. ABSTRACT This Thesis explores the potential of speech technologies for the detection of clinical disorders connected to the upper airway. The study of speech traditionally covers both the production process and post processing of the signals involved, from the speaker up to the listener, offering an alternative path to study these pathologies. The fact that utterances embed not just the encoded message but also information about the speaker, has motivated the development of automatic systems oriented to the identification and verificaton the speaker’s identity. These have recently been boosted and reoriented either towards the characterization of traits that are common to several speakers, or to the differences between records of the same speaker collected under different conditions. The first are particularly relevant to this Thesis as these patterns could reveal the presence of features that are related to a common condition shared among different speakers, regardless of their identity. Such is the case faced in this Thesis, where the traits identified would relate to a particular pathology, directly connected to the speech production system. The Obstructive Sleep Apnea syndrome (OSA) is a paradigmatic case for analysis. It is a disorder with high prevalence among adults and affecting a larger number of them as they grow older. Patients suffering from this disorder experience episodes of involuntary cessation of breath during sleep that may last a few seconds and reproduce throughout the night, preventing proper rest. In the case of obstructive apnea, these episodes are related to the collapse of the pharynx, which interrupts the air flow. Currently, OSA diagnosis is done through a polysomnographic study, which focuses on the analysis of apnea episodes during sleep, requiring the patient to stay at the hospital for the whole night. The complexity and high cost of the procedures involved, combined with the waiting lists, have evidenced the need for screening techniques, which perhaps would not achieve outstanding performance rates but would allow clinicians to reorganize these lists ranking patients according to the severity of their condition. Among others, imaging diagnosis and anthropometric characterization of patients have evidenced the existence of anatomical patterns related to OSA that have direct influence on speech. Contributions devoted to the study of how this disorder affects scpeech are scarce and somehow contradictory. However, since the late 1980s the existence of specific patterns related to articulation, phonation and resonance is known. By that time these descriptions were virtually useless when coming to the development of an automatic system, but pointed out the existence of a link between speech and OSA. In recent years automatic processing techniques have evolved and are now able to identify significant differences in the speech of OSAS patients when compared to records from healthy subjects. Nevertheless, little is known about the connection between these new results with those published in the past and the pathogenesis of the OSA syndrome. This Thesis is aimed to progress beyond the previous research done in this area by addressing: the study of how OSA affects patients’ speech, the enhancement of automatic OSA classification based on speech analysis, and its integration with the information embedded in the predictors generally used by clinicians in preliminary patients’ examination. The first two tasks, though may appear symbiotic at first, are quite different. While studying the connection between speech and OSA requires simple narrow models that can be easily interpreted, classification requires larger models including a large number dimensions for the characterization and posterior identification of the observed patterns. Anyhow, it is clear that any progress made in the first task should allow us to improve our performance on the second one, and that the incorporation of the predictors used by clinicians shall contribute in this same direction. The Thesis considers both continuous and sustained speech analysis, to exploit the synergies and differences between them. On continuous speech analysis, a conventional speech processing scheme, designed and evaluated before this Thesis, was taken as a baseline. Over this initial system several alternative representations of the speech information were proposed, optimized and tested to select those more suitable for the characterization of OSA-specific patterns. Evidences were found on the existence of a connection between OSA and the fundamental constituents of the speech: the formants. Experimental results proved that the success of the proposed solution is well explained by the ability of speech representations to describe these specific OSA-related components, ignoring the noisy ones as well those presenting low discrimination capabilities. The resulting scheme obtained a 18% error rate, on a classification scheme significantly less complex than those described in the literature and operating on a single speech record. Regarding the connection between OSA and the observed patterns, it was necessary to consider inter-and intra-group differences for this analysis, and to focus on the articulation, replacing the complex classification models by the long-term average spectra. Results clearly point to certain regions on the frequency axis, suggesting the existence of a systematic narrowing in the vocal tract section at the oropharynx. This was already described in the pathogenesis of this syndrome. Regarding sustained speech, similar experiments as those conducted on continuous speech were reproduced on sustained phonations of vowel / a /. Results were qualitatively similar to the previous ones, though in this case perfomance rates were found to be noticeably lower. Trying to derive further knowledge from this result, experiments on the long-term average spectra and intraand inter-group variability ratios were also reproduced on sustained speech records. Results on both experiments showed significant differences from the previous ones obtained from continuous speech which could explain the differences observed on peformance. However, sustained speech also provided the opportunity to study phonation within the controlled framework it provides. This was also identified in the literature as a source of information for the detection of OSA. In this study it was found that, for the available dataset, no sistematic differences related to phonation could be found between the two groups of speakers. Only those dimensions which relate energy distribution along the frequency axis provided significant differences, pointing once again towards the direction of resonant components. Once classification schemes on both continuous and sustained speech were developed, the Thesis addressed their combination into a single classification system. Under the assumption that the information in continuous and sustained speech is fundamentally different, it should be possible to successfully merge the two of them. This was tested through a simple fusion scheme which obtained a 88.6% correct classification (11.4% error rate), which represents a significant improvement over the state of the art. Finally, the combination of this classifier with the variables used by clinicians obtained a 91.3% accuracy (8.7% error rate). This is within the range of alternative, but costly and intrusive schemes, which unlike the one proposed can not be used in the preliminary assessment of patients’ condition. In the end, this Thesis has shed new light on the underlying connection between OSA and speech, and evidenced the degree of maturity reached by speech technology on OSA characterization and detection, leaving the door open for future research which shall continue in the multiple directions that have been pointed out and left as future work.
Resumo:
One of the biggest challenges in speech synthesis is the production of contextually-appropriate naturally sounding synthetic voices. This means that a Text-To-Speech system must be able to analyze a text beyond the sentence limits in order to select, or even modulate, the speaking style according to a broader context. Our current architecture is based on a two-step approach: text genre identification and speaking style synthesis according to the detected discourse genre. For the final implementation, a set of four genres and their corresponding speaking styles were considered: broadcast news, live sport commentaries, interviews and political speeches. In the final TTS evaluation, the four speaking styles were transplanted to the neutral voices of other speakers not included in the training database. When the transplanted styles were compared to the neutral voices, transplantation was significantly preferred and the similarity to the target speaker was as high as 78%.
Resumo:
The main objective of this study was to describe the outcomes of a communication education program for older people with hearing impairment using the International Outcome Inventory - Alternative Interventions (IOI-AI) and the version for significant others (IOI-AI-SO). Ninety-six people aged 58 to 94 years participated in an interactive group education program for two hours per week for five weeks. The IOI-AI was administered at one to two weeks after the last educational session and 29 significant others also completed the IOI-Al-SO at this time. Overall, positive results were obtained using both questionnaires, and satisfaction with the program was particularly high. Findings also compared favourably to reports of outcomes for other audiological interventions (i.e., another communication training program and hearing aid fitting). Principal components analysis of the IOI-AI revealed a somewhat different factor structure than the original IOI-HA. The two versions of the 101 applied in this study are recommended as simple and effective measures of the outcomes of alternative interventions.
Resumo:
This thesis addresses the viability of automatic speech recognition for control room systems; with careful system design, automatic speech recognition (ASR) devices can be useful means for human computer interaction in specific types of task. These tasks can be defined as complex verbal activities, such as command and control, and can be paired with spatial tasks, such as monitoring, without detriment. It is suggested that ASR use be confined to routine plant operation, as opposed the critical incidents, due to possible problems of stress on the operators' speech. It is proposed that using ASR will require operators to adapt a commonly used skill to cater for a novel use of speech. Before using the ASR device, new operators will require some form of training. It is shown that a demonstration by an experienced user of the device can lead to superior performance than instructions. Thus, a relatively cheap and very efficient form of operator training can be supplied by demonstration by experienced ASR operators. From a series of studies into speech based interaction with computers, it is concluded that the interaction be designed to capitalise upon the tendency of operators to use short, succinct, task specific styles of speech. From studies comparing different types of feedback, it is concluded that operators be given screen based feedback, rather than auditory feedback, for control room operation. Feedback will take two forms: the use of the ASR device will require recognition feedback, which will be best supplied using text; the performance of a process control task will require task feedback integrated into the mimic display. This latter feedback can be either textual or symbolic, but it is suggested that symbolic feedback will be more beneficial. Related to both interaction style and feedback is the issue of handling recognition errors. These should be corrected by simple command repetition practices, rather than use error handling dialogues. This method of error correction is held to be non intrusive to primary command and control operations. This thesis also addresses some of the problems of user error in ASR use, and provides a number of recommendations for its reduction.
Resumo:
This investigation studied the differences in learning styles among ethnically diverse secondary science students from a multicultural urban high school. It examined whether there were learning style differences among samples based on ethnicity, gender, academic grouping, and academic achievement. The learning style elements were based on scores of the Dunn, Dunn, and Price Learning Style Inventory (LSI) (1997). The sample (n = 476) consisted of students enrolled in Life Science courses. The analyses of data were made by one way analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA). ^ Significant differences were found among students for three of the four groups tested. The largest numbers of differences in learning style element preference were in academic grouping, with eight significant differences showing small or medium effect sizes. There were four significant differences between genders and one significant difference among ethnic groups. Effect size was small. The data analyses showed that individual differences have a much bigger effect than group differences on learning style, and that proportions in learning style element categories reveal more information than means of groups. ^ This study implied the need to increase awareness of differences in learning styles among students and help educators to understand them. Other predictors of learning styles might account for a large amount of the unexplained variation. Overall, this study reinforces the body of existing literature. ^
Resumo:
A study was conducted to test the therapeutic effects of assessment feedback on rapport-building and self-enhancement variables (self-verification, self-discovery, self-esteem), as well as symptomatology. Assessment feedback was provided in the form of interpretive information based on the results of the Millon Clinical Multiaxial Inventory-III (MCMI-III). Participants (N = 89) were randomly assigned to three groups: a Feedback group, a Reflective-Counseling group, and a No-Feedback group. The Feedback group was provided with assessment feedback, the Reflective-Counseling group was asked to comment on the meaning of the taking the MCMI-III, the No-Feedback group received general information about the MCMI-III. Results revealed that assessment feedback, when provided in the form of interpretive interpretation positively affects rapport-building and self-enhancement variables (self-verification and self-discovery). No significant results were found in terms of self-esteem or symptom decrease as a function of feedback. However, a significant decrease in symptoms across groups was found. Results indicate that assessment feedback in the form of interpretive information can be used as a starting point in therapy. Implications of the findings are discussed with respect to theory and clinical practice. ^
Resumo:
More information is now readily available to computer users than at any time in human history; however, much of this information is often inaccessible to people with blindness or low-vision, for whom information must be presented non-visually. Currently, screen readers are able to verbalize on-screen text using text-to-speech (TTS) synthesis; however, much of this vocalization is inadequate for browsing the Internet. An auditory interface that incorporates auditory-spatial orientation was created and tested. For information that can be structured as a two-dimensional table, links can be semantically grouped as cells in a row within an auditory table, which provides a consistent structure for auditory navigation. An auditory display prototype was tested.^ Sixteen legally blind subjects participated in this research study. Results demonstrated that stereo panning was an effective technique for audio-spatially orienting non-visual navigation in a five-row, six-column HTML table as compared to a centered, stationary synthesized voice. These results were based on measuring the time- to-target (TTT), or the amount of time elapsed from the first prompting to the selection of each tabular link. Preliminary analysis of the TTT values recorded during the experiment showed that the populations did not conform to the ANOVA requirements of normality and equality of variances. Therefore, the data were transformed using the natural logarithm. The repeated-measures two-factor ANOVA results show that the logarithmically-transformed TTTs were significantly affected by the tonal variation method, F(1,15) = 6.194, p= 0.025. Similarly, the results show that the logarithmically transformed TTTs were marginally affected by the stereo spatialization method, F(1,15) = 4.240, p=0.057. The results show that the logarithmically transformed TTTs were not significantly affected by the interaction of both methods, F(1,15) = 1.381, p=0.258. These results suggest that some confusion may be caused in the subject when employing both of these methods simultaneously. The significant effect of tonal variation indicates that the effect is actually increasing the average TTT. In other words, the presence of preceding tones increases task completion time on average. The marginally-significant effect of stereo spatialization decreases the average log(TTT) from 2.405 to 2.264.^