806 resultados para Bantu-speaking peoples
Resumo:
Current text-to-speech systems are developed using studio-recorded speech in a neutral style or based on acted emotions. However, the proliferation of media sharing sites would allow developing a new generation of speech-based systems which could cope with spontaneous and styled speech. This paper proposes an architecture to deal with realistic recordings and carries out some experiments on unsupervised speaker diarization. In order to maximize the speaker purity of the clusters while keeping a high speaker coverage, the paper evaluates the F-measure of a diarization module, achieving high scores (>85%) especially when the clusters are longer than 30 seconds, even for the more spontaneous and expressive styles (such as talk shows or sports).
Resumo:
One of the biggest challenges in speech synthesis is the production of naturally sounding synthetic voices. This means that the resulting voice must be not only of high enough quality but also that it must be able to capture the natural expressiveness imbued in human speech. This paper focus on solving the expressiveness problem by proposing a set of different techniques that could be used for extrapolating the expressiveness of proven high quality speaking style models into neutral speakers in HMM-based synthesis. As an additional advantage, the proposed techniques are based on adaptation approaches, which means that they can be used with little training data (around 15 minutes of training data are used in each style for this paper). For the final implementation, a set of 4 speaking styles were considered: news broadcasts, live sports commentary, interviews and parliamentary speech. Finally, the implementation of the 5 techniques were tested through a perceptual evaluation that proves that the deviations between neutral and speaking style average models can be learned and used to imbue expressiveness into target neutral speakers as intended.
Resumo:
Voice therapies of muscle tension dysphonia in Germany need to be increased in effectiveness by applying intensive, manualized procedures and standardized assessment protocols. The same holds true for therapies of disturbed singer's voices. According to a Cochrane review on the effectiveness of therapies of functional dysphonia neither direct nor indirect voice therapies alone but combinations of both elements are effective (Ruotsalainen et al., 2007).
Resumo:
El uso universal de síntesis de voz en diferentes aplicaciones requeriría un desarrollo sencillo de las nuevas voces con poca intervención manual. Teniendo en cuenta la cantidad de datos multimedia disponibles en Internet y los medios de comunicación, un objetivo interesante es el desarrollo de herramientas y métodos para construir automáticamente las voces de estilo de varios de ellos. En un trabajo anterior se esbozó una metodología para la construcción de este tipo de herramientas, y se presentaron experimentos preliminares con una base de datos multiestilo. En este artículo investigamos más a fondo esta tarea y proponemos varias mejoras basadas en la selección del número apropiado de hablantes iniciales, el uso o no de filtros de reducción de ruido, el uso de la F0 y el uso de un algoritmo de detección de música. Hemos demostrado que el mejor sistema usando un algoritmo de detección de música disminuye el error de precisión 22,36% relativo para el conjunto de desarrollo y 39,64% relativo para el montaje de ensayo en comparación con el sistema base, sin degradar el factor de mérito. La precisión media para el conjunto de prueba es 90.62% desde 76.18% para los reportajes de 99,93% para los informes meteorológicos.
Resumo:
One of the biggest challenges in speech synthesis is the production of contextually-appropriate naturally sounding synthetic voices. This means that a Text-To-Speech system must be able to analyze a text beyond the sentence limits in order to select, or even modulate, the speaking style according to a broader context. Our current architecture is based on a two-step approach: text genre identification and speaking style synthesis according to the detected discourse genre. For the final implementation, a set of four genres and their corresponding speaking styles were considered: broadcast news, live sport commentaries, interviews and political speeches. In the final TTS evaluation, the four speaking styles were transplanted to the neutral voices of other speakers not included in the training database. When the transplanted styles were compared to the neutral voices, transplantation was significantly preferred and the similarity to the target speaker was as high as 78%.
Resumo:
The genetic history of a group of populations is usually analyzed by reconstructing a tree of their origins. Reliability of the reconstruction depends on the validity of the hypothesis that genetic differentiation of the populations is mostly due to population fissions followed by independent evolution. If necessary, adjustment for major population admixtures can be made. Dating the fissions requires comparisons with paleoanthropological and paleontological dates, which are few and uncertain. A method of absolute genetic dating recently introduced uses mutation rates as molecular clocks; it was applied to human evolution using microsatellites, which have a sufficiently high mutation rate. Results are comparable with those of other methods and agree with a recent expansion of modern humans from Africa. An alternative method of analysis, useful when there is adequate geographic coverage of regions, is the geographic study of frequencies of alleles or haplotypes. As in the case of trees, it is necessary to summarize data from many loci for conclusions to be acceptable. Results must be independent from the loci used. Multivariate analyses like principal components or multidimensional scaling reveal a number of hidden patterns and evaluate their relative importance. Most patterns found in the analysis of human living populations are likely to be consequences of demographic expansions, determined by technological developments affecting food availability, transportation, or military power. During such expansions, both genes and languages are spread to potentially vast areas. In principle, this tends to create a correlation between the respective evolutionary trees. The correlation is usually positive and often remarkably high. It can be decreased or hidden by phenomena of language replacement and also of gene replacement, usually partial, due to gene flow.
Resumo:
Este trabajo presenta la experiencia docente de una Red multidisciplinar de investigadores (Red I+Do+i), en la que han participado profesorado y estudiantes. El objetivo principal de la experiencia docente “Speaking Library” es tiene una doble vertiente. Por un lado, generar documentos de trabajo especializados en investigación en docencia y en materias curriculares relevantes para el alumnado, así como en soportes más accesibles, atractivos y útiles para la comunidad educativa. Se ha tenido especial interés en los estudiantes con Necesidades Específicas de Apoyo Educativo (NEAE) y en este sentido la creación de materiales ha sido fundamentalmente audiovisual. Por otro lado, la gestión dichos materiales a través de repositorios universitarios (Universidad de Alicante y Universidad Miguel Hernández) y de un canal docente de YouTube (canal IDOi), para su ulterior difusión nacional e internacional a las distintas bases de datos y portales adecuados (OCW, blogs UA, VUALA, Blogs externos, etc.) que facilitarán su consulta. Los resultados y reflexiones finales presentan varios documentos convertidos a un formato amigable, visual y valioso para los estudiantes con NEAE, a la vez que se ha diseñado un protocolo de actuación para la elaboración de los mismos y creado un canal docente en YouTube.