34 resultados para speech segmentation
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
In image segmentation, clustering algorithms are very popular because they are intuitive and, some of them, easy to implement. For instance, the k-means is one of the most used in the literature, and many authors successfully compare their new proposal with the results achieved by the k-means. However, it is well known that clustering image segmentation has many problems. For instance, the number of regions of the image has to be known a priori, as well as different initial seed placement (initial clusters) could produce different segmentation results. Most of these algorithms could be slightly improved by considering the coordinates of the image as features in the clustering process (to take spatial region information into account). In this paper we propose a significant improvement of clustering algorithms for image segmentation. The method is qualitatively and quantitative evaluated over a set of synthetic and real images, and compared with classical clustering approaches. Results demonstrate the validity of this new approach
Resumo:
In this paper, an information theoretic framework for image segmentation is presented. This approach is based on the information channel that goes from the image intensity histogram to the regions of the partitioned image. It allows us to define a new family of segmentation methods which maximize the mutual information of the channel. Firstly, a greedy top-down algorithm which partitions an image into homogeneous regions is introduced. Secondly, a histogram quantization algorithm which clusters color bins in a greedy bottom-up way is defined. Finally, the resulting regions in the partitioning algorithm can optionally be merged using the quantized histogram
Real-Time implementation of a blind authentication method using self-synchronous speech watermarking
Resumo:
A blind speech watermarking scheme that meets hard real-time deadlines is presented and implemented. In addition, one of the key issues in these block-oriented watermarking techniques is to preserve the synchronization. Namely, to recover the exact position of each block in the mark extract process. In fact, the presented scheme can be split up into two distinguished parts, the synchronization and the information mark methods. The former is embedded into the time domain and it is fast enough to be run meeting real-time requirements. The latter contains the authentication information and it is embedded into the wavelet domain. The synchronization and information mark techniques are both tunable in order to allow a con gurable method. Thus, capacity, transparency and robustness can be con gured depending on the needs. It makes the scheme useful for professional applications, such telephony authentication or even sending information throw radio applications.
Resumo:
A novel technique for estimating the rank of the trajectory matrix in the local subspace affinity (LSA) motion segmentation framework is presented. This new rank estimation is based on the relationship between the estimated rank of the trajectory matrix and the affinity matrix built with LSA. The result is an enhanced model selection technique for trajectory matrix rank estimation by which it is possible to automate LSA, without requiring any a priori knowledge, and to improve the final segmentation
Resumo:
In this paper a novel rank estimation technique for trajectories motion segmentation within the Local Subspace Affinity (LSA) framework is presented. This technique, called Enhanced Model Selection (EMS), is based on the relationship between the estimated rank of the trajectory matrix and the affinity matrix built by LSA. The results on synthetic and real data show that without any a priori knowledge, EMS automatically provides an accurate and robust rank estimation, improving the accuracy of the final motion segmentation
Resumo:
Los hablantes bilingües tienen un acceso al léxico más lento y menos robusto que los monolingües, incluso cuando hablan en su lengua materna y dominante. Este fenómeno, comúnmente llamado “la desventaja bilingüe” también se observa en hablantes de una segunda lengua en comparación con hablantes de una primera lengua. Una causa que posiblemente contribuya a estas desventajas es el uso de control inhibitorio durante la producción del lenguaje: la inhibición de palabras coactivadas de la lengua actualmente no en uso puede prevenir intrusiones de dicha lengua, pero al mismo tiempo ralentizar la producción del lenguaje. El primer objetivo de los estudios descritos en este informe era testear esta hipótesis mediante diferentes predicciones generadas por teorías de control inhibitorio del lenguaje. Un segundo objetivo era investigar la extensión de la desventaja bilingüe dentro y fuera de la producción de palabras aisladas, así como avanzar en el conocimiento de las variables que la modulan. En lo atingente al primer objetivo, la evidencia obtenida es incompatible con un control inhibitorio global, desafiando la idea de mecanismos específicos en el hablante bilingüe utilizados para la selección léxica. Esto implica que una explicación común para el control de lenguaje y la desventaja bilingüe en el acceso al léxico es poco plausible. En cuanto al segundo objetivo, los resultados muestran que (a) la desventaja bilingüe no tiene un impacto al acceso a la memoria; (b) la desventaja bilingüe extiende a la producción del habla conectada; y (c) similitudes entre lenguas a diferentes niveles de representación así como la frecuencia de uso son factores que modulan la desventaja bilingüe.
Resumo:
The project aims at advancing the state of the art in the use of context information for classification of image and video data. The use of context in the classification of images has been showed of great importance to improve the performance of actual object recognition systems. In our project we proposed the concept of Multi-scale Feature Labels as a general and compact method to exploit the local and global context. The feature extraction from the discriminative probability or classification confidence label field is of great novelty. Moreover the use of a multi-scale representation of the feature labels lead to a compact and efficient description of the context. The goal of the project has been also to provide a general-purpose method and prove its suitability in different image/video analysis problem. The two-year project generated 5 journal publications (plus 2 under submission), 10 conference publications (plus 2 under submission) and one patent (plus 1 pending). Of these publications, a relevant number make use of the main result of this project to improve the results in detection and/or segmentation of objects.
Resumo:
Purpose: To evaluate the suitability of an improved version of an automatic segmentation method based on geodesic active regions (GAR) for segmenting cerebral vasculature with aneurysms from 3D X-ray reconstruc-tion angiography (3DRA) and time of °ight magnetic resonance angiography (TOF-MRA) images available in the clinical routine.Methods: Three aspects of the GAR method have been improved: execution time, robustness to variability in imaging protocols and robustness to variability in image spatial resolutions. The improved GAR was retrospectively evaluated on images from patients containing intracranial aneurysms in the area of the Circle of Willis and imaged with two modalities: 3DRA and TOF-MRA. Images were obtained from two clinical centers, each using di®erent imaging equipment. Evaluation included qualitative and quantitative analyses ofthe segmentation results on 20 images from 10 patients. The gold standard was built from 660 cross-sections (33 per image) of vessels and aneurysms, manually measured by interventional neuroradiologists. GAR has also been compared to an interactive segmentation method: iso-intensity surface extraction (ISE). In addition, since patients had been imaged with the two modalities, we performed an inter-modality agreement analysis with respect to both the manual measurements and each of the two segmentation methods. Results: Both GAR and ISE di®ered from the gold standard within acceptable limits compared to the imaging resolution. GAR (ISE, respectively) had an average accuracy of 0.20 (0.24) mm for 3DRA and 0.27 (0.30) mm for TOF-MRA, and had a repeatability of 0.05 (0.20) mm. Compared to ISE, GAR had a lower qualitative error in the vessel region and a lower quantitative error in the aneurysm region. The repeatabilityof GAR was superior to manual measurements and ISE. The inter-modality agreement was similar between GAR and the manual measurements. Conclusions: The improved GAR method outperformed ISE qualitatively as well as quantitatively and is suitable for segmenting 3DRA and TOF-MRA images from clinical routine.
Resumo:
This case study presents corpus data gathered from a Spanish-English bilingual child with expressive language delay. Longitudinal data on the child’s linguistic development was collected from the onset of productive speech at age 1;1 until age 4 over the course of 28 video-taped sessions with the child’s principal caregivers. A literature review focused on the relationship between language delay and persisting disorders—including a discussion of the frequent difficulty in distinguishing between the two at early stages of bilingual development—is followed by an analysis of the child’s productive development in 2 distinct phases. An attempt is made to assess the child’s speech at age 4 for preliminary signs of SLI and to consider techniques for identifying ‘at risk’ bilingual children (that is, those with productive language delay, poor oral fluency, and family history of language problems) based on samples of recorded and transcribed speech.
Resumo:
Background Accurate automatic segmentation of the caudate nucleus in magnetic resonance images (MRI) of the brain is of great interest in the analysis of developmental disorders. Segmentation methods based on a single atlas or on multiple atlases have been shown to suitably localize caudate structure. However, the atlas prior information may not represent the structure of interest correctly. It may therefore be useful to introduce a more flexible technique for accurate segmentations. Method We present Cau-dateCut: a new fully-automatic method of segmenting the caudate nucleus in MRI. CaudateCut combines an atlas-based segmentation strategy with the Graph Cut energy-minimization framework. We adapt the Graph Cut model to make it suitable for segmenting small, low-contrast structures, such as the caudate nucleus, by defining new energy function data and boundary potentials. In particular, we exploit information concerning the intensity and geometry, and we add supervised energies based on contextual brain structures. Furthermore, we reinforce boundary detection using a new multi-scale edgeness measure. Results We apply the novel CaudateCut method to the segmentation of the caudate nucleus to a new set of 39 pediatric attention-deficit/hyperactivity disorder (ADHD) patients and 40 control children, as well as to a public database of 18 subjects. We evaluate the quality of the segmentation using several volumetric and voxel by voxel measures. Our results show improved performance in terms of segmentation compared to state-of-the-art approaches, obtaining a mean overlap of 80.75%. Moreover, we present a quantitative volumetric analysis of caudate abnormalities in pediatric ADHD, the results of which show strong correlation with expert manual analysis. Conclusion CaudateCut generates segmentation results that are comparable to gold-standard segmentations and which are reliable in the analysis of differentiating neuroanatomical abnormalities between healthy controls and pediatric ADHD.
Resumo:
This special issue aims to cover some problems related to non-linear and nonconventional speech processing. The origin of this volume is in the ISCA Tutorial and Research Workshop on Non-Linear Speech Processing, NOLISP’09, held at the Universitat de Vic (Catalonia, Spain) on June 25–27, 2009. The series of NOLISP workshops started in 2003 has become a biannual event whose aim is to discuss alternative techniques for speech processing that, in a sense, do not fit into mainstream approaches. A selected choice of papers based on the presentations delivered at NOLISP’09 has given rise to this issue of Cognitive Computation.
Resumo:
The work presented here is part of a larger study to identify novel technologies and biomarkers for early Alzheimer disease (AD) detection and it focuses on evaluating the suitability of a new approach for early AD diagnosis by non-invasive methods. The purpose is to examine in a pilot study the potential of applying intelligent algorithms to speech features obtained from suspected patients in order to contribute to the improvement of diagnosis of AD and its degree of severity. In this sense, Artificial Neural Networks (ANN) have been used for the automatic classification of the two classes (AD and control subjects). Two human issues have been analyzed for feature selection: Spontaneous Speech and Emotional Response. Not only linear features but also non-linear ones, such as Fractal Dimension, have been explored. The approach is non invasive, low cost and without any side effects. Obtained experimental results were very satisfactory and promising for early diagnosis and classification of AD patients.
Resumo:
Alzheimer's disease is the most prevalent form of progressive degenerative dementia; it has a high socio-economic impact in Western countries. Therefore it is one of the most active research areas today. Alzheimer's is sometimes diagnosed by excluding other dementias, and definitive confirmation is only obtained through a post-mortem study of the brain tissue of the patient. The work presented here is part of a larger study that aims to identify novel technologies and biomarkers for early Alzheimer's disease detection, and it focuses on evaluating the suitability of a new approach for early diagnosis of Alzheimer’s disease by non-invasive methods. The purpose is to examine, in a pilot study, the potential of applying Machine Learning algorithms to speech features obtained from suspected Alzheimer sufferers in order help diagnose this disease and determine its degree of severity. Two human capabilities relevant in communication have been analyzed for feature selection: Spontaneous Speech and Emotional Response. The experimental results obtained were very satisfactory and promising for the early diagnosis and classification of Alzheimer’s disease patients.
Resumo:
Alzheimer’s disease (AD) is the most prevalent form of progressive degenerative dementia and it has a high socio-economic impact in Western countries, therefore is one of the most active research areas today. Its diagnosis is sometimes made by excluding other dementias, and definitive confirmation must be done trough a post-mortem study of the brain tissue of the patient. The purpose of this paper is to contribute to im-provement of early diagnosis of AD and its degree of severity, from an automatic analysis performed by non-invasive intelligent methods. The methods selected in this case are Automatic Spontaneous Speech Analysis (ASSA) and Emotional Temperature (ET), that have the great advantage of being non invasive, low cost and without any side effects.
Resumo:
This article analyses different factors that influence the purchasing behaviour of online supermarket customers. These factors are related to both the appearance of the website as well as the processes that take place when making the purchase. Based on these analyses, the various groups of consumers with homogenous behaviour are studied and positioned according to their attitudes. The analysis also allows the quality of the service offered by this kind of establishment to be defined, as well as the main dimensions in which it develops. In the conclusions, factors which should influence the manager of an online supermarket to improve the quality of its service are given