914 resultados para robust speech recognition
Resumo:
Spoken word recognition, during gating, appears intact in specific language impairment (SLI). This study used gating to investigate the process in adolescents with autism spectrum disorders plus language impairment (ALI). Adolescents with ALI, SLI, and typical language development (TLD), matched on nonverbal IQ listened to gated words that varied in frequency (low/high) and number of phonological onset neighbors (low/high density). Adolescents with ALI required more speech input to initially identify low-frequency words with low competitor density than those with SLI and those with TLD, who did not differ. These differences may be due to less well specified word form representations in ALI.
Resumo:
Background and aims: In addition to the well-known linguistic processing impairments in aphasia, oro-motor skills and articulatory implementation of speech segments are reported to be compromised to some degree in most types of aphasia. This study aimed to identify differences in the characteristics and coordination of lip movements in the production of a bilabial closure gesture between speech-like and nonspeech tasks in individuals with aphasia and healthy control subjects. Method and procedure: Upper and lower lip movement data were collected for a speech-like and a nonspeech task using an AG 100 EMMA system from five individuals with aphasia and five age and gender matched control subjects. Each task was produced at two rate conditions (normal and fast), and in a familiar and a less-familiar manner. Single articulator kinematic parameters (peak velocity, amplitude, duration, and cyclic spatio-temporal index) and multi-articulator coordination indices (average relative phase and variability of relative phase) were measured to characterize lip movements. Outcome and results: The results showed that when the two lips had similar task goals (bilabial closure) in speech-like versus nonspeech task, kinematic and coordination characteristics were not found to be different. However, when changes in rate were imposed on the bilabial gesture, only speech-like task showed functional adaptations, indicated by a greater decrease in amplitude and duration at fast rates. In terms of group differences, individuals with aphasia showed smaller amplitudes and longer movement durations for upper lip, higher spatio-temporal variability for both lips, and higher variability in lip coordination than the control speakers. Rate was an important factor in distinguishing the two groups, and individuals with aphasia were limited in implementing the rate changes. Conclusion and implications: The findings support the notion of subtle but robust differences in motor control characteristics between individuals with aphasia and the control participants, even in the context of producing bilabial closing gestures for a relatively simple speech-like task. The findings also highlight the functional differences between speech-like and nonspeech tasks, despite a common movement coordination goal for bilabial closure.
Resumo:
This paper presents a video surveillance framework that robustly and efficiently detects abandoned objects in surveillance scenes. The framework is based on a novel threat assessment algorithm which combines the concept of ownership with automatic understanding of social relations in order to infer abandonment of objects. Implementation is achieved through development of a logic-based inference engine based on Prolog. Threat detection performance is conducted by testing against a range of datasets describing realistic situations and demonstrates a reduction in the number of false alarms generated. The proposed system represents the approach employed in the EU SUBITO project (Surveillance of Unattended Baggage and the Identification and Tracking of the Owner).
Resumo:
Bowen and colleagues’ methods and conclusions raise concerns.1 At best, the trial evaluates the variability in current practice. In no way is it a robust test of treatment. Two communication impairments (aphasia and dysarthria) were included. In the post-acute stage spontaneous recovery is highly unpredictable, and changes in the profile of impairment during this time are common.2 Both impairments manifest in different forms,3 which may be more or less responsive to treatment. A third kind of impairment, apraxia of speech, was not excluded but was not targeted in therapy. All three impairments can and do co-occur. Whether randomised controlled trial designs can effectively cope with such complex disorders has been discussed elsewhere.4 Treatment was defined within terms of current practice but was unconstrained. Therefore, the treatment group would have received a variety of therapeutic approaches and protocols, some of which may indeed be ineffective. Only 53% of the contact time with a speech and language therapist was direct (one to one), the rest was impairment based therapy. In contrast, all of the visitors’ time was direct contact, usually in conversation. In both groups, the frequency and length of contact time varied. We already know that the transfer from impairment based therapy to functional communication can be limited and varies across individuals.5 However, it is not possible to conclude from this trial that one to one impairment based therapy should be replaced. For that, a well defined impairment therapy protocol must be directly compared with a similarly well defined functional communication therapy, with an attention control.
Resumo:
A new, healable, supramolecular nanocomposite material has been developed and evaluated. The material comprises a blend of three components: a pyrene-functionalized polyamide, a polydiimide and pyrenefunctionalized gold nanoparticles (P-AuNPs). The polymeric components interact by forming well-defined p–p stacked complexes between p-electron rich pyrenyl residues and p-electron deficient polydiimide residues. Solution studies in the mixed solvent chloroform–hexafluoroisopropanol (6 : 1, v/v) show that mixing the three components (each of which is soluble in isolation), results in the precipitation of a supramolecular, polymer nanocomposite network. The precipitate thus formed can be re-dissolved on heating, with the thermoreversible dissolution/precipitation procedure repeatable over at least 5 cycles. Robust, self-supporting composite films containing up to 15 wt% P-AuNPs could be cast from 2,2,2- trichloroethanol. Addition of as little as 1.25 wt% P-AuNPs resulted in significantly enhanced mechanical properties compared to the supramolecular blend without nanoparticles. The nanocomposites showed a linear increase in both tensile moduli and ultimate tensile strength with increasing P-AuNP content. All compositions up to 10 wt% P-AuNPs exhibited essentially quantitative healing efficiencies. Control experiments on an analogous nanocomposite material containing dodecylamine-functionalized AuNPs (5 wt%) exhibited a tensile modulus approximately half that of the corresponding nanocomposite that incorporated 5 wt% pyrene functionalized-AuNPs, clearly demonstrating the importance of the designed interactions between the gold filler and the supramolecular polymer matrix.
Resumo:
Anti-spoofing is attracting growing interest in biometrics, considering the variety of fake materials and new means to attack biometric recognition systems. New unseen materials continuously challenge state-of-the-art spoofing detectors, suggesting for additional systematic approaches to target anti-spoofing. By incorporating liveness scores into the biometric fusion process, recognition accuracy can be enhanced, but traditional sum-rule based fusion algorithms are known to be highly sensitive to single spoofed instances. This paper investigates 1-median filtering as a spoofing-resistant generalised alternative to the sum-rule targeting the problem of partial multibiometric spoofing where m out of n biometric sources to be combined are attacked. Augmenting previous work, this paper investigates the dynamic detection and rejection of livenessrecognition pair outliers for spoofed samples in true multi-modal configuration with its inherent challenge of normalisation. As a further contribution, bootstrap aggregating (bagging) classifiers for fingerprint spoof-detection algorithm is presented. Experiments on the latest face video databases (Idiap Replay- Attack Database and CASIA Face Anti-Spoofing Database), and fingerprint spoofing database (Fingerprint Liveness Detection Competition 2013) illustrate the efficiency of proposed techniques.
Resumo:
Since last two decades researches have been working on developing systems that can assistsdrivers in the best way possible and make driving safe. Computer vision has played a crucialpart in design of these systems. With the introduction of vision techniques variousautonomous and robust real-time traffic automation systems have been designed such asTraffic monitoring, Traffic related parameter estimation and intelligent vehicles. Among theseautomatic detection and recognition of road signs has became an interesting research topic.The system can assist drivers about signs they don’t recognize before passing them.Aim of this research project is to present an Intelligent Road Sign Recognition System basedon state-of-the-art technique, the Support Vector Machine. The project is an extension to thework done at ITS research Platform at Dalarna University [25]. Focus of this research work ison the recognition of road signs under analysis. When classifying an image its location, sizeand orientation in the image plane are its irrelevant features and one way to get rid of thisambiguity is to extract those features which are invariant under the above mentionedtransformation. These invariant features are then used in Support Vector Machine forclassification. Support Vector Machine is a supervised learning machine that solves problemin higher dimension with the help of Kernel functions and is best know for classificationproblems.
Resumo:
The project introduces an application using computer vision for Hand gesture recognition. A camera records a live video stream, from which a snapshot is taken with the help of interface. The system is trained for each type of count hand gestures (one, two, three, four, and five) at least once. After that a test gesture is given to it and the system tries to recognize it.A research was carried out on a number of algorithms that could best differentiate a hand gesture. It was found that the diagonal sum algorithm gave the highest accuracy rate. In the preprocessing phase, a self-developed algorithm removes the background of each training gesture. After that the image is converted into a binary image and the sums of all diagonal elements of the picture are taken. This sum helps us in differentiating and classifying different hand gestures.Previous systems have used data gloves or markers for input in the system. I have no such constraints for using the system. The user can give hand gestures in view of the camera naturally. A completely robust hand gesture recognition system is still under heavy research and development; the implemented system serves as an extendible foundation for future work.
Resumo:
This thesis presents a system to recognise and classify road and traffic signs for the purpose of developing an inventory of them which could assist the highway engineers’ tasks of updating and maintaining them. It uses images taken by a camera from a moving vehicle. The system is based on three major stages: colour segmentation, recognition, and classification. Four colour segmentation algorithms are developed and tested. They are a shadow and highlight invariant, a dynamic threshold, a modification of de la Escalera’s algorithm and a Fuzzy colour segmentation algorithm. All algorithms are tested using hundreds of images and the shadow-highlight invariant algorithm is eventually chosen as the best performer. This is because it is immune to shadows and highlights. It is also robust as it was tested in different lighting conditions, weather conditions, and times of the day. Approximately 97% successful segmentation rate was achieved using this algorithm.Recognition of traffic signs is carried out using a fuzzy shape recogniser. Based on four shape measures - the rectangularity, triangularity, ellipticity, and octagonality, fuzzy rules were developed to determine the shape of the sign. Among these shape measures octangonality has been introduced in this research. The final decision of the recogniser is based on the combination of both the colour and shape of the sign. The recogniser was tested in a variety of testing conditions giving an overall performance of approximately 88%.Classification was undertaken using a Support Vector Machine (SVM) classifier. The classification is carried out in two stages: rim’s shape classification followed by the classification of interior of the sign. The classifier was trained and tested using binary images in addition to five different types of moments which are Geometric moments, Zernike moments, Legendre moments, Orthogonal Fourier-Mellin Moments, and Binary Haar features. The performance of the SVM was tested using different features, kernels, SVM types, SVM parameters, and moment’s orders. The average classification rate achieved is about 97%. Binary images show the best testing results followed by Legendre moments. Linear kernel gives the best testing results followed by RBF. C-SVM shows very good performance, but ?-SVM gives better results in some case.
Resumo:
The purpose of this study was to determine the influence of hearing protection devices (HPDs) on the understanding of speech in young adults with normal hearing, both in a silent situation and in the presence of ambient noise. The experimental research was carried out with the following variables: five different conditions of HPD use (without protectors, with two earplugs and with two earmuffs); a type of noise (pink noise); 4 test levels (60, 70, 80 and 90 dB[A]); 6 signal/noise ratios (without noise, + 5, + 10, zero, - 5 and - 10 dB); 5 repetitions for each case, totalling 600 tests with 10 monosyllables in each one. The variable measure was the percentage of correctly heard words (monosyllabic) in the test. The results revealed that, at the lowest levels (60 and 70 dB), the protectors reduced the intelligibility of speech (compared to the tests without protectors) while, in the presence of ambient noise levels of 80 and 90 dB and unfavourable signal/noise ratios (0, -5 and -10 dB), the HPDs improved the intelligibility. A comparison of the effectiveness of earplugs versus earmuffs showed that the former offer greater efficiency in respect to the recognition of speech, providing a 30% improvement over situations in which no protection is used. As might be expected, this study confirmed that the protectors' influence on speech intelligibility is related directly to the spectral curve of the protector's attenuation. (C) 2003 Elsevier B.V. Ltd. All rights reserved.
Resumo:
Dental recognition is very important for forensic human identification, mainly regarding the mass disasters, which have frequently happened due to tsunamis, airplanes crashes, etc. Algorithms for automatic, precise, and robust teeth segmentation from radiograph images are crucial for dental recognition. In this work we propose the use of a graph-based algorithm to extract the teeth contours from panoramic dental radiographs that are used as dental features. In order to assess our proposal, we have carried out experiments using a database of 1126 tooth images, obtained from 40 panoramic dental radiograph images from 20 individuals. The results of the graph-based algorithm was qualitatively assessed by a human expert who reported excellent scores. For dental recognition we propose the use of the teeth shapes as biometric features, by the means of BAS (Bean Angle Statistics) and Shape Context descriptors. The BAS descriptors showed, on the same database, a better performance (EER 14%) than the Shape Context (EER 20%). © 2012 IEEE.
Resumo:
Cognitive dysfunction is found in patients with brain tumors and there is a need to determine whether it can be replicated in an experimental model. In the present study, the object recognition (OR) paradigm was used to investigate cognitive performance in nude mice, which represent one of the most important animal models available to study human tumors in vivo. Mice with orthotopic xenografts of the human U87MG glioblastoma cell line were trained at 9, 14, and 18days (D9, D14, and D18, respectively) after implantation of 5×10(5) cells. At D9, the mice showed normal behavior when tested 90min or 24h after training and compared to control nude mice. Animals at D14 were still able to discriminate between familiar and novel objects, but exhibited a lower performance than animals at D9. Total impairment in the OR memory was observed when animals were evaluated on D18. These alterations were detected earlier than any other clinical symptoms, which were observed only 22-24days after tumor implantation. There was a significant correlation between the discrimination index (d2) and time after tumor implantation as well as between d2 and tumor volume. These data indicate that the OR task is a robust test to identify early behavior alterations caused by glioblastoma in nude mice. In addition, these results suggest that OR task can be a reliable tool to test the efficacy of new therapies against these tumors.
Resumo:
As tumour specimens and biopsy specimens become smaller, recognition of anatomical structures relevant for staging is increasingly challenging. So far no marker is known that reliably discriminates between muscularis propria (MP) and muscularis mucosae (MM) of the gastrointestinal tract. Recently, smoothelin expression has been shown to differ in MP and MM of the urinary bladder. We aimed to analyse the expression of smoothelin in the gastrointestinal tract in MP and MM in order to define a novel diagnostic tool to identify MM bundles.
Resumo:
We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010. Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals. The goal of the procedure is to perceptually improve the sound of speech signals in background noise. The proposed new method modifies Xiao's method in four significant ways. Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end. Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique. With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training. Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation. And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz. The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios. The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures. Results of subjective CMOS tests over a smaller set of test samples support our claims.
Resumo:
Similarity measure is one of the main factors that affect the accuracy of intensity-based 2D/3D registration of X-ray fluoroscopy to CT images. Information theory has been used to derive similarity measure for image registration leading to the introduction of mutual information, an accurate similarity measure for multi-modal and mono-modal image registration tasks. However, it is known that the standard mutual information measure only takes intensity values into account without considering spatial information and its robustness is questionable. Previous attempt to incorporate spatial information into mutual information either requires computing the entropy of higher dimensional probability distributions, or is not robust to outliers. In this paper, we show how to incorporate spatial information into mutual information without suffering from these problems. Using a variational approximation derived from the Kullback-Leibler bound, spatial information can be effectively incorporated into mutual information via energy minimization. The resulting similarity measure has a least-squares form and can be effectively minimized by a multi-resolution Levenberg-Marquardt optimizer. Experimental results are presented on datasets of two applications: (a) intra-operative patient pose estimation from a few (e.g. 2) calibrated fluoroscopic images, and (b) post-operative cup alignment estimation from single X-ray radiograph with gonadal shielding.