967 resultados para Speaker Recognition, Text-constrained, Multilingual, Speaker Verification, HMMs
Resumo:
This study is concerned with the linguistic situation in the town of Kirkuk in north eastern Iraq. In this town there are three main ethnic groups: Kurds, Arabs and Turkmana with some very smell minorities such as Chaldeene, Assyrians and Armenians. The languages spoken by these three ethnic groups belong to different language Family groups. In the First cart of the study the historical background of the population, a review of the literature, both of the present linguistic situation in Kirkuk end of relevant sociolinguistics in general, and the theoretical Framework, have been discussed in detail in order to provide background to this study which is mainly concerned with the Following areas: 1. The relationships existing between ethnic background and language usage and language loyalty in Kirkuk. 2. The attitudes of Kirkukiane towards language maintenance and language shift in Kirkuk. 3. Bilingual, multilingual individual communicative competence of Kurds, Arabs and Turkmans in the languages concerned, including the degree to which such a speaker is bilingual or multilingual and the nature of bilingualism or multilingualism in different domains and situations in Kirkuk. To throw light a these areas a situationally-oriented language survey was conducted; the relevant data was collected by randomly distributed questionnaire, by parsonal interview, by personal observation of language use and language attitudes in this town. The data subjected to commuter analysis and the results proved that the were no significant and substantial correlations between the language use, attitudes and competence based on the socio-economic status of respondents in this town, on the other hand, the correlations between the ethnic backgrounds and the language, use, attitudes and competence are indisoutable.
Resumo:
This thesis addresses the viability of automatic speech recognition for control room systems; with careful system design, automatic speech recognition (ASR) devices can be useful means for human computer interaction in specific types of task. These tasks can be defined as complex verbal activities, such as command and control, and can be paired with spatial tasks, such as monitoring, without detriment. It is suggested that ASR use be confined to routine plant operation, as opposed the critical incidents, due to possible problems of stress on the operators' speech. It is proposed that using ASR will require operators to adapt a commonly used skill to cater for a novel use of speech. Before using the ASR device, new operators will require some form of training. It is shown that a demonstration by an experienced user of the device can lead to superior performance than instructions. Thus, a relatively cheap and very efficient form of operator training can be supplied by demonstration by experienced ASR operators. From a series of studies into speech based interaction with computers, it is concluded that the interaction be designed to capitalise upon the tendency of operators to use short, succinct, task specific styles of speech. From studies comparing different types of feedback, it is concluded that operators be given screen based feedback, rather than auditory feedback, for control room operation. Feedback will take two forms: the use of the ASR device will require recognition feedback, which will be best supplied using text; the performance of a process control task will require task feedback integrated into the mimic display. This latter feedback can be either textual or symbolic, but it is suggested that symbolic feedback will be more beneficial. Related to both interaction style and feedback is the issue of handling recognition errors. These should be corrected by simple command repetition practices, rather than use error handling dialogues. This method of error correction is held to be non intrusive to primary command and control operations. This thesis also addresses some of the problems of user error in ASR use, and provides a number of recommendations for its reduction.
Resumo:
Three experiments assessed the development of children's part and configural (part-relational) processing in object recognition during adolescence. In total, 312 school children aged 7-16 years and 80 adults were tested in 3-alternative forced choice (3-AFC) tasks. They judged the correct appearance of upright and inverted presented familiar animals, artifacts, and newly learned multipart objects, which had been manipulated either in terms of individual parts or part relations. Manipulation of part relations was constrained to either metric (animals, artifacts, and multipart objects) or categorical (multipart objects only) changes. For animals and artifacts, even the youngest children were close to adult levels for the correct recognition of an individual part change. By contrast, it was not until 11-12 years of age that they achieved similar levels of performance with regard to altered metric part relations. For the newly learned multipart objects, performance was equivalent throughout the tested age range for upright presented stimuli in the case of categorical part-specific and part-relational changes. In the case of metric manipulations, the results confirmed the data pattern observed for animals and artifacts. Together, the results provide converging evidence, with studies of face recognition, for a surprisingly late consolidation of configural-metric relative to part-based object recognition.
Resumo:
Full text: Several Lancet publications have questioned the value of glycaemic control in diabetic patients. For example, in their Comment (Sept 29, p 1103),1 John Cleland and Stephen Atkin state that “Improved glycaemic control is not a surrogate for effective care of patients who have diabetes”, and Victor Montori and colleagues (p 1104)2 claim that “HbA1c loses its validity as a surrogate marker when patients have a constellation of metabolic abnormalities”. We are concerned that the reaction against “glucocentricity” in the field of diabetes has gone too far. Even the UK's National Prescribing Centre website, carrying the National Health Service logo, includes comments that undermine the value of glycaemic control. For example, referring to the United Kingdom Prospective Diabetes Study (UKPDS), this site states that “Compared with ‘conventional control’ there was no benefit from tight control of blood glucose with sulphonylureas or insulin with regard to total mortality, diabetes-related death, macrovascular outcomes or microvascular outcomes, including all the most serious ones such as blindness or kidney failure”.3 It is well established that better glycaemic control reduces long-term microvascular complications in type 1 and type 2 diabetes.4 In type 2 diabetes, the UKPDS reported that a composite microvascular endpoint (retinopathy requiring photocoagulation, vitreous haemorrhage, and fatal or non-fatal renal failure) was reduced by 25% in patients randomised to intensive glucose control (p=0·0099).4 To imply that these are not patient-relevant outcomes is to distort the evidence. Many studies have also found that improved glycaemic control reduces macrovascular complications.5 Do not be misled: glycaemic control remains a crucial component in the care of people with diabetes. The authors have received research support and undertaken ad hoc consultancies and speaker engagements for several pharmaceutical companies.
Resumo:
In this paper, a new method for offline handwriting recognition is presented. A robust algorithm for handwriting segmentation has been described here with the help of which individual characters can be segmented from a word selected from a paragraph of handwritten text image which is given as input to the module. Then each of the segmented characters are converted into column vectors of 625 values that are later fed into the advanced neural network setup that has been designed in the form of text files. The networks has been designed with quadruple layered neural network with 625 input and 26 output neurons each corresponding to a character from a-z, the outputs of all the four networks is fed into the genetic algorithm which has been developed using the concepts of correlation, with the help of this the overall network is optimized with the help of genetic algorithm thus providing us with recognized outputs with great efficiency of 71%.
Resumo:
2000 Mathematics Subject Classification: 62H30
Resumo:
As one of the most popular deep learning models, convolution neural network (CNN) has achieved huge success in image information extraction. Traditionally CNN is trained by supervised learning method with labeled data and used as a classifier by adding a classification layer in the end. Its capability of extracting image features is largely limited due to the difficulty of setting up a large training dataset. In this paper, we propose a new unsupervised learning CNN model, which uses a so-called convolutional sparse auto-encoder (CSAE) algorithm pre-Train the CNN. Instead of using labeled natural images for CNN training, the CSAE algorithm can be used to train the CNN with unlabeled artificial images, which enables easy expansion of training data and unsupervised learning. The CSAE algorithm is especially designed for extracting complex features from specific objects such as Chinese characters. After the features of articficial images are extracted by the CSAE algorithm, the learned parameters are used to initialize the first CNN convolutional layer, and then the CNN model is fine-Trained by scene image patches with a linear classifier. The new CNN model is applied to Chinese scene text detection and is evaluated with a multilingual image dataset, which labels Chinese, English and numerals texts separately. More than 10% detection precision gain is observed over two CNN models.
Resumo:
We propose a novel template matching approach for the discrimination of handwritten and machine-printed text. We first pre-process the scanned document images by performing denoising, circles/lines exclusion and word-block level segmentation. We then align and match characters in a flexible sized gallery with the segmented regions, using parallelised normalised cross-correlation. The experimental results over the Pattern Recognition & Image Analysis Research Lab-Natural History Museum (PRImA-NHM) dataset show remarkably high robustness of the algorithm in classifying cluttered, occluded and noisy samples, in addition to those with significant high missing data. The algorithm, which gives 84.0% classification rate with false positive rate 0.16 over the dataset, does not require training samples and generates compelling results as opposed to the training-based approaches, which have used the same benchmark.
Resumo:
Biometrics is afield of study which pursues the association of a person's identity with his/her physiological or behavioral characteristics.^ As one aspect of biometrics, face recognition has attracted special attention because it is a natural and noninvasive means to identify individuals. Most of the previous studies in face recognition are based on two-dimensional (2D) intensity images. Face recognition based on 2D intensity images, however, is sensitive to environment illumination and subject orientation changes, affecting the recognition results. With the development of three-dimensional (3D) scanners, 3D face recognition is being explored as an alternative to the traditional 2D methods for face recognition.^ This dissertation proposes a method in which the expression and the identity of a face are determined in an integrated fashion from 3D scans. In this framework, there is a front end expression recognition module which sorts the incoming 3D face according to the expression detected in the 3D scans. Then, scans with neutral expressions are processed by a corresponding 3D neutral face recognition module. Alternatively, if a scan displays a non-neutral expression, e.g., a smiling expression, it will be routed to an appropriate specialized recognition module for smiling face recognition.^ The expression recognition method proposed in this dissertation is innovative in that it uses information from 3D scans to perform the classification task. A smiling face recognition module was developed, based on the statistical modeling of the variance between faces with neutral expression and faces with a smiling expression.^ The proposed expression and face recognition framework was tested with a database containing 120 3D scans from 30 subjects (Half are neutral faces and half are smiling faces). It is shown that the proposed framework achieves a recognition rate 10% higher than attempting the identification with only the neutral face recognition module.^
Resumo:
Ensuring the correctness of software has been the major motivation in software research, constituting a Grand Challenge. Due to its impact in the final implementation, one critical aspect of software is its architectural design. By guaranteeing a correct architectural design, major and costly flaws can be caught early on in the development cycle. Software architecture design has received a lot of attention in the past years, with several methods, techniques and tools developed. However, there is still more to be done, such as providing adequate formal analysis of software architectures. On these regards, a framework to ensure system dependability from design to implementation has been developed at FIU (Florida International University). This framework is based on SAM (Software Architecture Model), an ADL (Architecture Description Language), that allows hierarchical compositions of components and connectors, defines an architectural modeling language for the behavior of components and connectors, and provides a specification language for the behavioral properties. The behavioral model of a SAM model is expressed in the form of Petri nets and the properties in first order linear temporal logic.^ This dissertation presents a formal verification and testing approach to guarantee the correctness of Software Architectures. The Software Architectures studied are expressed in SAM. For the formal verification approach, the technique applied was model checking and the model checker of choice was Spin. As part of the approach, a SAM model is formally translated to a model in the input language of Spin and verified for its correctness with respect to temporal properties. In terms of testing, a testing approach for SAM architectures was defined which includes the evaluation of test cases based on Petri net testing theory to be used in the testing process at the design level. Additionally, the information at the design level is used to derive test cases for the implementation level. Finally, a modeling and analysis tool (SAM tool) was implemented to help support the design and analysis of SAM models. The results show the applicability of the approach to testing and verification of SAM models with the aid of the SAM tool.^
Resumo:
The purpose of this study was to investigate the ontogeny of auditory learning via operant contingency in Northern bobwhite (Colinus virginianus ) hatchlings and possible interaction between attention, orienting and learning during early development. Chicks received individual 5 min training sessions in which they received a playback of a bobwhite maternal call at a single delay following each vocalization they emitted. Playback was either from a single randomly chosen speaker or switched back and forth semi-randomly between two speakers during training. Chicks were tested 24 hrs later in a simultaneous choice test between the familiar and an unfamiliar maternal call. It was found that day-old chicks showed a significant time-specific decrement in auditory learning when trained with delays in the range of 470–910 ms between their vocalizations and call playback only when training involved two speakers. Two-day-old birds showed an even more sustained disruption of learning than day-old chicks, whereas three-day-old chicks showed a pattern of intermittent interference with their learning when trained at such delays. A similar but less severe decrement in auditory learning was found when chicks were provided with motor training in which playback was contingent upon chicks entering and exiting one of two colored squares placed on the floor of the arena. Chicks provided with playback of the call at randomly chosen delays each time they vocalized exhibited large fluctuations in their responsivity to the auditory stimulus as a function of delay—fluctuations which were correlated significantly with measures of chick learning, particularly at two-days-of-age. When playback was limited to a single location chicks no longer showed a time-specific disruption of their learning of the auditory stimulus. Sequential analyses revealed several patterns suggesting that an attentional process similar or analogous to attentional blink may have contributed both to the observed fluctuations in chick responsivity to the auditory stimulus as a function of delay and to the time-specific learning deficit shown by chicks provided with two-speaker training. The study highlights that learning can be substantially modulated by processes of orienting and attention and has a number of important implications for research within cognitive neuroscience, animal behavior and learning.
Resumo:
This dissertation introduces a novel automated book reader as an assistive technology tool for persons with blindness. The literature shows extensive work in the area of optical character recognition, but the current methodologies available for the automated reading of books or bound volumes remain inadequate and are severely constrained during document scanning or image acquisition processes. The goal of the book reader design is to automate and simplify the task of reading a book while providing a user-friendly environment with a realistic but affordable system design. This design responds to the main concerns of (a) providing a method of image acquisition that maintains the integrity of the source (b) overcoming optical character recognition errors created by inherent imaging issues such as curvature effects and barrel distortion, and (c) determining a suitable method for accurate recognition of characters that yields an interface with the ability to read from any open book with a high reading accuracy nearing 98%. This research endeavor focuses in its initial aim on the development of an assistive technology tool to help persons with blindness in the reading of books and other bound volumes. But its secondary and broader aim is to also find in this design the perfect platform for the digitization process of bound documentation in line with the mission of the Open Content Alliance (OCA), a nonprofit Alliance at making reading materials available in digital form. The theoretical perspective of this research relates to the mathematical developments that are made in order to resolve both the inherent distortions due to the properties of the camera lens and the anticipated distortions of the changing page curvature as one leafs through the book. This is evidenced by the significant increase of the recognition rate of characters and a high accuracy read-out through text to speech processing. This reasonably priced interface with its high performance results and its compatibility to any computer or laptop through universal serial bus connectors extends greatly the prospects for universal accessibility to documentation.
Resumo:
Perception and recognition of faces are fundamental cognitive abilities that form a basis for our social interactions. Research has investigated face perception using a variety of methodologies across the lifespan. Habituation, novelty preference, and visual paired comparison paradigms are typically used to investigate face perception in young infants. Storybook recognition tasks and eyewitness lineup paradigms are generally used to investigate face perception in young children. These methodologies have introduced systematic differences including the use of linguistic information for children but not infants, greater memory load for children than infants, and longer exposure times to faces for infants than for older children, making comparisons across age difficult. Thus, research investigating infant and child perception of faces using common methods, measures, and stimuli is needed to better understand how face perception develops. According to predictions of the Intersensory Redundancy Hypothesis (IRH; Bahrick & Lickliter, 2000, 2002), in early development, perception of faces is enhanced in unimodal visual (i.e., silent dynamic face) rather than bimodal audiovisual (i.e., dynamic face with synchronous speech) stimulation. The current study investigated the development of face recognition across children of three ages: 5 – 6 months, 18 – 24 months, and 3.5 – 4 years, using the novelty preference paradigm and the same stimuli for all age groups. It also assessed the role of modality (unimodal visual versus bimodal audiovisual) and memory load (low versus high) on face recognition. It was hypothesized that face recognition would improve across age and would be enhanced in unimodal visual stimulation with a low memory load. Results demonstrated a developmental trend (F(2, 90) = 5.00, p = 0.009) with older children showing significantly better recognition of faces than younger children. In contrast to predictions, no differences were found as a function of modality of presentation (bimodal audiovisual versus unimodal visual) or memory load (low versus high). This study was the first to demonstrate a developmental improvement in face recognition from infancy through childhood using common methods, measures and stimuli consistent across age.
Resumo:
Race in Argentina played a significant role as a highly durable construct by identifying and advancing subjects (1776–1810) and citizens (1811–1853). My dissertation explores the intricacies of power relations by focusing on the ways in which race informed the legal process during the transition from a colonial to national State. It argues that the State’s development in both the colonial and national periods depended upon defining and classifying African descendants. In response, people of African descendent used the State’s assigned definitions and classifications to advance their legal identities. It employs race and culture as operative concepts, and law as a representation of the sometimes, tense relationship between social practices and the State’s concern for social peace. This dissertation examines the dynamic nature of the court. It utilizes the theoretical concepts multicentric legal orders that are analyzed through weak and strong legal pluralisms, and jurisdictional politics, from the late eighteenth to early nineteenth centuries. This dissertation juxtaposes various levels of jurisdiction (canon/state law and colonial/national law) to illuminate how people of color used the legal system to ameliorate their social condition. In each chapter the primary source materials are state generated documents which include criminal, ecclesiastical, civil, and marriage dissent court cases along with notarial and census records. Though it would appear that these documents would provide a superficial understanding of people of color, my analysis provides both a top-down and bottom-up approach that reflects a continuous negotiation for African descendants’ goal for State recognition. These approaches allow for implicit or explicit negotiation of a legal identity that transformed slaves and free African descendants into active agents of their own destinies.
Resumo:
http://digitalcommons.fiu.edu/com_images/1004/thumbnail.jpg