924 resultados para acoustic speech recognition system
Resumo:
Handwriting is an acquired tool used for communication of one's observations or feelings. Factors that inuence a person's handwriting not only dependent on the individual's bio-mechanical constraints, handwriting education received, writing instrument, type of paper, background, but also factors like stress, motivation and the purpose of the handwriting. Despite the high variation in a person's handwriting, recent results from different writer identification studies have shown that it possesses sufficient individual traits to be used as an identification method. Handwriting as a behavioral biometric has had the interest of researchers for a long time. But recently it has been enjoying new interest due to an increased need and effort to deal with problems ranging from white-collar crime to terrorist threats. The identification of the writer based on a piece of handwriting is a challenging task for pattern recognition. The main objective of this thesis is to develop a text independent writer identification system for Malayalam Handwriting. The study also extends to developing a framework for online character recognition of Grantha script and Malayalam characters
Resumo:
This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements
Resumo:
This thesis investigated the potential use of Linear Predictive Coding in speech communication applications. A Modified Block Adaptive Predictive Coder is developed, which reduces the computational burden and complexity without sacrificing the speech quality, as compared to the conventional adaptive predictive coding (APC) system. For this, changes in the evaluation methods have been evolved. This method is as different from the usual APC system in that the difference between the true and the predicted value is not transmitted. This allows the replacement of the high order predictor in the transmitter section of a predictive coding system, by a simple delay unit, which makes the transmitter quite simple. Also, the block length used in the processing of the speech signal is adjusted relative to the pitch period of the signal being processed rather than choosing a constant length as hitherto done by other researchers. The efficiency of the newly proposed coder has been supported with results of computer simulation using real speech data. Three methods for voiced/unvoiced/silent/transition classification have been presented. The first one is based on energy, zerocrossing rate and the periodicity of the waveform. The second method uses normalised correlation coefficient as the main parameter, while the third method utilizes a pitch-dependent correlation factor. The third algorithm which gives the minimum error probability has been chosen in a later chapter to design the modified coder The thesis also presents a comparazive study beh-cm the autocorrelation and the covariance methods used in the evaluaiicn of the predictor parameters. It has been proved that the azztocorrelation method is superior to the covariance method with respect to the filter stabf-it)‘ and also in an SNR sense, though the increase in gain is only small. The Modified Block Adaptive Coder applies a switching from pitch precitzion to spectrum prediction when the speech segment changes from a voiced or transition region to an unvoiced region. The experiments cont;-:ted in coding, transmission and simulation, used speech samples from .\£=_‘ajr2_1a:r1 and English phrases. Proposal for a speaker reecgnifion syste: and a phoneme identification system has also been outlized towards the end of the thesis.
Resumo:
The motion instability is an important issue that occurs during the operation of towed underwater vehicles (TUV), which considerably affects the accuracy of high precision acoustic instrumentations housed inside the same. Out of the various parameters responsible for this, the disturbances from the tow-ship are the most significant one. The present study focus on the motion dynamics of an underwater towing system with ship induced disturbances as the input. The study focus on an innovative system called two-part towing. The methodology involves numerical modeling of the tow system, which consists of modeling of the tow-cables and vehicles formulation. Previous study in this direction used a segmental approach for the modeling of the cable. Even though, the model was successful in predicting the heave response of the tow-body, instabilities were observed in the numerical solution. The present study devises a simple approach called lumped mass spring model (LMSM) for the cable formulation. In this work, the traditional LMSM has been modified in two ways. First, by implementing advanced time integration procedures and secondly, use of a modified beam model which uses only translational degrees of freedoms for solving beam equation. A number of time integration procedures, such as Euler, Houbolt, Newmark and HHT-α were implemented in the traditional LMSM and the strength and weakness of each scheme were numerically estimated. In most of the previous studies, hydrodynamic forces acting on the tow-system such as drag and lift etc. are approximated as analytical expression of velocities. This approach restricts these models to use simple cylindrical shaped towed bodies and may not be applicable modern tow systems which are diversed in shape and complexity. Hence, this particular study, hydrodynamic parameters such as drag and lift of the tow-system are estimated using CFD techniques. To achieve this, a RANS based CFD code has been developed. Further, a new convection interpolation scheme for CFD simulation, called BNCUS, which is blend of cell based and node based formulation, was proposed in the study and numerically tested. To account for the fact that simulation takes considerable time in solving fluid dynamic equations, a dedicated parallel computing setup has been developed. Two types of computational parallelisms are explored in the current study, viz; the model for shared memory processors and distributed memory processors. In the present study, shared memory model was used for structural dynamic analysis of towing system, distributed memory one was devised in solving fluid dynamic equations.
Resumo:
This paper presents the design and development of a frame based approach for speech to sign language machine translation system in the domain of railways and banking. This work aims to utilize the capability of Artificial intelligence for the improvement of physically challenged, deaf-mute people. Our work concentrates on the sign language used by the deaf community of Indian subcontinent which is called Indian Sign Language (ISL). Input to the system is the clerk’s speech and the output of this system is a 3D virtual human character playing the signs for the uttered phrases. The system builds up 3D animation from pre-recorded motion capture data. Our work proposes to build a Malayalam to ISL
Resumo:
Optical Character Recognition plays an important role in Digital Image Processing and Pattern Recognition. Even though ambient study had been performed on foreign languages like Chinese and Japanese, effort on Indian script is still immature. OCR in Malayalam language is more complex as it is enriched with largest number of characters among all Indian languages. The challenge of recognition of characters is even high in handwritten domain, due to the varying writing style of each individual. In this paper we propose a system for recognition of offline handwritten Malayalam vowels. The proposed method uses Chain code and Image Centroid for the purpose of extracting features and a two layer feed forward network with scaled conjugate gradient for classification
Resumo:
Content Based Image Retrieval is one of the prominent areas in Computer Vision and Image Processing. Recognition of handwritten characters has been a popular area of research for many years and still remains an open problem. The proposed system uses visual image queries for retrieving similar images from database of Malayalam handwritten characters. Local Binary Pattern (LBP) descriptors of the query images are extracted and those features are compared with the features of the images in database for retrieving desired characters. This system with local binary pattern gives excellent retrieval performance
Resumo:
Modeling nonlinear systems using Volterra series is a century old method but practical realizations were hampered by inadequate hardware to handle the increased computational complexity stemming from its use. But interest is renewed recently, in designing and implementing filters which can model much of the polynomial nonlinearities inherent in practical systems. The key advantage in resorting to Volterra power series for this purpose is that nonlinear filters so designed can be made to work in parallel with the existing LTI systems, yielding improved performance. This paper describes the inclusion of a quadratic predictor (with nonlinearity order 2) with a linear predictor in an analog source coding system. Analog coding schemes generally ignore the source generation mechanisms but focuses on high fidelity reconstruction at the receiver. The widely used method of differential pnlse code modulation (DPCM) for speech transmission uses a linear predictor to estimate the next possible value of the input speech signal. But this linear system do not account for the inherent nonlinearities in speech signals arising out of multiple reflections in the vocal tract. So a quadratic predictor is designed and implemented in parallel with the linear predictor to yield improved mean square error performance. The augmented speech coder is tested on speech signals transmitted over an additive white gaussian noise (AWGN) channel.
Resumo:
Biometrics is an efficient technology with great possibilities in the area of security system development for official and commercial applications. The biometrics has recently become a significant part of any efficient person authentication solution. The advantage of using biometric traits is that they cannot be stolen, shared or even forgotten. The thesis addresses one of the emerging topics in Authentication System, viz., the implementation of Improved Biometric Authentication System using Multimodal Cue Integration, as the operator assisted identification turns out to be tedious, laborious and time consuming. In order to derive the best performance for the authentication system, an appropriate feature selection criteria has been evolved. It has been seen that the selection of too many features lead to the deterioration in the authentication performance and efficiency. In the work reported in this thesis, various judiciously chosen components of the biometric traits and their feature vectors are used for realizing the newly proposed Biometric Authentication System using Multimodal Cue Integration. The feature vectors so generated from the noisy biometric traits is compared with the feature vectors available in the knowledge base and the most matching pattern is identified for the purpose of user authentication. In an attempt to improve the success rate of the Feature Vector based authentication system, the proposed system has been augmented with the user dependent weighted fusion technique.
Resumo:
Die thermische Verarbeitung von Lebensmitteln beeinflusst deren Qualität und ernährungsphysiologischen Eigenschaften. Im Haushalt ist die Überwachung der Temperatur innerhalb des Lebensmittels sehr schwierig. Zudem ist das Wissen über optimale Temperatur- und Zeitparameter für die verschiedenen Speisen oft unzureichend. Die optimale Steuerung der thermischen Zubereitung ist maßgeblich abhängig von der Art des Lebensmittels und der äußeren und inneren Temperatureinwirkung während des Garvorgangs. Das Ziel der Arbeiten war die Entwicklung eines automatischen Backofens, der in der Lage ist, die Art des Lebensmittels zu erkennen und die Temperatur im Inneren des Lebensmittels während des Backens zu errechnen. Die für die Temperaturberechnung benötigten Daten wurden mit mehreren Sensoren erfasst. Hierzu kam ein Infrarotthermometer, ein Infrarotabstandssensor, eine Kamera, ein Temperatursensor und ein Lambdasonde innerhalb des Ofens zum Einsatz. Ferner wurden eine Wägezelle, ein Strom- sowie Spannungs-Sensor und ein Temperatursensor außerhalb des Ofens genutzt. Die während der Aufheizphase aufgenommen Datensätze ermöglichten das Training mehrerer künstlicher neuronaler Netze, die die verschiedenen Lebensmittel in die entsprechenden Kategorien einordnen konnten, um so das optimale Backprogram auszuwählen. Zur Abschätzung der thermische Diffusivität der Nahrung, die von der Zusammensetzung (Kohlenhydrate, Fett, Protein, Wasser) abhängt, wurden mehrere künstliche neuronale Netze trainiert. Mit Ausnahme des Fettanteils der Lebensmittel konnten alle Komponenten durch verschiedene KNNs mit einem Maximum von 8 versteckten Neuronen ausreichend genau abgeschätzt werden um auf deren Grundlage die Temperatur im inneren des Lebensmittels zu berechnen. Die durchgeführte Arbeit zeigt, dass mit Hilfe verschiedenster Sensoren zur direkten beziehungsweise indirekten Messung der äußeren Eigenschaften der Lebensmittel sowie KNNs für die Kategorisierung und Abschätzung der Lebensmittelzusammensetzung die automatische Erkennung und Berechnung der inneren Temperatur von verschiedensten Lebensmitteln möglich ist.
Resumo:
The measurement of feed intake, feeding time and rumination time, summarized by the term feeding behavior, are helpful indicators for early recognition of animals which show deviations in their behavior. The overall objective of this work was the development of an early warning system for inadequate feeding rations and digestive and metabolic disorders, which prevention constitutes the basis for health, performance, and reproduction. In a literature review, the current state of the art and the suitability of different measurement tools to determine feeding behavior of ruminants was discussed. Five measurement methods based on different methodological approaches (visual observance, pressure transducer, electrical switches, electrical deformation sensors and acoustic biotelemetry), and three selected measurement techniques (the IGER Behavior Recorder, the Hi-Tag rumination monitoring system and RumiWatchSystem) were described, assessed and compared to each other within this review. In the second study, the new system for measuring feeding behavior of dairy cows was evaluated. The measurement of feeding behavior ensues through electromyography (EMG). For validation, the feeding behavior of 14 cows was determined by both the EMG system and by visual observation. The high correlation coefficients indicate that the current system is a reliable and suitable tool for monitoring the feeding behavior of dairy cows. The aim of a further study was to compare the DairyCheck (DC) system and two additional measurement systems for measuring rumination behavior in relation to efficiency, reliability and reproducibility, with respect to each other. The two additional systems were labeled as the Lely Qwes HR (HR) sensor, and the RumiWatchSystem (RW). Results of accordance of RW and DC to each other were high. The last study examined whether rumination time (RT) is affected by the onset of calving and if it might be a useful indicator for the prediction of imminent birth. Data analysis referred to the final 72h before the onset of calving, which were divided into twelve 6h-blocks. The results showed that RT was significantly reduced in the final 6h before imminent birth.
Resumo:
This thesis describes the development of a model-based vision system that exploits hierarchies of both object structure and object scale. The focus of the research is to use these hierarchies to achieve robust recognition based on effective organization and indexing schemes for model libraries. The goal of the system is to recognize parameterized instances of non-rigid model objects contained in a large knowledge base despite the presence of noise and occlusion. Robustness is achieved by developing a system that can recognize viewed objects that are scaled or mirror-image instances of the known models or that contain components sub-parts with different relative scaling, rotation, or translation than in models. The approach taken in this thesis is to develop an object shape representation that incorporates a component sub-part hierarchy- to allow for efficient and correct indexing into an automatically generated model library as well as for relative parameterization among sub-parts, and a scale hierarchy- to allow for a general to specific recognition procedure. After analysis of the issues and inherent tradeoffs in the recognition process, a system is implemented using a representation based on significant contour curvature changes and a recognition engine based on geometric constraints of feature properties. Examples of the system's performance are given, followed by an analysis of the results. In conclusion, the system's benefits and limitations are presented.
Resumo:
Object recognition is complicated by clutter, occlusion, and sensor error. Since pose hypotheses are based on image feature locations, these effects can lead to false negatives and positives. In a typical recognition algorithm, pose hypotheses are tested against the image, and a score is assigned to each hypothesis. We use a statistical model to determine the score distribution associated with correct and incorrect pose hypotheses, and use binary hypothesis testing techniques to distinguish between them. Using this approach we can compare algorithms and noise models, and automatically choose values for internal system thresholds to minimize the probability of making a mistake.
Resumo:
Humans distinguish materials such as metal, plastic, and paper effortlessly at a glance. Traditional computer vision systems cannot solve this problem at all. Recognizing surface reflectance properties from a single photograph is difficult because the observed image depends heavily on the amount of light incident from every direction. A mirrored sphere, for example, produces a different image in every environment. To make matters worse, two surfaces with different reflectance properties could produce identical images. The mirrored sphere simply reflects its surroundings, so in the right artificial setting, it could mimic the appearance of a matte ping-pong ball. Yet, humans possess an intuitive sense of what materials typically "look like" in the real world. This thesis develops computational algorithms with a similar ability to recognize reflectance properties from photographs under unknown, real-world illumination conditions. Real-world illumination is complex, with light typically incident on a surface from every direction. We find, however, that real-world illumination patterns are not arbitrary. They exhibit highly predictable spatial structure, which we describe largely in the wavelet domain. Although they differ in several respects from the typical photographs, illumination patterns share much of the regularity described in the natural image statistics literature. These properties of real-world illumination lead to predictable image statistics for a surface with given reflectance properties. We construct a system that classifies a surface according to its reflectance from a single photograph under unknown illuminination. Our algorithm learns relationships between surface reflectance and certain statistics computed from the observed image. Like the human visual system, we solve the otherwise underconstrained inverse problem of reflectance estimation by taking advantage of the statistical regularity of illumination. For surfaces with homogeneous reflectance properties and known geometry, our system rivals human performance.
Resumo:
This paper describes a general, trainable architecture for object detection that has previously been applied to face and peoplesdetection with a new application to car detection in static images. Our technique is a learning based approach that uses a set of labeled training data from which an implicit model of an object class -- here, cars -- is learned. Instead of pixel representations that may be noisy and therefore not provide a compact representation for learning, our training images are transformed from pixel space to that of Haar wavelets that respond to local, oriented, multiscale intensity differences. These feature vectors are then used to train a support vector machine classifier. The detection of cars in images is an important step in applications such as traffic monitoring, driver assistance systems, and surveillance, among others. We show several examples of car detection on out-of-sample images and show an ROC curve that highlights the performance of our system.