954 resultados para recognition system


Relevância:

60.00% 60.00%

Publicador:

Resumo:

We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), a novel technique for incorporating information from out-of-domain posterior features using deep neural networks. We show that it provides a substantial reduction in WER over other systems, with relative WER reductions of 15% over a PLP baseline, 9% over in-domain tandem features and 8% over the best out-of-domain tandem features. © 2012 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, a novel mathematical model of neuron-Double Synaptic Weight Neuron (DSWN)(l) is presented. The DSWN can simulate many kinds of neuron architectures, including Radial-Basis-Function (RBF), Hyper Sausage and Hyper Ellipsoid models, etc. Moreover, this new model has been implemented in the new CASSANN-II neurocomputer that can be used to form various types of neural networks with multiple mathematical models of neurons. The flexibility of the DSWN has also been described in constructing neural networks. Based on the theory of Biomimetic Pattern Recognition (BPR) and high-dimensional space covering, a recognition system of omni directionally oriented rigid objects on the horizontal surface and a face recognition system had been implemented on CASSANN-II neurocomputer. In these two special cases, the result showed DSWN neural network had great potential in pattern recognition.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Based on the introduction of the traditional mathematical models of neurons in general-purpose neurocomputer, a novel all-purpose mathematical model-Double synaptic weight neuron (DSWN) is presented, which can simulate all kinds of neuron architectures, including Radial-Basis-Function (RBF) and Back-propagation (BP) models, etc. At the same time, this new model is realized using hardware and implemented in the new CASSANN-II neurocomputer that can be used to form various types of neural networks with multiple mathematical models of neurons. In this paper, the flexibility of the new model has also been described in constructing neural networks and based on the theory of Biomimetic pattern recognition (BPR) and high-dimensional space covering, a recognition system of omni directionally oriented rigid objects on the horizontal surface and a face recognition system had been implemented on CASSANN-H neurocomputer. The result showed DSWN neural network has great potential in pattern recognition.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In the present work, a sensitive spectroscopic assay based on surface-enhanced Raman spectroscopy (SERS) using gold nanoparticles as substrates was developed for the rapid detection protein-protein interactions. Detection is achieved by specific binding biotin-modification antibodies with protein-stabilized 30 nm gold nanoparticles, followed by the attachment of avidin-modification Raman-active dyes. As a proof-of-principle experiment, a well-known biomolecular recognition system, IgG with protein A, was chosen to establish this new spectroscopic assay. Highly selective recognition of IgG down to 1 ng/ml in solution has been demonstrated.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

讨论基于多种分类方法的模块组合实现的混合模式识别系统,它不同于利用多分类器输出结果表决的集成系统.提出两个系统:一个面向印刷体汉字文本识别,另一个面向自由手写体数字识别.利用多种特征和多种分类方法的组合、部分识别信息控制混淆字判别策略以及提出的动态模板库匹配后处理方法,使系统的性能与传统单一分类器系统比较,获得明显改善.实验表明:多方法多策略混合是解决复杂和增强系统鲁棒性的一条途径

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Recently,Handheld Communication Devices is developing very fast, extending in users and spreading in application fields, and has an promising future. This study investigated the acceptance of the multimodal text entry method and the behavioral characteristics when using it. Based on the general information process model of a bimodal system and the human factor studies about the multimodal map system, the present study mainly focused on the hand-speech bimodal text entry method. For acceptance, the study investigated the subjective perception of the accuracy of speech recognition by Wizard of Oz (WOz) experiment and a questionnaire. Results showed that there was a linear relationship between the speech recognition accuracy and the subjective accuracy. Furthermore, as the familiarity increasing, the difference between the acceptable accuracy and the subjective accuracy gradually decreased. In addition, the similarity of meaning between the outcome of speech recognition and the correct sentences was an important referential criterion. The second study investigated three aspects of the bimodal text entry method, including input, error recovery and modal shifts. The first experiment aimed to find the behavioral characteristics of user when doing error recovery task. Results indicated that participants preferred to correct the error by handwriting, which had no relationship with the input modality. The second experiment aimed to discover the behavioral characteristics of users when doing text entry in various types of text. Results showed that users preferred to speech input in both words and sentences conditions, which was highly consistent among individuals, while no significant difference was found between handwriting and speech input in the character condition. Participants used more direct strategy than jumping strategy to deal with mixed text, especially for the Chinese-English mixed type. The third experiment examined the cognitive load in the different modal shifts, results suggesting that there were significant differences between different shifts. Moreover, relevant little time was needed in the Shift from speech input to hand input. Based on the main findings, implications were discussed as follows: Firstly, when evaluating a speech recognition system, attention should be paid to the fact that the speech recognition accuracy was not equal to the subjective accuracy. Secondly, in order to make a speech input system more acceptable, a good method is to train and supply the feedback for the accuracy in training, which improving the familiarity and sensitivity to the system. Thirdly, both the universal and individual behavioral patterns were taken into consideration to improve the error recovery method. Fourthly, easing the study and the use of speech input, the operations of speech input should be simpler. Fifthly, more convenient text input method for non-Chinese text entry should be provided. Finally, the shifting time between hand input and speech input provides an important parameter for the design of automatic-evoked speech recognition system.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We discuss a strategy for visual recognition by forming groups of salient image features, and then using these groups to index into a data base to find all of the matching groups of model features. We discuss the most space efficient possible method of representing 3-D models for indexing from 2-D data, and show how to account for sensing error when indexing. We also present a convex grouping method that is robust and efficient, both theoretically and in practice. Finally, we combine these modules into a complete recognition system, and test its performance on many real images.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper introduces BoostMap, a method that can significantly reduce retrieval time in image and video database systems that employ computationally expensive distance measures, metric or non-metric. Database and query objects are embedded into a Euclidean space, in which similarities can be rapidly measured using a weighted Manhattan distance. Embedding construction is formulated as a machine learning task, where AdaBoost is used to combine many simple, 1D embeddings into a multidimensional embedding that preserves a significant amount of the proximity structure in the original space. Performance is evaluated in a hand pose estimation system, and a dynamic gesture recognition system, where the proposed method is used to retrieve approximate nearest neighbors under expensive image and video similarity measures. In both systems, BoostMap significantly increases efficiency, with minimal losses in accuracy. Moreover, the experiments indicate that BoostMap compares favorably with existing embedding methods that have been employed in computer vision and database applications, i.e., FastMap and Bourgain embeddings.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A common design of an object recognition system has two steps, a detection step followed by a foreground within-class classification step. For example, consider face detection by a boosted cascade of detectors followed by face ID recognition via one-vs-all (OVA) classifiers. Another example is human detection followed by pose recognition. Although the detection step can be quite fast, the foreground within-class classification process can be slow and becomes a bottleneck. In this work, we formulate a filter-and-refine scheme, where the binary outputs of the weak classifiers in a boosted detector are used to identify a small number of candidate foreground state hypotheses quickly via Hamming distance or weighted Hamming distance. The approach is evaluated in three applications: face recognition on the FRGC V2 data set, hand shape detection and parameter estimation on a hand data set and vehicle detection and view angle estimation on a multi-view vehicle data set. On all data sets, our approach has comparable accuracy and is at least five times faster than the brute force approach.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

CONFIGR (CONtour FIgure GRound) is a computational model based on principles of biological vision that completes sparse and noisy image figures. Within an integrated vision/recognition system, CONFIGR posits an initial recognition stage which identifies figure pixels from spatially local input information. The resulting, and typically incomplete, figure is fed back to the “early vision” stage for long-range completion via filling-in. The reconstructed image is then re-presented to the recognition system for global functions such as object recognition. In the CONFIGR algorithm, the smallest independent image unit is the visible pixel, whose size defines a computational spatial scale. Once pixel size is fixed, the entire algorithm is fully determined, with no additional parameter choices. Multi-scale simulations illustrate the vision/recognition system. Open-source CONFIGR code is available online, but all examples can be derived analytically, and the design principles applied at each step are transparent. The model balances filling-in as figure against complementary filling-in as ground, which blocks spurious figure completions. Lobe computations occur on a subpixel spatial scale. Originally designed to fill-in missing contours in an incomplete image such as a dashed line, the same CONFIGR system connects and segments sparse dots, and unifies occluded objects from pieces locally identified as figure in the initial recognition stage. The model self-scales its completion distances, filling-in across gaps of any length, where unimpeded, while limiting connections among dense image-figure pixel groups that already have intrinsic form. Long-range image completion promises to play an important role in adaptive processors that reconstruct images from highly compressed video and still camera images.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A neural network theory of :3-D vision, called FACADE Theory, is described. The theory proposes a solution of the classical figure-ground problem for biological vision. It does so by suggesting how boundary representations and surface representations are formed within a Boundary Contour System (BCS) and a Feature Contour System (FCS). The BCS and FCS interact reciprocally to form 3-D boundary and surface representations that arc mutually consistent. Their interactions generate 3-D percepts wherein occluding and occluded object completed, and grouped. The theory clarifies how preattentive processes of 3-D perception and figure-ground separation interact reciprocally with attentive processes of spatial localization, object recognition, and visual search. A new theory of stereopsis is proposed that predicts how cells sensitive to multiple spatial frequencies, disparities, and orientations are combined by context-sensitive filtering, competition, and cooperation to form coherent BCS boundary segmentations. Several factors contribute to figure-ground pop-out, including: boundary contrast between spatially contiguous boundaries, whether due to scenic differences in luminance, color, spatial frequency, or disparity; partially ordered interactions from larger spatial scales and disparities to smaller scales and disparities; and surface filling-in restricted to regions surrounded by a connected boundary. Phenomena such as 3-D pop-out from a 2-D picture, DaVinci stereopsis, a 3-D neon color spreading, completion of partially occluded objects, and figure-ground reversals are analysed. The BCS and FCS sub-systems model aspects of how the two parvocellular cortical processing streams that join the Lateral Geniculate Nucleus to prestriate cortical area V4 interact to generate a multiplexed representation of Form-And-Color-And-Depth, or FACADE, within area V4. Area V4 is suggested to support figure-ground separation and to interact. with cortical mechanisms of spatial attention, attentive objcect learning, and visual search. Adaptive Resonance Theory (ART) mechanisms model aspects of how prestriate visual cortex interacts reciprocally with a visual object recognition system in inferotemporal cortex (IT) for purposes of attentive object learning and categorization. Object attention mechanisms of the What cortical processing stream through IT cortex are distinguished from spatial attention mechanisms of the Where cortical processing stream through parietal cortex. Parvocellular BCS and FCS signals interact with the model What stream. Parvocellular FCS and magnocellular Motion BCS signals interact with the model Where stream. Reciprocal interactions between these visual, What, and Where mechanisms arc used to discuss data about visual search and saccadic eye movements, including fast search of conjunctive targets, search of 3-D surfaces, selective search of like-colored targets, attentive tracking of multi-element groupings, and recursive search of simultaneously presented targets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We describe a 42.6 Gbit/s all-optical pattern recognition system which uses semiconductor optical amplifiers (SOAs). A circuit with three SOA-based logic gates is used to identify the presence of specific port numbers in an optical packet header.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Despite over seven decades of speciation research and 25 years of phylogeographic studies, a comprehensive understanding of mechanisms that generate biological species remains elusive. In temperate zones, the pervasiveness of range fragmentation and subsequent range expansions suggests that secondary contact between diverging lineages may be important in the evolution of species. Thus, such contact zones provide compelling opportunities to investigate evolutionary processes, particularly the roles of geographical isolation in initiating, and indirect selection against hybrids in completing (reinforcement), the evolution of reproductive isolation and speciation. The spring peeper (Pseudacris crucifer) has six well-supported mitochondrial lineages many of which are now in secondary contact. Here I investigate the evolutionary consequences of secondary contact of two such lineages (Eastern and Interior) in Southwestern Ontario using genetic, morphological, acoustical, experimental, and behavioural evidence to show accentuated divergence of the mate recognition system in sympatry. Mitochondrial and microsatellite data distinguish these two lineages but also show ongoing hybridization. Bayesian assignment tests and cline analysis imply asymmetrical introgression of Eastern lineage nuclear markers into Interior populations. Male calls are divergent between Eastern and Interior allopatric populations and show asymmetrical reproductive character displacement in sympatry. Female preference of pure lineage individuals is also exaggerated in sympatry, with hybrids showing intermediate traits and preference. I suggest that these patterns are most consistent with secondary reinforcement. I assessed levels of post-zygotic isolation between the Eastern and Interior lineages using a laboratory hybridization experiment. Hybrid tadpoles showed equal to or greater fitness than their pure lineage counterparts, but this may be countered through competition. More deformities and developmental anomalies in hybrid tadpoles further suggest post-zygotic isolation. Despite evidence for pre-mating isolation between the two lineages, isolation appears incomplete (i.e. hybridization is ongoing). I hypothesize that potentially less attractive hybrids may circumvent female choice by adopting satellite behaviour. Although mating tactics are related to body size, genetic status may play a role. I show that pure Eastern males almost always engage in calling, while hybrids adopt a satellite tactic. An absence of assortative mating, despite evidence of female preference, suggests successful satellite interception possibly facilitating introgression.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a new approach to speech enhancement from single-channel measurements involving both noise and channel distortion (i.e., convolutional noise), and demonstrates its applications for robust speech recognition and for improving noisy speech quality. The approach is based on finding longest matching segments (LMS) from a corpus of clean, wideband speech. The approach adds three novel developments to our previous LMS research. First, we address the problem of channel distortion as well as additive noise. Second, we present an improved method for modeling noise for speech estimation. Third, we present an iterative algorithm which updates the noise and channel estimates of the corpus data model. In experiments using speech recognition as a test with the Aurora 4 database, the use of our enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system. In another comparison against conventional enhancement algorithms, both the PESQ and the segmental SNR ratings of the LMS algorithm were superior to the other methods for noisy speech enhancement.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a new approach to single-channel speech enhancement involving both noise and channel distortion (i.e., convolutional noise). The approach is based on finding longest matching segments (LMS) from a corpus of clean, wideband speech. The approach adds three novel developments to our previous LMS research. First, we address the problem of channel distortion as well as additive noise. Second, we present an improved method for modeling noise. Third, we present an iterative algorithm for improved speech estimates. In experiments using speech recognition as a test with the Aurora 4 database, the use of our enhancement approach as a preprocessor for feature extraction significantly improved the performance of a baseline recognition system. In another comparison against conventional enhancement algorithms, both the PESQ and the segmental SNR ratings of the LMS algorithm were superior to the other methods for noisy speech enhancement. Index Terms: corpus-based speech model, longest matching segment, speech enhancement, speech recognition