10 resultados para Inventory-style speech enhancement
em Massachusetts Institute of Technology
Resumo:
Sketches are commonly used in the early stages of design. Our previous system allows users to sketch mechanical systems that the computer interprets. However, some parts of the mechanical system might be too hard or too complicated to express in the sketch. Adding speech recognition to create a multimodal system would move us toward our goal of creating a more natural user interface. This thesis examines the relationship between the verbal and sketch input, particularly how to segment and align the two inputs. Toward this end, subjects were recorded while they sketched and talked. These recordings were transcribed, and a set of rules to perform segmentation and alignment was created. These rules represent the knowledge that the computer needs to perform segmentation and alignment. The rules successfully interpreted the 24 data sets that they were given.
Resumo:
We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.
Resumo:
We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.
Resumo:
abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic (potentially indistinguishable from real recorded video) visual-speech animation system, called Mary 101. Two types of experiments were performed: a) distinguishing visually between real and synthetic image- sequences of the same utterances, ("Turing tests") and b) gauging visual speech recognition by comparing lip-reading performance of the real and synthetic image-sequences of the same utterances ("Intelligibility tests"). Subjects that were presented randomly with either real or synthetic image-sequences could not tell the synthetic from the real sequences above chance level. The same subjects when asked to lip-read the utterances from the same image-sequences recognized speech from real image-sequences significantly better than from synthetic ones. However, performance for both, real and synthetic, were at levels suggested in the literature on lip-reading. We conclude from the two experiments that the animation of Mary 101 is adequate for providing a percept of a talking head. However, additional effort is required to improve the animation for lip-reading purposes like rehabilitation and language learning. In addition, these two tasks could be considered as explicit and implicit perceptual discrimination tasks. In the explicit task (a), each stimulus is classified directly as a synthetic or real image-sequence by detecting a possible difference between the synthetic and the real image-sequences. The implicit perceptual discrimination task (b) consists of a comparison between visual recognition of speech of real and synthetic image-sequences. Our results suggest that implicit perceptual discrimination is a more sensitive method for discrimination between synthetic and real image-sequences than explicit perceptual discrimination.
Resumo:
The Lean Aircraft Initiative began in the summer of 1992 as a “quick look” into the feasibility of applying manufacturing principles that had been pioneered in the automobile industry, most notably the Toyota Production System, to the U.S. defense aircraft industry. Once it was established that “lean principles” (the term coined to describe the new paradigm in automobile manufacturing) were indeed applicable to aircraft manufacturing as well, the Initiative was broadened to include other segments of the defense aerospace industry. These consisted of electronics/avionics, engines, electro-mechanical systems, missiles, and space systems manufacturers. In early 1993, a formal framework was established in which 21 defense firms and the Air Force formed a consortium to support and participate in the Initiative at M.I.T.
Resumo:
We enhance photographs shot in dark environments by combining a picture taken with the available light and one taken with the flash. We preserve the ambiance of the original lighting and insert the sharpness from the flash image. We use the bilateral filter to decompose the images into detail and large scale. We reconstruct the image using the large scale of the available lighting and the detail of the flash. We detect and correct flash shadows. This combines the advantages of available illumination and flash photography.
Resumo:
We analyze a finite horizon, single product, periodic review model in which pricing and production/inventory decisions are made simultaneously. Demands in different periods are random variables that are independent of each other and their distributions depend on the product price. Pricing and ordering decisions are made at the beginning of each period and all shortages are backlogged. Ordering cost includes both a fixed cost and a variable cost proportional to the amount ordered. The objective is to find an inventory policy and a pricing strategy maximizing expected profit over the finite horizon. We show that when the demand model is additive, the profit-to-go functions are k-concave and hence an (s,S,p) policy is optimal. In such a policy, the period inventory is managed based on the classical (s,S) policy and price is determined based on the inventory position at the beginning of each period. For more general demand functions, i.e., multiplicative plus additive functions, we demonstrate that the profit-to-go function is not necessarily k-concave and an (s,S,p) policy is not necessarily optimal. We introduce a new concept, the symmetric k-concave functions and apply it to provide a characterization of the optimal policy.
Resumo:
We analyze an infinite horizon, single product, periodic review model in which pricing and production/inventory decisions are made simultaneously. Demands in different periods are identically distributed random variables that are independent of each other and their distributions depend on the product price. Pricing and ordering decisions are made at the beginning of each period and all shortages are backlogged. Ordering cost includes both a fixed cost and a variable cost proportional to the amount ordered. The objective is to maximize expected discounted, or expected average profit over the infinite planning horizon. We show that a stationary (s,S,p) policy is optimal for both the discounted and average profit models with general demand functions. In such a policy, the period inventory is managed based on the classical (s,S) policy and price is determined based on the inventory position at the beginning of each period.
Resumo:
Traditional inventory models focus on risk-neutral decision makers, i.e., characterizing replenishment strategies that maximize expected total profit, or equivalently, minimize expected total cost over a planning horizon. In this paper, we propose a framework for incorporating risk aversion in multi-period inventory models as well as multi-period models that coordinate inventory and pricing strategies. In each case, we characterize the optimal policy for various measures of risk that have been commonly used in the finance literature. In particular, we show that the structure of the optimal policy for a decision maker with exponential utility functions is almost identical to the structure of the optimal risk-neutral inventory (and pricing) policies. Computational results demonstrate the importance of this approach not only to risk-averse decision makers, but also to risk-neutral decision makers with limited information on the demand distribution.