968 resultados para Word recognition.
Resumo:
The use of the PC and Internet for placing telephone calls will present new opportunities to capture vast amounts of un-transcribed speech for a particular speaker. This paper investigates how to best exploit this data for speaker-dependent speech recognition. Supervised and unsupervised experiments in acoustic model and language model adaptation are presented. Using one hour of automatically transcribed speech per speaker with a word error rate of 36.0%, unsupervised adaptation resulted in an absolute gain of 6.3%, equivalent to 70% of the gain from the supervised case, with additional adaptation data likely to yield further improvements. LM adaptation experiments suggested that although there seems to be a small degree of speaker idiolect, adaptation to the speaker alone, without considering the topic of the conversation, is in itself unlikely to improve transcription accuracy.
Resumo:
The Thai written language is one of the languages that does not have word boundaries. In order to discover the meaning of the document, all texts must be separated into syllables, words, sentences, and paragraphs. This paper develops a novel method to segment the Thai text by combining a non-dictionary based technique with a dictionary-based technique. This method first applies the Thai language grammar rules to the text for identifying syllables. The hidden Markov model is then used for merging possible syllables into words. The identified words are verified with a lexical dictionary and a decision tree is employed to discover the words unidentified by the lexical dictionary. Documents used in the litigation process of Thai court proceedings have been used in experiments. The results which are segmented words, obtained by the proposed method outperform the results obtained by other existing methods.
Resumo:
The increasing diversity of the Internet has created a vast number of multilingual resources on the Web. A huge number of these documents are written in various languages other than English. Consequently, the demand for searching in non-English languages is growing exponentially. It is desirable that a search engine can search for information over collections of documents in other languages. This research investigates the techniques for developing high-quality Chinese information retrieval systems. A distinctive feature of Chinese text is that a Chinese document is a sequence of Chinese characters with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may not be a valid Chinese word in that documents. On the other hand, a document that is actually relevant may not be retrieved because it does not contain the query sequence but contains other relevant words. In this research, we propose two approaches to deal with the problems. In the first approach, we propose a hybrid Chinese information retrieval model by incorporating word-based techniques with the traditional character-based techniques. The aim of this approach is to investigate the influence of Chinese segmentation on the performance of Chinese information retrieval. Two ranking methods are proposed to rank retrieved documents based on the relevancy to the query calculated by combining character-based ranking and word-based ranking. Our experimental results show that Chinese segmentation can improve the performance of Chinese information retrieval, but the improvement is not significant if it incorporates only Chinese segmentation with the traditional character-based approach. In the second approach, we propose a novel query expansion method which applies text mining techniques in order to find the most relevant words to extend the query. Unlike most existing query expansion methods, which generally select the highly frequent indexing terms from the retrieved documents to expand the query. In our approach, we utilize text mining techniques to find patterns from the retrieved documents that highly correlate with the query term and then use the relevant words in the patterns to expand the original query. This research project develops and implements a Chinese information retrieval system for evaluating the proposed approaches. There are two stages in the experiments. The first stage is to investigate if high accuracy segmentation can make an improvement to Chinese information retrieval. In the second stage, a text mining based query expansion approach is implemented and a further experiment has been done to compare its performance with the standard Rocchio approach with the proposed text mining based query expansion method. The NTCIR5 Chinese collections are used in the experiments. The experiment results show that by incorporating the text mining based query expansion with the hybrid model, significant improvement has been achieved in both precision and recall assessments.
Resumo:
Despite the increasing recognition of the importance of word of mouth as an integral component of a firms’ marketing efforts, there has been little emphasis on developing suitable guidelines for entrepreneurs who wish to leverage scarce resources by pursuing more innovative marketing techniques. In addition, although there has been a great deal of research into the nature of social networks and interpersonal communication via word of mouth, there have been few attempts to link this research with the firms marketing strategy. In this paper, we consider the diffusion of innovation literature and recent research into social network structure and propose a framework that may be useful for enhancing the marketing efforts of entrepreneurial firms.
Resumo:
In an automotive environment, the performance of a speech recognition system is affected by environmental noise if the speech signal is acquired directly from a microphone. Speech enhancement techniques are therefore necessary to improve the speech recognition performance. In this paper, a field-programmable gate array (FPGA) implementation of dual-microphone delay-and-sum beamforming (DASB) for speech enhancement is presented. As the first step towards a cost-effective solution, the implementation described in this paper uses a relatively high-end FPGA device to facilitate the verification of various design strategies and parameters. Experimental results show that the proposed design can produce output waveforms close to those generated by a theoretical (floating-point) model with modest usage of FPGA resources. Speech recognition experiments are also conducted on enhanced in-car speech waveforms produced by the FPGA in order to compare recognition performance with the floating-point representation running on a PC.
Resumo:
The purpose of this chapter is to describe the use of caricatured contrasting scenarios (Bødker, 2000) and how they can be used to consider potential designs for disruptive technologies. The disruptive technology in this case is Automatic Speech Recognition (ASR) software in workplace settings. The particular workplace is the Magistrates Court of the Australian Capital Territory.----- Caricatured contrasting scenarios are ideally suited to exploring how ASR might be implemented in a particular setting because they allow potential implementations to be “sketched” quickly and with little effort. This sketching of potential interactions and the emphasis of both positive and negative outcomes allows the benefits and pitfalls of design decisions to become apparent.----- A brief description of the Court is given, describing the reasons for choosing the Court for this case study. The work of the Court is framed as taking place in two modes: Front of house, where the courtroom itself is, and backstage, where documents are processed and the business of the court is recorded and encoded into various systems.----- Caricatured contrasting scenarios describing the introduction of ASR to the front of house are presented and then analysed. These scenarios show that the introduction of ASR to the court would be highly problematic.----- The final section describes how ASR could be re-imagined in order to make it useful for the court. A final scenario is presented that describes how this re-imagined ASR could be integrated into both the front of house and backstage of the court in a way that could strengthen both processes.
Resumo:
Identifying an individual from surveillance video is a difficult, time consuming and labour intensive process. The proposed system aims to streamline this process by filtering out unwanted scenes and enhancing an individual's face through super-resolution. An automatic face recognition system is then used to identify the subject or present the human operator with likely matches from a database. A person tracker is used to speed up the subject detection and super-resolution process by tracking moving subjects and cropping a region of interest around the subject's face to reduce the number and size of the image frames to be super-resolved respectively. In this paper, experiments have been conducted to demonstrate how the optical flow super-resolution method used improves surveillance imagery for visual inspection as well as automatic face recognition on an Eigenface and Elastic Bunch Graph Matching system. The optical flow based method has also been benchmarked against the ``hallucination'' algorithm, interpolation methods and the original low-resolution images. Results show that both super-resolution algorithms improved recognition rates significantly. Although the hallucination method resulted in slightly higher recognition rates, the optical flow method produced less artifacts and more visually correct images suitable for human consumption.
Resumo:
Principal Topic: Project structures are often created by entrepreneurs and large corporate organizations to develop new products. Since new product development projects (NPDP) are more often situated within a larger organization, intrapreneurship or corporate entrepreneurship plays an important role in bringing these projects to fruition. Since NPDP often involves the development of a new product using immature technology, we describe development of an immature technology. The Joint Strike Fighter (JSF) F-35 aircraft is being developed by the U.S. Department of Defense and eight allied nations. In 2001 Lockheed Martin won a $19 billion contract to develop an affordable, stealthy and supersonic all-weather strike fighter designed to replace a wide range of aging fighter aircraft. In this research we define a complex project as one that demonstrates a number of sources of uncertainty to a degree, or level of severity, that makes it extremely difficult to predict project outcomes, to control or manage project (Remington & Zolin, Forthcoming). Project complexity has been conceptualized by Remington and Pollock (2007) in terms of four major sources of complexity; temporal, directional, structural and technological complexity (See Figure 1). Temporal complexity exists when projects experience significant environmental change outside the direct influence or control of the project. The Global Economic Crisis of 2008 - 2009 is a good example of the type of environmental change that can make a project complex as, for example in the JSF project, where project managers attempt to respond to changes in interest rates, international currency exchange rates and commodity prices etc. Directional complexity exists in a project where stakeholders' goals are unclear or undefined, where progress is hindered by unknown political agendas, or where stakeholders disagree or misunderstand project goals. In the JSF project all the services and all non countries have to agree to the specifications of the three variants of the aircraft; Conventional Take Off and Landing (CTOL), Short Take Off/Vertical Landing (STOVL) and the Carrier Variant (CV). Because the Navy requires a plane that can take off and land on an aircraft carrier, that required a special variant of the aircraft design, adding complexity to the project. Technical complexity occurs in a project using technology that is immature or where design characteristics are unknown or untried. Developing a plane that can take off on a very short runway and land vertically created may highly interdependent technological challenges to correctly locate, direct and balance the lift fans, modulate the airflow and provide equivalent amount of thrust from the downward vectored rear exhaust to lift the aircraft and at the same time control engine temperatures. These technological challenges make costing and scheduling equally challenging. Structural complexity in a project comes from the sheer numbers of elements such as the number of people, teams or organizations involved, ambiguity regarding the elements, and the massive degree of interconnectedness between them. While Lockheed Martin is the prime contractor, they are assisted in major aspects of the JSF development by Northrop Grumman, BAE Systems, Pratt & Whitney and GE/Rolls-Royce Fighter Engineer Team and innumerable subcontractors. In addition to identifying opportunities to achieve project goals, complex projects also need to identify and exploit opportunities to increase agility in response to changing stakeholder demands or to reduce project risks. Complexity Leadership Theory contends that in complex environments adaptive and enabling leadership are needed (Uhl-Bien, Marion and McKelvey, 2007). Adaptive leadership facilitates creativity, learning and adaptability, while enabling leadership handles the conflicts that inevitably arise between adaptive leadership and traditional administrative leadership (Uhl-Bien and Marion, 2007). Hence, adaptive leadership involves the recognition and opportunities to adapt, while and enabling leadership involves the exploitation of these opportunities. Our research questions revolve around the type or source of complexity and its relationship to opportunity recognition and exploitation. For example, is it only external environmental complexity that creates the need for the entrepreneurial behaviours, such as opportunity recognition and opportunity exploitation? Do the internal dimensions of project complexity, such as technological and structural complexity, also create the need for opportunity recognition and opportunity exploitation? The Kropp, Zolin and Lindsay model (2009) describes a relationship between entrepreneurial orientation (EO), opportunity recognition (OR), and opportunity exploitation (OX) in complex projects, with environmental and organizational contextual variables as moderators. We extend their model by defining the affects of external complexity and internal complexity on OR and OX. ---------- Methodology/Key Propositions: When the environment complex EO is more likely to result in OR because project members will be actively looking for solutions to problems created by environmental change. But in projects that are technologically or structurally complex project leaders and members may try to make the minimum changes possible to reduce the risk of creating new problems due to delays or schedule changes. In projects with environmental or technological complexity project leaders who encourage the innovativeness dimension of EO will increase OR in complex projects. But projects with technical or structural complexity innovativeness will not necessarily result in the recognition and exploitation of opportunities due to the over-riding importance of maintaining stability in the highly intricate and interconnected project structure. We propose that in projects with environmental complexity creating the need for change and innovation project leaders, who are willing to accept and manage risk, are more likely to identify opportunities to increase project effectiveness and efficiency. In contrast in projects with internal complexity a much higher willingness to accept risk will be necessary to trigger opportunity recognition. In structurally complex projects we predict it will be less likely to find a relationship between risk taking and OP. When the environment is complex, and a project has autonomy, they will be motivated to execute opportunities to improve the project's performance. In contrast, when the project has high internal complexity, they will be more cautious in execution. When a project experiences high competitive aggressiveness and their environment is complex, project leaders will be motivated to execute opportunities to improve the project's performance. In contrast, when the project has high internal complexity, they will be more cautious in execution. This paper reports the first stage of a three year study into the behaviours of managers, leaders and team members of complex projects. We conduct a qualitative study involving a Group Discussion with experienced project leaders. The objective is to determine how leaders of large and potentially complex projects perceive that external and internal complexity will influence the affects of EO on OR. ---------- Results and Implications: These results will help identify and distinguish the impact of external and internal complexity on entrepreneurial behaviours in NPDP. Project managers will be better able to quickly decide how and when to respond to changes in the environment and internal project events.
Resumo:
Recovering position from sensor information is an important problem in mobile robotics, known as localisation. Localisation requires a map or some other description of the environment to provide the robot with a context to interpret sensor data. The mobile robot system under discussion is using an artificial neural representation of position. Building a geometrical map of the environment with a single camera and artificial neural networks is difficult. Instead it would be simpler to learn position as a function of the visual input. Usually when learning images, an intermediate representation is employed. An appropriate starting point for biologically plausible image representation is the complex cells of the visual cortex, which have invariance properties that appear useful for localisation. The effectiveness for localisation of two different complex cell models are evaluated. Finally the ability of a simple neural network with single shot learning to recognise these representations and localise a robot is examined.
Resumo:
The paper presents a fast and robust stereo object recognition method. The method is currently unable to identify the rotation of objects. This makes it very good at locating spheres which are rotationally independent. Approximate methods for located non-spherical objects have been developed. Fundamental to the method is that the correspondence problem is solved using information about the dimensions of the object being located. This is in contrast to previous stereo object recognition systems where the scene is first reconstructed by point matching techniques. The method is suitable for real-time application on low-power devices.
Resumo:
Acoustically, car cabins are extremely noisy and as a consequence audio-only, in-car voice recognition systems perform poorly. As the visual modality is immune to acoustic noise, using the visual lip information from the driver is seen as a viable strategy in circumventing this problem by using audio visual automatic speech recognition (AVASR). However, implementing AVASR requires a system being able to accurately locate and track the drivers face and lip area in real-time. In this paper we present such an approach using the Viola-Jones algorithm. Using the AVICAR [1] in-car database, we show that the Viola- Jones approach is a suitable method of locating and tracking the driver’s lips despite the visual variability of illumination and head pose for audio-visual speech recognition system.
Resumo:
Several approaches have been proposed to recognize handwritten Bengali characters using different curve fitting algorithms and curvature analysis. In this paper, a new algorithm (Curve-fitting Algorithm) to identify various strokes of a handwritten character is developed. The curve-fitting algorithm helps recognizing various strokes of different patterns (line, quadratic curve) precisely. This reduces the error elimination burden heavily. Implementation of this Modified Syntactic Method demonstrates significant improvement in the recognition of Bengali handwritten characters.
Resumo:
Acoustically, car cabins are extremely noisy and as a consequence, existing audio-only speech recognition systems, for voice-based control of vehicle functions such as the GPS based navigator, perform poorly. Audio-only speech recognition systems fail to make use of the visual modality of speech (eg: lip movements). As the visual modality is immune to acoustic noise, utilising this visual information in conjunction with an audio only speech recognition system has the potential to improve the accuracy of the system. The field of recognising speech using both auditory and visual inputs is known as Audio Visual Speech Recognition (AVSR). Continuous research in AVASR field has been ongoing for the past twenty-five years with notable progress being made. However, the practical deployment of AVASR systems for use in a variety of real-world applications has not yet emerged. The main reason is due to most research to date neglecting to address variabilities in the visual domain such as illumination and viewpoint in the design of the visual front-end of the AVSR system. In this paper we present an AVASR system in a real-world car environment using the AVICAR database [1], which is publicly available in-car database and we show that the use of visual speech conjunction with the audio modality is a better approach to improve the robustness and effectiveness of voice-only recognition systems in car cabin environments.