82 resultados para automatic speech recognition


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Perceiving the world visually is a basic act for humans, but for computers it is still an unsolved problem. The variability present innatural environments is an obstacle for effective computer vision. The goal of invariant object recognition is to recognise objects in a digital image despite variations in, for example, pose, lighting or occlusion. In this study, invariant object recognition is considered from the viewpoint of feature extraction. Thedifferences between local and global features are studied with emphasis on Hough transform and Gabor filtering based feature extraction. The methods are examined with respect to four capabilities: generality, invariance, stability, and efficiency. Invariant features are presented using both Hough transform and Gabor filtering. A modified Hough transform technique is also presented where the distortion tolerance is increased by incorporating local information. In addition, methods for decreasing the computational costs of the Hough transform employing parallel processing and local information are introduced.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This Master's thesis addresses the design and implementation of the optical character recognition (OCR) system for a mobile device working on the Symbian operating system. The developed OCR system, named OCRCapriccio, emphasizes the modularity, effective extensibility and reuse. The system consists of two parts which are the graphical user interface and the OCR engine that was implemented as a plug-in. In fact, the plug-in includes two implementations of the OCR engine for enabling two types of recognition: the bitmap comparison based recognition and statistical recognition. The implementation results have shown that the approach based on bitmap comparison is more suitable for the Symbian environment because of its nature. Although the current implementation of bitmap comparison is lacking in accuracy, further development should be done in its direction. The biggest challenges of this work were related to developing an OCR scheme that would be suitable for Symbian OS Smartphones that have limited computational power and restricted resources.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In liberalized electricity markets, which have taken place in many countries over the world, the electricity distribution companies operate in the competitive conditions. Therefore, accurate information about the customers’ energy consumption plays an essential role for the budget keeping of the distribution company and for correct planning and operation of the distribution network. This master’s thesis is focused on the description of the possible benefits for the electric utilities and residential customers from the automatic meter reading system usage. Major benefits of the AMR, illustrated in the thesis, are distribution network management, power quality monitoring, load modelling, and detection of the illegal usage of the electricity. By the example of the power system state estimation, it was illustrated that even the partial installation of the AMR in the customer side leads to more accurate data about the voltage and power levels in the whole network. The thesis also contains the description of the present situation of the AMR integration in Russia.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation considers the segmental durations of speech from the viewpoint of speech technology, especially speech synthesis. The idea is that better models of segmental durations lead to higher naturalness and better intelligibility. These features are the key factors for better usability and generality of synthesized speech technology. Even though the studies are based on a Finnish corpus the approaches apply to all other languages as well. This is possibly due to the fact that most of the studies included in this dissertation are about universal effects taking place on utterance boundaries. Also the methods invented and used here are suitable for any other study of another language. This study is based on two corpora of news reading speech and sentences read aloud. The other corpus is read aloud by a 39-year-old male, whilst the other consists of several speakers in various situations. The use of two corpora is twofold: it involves a comparison of the corpora and a broader view on the matters of interest. The dissertation begins with an overview to the phonemes and the quantity system in the Finnish language. Especially, we are covering the intrinsic durations of phonemes and phoneme categories, as well as the difference of duration between short and long phonemes. The phoneme categories are presented to facilitate the problem of variability of speech segments. In this dissertation we cover the boundary-adjacent effects on segmental durations. In initial positions of utterances we find that there seems to be initial shortening in Finnish, but the result depends on the level of detail and on the individual phoneme. On the phoneme level we find that the shortening or lengthening only affects the very first ones at the beginning of an utterance. However, on average, the effect seems to shorten the whole first word on the word level. We establish the effect of final lengthening in Finnish. The effect in Finnish has been an open question for a long time, whilst Finnish has been the last missing piece for it to be a universal phenomenon. Final lengthening is studied from various angles and it is also shown that it is not a mere effect of prominence or an effect of speech corpus with high inter- and intra-speaker variation. The effect of final lengthening seems to extend from the final to the penultimate word. On a phoneme level it reaches a much wider area than the initial effect. We also present a normalization method suitable for corpus studies on segmental durations. The method uses an utterance-level normalization approach to capture the pattern of segmental durations within each utterance. This prevents the impact of various problematic variations within the corpora. The normalization is used in a study on final lengthening to show that the results on the effect are not caused by variation in the material. The dissertation shows an implementation and prowess of speech synthesis on a mobile platform. We find that the rule-based method of speech synthesis is a real-time software solution, but the signal generation process slows down the system beyond real time. Future aspects of speech synthesis on limited platforms are discussed. The dissertation considers ethical issues on the development of speech technology. The main focus is on the development of speech synthesis with high naturalness, but the problems and solutions are applicable to any other speech technology approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Lujabetoni Oy:n kappaletavaran varastonhallinnassa on todettu paljon kehitettävää. Kappaletavaran varasto-otot eivät perustu todellisiin määriin, varastonkierto on hidasta, varastointi sitoo paljon pääomaa ja inventointi on työlästä. Tähän ongelmaan haetaan ratkaisua RFID –teknologiasta. Työn teoriaosassa on käsitelty RFID –teknologian teoriaa ja nykytilaa, sekä varastoinnin peruskäsitteitä ja varastonohjausmenetelmiä. Empiirisessä osassa on laadittu RFID –järjestelmän toteutussuunnitelma. Työn tuloksena on RFID –järjestelmän kuvaus sekä toteutussuunnitelma, joilla saavutetaan tavoitellut hyödyt. Työssä löydettiin menetelmät joilla komponentit voidaan luotettavasti tunnistaa, sekä suunniteltiin todellisiin tapahtumiin perustuva, automaattiset vastaanotto- ja varasto-otto –kirjaukset mahdollistava RFID –järjestelmä. Näiden ansiosta varastotyö ja varastoihin sitoutuva pääoma vähenevät, jolloin saavutetaan säästöjä varastointikustannuksissa.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The skill of programming is a key asset for every computer science student. Many studies have shown that this is a hard skill to learn and the outcomes of programming courses have often been substandard. Thus, a range of methods and tools have been developed to assist students’ learning processes. One of the biggest fields in computer science education is the use of visualizations as a learning aid and many visualization based tools have been developed to aid the learning process during last few decades. Studies conducted in this thesis focus on two different visualizationbased tools TRAKLA2 and ViLLE. This thesis includes results from multiple empirical studies about what kind of effects the introduction and usage of these tools have on students’ opinions and performance, and what kind of implications there are from a teacher’s point of view. The results from studies in this thesis show that students preferred to do web-based exercises, and felt that those exercises contributed to their learning. The usage of the tool motivated students to work harder during their course, which was shown in overall course performance and drop-out statistics. We have also shown that visualization-based tools can be used to enhance the learning process, and one of the key factors is the higher and active level of engagement (see. Engagement Taxonomy by Naps et al., 2002). The automatic grading accompanied with immediate feedback helps students to overcome obstacles during the learning process, and to grasp the key element in the learning task. These kinds of tools can help us to cope with the fact that many programming courses are overcrowded with limited teaching resources. These tools allows us to tackle this problem by utilizing automatic assessment in exercises that are most suitable to be done in the web (like tracing and simulation) since its supports students’ independent learning regardless of time and place. In summary, we can use our course’s resources more efficiently to increase the quality of the learning experience of the students and the teaching experience of the teacher, and even increase performance of the students. There are also methodological results from this thesis which contribute to developing insight into the conduct of empirical evaluations of new tools or techniques. When we evaluate a new tool, especially one accompanied with visualization, we need to give a proper introduction to it and to the graphical notation used by tool. The standard procedure should also include capturing the screen with audio to confirm that the participants of the experiment are doing what they are supposed to do. By taken such measures in the study of the learning impact of visualization support for learning, we can avoid drawing false conclusion from our experiments. As computer science educators, we face two important challenges. Firstly, we need to start to deliver the message in our own institution and all over the world about the new – scientifically proven – innovations in teaching like TRAKLA2 and ViLLE. Secondly, we have the relevant experience of conducting teaching related experiment, and thus we can support our colleagues to learn essential know-how of the research based improvement of their teaching. This change can transform academic teaching into publications and by utilizing this approach we can significantly increase the adoption of the new tools and techniques, and overall increase the knowledge of best-practices. In future, we need to combine our forces and tackle these universal and common problems together by creating multi-national and multiinstitutional research projects. We need to create a community and a platform in which we can share these best practices and at the same time conduct multi-national research projects easily.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Diabetes is a rapidly increasing worldwide problem which is characterised by defective metabolism of glucose that causes long-term dysfunction and failure of various organs. The most common complication of diabetes is diabetic retinopathy (DR), which is one of the primary causes of blindness and visual impairment in adults. The rapid increase of diabetes pushes the limits of the current DR screening capabilities for which the digital imaging of the eye fundus (retinal imaging), and automatic or semi-automatic image analysis algorithms provide a potential solution. In this work, the use of colour in the detection of diabetic retinopathy is statistically studied using a supervised algorithm based on one-class classification and Gaussian mixture model estimation. The presented algorithm distinguishes a certain diabetic lesion type from all other possible objects in eye fundus images by only estimating the probability density function of that certain lesion type. For the training and ground truth estimation, the algorithm combines manual annotations of several experts for which the best practices were experimentally selected. By assessing the algorithm’s performance while conducting experiments with the colour space selection, both illuminance and colour correction, and background class information, the use of colour in the detection of diabetic retinopathy was quantitatively evaluated. Another contribution of this work is the benchmarking framework for eye fundus image analysis algorithms needed for the development of the automatic DR detection algorithms. The benchmarking framework provides guidelines on how to construct a benchmarking database that comprises true patient images, ground truth, and an evaluation protocol. The evaluation is based on the standard receiver operating characteristics analysis and it follows the medical practice in the decision making providing protocols for image- and pixel-based evaluations. During the work, two public medical image databases with ground truth were published: DIARETDB0 and DIARETDB1. The framework, DR databases and the final algorithm, are made public in the web to set the baseline results for automatic detection of diabetic retinopathy. Although deviating from the general context of the thesis, a simple and effective optic disc localisation method is presented. The optic disc localisation is discussed, since normal eye fundus structures are fundamental in the characterisation of DR.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A district heating system comprises production facilities, a distribution network, and heat consumers. The utilization of new energy metering and reading system (AMR) is increasing constantly in district heating systems. This heuristic study shows how the AMR system can be exploited in finding optimization opportunities in district heating system. In this study, the district heating system is mainly considered from the viewpoint of operational optimization. The focus is on the core processes, heat production and distribution. Three objectives were set to this study. The first one was to examine general optimization opportunities in district heating systems. Second, to figure out the benefits of AMR for general optimization opportunities. Finally, to define a methodology for process improvement endeavors. This study shows, through a case study, the usefulness of AMR in specifying current deficiencies in a district heating system. Based on a literature review, the methodology for the improvement of business processes is presented. Additionally, some issues related to future competitiveness of district heating are concerned. As a conclusion, some optimization objectives are considered more desirable than others. Study shows that AMR is useful in the specification of optimization targets in the district heating system. Further steps in optimization process were not examined in detail. That would seem to be interesting topic for further studies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The flow of information within modern information society has increased rapidly over the last decade. The major part of this information flow relies on the individual’s abilities to handle text or speech input. For the majority of us it presents no problems, but there are some individuals who would benefit from other means of conveying information, e.g. signed information flow. During the last decades the new results from various disciplines have all suggested towards the common background and processing for sign and speech and this was one of the key issues that I wanted to investigate further in this thesis. The basis of this thesis is firmly within speech research and that is why I wanted to design analogous test batteries for widely used speech perception tests for signers – to find out whether the results for signers would be the same as in speakers’ perception tests. One of the key findings within biology – and more precisely its effects on speech and communication research – is the mirror neuron system. That finding has enabled us to form new theories about evolution of communication, and it all seems to converge on the hypothesis that all communication has a common core within humans. In this thesis speech and sign are discussed as equal and analogical counterparts of communication and all research methods used in speech are modified for sign. Both speech and sign are thus investigated using similar test batteries. Furthermore, both production and perception of speech and sign are studied separately. An additional framework for studying production is given by gesture research using cry sounds. Results of cry sound research are then compared to results from children acquiring sign language. These results show that individuality manifests itself from very early on in human development. Articulation in adults, both in speech and sign, is studied from two perspectives: normal production and re-learning production when the apparatus has been changed. Normal production is studied both in speech and sign and the effects of changed articulation are studied with regards to speech. Both these studies are done by using carrier sentences. Furthermore, sign production is studied giving the informants possibility for spontaneous speech. The production data from the signing informants is also used as the basis for input in the sign synthesis stimuli used in sign perception test battery. Speech and sign perception were studied using the informants’ answers to questions using forced choice in identification and discrimination tasks. These answers were then compared across language modalities. Three different informant groups participated in the sign perception tests: native signers, sign language interpreters and Finnish adults with no knowledge of any signed language. This gave a chance to investigate which of the characteristics found in the results were due to the language per se and which were due to the changes in modality itself. As the analogous test batteries yielded similar results over different informant groups, some common threads of results could be observed. Starting from very early on in acquiring speech and sign the results were highly individual. However, the results were the same within one individual when the same test was repeated. This individuality of results represented along same patterns across different language modalities and - in some occasions - across language groups. As both modalities yield similar answers to analogous study questions, this has lead us to providing methods for basic input for sign language applications, i.e. signing avatars. This has also given us answers to questions on precision of the animation and intelligibility for the users – what are the parameters that govern intelligibility of synthesised speech or sign and how precise must the animation or synthetic speech be in order for it to be intelligible. The results also give additional support to the well-known fact that intelligibility in fact is not the same as naturalness. In some cases, as shown within the sign perception test battery design, naturalness decreases intelligibility. This also has to be taken into consideration when designing applications. All in all, results from each of the test batteries, be they for signers or speakers, yield strikingly similar patterns, which would indicate yet further support for the common core for all human communication. Thus, we can modify and deepen the phonetic framework models for human communication based on the knowledge obtained from the results of the test batteries within this thesis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Centrifugal pumps are a notable end-consumer of electrical energy. Typical application of a centrifugal pump is the filling or emptying of a reservoir tank, where the pump is often operated at a constant speed until the process is completed. Installing a frequency converter to control the motor substitutes the traditional fixed-speed pumping system, allows the optimization of rotational speed profile for the pumping tasks and enables the estimation of rotational speed and shaft torque of an induction motor without any additional measurements from the motor shaft. Utilization of variable-speed operation provides the possibility to decrease the overall energy consumption of the pumping task. The static head of the pumping process may change during the pumping task. In such systems, the minimum rotational speed changes during reservoir filling or emptying, and the minimum energy consumption can’t be achieved with a fixed rotational speed. This thesis presents embedded algorithms to automatically identify, optimize and monitor pumping processes between supply and destination reservoirs, and evaluates the changing static head –based optimization method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The general trend towards increasing e ciency and energy density drives the industry to high-speed technologies. Active Magnetic Bearings (AMBs) are one of the technologies that allow contactless support of a rotating body. Theoretically, there are no limitations on the rotational speed. The absence of friction, low maintenance cost, micrometer precision, and programmable sti ness have made AMBs a viable choice for highdemanding applications. Along with the advances in power electronics, such as signi cantly improved reliability and cost, AMB systems have gained a wide adoption in the industry. The AMB system is a complex, open-loop unstable system with multiple inputs and outputs. For normal operation, such a system requires a feedback control. To meet the high demands for performance and robustness, model-based control techniques should be applied. These techniques require an accurate plant model description and uncertainty estimations. The advanced control methods require more e ort at the commissioning stage. In this work, a methodology is developed for an automatic commissioning of a subcritical, rigid gas blower machine. The commissioning process includes open-loop tuning of separate parts such as sensors and actuators. The next step is to apply a system identi cation procedure to obtain a model for the controller synthesis. Finally, a robust model-based controller is synthesized and experimentally evaluated in the full operating range of the system. The commissioning procedure is developed by applying only the system components available and a priori knowledge without any additional hardware. Thus, the work provides an intelligent system with a self-diagnostics feature and an automatic commissioning.