904 resultados para audio-visual automatic speech recognition


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The number of digital images has been increasing exponentially in the last few years. People have problems managing their image collections and finding a specific image. An automatic image categorization system could help them to manage images and find specific images. In this thesis, an unsupervised visual object categorization system was implemented to categorize a set of unknown images. The system is unsupervised, and hence, it does not need known images to train the system which needs to be manually obtained. Therefore, the number of possible categories and images can be huge. The system implemented in the thesis extracts local features from the images. These local features are used to build a codebook. The local features and the codebook are then used to generate a feature vector for an image. Images are categorized based on the feature vectors. The system is able to categorize any given set of images based on the visual appearance of the images. Images that have similar image regions are grouped together in the same category. Thus, for example, images which contain cars are assigned to the same cluster. The unsupervised visual object categorization system can be used in many situations, e.g., in an Internet search engine. The system can categorize images for a user, and the user can then easily find a specific type of image.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The skill of programming is a key asset for every computer science student. Many studies have shown that this is a hard skill to learn and the outcomes of programming courses have often been substandard. Thus, a range of methods and tools have been developed to assist students’ learning processes. One of the biggest fields in computer science education is the use of visualizations as a learning aid and many visualization based tools have been developed to aid the learning process during last few decades. Studies conducted in this thesis focus on two different visualizationbased tools TRAKLA2 and ViLLE. This thesis includes results from multiple empirical studies about what kind of effects the introduction and usage of these tools have on students’ opinions and performance, and what kind of implications there are from a teacher’s point of view. The results from studies in this thesis show that students preferred to do web-based exercises, and felt that those exercises contributed to their learning. The usage of the tool motivated students to work harder during their course, which was shown in overall course performance and drop-out statistics. We have also shown that visualization-based tools can be used to enhance the learning process, and one of the key factors is the higher and active level of engagement (see. Engagement Taxonomy by Naps et al., 2002). The automatic grading accompanied with immediate feedback helps students to overcome obstacles during the learning process, and to grasp the key element in the learning task. These kinds of tools can help us to cope with the fact that many programming courses are overcrowded with limited teaching resources. These tools allows us to tackle this problem by utilizing automatic assessment in exercises that are most suitable to be done in the web (like tracing and simulation) since its supports students’ independent learning regardless of time and place. In summary, we can use our course’s resources more efficiently to increase the quality of the learning experience of the students and the teaching experience of the teacher, and even increase performance of the students. There are also methodological results from this thesis which contribute to developing insight into the conduct of empirical evaluations of new tools or techniques. When we evaluate a new tool, especially one accompanied with visualization, we need to give a proper introduction to it and to the graphical notation used by tool. The standard procedure should also include capturing the screen with audio to confirm that the participants of the experiment are doing what they are supposed to do. By taken such measures in the study of the learning impact of visualization support for learning, we can avoid drawing false conclusion from our experiments. As computer science educators, we face two important challenges. Firstly, we need to start to deliver the message in our own institution and all over the world about the new – scientifically proven – innovations in teaching like TRAKLA2 and ViLLE. Secondly, we have the relevant experience of conducting teaching related experiment, and thus we can support our colleagues to learn essential know-how of the research based improvement of their teaching. This change can transform academic teaching into publications and by utilizing this approach we can significantly increase the adoption of the new tools and techniques, and overall increase the knowledge of best-practices. In future, we need to combine our forces and tackle these universal and common problems together by creating multi-national and multiinstitutional research projects. We need to create a community and a platform in which we can share these best practices and at the same time conduct multi-national research projects easily.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Diabetes is a rapidly increasing worldwide problem which is characterised by defective metabolism of glucose that causes long-term dysfunction and failure of various organs. The most common complication of diabetes is diabetic retinopathy (DR), which is one of the primary causes of blindness and visual impairment in adults. The rapid increase of diabetes pushes the limits of the current DR screening capabilities for which the digital imaging of the eye fundus (retinal imaging), and automatic or semi-automatic image analysis algorithms provide a potential solution. In this work, the use of colour in the detection of diabetic retinopathy is statistically studied using a supervised algorithm based on one-class classification and Gaussian mixture model estimation. The presented algorithm distinguishes a certain diabetic lesion type from all other possible objects in eye fundus images by only estimating the probability density function of that certain lesion type. For the training and ground truth estimation, the algorithm combines manual annotations of several experts for which the best practices were experimentally selected. By assessing the algorithm’s performance while conducting experiments with the colour space selection, both illuminance and colour correction, and background class information, the use of colour in the detection of diabetic retinopathy was quantitatively evaluated. Another contribution of this work is the benchmarking framework for eye fundus image analysis algorithms needed for the development of the automatic DR detection algorithms. The benchmarking framework provides guidelines on how to construct a benchmarking database that comprises true patient images, ground truth, and an evaluation protocol. The evaluation is based on the standard receiver operating characteristics analysis and it follows the medical practice in the decision making providing protocols for image- and pixel-based evaluations. During the work, two public medical image databases with ground truth were published: DIARETDB0 and DIARETDB1. The framework, DR databases and the final algorithm, are made public in the web to set the baseline results for automatic detection of diabetic retinopathy. Although deviating from the general context of the thesis, a simple and effective optic disc localisation method is presented. The optic disc localisation is discussed, since normal eye fundus structures are fundamental in the characterisation of DR.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Local features are used in many computer vision tasks including visual object categorization, content-based image retrieval and object recognition to mention a few. Local features are points, blobs or regions in images that are extracted using a local feature detector. To make use of extracted local features the localized interest points are described using a local feature descriptor. A descriptor histogram vector is a compact representation of an image and can be used for searching and matching images in databases. In this thesis the performance of local feature detectors and descriptors is evaluated for object class detection task. Features are extracted from image samples belonging to several object classes. Matching features are then searched using random image pairs of a same class. The goal of this thesis is to find out what are the best detector and descriptor methods for such task in terms of detector repeatability and descriptor matching rate.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

During a possible loss of coolant accident in BWRs, a large amount of steam will be released from the reactor pressure vessel to the suppression pool. Steam will be condensed into the suppression pool causing dynamic and structural loads to the pool. The formation and break up of bubbles can be measured by visual observation using a suitable pattern recognition algorithm. The aim of this study was to improve the preliminary pattern recognition algorithm, developed by Vesa Tanskanen in his doctoral dissertation, by using MATLAB. Video material from the PPOOLEX test facility, recorded during thermal stratification and mixing experiments, was used as a reference in the development of the algorithm. The developed algorithm consists of two parts: the pattern recognition of the bubbles and the analysis of recognized bubble images. The bubble recognition works well, but some errors will appear due to the complex structure of the pool. The results of the image analysis were reasonable. The volume and the surface area of the bubbles were not evaluated. Chugging frequencies calculated by using FFT fitted well into the results of oscillation frequencies measured in the experiments. The pattern recognition algorithm works in the conditions it is designed for. If the measurement configuration will be changed, some modifications have to be done. Numerous improvements are proposed for the future 3D equipment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The early facilitatory effect of a peripheral spatially visual prime stimulus described in the literature for simple reaction time tasks has been usually smaller than that described for complex (go/no-go, choice) reaction time tasks. In the present study we investigated the reason for this difference. In a first and a second experiment we tested the participants in both a simple task and a go/no-go task, half of them beginning with one of these tasks and half with the other one. We observed that the prime stimulus had an early effect, inhibitory for the simple task and facilitatory for the go/no-go task, when the task was performed first. No early effect appeared when the task was performed second. In a third and a fourth experiment the participants were, respectively, tested in the simple task and in the go/no-go task for four sessions (the prime stimulus was presented in the second, third and fourth sessions). The early effects of the prime stimulus did not change across the sessions, suggesting that a habituatory process was not the cause for the disappearance of these effects in the first two experiments. Our findings are compatible with the idea that different attentional strategies are adopted in simple and complex reaction time tasks. In the former tasks the gain of automatic attention mechanisms may be adjusted to a low level and in the latter tasks, to a high level. The attentional influence of the prime stimulus may be antagonized by another influence, possibly a masking one.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Several methods are used to estimate anaerobic threshold (AT) during exercise. The aim of the present study was to compare AT obtained by a graphic visual method for the estimate of ventilatory and metabolic variables (gold standard), to a bi-segmental linear regression mathematical model of Hinkley's algorithm applied to heart rate (HR) and carbon dioxide output (VCO2) data. Thirteen young (24 ± 2.63 years old) and 16 postmenopausal (57 ± 4.79 years old) healthy and sedentary women were submitted to a continuous ergospirometric incremental test on an electromagnetic braking cycloergometer with 10 to 20 W/min increases until physical exhaustion. The ventilatory variables were recorded breath-to-breath and HR was obtained beat-to-beat over real time. Data were analyzed by the nonparametric Friedman test and Spearman correlation test with the level of significance set at 5%. Power output (W), HR (bpm), oxygen uptake (VO2; mL kg-1 min-1), VO2 (mL/min), VCO2 (mL/min), and minute ventilation (VE; L/min) data observed at the AT level were similar for both methods and groups studied (P > 0.05). The VO2 (mL kg-1 min-1) data showed significant correlation (P < 0.05) between the gold standard method and the mathematical model when applied to HR (r s = 0.75) and VCO2 (r s = 0.78) data for the subjects as a whole (N = 29). The proposed mathematical method for the detection of changes in response patterns of VCO2 and HR was adequate and promising for AT detection in young and middle-aged women, representing a semi-automatic, non-invasive and objective AT measurement.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A long-standing debate in the literature is whether attention can form two or more independent spatial foci in addition to the well-known unique spatial focus. There is evidence that voluntary visual attention divides in space. The possibility that this also occurs for automatic visual attention was investigated here. Thirty-six female volunteers were tested. In each trial, a prime stimulus was presented in the left or right visual hemifield. This stimulus was characterized by the blinking of a superior, middle or inferior ring, the blinking of all these rings, or the blinking of the superior and inferior rings. A target stimulus to which the volunteer should respond with the same side hand or a target stimulus to which she should not respond was presented 100 ms later in a primed location, a location between two primed locations or a location in the contralateral hemifield. Reaction time to the positive target stimulus in a primed location was consistently shorter than reaction time in the horizontally corresponding contralateral location. This attentional effect was significantly smaller or absent when the positive target stimulus appeared in the middle location after the double prime stimulus. These results suggest that automatic visual attention can focus on two separate locations simultaneously, to some extent sparing the region in between.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivated by a recently proposed biologically inspired face recognition approach, we investigated the relation between human behavior and a computational model based on Fourier-Bessel (FB) spatial patterns. We measured human recognition performance of FB filtered face images using an 8-alternative forced-choice method. Test stimuli were generated by converting the images from the spatial to the FB domain, filtering the resulting coefficients with a band-pass filter, and finally taking the inverse FB transformation of the filtered coefficients. The performance of the computational models was tested using a simulation of the psychophysical experiment. In the FB model, face images were first filtered by simulated V1- type neurons and later analyzed globally for their content of FB components. In general, there was a higher human contrast sensitivity to radially than to angularly filtered images, but both functions peaked at the 11.3-16 frequency interval. The FB-based model presented similar behavior with regard to peak position and relative sensitivity, but had a wider frequency band width and a narrower response range. The response pattern of two alternative models, based on local FB analysis and on raw luminance, strongly diverged from the human behavior patterns. These results suggest that human performance can be constrained by the type of information conveyed by polar patterns, and consequently that humans might use FB-like spatial patterns in face processing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The occurrence of a weak auditory warning stimulus increases the speed of the response to a subsequent visual target stimulus that must be identified. This facilitatory effect has been attributed to the temporal expectancy automatically induced by the warning stimulus. It has not been determined whether this results from a modulation of the stimulus identification process, the response selection process or both. The present study examined these possibilities. A group of 12 young adults performed a reaction time location identification task and another group of 12 young adults performed a reaction time shape identification task. A visual target stimulus was presented 1850 to 2350 ms plus a fixed interval (50, 100, 200, 400, 800, or 1600 ms, depending on the block) after the appearance of a fixation point, on its left or right side, above or below a virtual horizontal line passing through it. In half of the trials, a weak auditory warning stimulus (S1) appeared 50, 100, 200, 400, 800, or 1600 ms (according to the block) before the target stimulus (S2). Twelve trials were run for each condition. The S1 produced a facilitatory effect for the 200, 400, 800, and 1600 ms stimulus onset asynchronies (SOA) in the case of the side stimulus-response (S-R) corresponding condition, and for the 100 and 400 ms SOA in the case of the side S-R non-corresponding condition. Since these two conditions differ mainly by their response selection requirements, it is reasonable to conclude that automatic temporal expectancy influences the response selection process.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The visualization of tools and manipulable objects activates motor-related areas in the cortex, facilitating possible actions toward them. This pattern of activity may underlie the phenomenon of object affordance. Some cortical motor neurons are also covertly activated during the recognition of body parts such as hands. One hypothesis is that different subpopulations of motor neurons in the frontal cortex are activated in each motor program; for example, canonical neurons in the premotor cortex are responsible for the affordance of visual objects, while mirror neurons support motor imagery triggered during handedness recognition. However, the question remains whether these subpopulations work independently. This hypothesis can be tested with a manual reaction time (MRT) task with a priming paradigm to evaluate whether the view of a manipulable object interferes with the motor imagery of the subject's hand. The MRT provides a measure of the course of information processing in the brain and allows indirect evaluation of cognitive processes. Our results suggest that canonical and mirror neurons work together to create a motor plan involving hand movements to facilitate successful object manipulation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many industrial applications need object recognition and tracking capabilities. The algorithms developed for those purposes are computationally expensive. Yet ,real time performance, high accuracy and small power consumption are essential measures of the system. When all these requirements are combined, hardware acceleration of these algorithms becomes a feasible solution. The purpose of this study is to analyze the current state of these hardware acceleration solutions, which algorithms have been implemented in hardware and what modifications have been done in order to adapt these algorithms to hardware.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The impact of automatic and manual shelling methods during manual/visual sorting of different batches of Brazil nuts from the 2010 and 2011 harvests was evaluated in order to investigate aflatoxin prevention.The samples were tested as follows: in-shell, shell, shelled, and pieces in order to evaluate the moisture content (mc), water activity (Aw), and total aflatoxin (LOD = 0.3 µg/kg and LOQ 0.85 µg/kg) at the Brazil nut processing plant. The results of aflatoxins obtained for the manually shelled nut samples ranged from 3.0 to 60.3 µg/g and from 2.0 to 31.0 µg/g for the automatically shelled samples. All samples showed levels of mc below the limit of 15%; on the other hand, shelled samples from both harvests showed levels of Aw above the limit. There were no significant differences concerning the manual or automatic shelling results during the sorting stages. On the other hand, the visual sorting was effective in decreasing the aflatoxin contamination in both methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Forty students from regular, grade five classes were divided into two groups of twenty, a good reader group and a' poor reader group, on the basis. of their reading scores on Canadian Achievement Tests. .The subjects took. part in four experimental conditions iM which they .learned lists of pronounceable and unprono~nceable pseudowords, some with semantic referents, and responded to questions designed tci test visual perceptu~l learning and lexical ·and semantic association learning. It' was hypothesized "that the good reade~ group would be able to make use of graphemic and phonemic redundancy patterns in order to improv~·visuSl perceptual learning and lexical and semantic association lea~ningto a greater extent. than would .the poor reader gr6up. The data supported this hypothesis, and also indicated that, although the poor readers were less adept at using familiar sound and letter patterns, they were more dependent on· such pa~terns as an aid to visual recognition memory and semantic recall than were the good readers. It wa.s postulated that poor readers are in a double- ~ . bind situatio~ of having to choose between using weak graphemic-semantic associations or gr~pheme-phoneme associations which are also weak and which have hindered them in developing automaticity in. reading.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This action research observes a second year Japanese class at a university where foreign language courses are elective for undergraduate students. In this study, using the six strategies to teach Japanese speech acts that Ishihara and Cohen (2006) suggested, I conducted three classes and analyzed my teaching practice with a critical friend. These strategies assist learners toward the development of their understanding of the following Japanese speech acts and also keep the learners to use them in a manner appropriate to the context: (I) invitation and refusal; (2) compliments; and (3) asking for a permission. The aim of this research is not only to improve my instruction in relation to second language (L2) pragmatic development, but also to raise further questions and to develop future research. The findings are analyzed and the data derived from my journals, artifacts, students' work, observation sheets, interviews with my critical friend, and pretests and posttests are coded and presented. The analysis shows that (I) after my critical friend encouraged my study and my students gave me some positive comments after each lesson, I gained confidence in teaching the suggested speech acts; (2) teaching involved explaining concepts and strategies, creating the visual material (a video) showing the strategies, and explaining the relationship between the strategy and grammatical forms and samples of misusing the forms; (3) students' background and learning styles influenced lessons; and (4) pretest and posttests showed that the students' Icvel of their L2 appropriate pragmatics dramatically improved after each instruction. However, after careful observation, it was noted that some factors prevented students from producing the correct output even though they understood the speech act differences.