927 resultados para 3D object recognition


Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose a new method for face recognition using fractal codes. Fractal codes represent local contractive, affine transformations which when iteratively applied to range-domain pairs in an arbitrary initial image result in a fixed point close to a given image. The transformation parameters such as brightness offset, contrast factor, orientation and the address of the corresponding domain for each range are used directly as features in our method. Features of an unknown face image are compared with those pre-computed for images in a database. There is no need to iterate, use fractal neighbor distances or fractal dimensions for comparison in the proposed method. This method is robust to scale change, frame size change and rotations as well as to some noise, facial expressions and blur distortion in the image

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Visual noise insensitivity is important to audio visual speech recognition (AVSR). Visual noise can take on a number of forms such as varying frame rate, occlusion, lighting or speaker variabilities. The use of a high dimensional secondary classifier on the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Preliminary results are presented demonstrating performance above the catastrophic fusion boundary for our confidence measure irrespective of the type of visual noise presented to it. Our experiments were restricted to small vocabulary applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The performance of automatic speech recognition systems deteriorates in the presence of noise. One known solution is to incorporate video information with an existing acoustic speech recognition system. We investigate the performance of the individual acoustic and visual sub-systems and then examine different ways in which the integration of the two systems may be performed. The system is to be implemented in real time on a Texas Instruments' TMS320C80 DSP.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A system to segment and recognize Australian 4-digit postcodes from address labels on parcels is described. Images of address labels are preprocessed and adaptively thresholded to reduce noise. Projections are used to segment the line and then the characters comprising the postcode. Individual digits are recognized using bispectral features extracted from their parallel beam projections. These features are insensitive to translation, scaling and rotation, and robust to noise. Results on scanned images are presented. The system is currently being improved and implemented to work on-line.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new technique is proposed for learning the dynamic characteristics of a deformable object, applied in particular to the problem of lip-tracking. Experimental results are given which demonstrate that the use of dynamic models allows the system to track more robustly under adverse conditions and to correct spurious, poorly tracked frames

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Characteristics of surveillance video generally include low resolution and poor quality due to environmental, storage and processing limitations. It is extremely difficult for computers and human operators to identify individuals from these videos. To overcome this problem, super-resolution can be used in conjunction with an automated face recognition system to enhance the spatial resolution of video frames containing the subject and narrow down the number of manual verifications performed by the human operator by presenting a list of most likely candidates from the database. As the super-resolution reconstruction process is ill-posed, visual artifacts are often generated as a result. These artifacts can be visually distracting to humans and/or affect machine recognition algorithms. While it is intuitive that higher resolution should lead to improved recognition accuracy, the effects of super-resolution and such artifacts on face recognition performance have not been systematically studied. This paper aims to address this gap while illustrating that super-resolution allows more accurate identification of individuals from low-resolution surveillance footage. The proposed optical flow-based super-resolution method is benchmarked against Baker et al.’s hallucination and Schultz et al.’s super-resolution techniques on images from the Terrascope and XM2VTS databases. Ground truth and interpolated images were also tested to provide a baseline for comparison. Results show that a suitable super-resolution system can improve the discriminability of surveillance video and enhance face recognition accuracy. The experiments also show that Schultz et al.’s method fails when dealing surveillance footage due to its assumption of rigid objects in the scene. The hallucination and optical flow-based methods performed comparably, with the optical flow-based method producing less visually distracting artifacts that interfered with human recognition.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper argues that teachers’ recognition of children’s cultural practices is an important positive step in helping socio-economically disadvantaged children engage with school literacies. Based on twenty-one longitudinal case studies of children’s literacy development over a three-year period, the authors demonstrate that when children’s knowledges and practices assembled in home and community spheres are treated as valuable material for school learning, children are more likely to invest in the work of acquiring school literacies. However they show also that whilst some children benefit greatly from being allowed to draw on their knowledge of popular culture, sports and the outdoors, other children’s interests may be ignored or excluded. Some differences in teachers’ valuing of home and community cultures appeared to relate to gender dimensions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Orthopaedic fracture fixation implants are increasingly being designed using accurate 3D models of long bones based on computer tomography (CT). Unlike CT, magnetic resonance imaging (MRI) does not involve ionising radiation and is therefore a desirable alternative to CT. This study aims to quantify the accuracy of MRI-based 3D models compared to CT-based 3D models of long bones. The femora of five intact cadaver ovine limbs were scanned using a 1.5T MRI and a CT scanner. Image segmentation of CT and MRI data was performed using a multi-threshold segmentation method. Reference models were generated by digitising the bone surfaces free of soft tissue with a mechanical contact scanner. The MRI- and CT-derived models were validated against the reference models. The results demonstrated that the CT-based models contained an average error of 0.15mm while the MRI-based models contained an average error of 0.23mm. Statistical validation shows that there are no significant differences between 3D models based on CT and MRI data. These results indicate that the geometric accuracy of MRI based 3D models was comparable to that of CT-based models and therefore MRI is a potential alternative to CT for generation of 3D models with high geometric accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The use of visual features in the form of lip movements to improve the performance of acoustic speech recognition has been shown to work well, particularly in noisy acoustic conditions. However, whether this technique can outperform speech recognition incorporating well-known acoustic enhancement techniques, such as spectral subtraction, or multi-channel beamforming is not known. This is an important question to be answered especially in an automotive environment, for the design of an efficient human-vehicle computer interface. We perform a variety of speech recognition experiments on a challenging automotive speech dataset and results show that synchronous HMM-based audio-visual fusion can outperform traditional single as well as multi-channel acoustic speech enhancement techniques. We also show that further improvement in recognition performance can be obtained by fusing speech-enhanced audio with the visual modality, demonstrating the complementary nature of the two robust speech recognition approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Autonomous development of sensorimotor coordination enables a robot to adapt and change its action choices to interact with the world throughout its lifetime. The Experience Network is a structure that rapidly learns coordination between visual and haptic inputs and motor action. This paper presents methods which handle the high dimensionality of the network state-space which occurs due to the simultaneous detection of multiple sensory features. The methods provide no significant increase in the complexity of the underlying representations and also allow emergent, task-specific, semantic information to inform action selection. Experimental results show rapid learning in a real robot, beginning with no sensorimotor mappings, to a mobile robot capable of wall avoidance and target acquisition.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In automatic facial expression recognition, an increasing number of techniques had been proposed for in the literature that exploits the temporal nature of facial expressions. As all facial expressions are known to evolve over time, it is crucially important for a classifier to be capable of modelling their dynamics. We establish that the method of sparse representation (SR) classifiers proves to be a suitable candidate for this purpose, and subsequently propose a framework for expression dynamics to be efficiently incorporated into its current formulation. We additionally show that for the SR method to be applied effectively, then a certain threshold on image dimensionality must be enforced (unlike in facial recognition problems). Thirdly, we determined that recognition rates may be significantly influenced by the size of the projection matrix \Phi. To demonstrate these, a battery of experiments had been conducted on the CK+ dataset for the recognition of the seven prototypic expressions - anger, contempt, disgust, fear, happiness, sadness and surprise - and comparisons have been made between the proposed temporal-SR against the static-SR framework and state-of-the-art support vector machine.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Robust speaker verification on short utterances remains a key consideration when deploying automatic speaker recognition, as many real world applications often have access to only limited duration speech data. This paper explores how the recent technologies focused around total variability modeling behave when training and testing utterance lengths are reduced. Results are presented which provide a comparison of Joint Factor Analysis (JFA) and i-vector based systems including various compensation techniques; Within-Class Covariance Normalization (WCCN), LDA, Scatter Difference Nuisance Attribute Projection (SDNAP) and Gaussian Probabilistic Linear Discriminant Analysis (GPLDA). Speaker verification performance for utterances with as little as 2 sec of data taken from the NIST Speaker Recognition Evaluations are presented to provide a clearer picture of the current performance characteristics of these techniques in short utterance conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Compressive Sensing (CS) is a popular signal processing technique, that can exactly reconstruct a signal given a small number of random projections of the original signal, provided that the signal is sufficiently sparse. We demonstrate the applicability of CS in the field of gait recognition as a very effective dimensionality reduction technique, using the gait energy image (GEI) as the feature extraction process. We compare the CS based approach to the principal component analysis (PCA) and show that the proposed method outperforms this baseline, particularly under situations where there are appearance changes in the subject. Applying CS to the gait features also avoids the need to train the models, by using a generalised random projection.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new method for the detection of abnormal vehicle trajectories is proposed. It couples optical flow extraction of vehicle velocities with a neural network classifier. Abnormal trajectories are indicative of drunk or sleepy drivers. A single feature of the vehicle, eg., a tail light, is isolated and the optical flow computed only around this feature rather than at each pixel in the image.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper reports an investigation of primary school children’s understandings about "square". 12 students participated in a small group teaching experiment session, where they were interviewed and guided to construct a square in a 3D virtual reality learning environment (VRLE). Main findings include mixed levels of "quasi" geometrical understandings, misconceptions about length and angles, and ambiguous uses of geometrical language for location, direction, and movement. These have implications for future teaching and learning about 2D shapes with particular reference to VRLE.