924 resultados para IMAGE PROCESSING COMPUTER-ASSISTED
Resumo:
Current state of the art techniques for landmine detection in ground penetrating radar (GPR) utilize statistical methods to identify characteristics of a landmine response. This research makes use of 2-D slices of data in which subsurface landmine responses have hyperbolic shapes. Various methods from the field of visual image processing are adapted to the 2-D GPR data, producing superior landmine detection results. This research goes on to develop a physics-based GPR augmentation method motivated by current advances in visual object detection. This GPR specific augmentation is used to mitigate issues caused by insufficient training sets. This work shows that augmentation improves detection performance under training conditions that are normally very difficult. Finally, this work introduces the use of convolutional neural networks as a method to learn feature extraction parameters. These learned convolutional features outperform hand-designed features in GPR detection tasks. This work presents a number of methods, both borrowed from and motivated by the substantial work in visual image processing. The methods developed and presented in this work show an improvement in overall detection performance and introduce a method to improve the robustness of statistical classification.
Resumo:
Introduction: Computer-Aided-Design (CAD) and Computer-Aided-Manufacture (CAM) has been developed to fabricate fixed dental restorations accurately, faster and improve cost effectiveness of manufacture when compared to the conventional method. Two main methods exist in dental CAD/CAM technology: the subtractive and additive methods. While fitting accuracy of both methods has been explored, no study yet has compared the fabricated restoration (CAM output) to its CAD in terms of accuracy. The aim of this present study was to compare the output of various dental CAM routes to a sole initial CAD and establish the accuracy of fabrication. The internal fit of the various CAM routes were also investigated. The null hypotheses tested were: 1) no significant differences observed between the CAM output to the CAD and 2) no significant differences observed between the various CAM routes. Methods: An aluminium master model of a standard premolar preparation was scanned with a contact dental scanner (Incise, Renishaw, UK). A single CAD was created on the scanned master model (InciseCAD software, V2.5.0.140, UK). Twenty copings were then fabricated by sending the single CAD to a multitude of CAM routes. The copings were grouped (n=5) as: Laser sintered CoCrMo (LS), 5-axis milled CoCrMo (MCoCrMo), 3-axis milled zirconia (ZAx3) and 4-axis milled zirconia (ZAx4). All copings were micro-CT scanned (Phoenix X-Ray, Nanotom-S, Germany, power: 155kV, current: 60µA, 3600 projections) to produce 3-Dimensional (3D) models. A novel methodology was created to superimpose the micro-CT scans with the CAD (GOM Inspect software, V7.5SR2, Germany) to indicate inaccuracies in manufacturing. The accuracy in terms of coping volume was explored. The distances from the surfaces of the micro-CT 3D models to the surfaces of the CAD model (CAD Deviation) were investigated after creating surface colour deviation maps. Localised digital sections of the deviations (Occlusal, Axial and Cervical) and selected focussed areas were then quantitatively measured using software (GOM Inspect software, Germany). A novel methodology was also explored to digitally align (Rhino software, V5, USA) the micro-CT scans with the master model to investigate internal fit. Fifty digital cross sections of the aligned scans were created. Point-to-point distances were measured at 5 levels at each cross section. The five levels were: Vertical Marginal Fit (VF), Absolute Marginal Fit (AM), Axio-margin Fit (AMF), Axial Fit (AF) and Occlusal Fit (OF). Results: The results of the volume measurement were summarised as: VM-CoCrMo (62.8mm3 ) > VZax3 (59.4mm3 ) > VCAD (57mm3 ) > VZax4 (56.1mm3 ) > VLS (52.5mm3 ) and were all significantly different (p presented as areas with different colour. No significant differences were observed at the internal aspect of the cervical aspect between all groups of copings. Significant differences (p< M-CoCrMo Internal Occlusal, Internal Axial and External Axial 2 ZAx3 > ZAx4 External Occlusal, External Cervical 3 ZAx3 < ZAx4 Internal Occlusal 4 M-CoCrMo > ZAx4 Internal Occlusal and Internal Axial The mean values of AMF and AF were significantly (p M-CoCrMo and CAD > ZAx4. Only VF of M-CoCrMo was comparable with the CAD Internal Fit. All VF and AM values were within the clinically acceptable fit (120µm). Conclusion: The investigated CAM methods reproduced the CAD accurately at the internal cervical aspect of the copings. However, localised deviations at axial and occlusal aspects of the copings may suggest the need for modifications in these areas prior to fitting and veneering with porcelain. The CAM groups evaluated also showed different levels of Internal Fit thus rejecting the null hypotheses. The novel non-destructive methodologies for CAD/CAM accuracy and internal fit testing presented in this thesis may be a useful evaluation tool for similar applications.
Resumo:
Interacting with a computer system in the operating room (OR) can be a frustrating experience for a surgeon, who currently has to verbally delegate to an assistant every computer interaction task. This indirect mode of interaction is time consuming, error prone and can lead to poor usability of OR computer systems. This thesis describes the design and evaluation of a joystick-like device that allows direct surgeon control of the computer in the OR. The device was tested extensively in comparison to a mouse and delegated dictation with seven surgeons, eleven residents, and five graduate students. The device contains no electronic parts, is easy to use, is unobtrusive, has no physical connection to the computer and makes use of an existing tool in the OR. We performed a user study to determine its effectiveness in allowing a user to perform all the tasks they would be expected to perform on an OR computer system during a computer-assisted surgery. Dictation was found to be superior to the joystick in qualitative measures, but the joystick was preferred over dictation in user satisfaction responses. The mouse outperformed both joystick and dictation, but it is not a readily accepted modality in the OR.
Resumo:
With security and surveillance, there is an increasing need to process image data efficiently and effectively either at source or in a large data network. Whilst a Field-Programmable Gate Array (FPGA) has been seen as a key technology for enabling this, the design process has been viewed as problematic in terms of the time and effort needed for implementation and verification. The work here proposes a different approach of using optimized FPGA-based soft-core processors which allows the user to exploit the task and data level parallelism to achieve the quality of dedicated FPGA implementations whilst reducing design time. The paper also reports some preliminary
progress on the design flow to program the structure. An implementation for a Histogram of Gradients algorithm is also reported which shows that a performance of 328 fps can be achieved with this design approach, whilst avoiding the long design time, verification and debugging steps associated with conventional FPGA implementations.
Resumo:
[Sin resumen]
Resumo:
This thesis addresses the problem of detecting and describing the same scene points in different wide-angle images taken by the same camera at different viewpoints. This is a core competency of many vision-based localisation tasks including visual odometry and visual place recognition. Wide-angle cameras have a large field of view that can exceed a full hemisphere, and the images they produce contain severe radial distortion. When compared to traditional narrow field of view perspective cameras, more accurate estimates of camera egomotion can be found using the images obtained with wide-angle cameras. The ability to accurately estimate camera egomotion is a fundamental primitive of visual odometry, and this is one of the reasons for the increased popularity in the use of wide-angle cameras for this task. Their large field of view also enables them to capture images of the same regions in a scene taken at very different viewpoints, and this makes them suited for visual place recognition. However, the ability to estimate the camera egomotion and recognise the same scene in two different images is dependent on the ability to reliably detect and describe the same scene points, or ‘keypoints’, in the images. Most algorithms used for this purpose are designed almost exclusively for perspective images. Applying algorithms designed for perspective images directly to wide-angle images is problematic as no account is made for the image distortion. The primary contribution of this thesis is the development of two novel keypoint detectors, and a method of keypoint description, designed for wide-angle images. Both reformulate the Scale- Invariant Feature Transform (SIFT) as an image processing operation on the sphere. As the image captured by any central projection wide-angle camera can be mapped to the sphere, applying these variants to an image on the sphere enables keypoints to be detected in a manner that is invariant to image distortion. Each of the variants is required to find the scale-space representation of an image on the sphere, and they differ in the approaches they used to do this. Extensive experiments using real and synthetically generated wide-angle images are used to validate the two new keypoint detectors and the method of keypoint description. The best of these two new keypoint detectors is applied to vision based localisation tasks including visual odometry and visual place recognition using outdoor wide-angle image sequences. As part of this work, the effect of keypoint coordinate selection on the accuracy of egomotion estimates using the Direct Linear Transform (DLT) is investigated, and a simple weighting scheme is proposed which attempts to account for the uncertainty of keypoint positions during detection. A word reliability metric is also developed for use within a visual ‘bag of words’ approach to place recognition.
Resumo:
Camera calibration information is required in order for multiple camera networks to deliver more than the sum of many single camera systems. Methods exist for manually calibrating cameras with high accuracy. Manually calibrating networks with many cameras is, however, time consuming, expensive and impractical for networks that undergo frequent change. For this reason, automatic calibration techniques have been vigorously researched in recent years. Fully automatic calibration methods depend on the ability to automatically find point correspondences between overlapping views. In typical camera networks, cameras are placed far apart to maximise coverage. This is referred to as a wide base-line scenario. Finding sufficient correspondences for camera calibration in wide base-line scenarios presents a significant challenge. This thesis focuses on developing more effective and efficient techniques for finding correspondences in uncalibrated, wide baseline, multiple-camera scenarios. The project consists of two major areas of work. The first is the development of more effective and efficient view covariant local feature extractors. The second area involves finding methods to extract scene information using the information contained in a limited set of matched affine features. Several novel affine adaptation techniques for salient features have been developed. A method is presented for efficiently computing the discrete scale space primal sketch of local image features. A scale selection method was implemented that makes use of the primal sketch. The primal sketch-based scale selection method has several advantages over the existing methods. It allows greater freedom in how the scale space is sampled, enables more accurate scale selection, is more effective at combining different functions for spatial position and scale selection, and leads to greater computational efficiency. Existing affine adaptation methods make use of the second moment matrix to estimate the local affine shape of local image features. In this thesis, it is shown that the Hessian matrix can be used in a similar way to estimate local feature shape. The Hessian matrix is effective for estimating the shape of blob-like structures, but is less effective for corner structures. It is simpler to compute than the second moment matrix, leading to a significant reduction in computational cost. A wide baseline dense correspondence extraction system, called WiDense, is presented in this thesis. It allows the extraction of large numbers of additional accurate correspondences, given only a few initial putative correspondences. It consists of the following algorithms: An affine region alignment algorithm that ensures accurate alignment between matched features; A method for extracting more matches in the vicinity of a matched pair of affine features, using the alignment information contained in the match; An algorithm for extracting large numbers of highly accurate point correspondences from an aligned pair of feature regions. Experiments show that the correspondences generated by the WiDense system improves the success rate of computing the epipolar geometry of very widely separated views. This new method is successful in many cases where the features produced by the best wide baseline matching algorithms are insufficient for computing the scene geometry.
Resumo:
Recent modelling of socio-economic costs by the Australian railway industry in 2010 has estimated the cost of level crossing accidents to exceed AU$116 million annually. To better understand causal factors that contribute to these accidents, the Cooperative Research Centre for Rail Innovation is running a project entitled Baseline Level Crossing Video. The project aims to improve the recording of level crossing safety data by developing an intelligent system capable of detecting near-miss incidents and capturing quantitative data around these incidents. To detect near-miss events at railway level crossings a video analytics module is being developed to analyse video footage obtained from forward-facing cameras installed on trains. This paper presents a vision base approach for the detection of these near-miss events. The video analytics module is comprised of object detectors and a rail detection algorithm, allowing the distance between a detected object and the rail to be determined. An existing publicly available Histograms of Oriented Gradients (HOG) based object detector algorithm is used to detect various types of vehicles in each video frame. As vehicles are usually seen from a sideway view from the cabin’s perspective, the results of the vehicle detector are verified using an algorithm that can detect the wheels of each detected vehicle. Rail detection is facilitated using a projective transformation of the video, such that the forward-facing view becomes a bird’s eye view. Line Segment Detector is employed as the feature extractor and a sliding window approach is developed to track a pair of rails. Localisation of the vehicles is done by projecting the results of the vehicle and rail detectors on the ground plane allowing the distance between the vehicle and rail to be calculated. The resultant vehicle positions and distance are logged to a database for further analysis. We present preliminary results regarding the performance of a prototype video analytics module on a data set of videos containing more than 30 different railway level crossings. The video data is captured from a journey of a train that has passed through these level crossings.
Resumo:
Recent modelling of socio-economic costs by the Australian railway industry in 2010 has estimated the cost of level crossing accidents to exceed AU$116 million annually. To better understand the causal factors of these accidents, a video analytics application is being developed to automatically detect near-miss incidents using forward facing videos from trains. As near-miss events occur more frequently than collisions, by detecting these occurrences there will be more safety data available for analysis. The application that is being developed will improve the objectivity of near-miss reporting by providing quantitative data about the position of vehicles at level crossings through the automatic analysis of video footage. In this paper we present a novel method for detecting near-miss occurrences at railway level crossings from video data of trains. Our system detects and localizes vehicles at railway level crossings. It also detects the position of railways to calculate the distance of the detected vehicles to the railway centerline. The system logs the information about the position of the vehicles and railway centerline into a database for further analysis by the safety data recording and analysis system, to determine whether or not the event is a near-miss. We present preliminary results of our system on a dataset of videos taken from a train that passed through 14 railway level crossings. We demonstrate the robustness of our system by showing the results of our system on day and night videos.
Resumo:
This paper presents 'vSpeak', the first initiative taken in Pakistan for ICT enabled conversion of dynamic Sign Urdu gestures into natural language sentences. To realize this, vSpeak has adopted a novel approach for feature extraction using edge detection and image compression which gives input to the Artificial Neural Network that recognizes the gesture. This technique caters for the blurred images as well. The training and testing is currently being performed on a dataset of 200 patterns of 20 words from Sign Urdu with target accuracy of 90% and above.
Resumo:
The bilateral filter is a versatile non-linear filter that has found diverse applications in image processing, computer vision, computer graphics, and computational photography. A common form of the filter is the Gaussian bilateral filter in which both the spatial and range kernels are Gaussian. A direct implementation of this filter requires O(sigma(2)) operations per pixel, where sigma is the standard deviation of the spatial Gaussian. In this paper, we propose an accurate approximation algorithm that can cut down the computational complexity to O(1) per pixel for any arbitrary sigma (constant-time implementation). This is based on the observation that the range kernel operates via the translations of a fixed Gaussian over the range space, and that these translated Gaussians can be accurately approximated using the so-called Gauss-polynomials. The overall algorithm emerging from this approximation involves a series of spatial Gaussian filtering, which can be efficiently implemented (in parallel) using separability and recursion. We present some preliminary results to demonstrate that the proposed algorithm compares favorably with some of the existing fast algorithms in terms of speed and accuracy.
Resumo:
While cochlear implants (CIs) usually provide high levels of speech recognition in quiet, speech recognition in noise remains challenging. To overcome these difficulties, it is important to understand how implanted listeners separate a target signal from interferers. Stream segregation has been studied extensively in both normal and electric hearing, as a function of place of stimulation. However, the effects of pulse rate, independent of place, on the perceptual grouping of sequential sounds in electric hearing have not yet been investigated. A rhythm detection task was used to measure stream segregation. The results of this study suggest that while CI listeners can segregate streams based on differences in pulse rate alone, the amount of stream segregation observed decreases as the base pulse rate increases. Further investigation of the perceptual dimensions encoded by the pulse rate and the effect of sequential presentation of different stimulation rates on perception could be beneficial for the future development of speech processing strategies for CIs.