387 resultados para Spectral projected gradient method

em Queensland University of Technology - ePrints Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acoustic recordings of the environment provide an effective means to monitor bird species diversity. To facilitate exploration of acoustic recordings, we describe a content-based birdcall retrieval algorithm. A query birdcall is a region of spectrogram bounded by frequency and time. Retrieval depends on a similarity measure derived from the orientation and distribution of spectral ridges. The spectral ridge detection method caters for a broad range of birdcall structures. In this paper, we extend previous work by incorporating a spectrogram scaling step in order to improve the detection of spectral ridges. Compared to an existing approach based on MFCC features, our feature representation achieves better retrieval performance for multiple bird species in noisy recordings.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, a new alternating direction implicit Galerkin--Legendre spectral method for the two-dimensional Riesz space fractional nonlinear reaction-diffusion equation is developed. The temporal component is discretized by the Crank--Nicolson method. The detailed implementation of the method is presented. The stability and convergence analysis is strictly proven, which shows that the derived method is stable and convergent of order $2$ in time. An optimal error estimate in space is also obtained by introducing a new orthogonal projector. The present method is extended to solve the fractional FitzHugh--Nagumo model. Numerical results are provided to verify the theoretical analysis.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The fractional Fokker-Planck equation is an important physical model for simulating anomalous diffusions with external forces. Because of the non-local property of the fractional derivative an interesting problem is to explore high accuracy numerical methods for fractional differential equations. In this paper, a space-time spectral method is presented for the numerical solution of the time fractional Fokker-Planck initial-boundary value problem. The proposed method employs the Jacobi polynomials for the temporal discretization and Fourier-like basis functions for the spatial discretization. Due to the diagonalizable trait of the Fourier-like basis functions, this leads to a reduced representation of the inner product in the Galerkin analysis. We prove that the time fractional Fokker-Planck equation attains the same approximation order as the time fractional diffusion equation developed in [23] by using the present method. That indicates an exponential decay may be achieved if the exact solution is sufficiently smooth. Finally, some numerical results are given to demonstrate the high order accuracy and efficiency of the new numerical scheme. The results show that the errors of the numerical solutions obtained by the space-time spectral method decay exponentially.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The application of object-based approaches to the problem of extracting vegetation information from images requires accurate delineation of individual tree crowns. This paper presents an automated method for individual tree crown detection and delineation by applying a simplified PCNN model in spectral feature space followed by post-processing using morphological reconstruction. The algorithm was tested on high resolution multi-spectral aerial images and the results are compared with two existing image segmentation algorithms. The results demonstrate that our algorithm outperforms the other two solutions with the average accuracy of 81.8%.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A method is presented for the development of a regional Landsat-5 Thematic Mapper (TM) and Landsat-7 Enhanced Thematic Mapper plus (ETM+) spectral greenness index, coherent with a six-dimensional index set, based on a single ETM+ spectral image of a reference landscape. The first three indices of the set are determined by a polar transformation of the first three principal components of the reference image and relate to scene brightness, percent foliage projective cover (FPC) and water related features. The remaining three principal components, of diminishing significance with respect to the reference image, complete the set. The reference landscape, a 2200 km2 area containing a mix of cattle pasture, native woodland and forest, is located near Injune in South East Queensland, Australia. The indices developed from the reference image were tested using TM spectral images from 19 regionally dispersed areas in Queensland, representative of dissimilar landscapes containing woody vegetation ranging from tall closed forest to low open woodland. Examples of image transformations and two-dimensional feature space plots are used to demonstrate image interpretations related to the first three indices. Coherent, sensible, interpretations of landscape features in images composed of the first three indices can be made in terms of brightness (red), foliage cover (green) and water (blue). A limited comparison is made with similar existing indices. The proposed greenness index was found to be very strongly related to FPC and insensitive to smoke. A novel Bayesian, bounded space, modelling method, was used to validate the greenness index as a good predictor of FPC. Airborne LiDAR (Light Detection and Ranging) estimates of FPC along transects of the 19 sites provided the training and validation data. Other spectral indices from the set were found to be useful as model covariates that could improve FPC predictions. They act to adjust the greenness/FPC relationship to suit different spectral backgrounds. The inclusion of an external meteorological covariate showed that further improvements to regional-scale predictions of FPC could be gained over those based on spectral indices alone.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Surveillance systems such as object tracking and abandoned object detection systems typically rely on a single modality of colour video for their input. These systems work well in controlled conditions but often fail when low lighting, shadowing, smoke, dust or unstable backgrounds are present, or when the objects of interest are a similar colour to the background. Thermal images are not affected by lighting changes or shadowing, and are not overtly affected by smoke, dust or unstable backgrounds. However, thermal images lack colour information which makes distinguishing between different people or objects of interest within the same scene difficult. ----- By using modalities from both the visible and thermal infrared spectra, we are able to obtain more information from a scene and overcome the problems associated with using either modality individually. We evaluate four approaches for fusing visual and thermal images for use in a person tracking system (two early fusion methods, one mid fusion and one late fusion method), in order to determine the most appropriate method for fusing multiple modalities. We also evaluate two of these approaches for use in abandoned object detection, and propose an abandoned object detection routine that utilises multiple modalities. To aid in the tracking and fusion of the modalities we propose a modified condensation filter that can dynamically change the particle count and features used according to the needs of the system. ----- We compare tracking and abandoned object detection performance for the proposed fusion schemes and the visual and thermal domains on their own. Testing is conducted using the OTCBVS database to evaluate object tracking, and data captured in-house to evaluate the abandoned object detection. Our results show that significant improvement can be achieved, and that a middle fusion scheme is most effective.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automatic Speech Recognition (ASR) has matured into a technology which is becoming more common in our everyday lives, and is emerging as a necessity to minimise driver distraction when operating in-car systems such as navigation and infotainment. In “noise-free” environments, word recognition performance of these systems has been shown to approach 100%, however this performance degrades rapidly as the level of background noise is increased. Speech enhancement is a popular method for making ASR systems more ro- bust. Single-channel spectral subtraction was originally designed to improve hu- man speech intelligibility and many attempts have been made to optimise this algorithm in terms of signal-based metrics such as maximised Signal-to-Noise Ratio (SNR) or minimised speech distortion. Such metrics are used to assess en- hancement performance for intelligibility not speech recognition, therefore mak- ing them sub-optimal ASR applications. This research investigates two methods for closely coupling subtractive-type enhancement algorithms with ASR: (a) a computationally-efficient Mel-filterbank noise subtraction technique based on likelihood-maximisation (LIMA), and (b) in- troducing phase spectrum information to enable spectral subtraction in the com- plex frequency domain. Likelihood-maximisation uses gradient-descent to optimise parameters of the enhancement algorithm to best fit the acoustic speech model given a word se- quence known a priori. Whilst this technique is shown to improve the ASR word accuracy performance, it is also identified to be particularly sensitive to non-noise mismatches between the training and testing data. Phase information has long been ignored in spectral subtraction as it is deemed to have little effect on human intelligibility. In this work it is shown that phase information is important in obtaining highly accurate estimates of clean speech magnitudes which are typically used in ASR feature extraction. Phase Estimation via Delay Projection is proposed based on the stationarity of sinusoidal signals, and demonstrates the potential to produce improvements in ASR word accuracy in a wide range of SNR. Throughout the dissertation, consideration is given to practical implemen- tation in vehicular environments which resulted in two novel contributions – a LIMA framework which takes advantage of the grounding procedure common to speech dialogue systems, and a resource-saving formulation of frequency-domain spectral subtraction for realisation in field-programmable gate array hardware. The techniques proposed in this dissertation were evaluated using the Aus- tralian English In-Car Speech Corpus which was collected as part of this work. This database is the first of its kind within Australia and captures real in-car speech of 50 native Australian speakers in seven driving conditions common to Australian environments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Robust image hashing seeks to transform a given input image into a shorter hashed version using a key-dependent non-invertible transform. These image hashes can be used for watermarking, image integrity authentication or image indexing for fast retrieval. This paper introduces a new method of generating image hashes based on extracting Higher Order Spectral features from the Radon projection of an input image. The feature extraction process is non-invertible, non-linear and different hashes can be produced from the same image through the use of random permutations of the input. We show that the transform is robust to typical image transformations such as JPEG compression, noise, scaling, rotation, smoothing and cropping. We evaluate our system using a verification-style framework based on calculating false match, false non-match likelihoods using the publicly available Uncompressed Colour Image database (UCID) of 1320 images. We also compare our results to Swaminathan’s Fourier-Mellin based hashing method with at least 1% EER improvement under noise, scaling and sharpening.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis aimed to investigate the way in which distance runners modulate their speed in an effort to understand the key processes and determinants of speed selection when encountering hills in natural outdoor environments. One factor which has limited the expansion of knowledge in this area has been a reliance on the motorized treadmill which constrains runners to constant speeds and gradients and only linear paths. Conversely, limits in the portability or storage capacity of available technology have restricted field research to brief durations and level courses. Therefore another aim of this thesis was to evaluate the capacity of lightweight, portable technology to measure running speed in outdoor undulating terrain. The first study of this thesis assessed the validity of a non-differential GPS to measure speed, displacement and position during human locomotion. Three healthy participants walked and ran over straight and curved courses for 59 and 34 trials respectively. A non-differential GPS receiver provided speed data by Doppler Shift and change in GPS position over time, which were compared with actual speeds determined by chronometry. Displacement data from the GPS were compared with a surveyed 100m section, while static positions were collected for 1 hour and compared with the known geodetic point. GPS speed values on the straight course were found to be closely correlated with actual speeds (Doppler shift: r = 0.9994, p < 0.001, Δ GPS position/time: r = 0.9984, p < 0.001). Actual speed errors were lowest using the Doppler shift method (90.8% of values within ± 0.1 m.sec -1). Speed was slightly underestimated on a curved path, though still highly correlated with actual speed (Doppler shift: r = 0.9985, p < 0.001, Δ GPS distance/time: r = 0.9973, p < 0.001). Distance measured by GPS was 100.46 ± 0.49m, while 86.5% of static points were within 1.5m of the actual geodetic point (mean error: 1.08 ± 0.34m, range 0.69-2.10m). Non-differential GPS demonstrated a highly accurate estimation of speed across a wide range of human locomotion velocities using only the raw signal data with a minimal decrease in accuracy around bends. This high level of resolution was matched by accurate displacement and position data. Coupled with reduced size, cost and ease of use, the use of a non-differential receiver offers a valid alternative to differential GPS in the study of overground locomotion. The second study of this dissertation examined speed regulation during overground running on a hilly course. Following an initial laboratory session to calculate physiological thresholds (VO2 max and ventilatory thresholds), eight experienced long distance runners completed a self- paced time trial over three laps of an outdoor course involving uphill, downhill and level sections. A portable gas analyser, GPS receiver and activity monitor were used to collect physiological, speed and stride frequency data. Participants ran 23% slower on uphills and 13.8% faster on downhills compared with level sections. Speeds on level sections were significantly different for 78.4 ± 7.0 seconds following an uphill and 23.6 ± 2.2 seconds following a downhill. Speed changes were primarily regulated by stride length which was 20.5% shorter uphill and 16.2% longer downhill, while stride frequency was relatively stable. Oxygen consumption averaged 100.4% of runner’s individual ventilatory thresholds on uphills, 78.9% on downhills and 89.3% on level sections. Group level speed was highly predicted using a modified gradient factor (r2 = 0.89). Individuals adopted distinct pacing strategies, both across laps and as a function of gradient. Speed was best predicted using a weighted factor to account for prior and current gradients. Oxygen consumption (VO2) limited runner’s speeds only on uphill sections, and was maintained in line with individual ventilatory thresholds. Running speed showed larger individual variation on downhill sections, while speed on the level was systematically influenced by the preceding gradient. Runners who varied their pace more as a function of gradient showed a more consistent level of oxygen consumption. These results suggest that optimising time on the level sections after hills offers the greatest potential to minimise overall time when running over undulating terrain. The third study of this thesis investigated the effect of implementing an individualised pacing strategy on running performance over an undulating course. Six trained distance runners completed three trials involving four laps (9968m) of an outdoor course involving uphill, downhill and level sections. The initial trial was self-paced in the absence of any temporal feedback. For the second and third field trials, runners were paced for the first three laps (7476m) according to two different regimes (Intervention or Control) by matching desired goal times for subsections within each gradient. The fourth lap (2492m) was completed without pacing. Goals for the Intervention trial were based on findings from study two using a modified gradient factor and elapsed distance to predict the time for each section. To maintain the same overall time across all paced conditions, times were proportionately adjusted according to split times from the self-paced trial. The alternative pacing strategy (Control) used the original split times from this initial trial. Five of the six runners increased their range of uphill to downhill speeds on the Intervention trial by more than 30%, but this was unsuccessful in achieving a more consistent level of oxygen consumption with only one runner showing a change of more than 10%. Group level adherence to the Intervention strategy was lowest on downhill sections. Three runners successfully adhered to the Intervention pacing strategy which was gauged by a low Root Mean Square error across subsections and gradients. Of these three, the two who had the largest change in uphill-downhill speeds ran their fastest overall time. This suggests that for some runners the strategy of varying speeds systematically to account for gradients and transitions may benefit race performances on courses involving hills. In summary, a non – differential receiver was found to offer highly accurate measures of speed, distance and position across the range of human locomotion speeds. Self-selected speed was found to be best predicted using a weighted factor to account for prior and current gradients. Oxygen consumption limited runner’s speeds only on uphills, speed on the level was systematically influenced by preceding gradients, while there was a much larger individual variation on downhill sections. Individuals were found to adopt distinct but unrelated pacing strategies as a function of durations and gradients, while runners who varied pace more as a function of gradient showed a more consistent level of oxygen consumption. Finally, the implementation of an individualised pacing strategy to account for gradients and transitions greatly increased runners’ range of uphill-downhill speeds and was able to improve performance in some runners. The efficiency of various gradient-speed trade- offs and the factors limiting faster downhill speeds will however require further investigation to further improve the effectiveness of the suggested strategy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The use of appropriate features to represent an output class or object is critical for all classification problems. In this paper, we propose a biologically inspired object descriptor to represent the spectral-texture patterns of image-objects. The proposed feature descriptor is generated from the pulse spectral frequencies (PSF) of a pulse coupled neural network (PCNN), which is invariant to rotation, translation and small scale changes. The proposed method is first evaluated in a rotation and scale invariant texture classification using USC-SIPI texture database. It is further evaluated in an application of vegetation species classification in power line corridor monitoring using airborne multi-spectral aerial imagery. The results from the two experiments demonstrate that the PSF feature is effective to represent spectral-texture patterns of objects and it shows better results than classic color histogram and texture features.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The tear film plays an important role preserving the health of the ocular surface and maintaining the optimal refractive power of the cornea. Moreover dry eye syndrome is one of the most commonly reported eye health problems. This syndrome is caused by abnormalities in the properties of the tear film. Current clinical tools to assess the tear film properties have shown certain limitations. The traditional invasive methods for the assessment of tear film quality, which are used by most clinicians, have been criticized for the lack of reliability and/or repeatability. A range of non-invasive methods of tear assessment have been investigated, but also present limitations. Hence no “gold standard” test is currently available to assess the tear film integrity. Therefore, improving techniques for the assessment of the tear film quality is of clinical significance and the main motivation for the work described in this thesis. In this study the tear film surface quality (TFSQ) changes were investigated by means of high-speed videokeratoscopy (HSV). In this technique, a set of concentric rings formed in an illuminated cone or a bowl is projected on the anterior cornea and their reflection from the ocular surface imaged on a charge-coupled device (CCD). The reflection of the light is produced in the outer most layer of the cornea, the tear film. Hence, when the tear film is smooth the reflected image presents a well structure pattern. In contrast, when the tear film surface presents irregularities, the pattern also becomes irregular due to the light scatter and deviation of the reflected light. The videokeratoscope provides an estimate of the corneal topography associated with each Placido disk image. Topographical estimates, which have been used in the past to quantify tear film changes, may not always be suitable for the evaluation of all the dynamic phases of the tear film. However the Placido disk image itself, which contains the reflected pattern, may be more appropriate to assess the tear film dynamics. A set of novel routines have been purposely developed to quantify the changes of the reflected pattern and to extract a time series estimate of the TFSQ from the video recording. The routine extracts from each frame of the video recording a maximized area of analysis. In this area a metric of the TFSQ is calculated. Initially two metrics based on the Gabor filter and Gaussian gradient-based techniques, were used to quantify the consistency of the pattern’s local orientation as a metric of TFSQ. These metrics have helped to demonstrate the applicability of HSV to assess the tear film, and the influence of contact lens wear on TFSQ. The results suggest that the dynamic-area analysis method of HSV was able to distinguish and quantify the subtle, but systematic degradation of tear film surface quality in the inter-blink interval in contact lens wear. It was also able to clearly show a difference between bare eye and contact lens wearing conditions. Thus, the HSV method appears to be a useful technique for quantitatively investigating the effects of contact lens wear on the TFSQ. Subsequently a larger clinical study was conducted to perform a comparison between HSV and two other non-invasive techniques, lateral shearing interferometry (LSI) and dynamic wavefront sensing (DWS). Of these non-invasive techniques, the HSV appeared to be the most precise method for measuring TFSQ, by virtue of its lower coefficient of variation. While the LSI appears to be the most sensitive method for analyzing the tear build-up time (TBUT). The capability of each of the non-invasive methods to discriminate dry eye from normal subjects was also investigated. The receiver operating characteristic (ROC) curves were calculated to assess the ability of each method to predict dry eye syndrome. The LSI technique gave the best results under both natural blinking conditions and in suppressed blinking conditions, which was closely followed by HSV. The DWS did not perform as well as LSI or HSV. The main limitation of the HSV technique, which was identified during the former clinical study, was the lack of the sensitivity to quantify the build-up/formation phase of the tear film cycle. For that reason an extra metric based on image transformation and block processing was proposed. In this metric, the area of analysis was transformed from Cartesian to Polar coordinates, converting the concentric circles pattern into a quasi-straight lines image in which a block statistics value was extracted. This metric has shown better sensitivity under low pattern disturbance as well as has improved the performance of the ROC curves. Additionally a theoretical study, based on ray-tracing techniques and topographical models of the tear film, was proposed to fully comprehend the HSV measurement and the instrument’s potential limitations. Of special interested was the assessment of the instrument’s sensitivity under subtle topographic changes. The theoretical simulations have helped to provide some understanding on the tear film dynamics, for instance the model extracted for the build-up phase has helped to provide some insight into the dynamics during this initial phase. Finally some aspects of the mathematical modeling of TFSQ time series have been reported in this thesis. Over the years, different functions have been used to model the time series as well as to extract the key clinical parameters (i.e., timing). Unfortunately those techniques to model the tear film time series do not simultaneously consider the underlying physiological mechanism and the parameter extraction methods. A set of guidelines are proposed to meet both criteria. Special attention was given to a commonly used fit, the polynomial function, and considerations to select the appropriate model order to ensure the true derivative of the signal is accurately represented. The work described in this thesis has shown the potential of using high-speed videokeratoscopy to assess tear film surface quality. A set of novel image and signal processing techniques have been proposed to quantify different aspects of the tear film assessment, analysis and modeling. The dynamic-area HSV has shown good performance in a broad range of conditions (i.e., contact lens, normal and dry eye subjects). As a result, this technique could be a useful clinical tool to assess tear film surface quality in the future.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider the problem of structured classification, where the task is to predict a label y from an input x, and y has meaningful internal structure. Our framework includes supervised training of Markov random fields and weighted context-free grammars as special cases. We describe an algorithm that solves the large-margin optimization problem defined in [12], using an exponential-family (Gibbs distribution) representation of structured objects. The algorithm is efficient—even in cases where the number of labels y is exponential in size—provided that certain expectations under Gibbs distributions can be calculated efficiently. The method for structured labels relies on a more general result, specifically the application of exponentiated gradient updates [7, 8] to quadratic programs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The uncertainty associated with how projected climate change will affect global C cycling could have a large impact on predictions of soil C stocks. The purpose of our study was to determine how various soil decomposition and chemistry characteristics relate to soil organic matter (SOM) temperature sensitivity. We accomplished this objective using long-term soil incubations at three temperatures (15, 25, and 35°C) and pyrolysis molecular beam mass spectrometry (py-MBMS) on 12 soils from 6 sites along a mean annual temperature (MAT) gradient (2–25.6°C). The Q10 values calculated from the CO2 respired during a long-term incubation using the Q10-q method showed decomposition of the more resistant fraction to be more temperature sensitive with a Q10-q of 1.95 ± 0.08 for the labile fraction and a Q10-q of 3.33 ± 0.04 for the more resistant fraction. We compared the fit of soil respiration data using a two-pool model (active and slow) with first-order kinetics with a three-pool model and found that the two and three-pool models statistically fit the data equally well. The three-pool model changed the size and rate constant for the more resistant pool. The size of the active pool in these soils, calculated using the two-pool model, increased with incubation temperature and ranged from 0.1 to 14.0% of initial soil organic C. Sites with an intermediate MAT and lowest C/N ratio had the largest active pool. Pyrolysis molecular beam mass spectrometry showed declines in carbohydrates with conversion from grassland to wheat cultivation and a greater amount of protected carbohydrates in allophanic soils which may have lead to differences found between the total amount of CO2 respired, the size of the active pool, and the Q10-q values of the soils.