601 resultados para redistribution paradox
Resumo:
The performance of automatic speech recognition systems deteriorates in the presence of noise. One known solution is to incorporate video information with an existing acoustic speech recognition system. We investigate the performance of the individual acoustic and visual sub-systems and then examine different ways in which the integration of the two systems may be performed. The system is to be implemented in real time on a Texas Instruments' TMS320C80 DSP.
Resumo:
A system to segment and recognize Australian 4-digit postcodes from address labels on parcels is described. Images of address labels are preprocessed and adaptively thresholded to reduce noise. Projections are used to segment the line and then the characters comprising the postcode. Individual digits are recognized using bispectral features extracted from their parallel beam projections. These features are insensitive to translation, scaling and rotation, and robust to noise. Results on scanned images are presented. The system is currently being improved and implemented to work on-line.
An approach to statistical lip modelling for speaker identification via chromatic feature extraction
Resumo:
This paper presents a novel technique for the tracking of moving lips for the purpose of speaker identification. In our system, a model of the lip contour is formed directly from chromatic information in the lip region. Iterative refinement of contour point estimates is not required. Colour features are extracted from the lips via concatenated profiles taken around the lip contour. Reduction of order in lip features is obtained via principal component analysis (PCA) followed by linear discriminant analysis (LDA). Statistical speaker models are built from the lip features based on the Gaussian mixture model (GMM). Identification experiments performed on the M2VTS1 database, show encouraging results
Resumo:
This paper investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. It has been previously shown in our own work, and in the work of others, that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms the performance of either sub-system. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise
Resumo:
Investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming two independent data streams. Recent work with multi-modal MSHMMs has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously (T.J. Wark et al., 1998), however this has been restricted to output fusion via single-stream HMMs. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification
Resumo:
Investigates the use of lip information, in conjunction with speech information, for robust speaker verification in the presence of background noise. We have previously shown (Int. Conf. on Acoustics, Speech and Signal Proc., vol. 6, pp. 3693-3696, May 1998) that features extracted from a speaker's moving lips hold speaker dependencies which are complementary with speech features. We demonstrate that the fusion of lip and speech information allows for a highly robust speaker verification system which outperforms either subsystem individually. We present a new technique for determining the weighting to be applied to each modality so as to optimize the performance of the fused system. Given a correct weighting, lip information is shown to be highly effective for reducing the false acceptance and false rejection error rates in the presence of background noise
Resumo:
Characteristics of surveillance video generally include low resolution and poor quality due to environmental, storage and processing limitations. It is extremely difficult for computers and human operators to identify individuals from these videos. To overcome this problem, super-resolution can be used in conjunction with an automated face recognition system to enhance the spatial resolution of video frames containing the subject and narrow down the number of manual verifications performed by the human operator by presenting a list of most likely candidates from the database. As the super-resolution reconstruction process is ill-posed, visual artifacts are often generated as a result. These artifacts can be visually distracting to humans and/or affect machine recognition algorithms. While it is intuitive that higher resolution should lead to improved recognition accuracy, the effects of super-resolution and such artifacts on face recognition performance have not been systematically studied. This paper aims to address this gap while illustrating that super-resolution allows more accurate identification of individuals from low-resolution surveillance footage. The proposed optical flow-based super-resolution method is benchmarked against Baker et al.’s hallucination and Schultz et al.’s super-resolution techniques on images from the Terrascope and XM2VTS databases. Ground truth and interpolated images were also tested to provide a baseline for comparison. Results show that a suitable super-resolution system can improve the discriminability of surveillance video and enhance face recognition accuracy. The experiments also show that Schultz et al.’s method fails when dealing surveillance footage due to its assumption of rigid objects in the scene. The hallucination and optical flow-based methods performed comparably, with the optical flow-based method producing less visually distracting artifacts that interfered with human recognition.
Resumo:
Recently, user tagging systems have grown in popularity on the web. The tagging process is quite simple for ordinary users, which contributes to its popularity. However, free vocabulary has lack of standardization and semantic ambiguity. It is possible to capture the semantics from user tagging into some form of ontology, but the application of the resulted ontology for recommendation making has not been that flourishing. In this paper we discuss our approach to learn domain ontology from user tagging information and apply the extracted tag ontology in a pilot tag recommendation experiment. The initial result shows that by using the tag ontology to re-rank the recommended tags, the accuracy of the tag recommendation can be improved.
Resumo:
Distributed Denial-of-Service (DDoS) attacks continue to be one of the most pernicious threats to the delivery of services over the Internet. Not only are DDoS attacks present in many guises, they are also continuously evolving as new vulnerabilities are exploited. Hence accurate detection of these attacks still remains a challenging problem and a necessity for ensuring high-end network security. An intrinsic challenge in addressing this problem is to effectively distinguish these Denial-of-Service attacks from similar looking Flash Events (FEs) created by legitimate clients. A considerable overlap between the general characteristics of FEs and DDoS attacks makes it difficult to precisely separate these two classes of Internet activity. In this paper we propose parameters which can be used to explicitly distinguish FEs from DDoS attacks and analyse two real-world publicly available datasets to validate our proposal. Our analysis shows that even though FEs appear very similar to DDoS attacks, there are several subtle dissimilarities which can be exploited to separate these two classes of events.
Resumo:
Epoxy-multiwall carbon nanotube nanocomposite thin films were prepared by spin casting. High power air plasma was used to preferentially etch a coating of epoxy and expose the underlying carbon nanotube network. Scanning electron microscopy (SEM) examination revealed well distributed and spatially connected carbon nanotube network in both the longitudinal direction (plasma etched surface) and traverse direction (through-thickness fractured surface). Topographical examination and conductive mode imaging of the plasma etched surface using atomic force microscope (AFM) in the contact mode enabled direct imaging of topography and current maps of the embedded carbon nanotube network. Bundles consisting of at least three single carbon nanotubes form part of the percolating network observed under high resolution current maps. Predominantly non-ohmic response is obtained in this study; behaviour attributed to less than effective polymer material removal when using air plasma etching.
Resumo:
Acoustic sensors play an important role in augmenting the traditional biodiversity monitoring activities carried out by ecologists and conservation biologists. With this ability however comes the burden of analysing large volumes of complex acoustic data. Given the complexity of acoustic sensor data, fully automated analysis for a wide range of species is still a significant challenge. This research investigates the use of citizen scientists to analyse large volumes of environmental acoustic data in order to identify bird species. Specifically, it investigates ways in which the efficiency of a user can be improved through the use of species identification tools and the use of reputation models to predict the accuracy of users with unidentified skill levels. Initial experimental results are reported.
Resumo:
Micro aerial vehicles (MAVs) are a rapidly growing area of research and development in robotics. For autonomous robot operations, localization has typically been calculated using GPS, external camera arrays, or onboard range or vision sensing. In cluttered indoor or outdoor environments, onboard sensing is the only viable option. In this paper we present an appearance-based approach to visual SLAM on a flying MAV using only low quality vision. Our approach consists of a visual place recognition algorithm that operates on 1000 pixel images, a lightweight visual odometry algorithm, and a visual expectation algorithm that improves the recall of place sequences and the precision with which they are recalled as the robot flies along a similar path. Using data gathered from outdoor datasets, we show that the system is able to perform visual recognition with low quality, intermittent visual sensory data. By combining the visual algorithms with the RatSLAM system, we also demonstrate how the algorithms enable successful SLAM.
Resumo:
By December 2010 total superannuation assets had reached $1.3 trillion, covering 94% of all Australians. This substantial growth was not a natural evolution. Rather it can be directly traced to three decades of bipartisan reform strategies based on a claimed public interest ideology. This article investigates the concerns raised by Superannuation Select Committees, consumer and union organisations, independent researchers and actuarial experts that, in contrast to the public interest rhetoric, the regulatory reforms have primarily achieved major private interest gains for powerful lobbyists. The findings of this analysis indicate that the democratic power of Australian governments to set economic policy agendas has been progressively eclipsed by the power of the financial services industry's producer groups. Rather than producing a best practice governance structure, fund members remain trapped in a post-reform cost paradox: no right of exit regardless of the deepening cost burden imposed. In an industry set to control a projected nominal figure of $6.7 trillion in superannuation assets by 2035, these findings suggest that the real change necessary to improve the deepening cost burden faced by fund members within a life-long, mandatory superannuation investment is now beyond any government's reach.
Resumo:
People all over the world are regularly hit by floods, cyclones, and other natural disasters. Many use smart phones and social media to stay connected, seek help, improvise, and cope with crises or challenging situations. This column discusses these practices after dark or during disasters to unveil challenges and opportunities for innovative designs that increase resilience and safety.
Resumo:
Probabilistic topic models have recently been used for activity analysis in video processing, due to their strong capacity to model both local activities and interactions in crowded scenes. In those applications, a video sequence is divided into a collection of uniform non-overlaping video clips, and the high dimensional continuous inputs are quantized into a bag of discrete visual words. The hard division of video clips, and hard assignment of visual words leads to problems when an activity is split over multiple clips, or the most appropriate visual word for quantization is unclear. In this paper, we propose a novel algorithm, which makes use of a soft histogram technique to compensate for the loss of information in the quantization process; and a soft cut technique in the temporal domain to overcome problems caused by separating an activity into two video clips. In the detection process, we also apply a soft decision strategy to detect unusual events.We show that the proposed soft decision approach outperforms its hard decision counterpart in both local and global activity modelling.