250 resultados para image coding
Resumo:
This thesis presents an original approach to parametric speech coding at rates below 1 kbitsjsec, primarily for speech storage applications. Essential processes considered in this research encompass efficient characterization of evolutionary configuration of vocal tract to follow phonemic features with high fidelity, representation of speech excitation using minimal parameters with minor degradation in naturalness of synthesized speech, and finally, quantization of resulting parameters at the nominated rates. For encoding speech spectral features, a new method relying on Temporal Decomposition (TD) is developed which efficiently compresses spectral information through interpolation between most steady points over time trajectories of spectral parameters using a new basis function. The compression ratio provided by the method is independent of the updating rate of the feature vectors, hence allows high resolution in tracking significant temporal variations of speech formants with no effect on the spectral data rate. Accordingly, regardless of the quantization technique employed, the method yields a high compression ratio without sacrificing speech intelligibility. Several new techniques for improving performance of the interpolation of spectral parameters through phonetically-based analysis are proposed and implemented in this research, comprising event approximated TD, near-optimal shaping event approximating functions, efficient speech parametrization for TD on the basis of an extensive investigation originally reported in this thesis, and a hierarchical error minimization algorithm for decomposition of feature parameters which significantly reduces the complexity of the interpolation process. Speech excitation in this work is characterized based on a novel Multi-Band Excitation paradigm which accurately determines the harmonic structure in the LPC (linear predictive coding) residual spectra, within individual bands, using the concept 11 of Instantaneous Frequency (IF) estimation in frequency domain. The model yields aneffective two-band approximation to excitation and computes pitch and voicing with high accuracy as well. New methods for interpolative coding of pitch and gain contours are also developed in this thesis. For pitch, relying on the correlation between phonetic evolution and pitch variations during voiced speech segments, TD is employed to interpolate the pitch contour between critical points introduced by event centroids. This compresses pitch contour in the ratio of about 1/10 with negligible error. To approximate gain contour, a set of uniformly-distributed Gaussian event-like functions is used which reduces the amount of gain information to about 1/6 with acceptable accuracy. The thesis also addresses a new quantization method applied to spectral features on the basis of statistical properties and spectral sensitivity of spectral parameters extracted from TD-based analysis. The experimental results show that good quality speech, comparable to that of conventional coders at rates over 2 kbits/sec, can be achieved at rates 650-990 bits/sec.
Resumo:
This paper proposes a generic decoupled imagebased control scheme for cameras obeying the unified projection model. The scheme is based on the spherical projection model. Invariants to rotational motion are computed from this projection and used to control the translational degrees of freedom. Importantly we form invariants which decrease the sensitivity of the interaction matrix to object depth variation. Finally, the proposed results are validated with experiments using a classical perspective camera as well as a fisheye camera mounted on a 6-DOF robotic platform.
Resumo:
This paper reports on the empirical comparison of seven machine learning algorithms in texture classification with application to vegetation management in power line corridors. Aiming at classifying tree species in power line corridors, object-based method is employed. Individual tree crowns are segmented as the basic classification units and three classic texture features are extracted as the input to the classification algorithms. Several widely used performance metrics are used to evaluate the classification algorithms. The experimental results demonstrate that the classification performance depends on the performance matrix, the characteristics of datasets and the feature used.
Resumo:
Advances in digital technology have caused a radical shift in moving image culture. This has occurred in both modes of production and sites of exhibition, resulting in a blurring of boundaries that previously defined a range of creative disciplines. Re-Imagining Animation: The Changing Face of the Moving Image, by Paul Wells and Johnny Hardstaff, argues that as a result of these blurred disciplinary boundaries, the term “animation” has become a “catch all” for describing any form of manipulated moving image practice. Understanding animation predicates the need to (re)define the medium within contemporary moving image culture. Via a series of case studies, the book engages with a range of moving image works, interrogating “how the many and varied approaches to making film, graphics, visual artefacts, multimedia and other intimations of motion pictures can now be delineated and understood” (p. 7). The structure and clarity of content make this book ideally suited to any serious study of contemporary animation which accepts animation as a truly interdisciplinary medium.