389 resultados para Markup Language for Manuscript Images
Resumo:
The topic of the present work is to study the relationship between the power of the learning algorithms on the one hand, and the expressive power of the logical language which is used to represent the problems to be learned on the other hand. The central question is whether enriching the language results in more learning power. In order to make the question relevant and nontrivial, it is required that both texts (sequences of data) and hypotheses (guesses) be translatable from the “rich” language into the “poor” one. The issue is considered for several logical languages suitable to describe structures whose domain is the set of natural numbers. It is shown that enriching the language does not give any advantage for those languages which define a monadic second-order language being decidable in the following sense: there is a fixed interpretation in the structure of natural numbers such that the set of sentences of this extended language true in that structure is decidable. But enriching the original language even by only one constant gives an advantage if this language contains a binary function symbol (which will be interpreted as addition). Furthermore, it is shown that behaviourally correct learning has exactly the same power as learning in the limit for those languages which define a monadic second-order language with the property given above, but has more power in case of languages containing a binary function symbol. Adding the natural requirement that the set of all structures to be learned is recursively enumerable, it is shown that it pays o6 to enrich the language of arithmetics for both finite learning and learning in the limit, but it does not pay off to enrich the language for behaviourally correct learning.
Resumo:
Automatic spoken Language Identi¯cation (LID) is the process of identifying the language spoken within an utterance. The challenge that this task presents is that no prior information is available indicating the content of the utterance or the identity of the speaker. The trend of globalization and the pervasive popularity of the Internet will amplify the need for the capabilities spoken language identi¯ca- tion systems provide. A prominent application arises in call centers dealing with speakers speaking di®erent languages. Another important application is to index or search huge speech data archives and corpora that contain multiple languages. The aim of this research is to develop techniques targeted at producing a fast and more accurate automatic spoken LID system compared to the previous National Institute of Standards and Technology (NIST) Language Recognition Evaluation. Acoustic and phonetic speech information are targeted as the most suitable fea- tures for representing the characteristics of a language. To model the acoustic speech features a Gaussian Mixture Model based approach is employed. Pho- netic speech information is extracted using existing speech recognition technol- ogy. Various techniques to improve LID accuracy are also studied. One approach examined is the employment of Vocal Tract Length Normalization to reduce the speech variation caused by di®erent speakers. A linear data fusion technique is adopted to combine the various aspects of information extracted from speech. As a result of this research, a LID system was implemented and presented for evaluation in the 2003 Language Recognition Evaluation conducted by the NIST.
Resumo:
The task addressed in this thesis is the automatic alignment of an ensemble of misaligned images in an unsupervised manner. This application is especially useful in computer vision applications where annotations of the shape of an object of interest present in a collection of images is required. Performing this task manually is a slow, tedious, expensive and error prone process which hinders the progress of research laboratories and businesses. Most recently, the unsupervised removal of geometric variation present in a collection of images has been referred to as congealing based on the seminal work of Learned-Miller [21]. The only assumption made in congealing is that the parametric nature of the misalignment is known a priori (e.g. translation, similarity, a�ne, etc) and that the object of interest is guaranteed to be present in each image. The capability to congeal an ensemble of misaligned images stemming from the same object class has numerous applications in object recognition, detection and tracking. This thesis concerns itself with the construction of a congealing algorithm titled, least-squares congealing, which is inspired by the well known image to image alignment algorithm developed by Lucas and Kanade [24]. The algorithm is shown to have superior performance characteristics when compared to previously established methods: canonical congealing by Learned-Miller [21] and stochastic congealing by Z�ollei [39].
Resumo:
Component software has many benefits, most notably increased software re-use; however, the component software process places heavy burdens on programming language technology, which modern object-oriented programming languages do not address. In particular, software components require specifications that are both sufficiently expressive and sufficiently abstract, and, where possible, these specifications should be checked formally by the programming language. This dissertation presents a programming language called Mentok that provides two novel programming language features enabling improved specification of stateful component roles. Negotiable interfaces are interface types extended with protocols, and allow specification of changing method availability, including some patterns of out-calls and re-entrance. Type layers are extensions to module signatures that allow specification of abstract control flow constraints through the interfaces of a component-based application. Development of Mentok's unique language features included creation of MentokC, the Mentok compiler, and formalization of key properties of Mentok in mini-languages called MentokP and MentokL.
Resumo:
This thesis addresses the problem of detecting and describing the same scene points in different wide-angle images taken by the same camera at different viewpoints. This is a core competency of many vision-based localisation tasks including visual odometry and visual place recognition. Wide-angle cameras have a large field of view that can exceed a full hemisphere, and the images they produce contain severe radial distortion. When compared to traditional narrow field of view perspective cameras, more accurate estimates of camera egomotion can be found using the images obtained with wide-angle cameras. The ability to accurately estimate camera egomotion is a fundamental primitive of visual odometry, and this is one of the reasons for the increased popularity in the use of wide-angle cameras for this task. Their large field of view also enables them to capture images of the same regions in a scene taken at very different viewpoints, and this makes them suited for visual place recognition. However, the ability to estimate the camera egomotion and recognise the same scene in two different images is dependent on the ability to reliably detect and describe the same scene points, or ‘keypoints’, in the images. Most algorithms used for this purpose are designed almost exclusively for perspective images. Applying algorithms designed for perspective images directly to wide-angle images is problematic as no account is made for the image distortion. The primary contribution of this thesis is the development of two novel keypoint detectors, and a method of keypoint description, designed for wide-angle images. Both reformulate the Scale- Invariant Feature Transform (SIFT) as an image processing operation on the sphere. As the image captured by any central projection wide-angle camera can be mapped to the sphere, applying these variants to an image on the sphere enables keypoints to be detected in a manner that is invariant to image distortion. Each of the variants is required to find the scale-space representation of an image on the sphere, and they differ in the approaches they used to do this. Extensive experiments using real and synthetically generated wide-angle images are used to validate the two new keypoint detectors and the method of keypoint description. The best of these two new keypoint detectors is applied to vision based localisation tasks including visual odometry and visual place recognition using outdoor wide-angle image sequences. As part of this work, the effect of keypoint coordinate selection on the accuracy of egomotion estimates using the Direct Linear Transform (DLT) is investigated, and a simple weighting scheme is proposed which attempts to account for the uncertainty of keypoint positions during detection. A word reliability metric is also developed for use within a visual ‘bag of words’ approach to place recognition.
Resumo:
In this study, the feasibility of difference imaging for improving the contrast of electronic portal imaging device (EPID) images is investigated. The difference imaging technique consists of the acquisition of two EPID images (with and without the placement of an additional layer of attenuating medium on the surface of the EPID)and the subtraction of one of these images from the other. The resulting difference image shows improved contrast, compared to a standard EPID image, since it is generated by lower-energy photons. Results of this study show that, ¯rstly, this method can produce images exhibiting greater contrast than is seen in standard megavoltage EPID images and that, secondly, the optimal thickness of attenuating material for producing a maximum contrast enhancement may vary with phantom thickness and composition. Further studies of the possibilities and limitations of the di®erence imaging technique, and the physics behind it, are therefore recommended.
Resumo:
Interactional competence has emerged as a focal point for language testing researchers in recent years. In spoken communication involving two or more interlocutors, the co-construction of discourse is central to successful interaction. The acknowledgement of co-construction has led to concern over the impact of the interlocutor and the separability of performances in speaking tests involving interaction. The purpose of this article is to review recent studies of direct relevance to the construct of interactional competence and its operationalisation by raters in the context of second language speaking tests. The review begins by tracing the emergence of interaction as a criterion in speaking tests from a theoretical perspective, and then focuses on research salient to interactional effectiveness that has been carried out in the context of language testing interviews and group and paired speaking tests.
Resumo:
The MPEG-21 Multimedia Framework provides for controlled distribution of multimedia works through its Intellectual Property Management and Protection ("IPMP") Components and Rights Expression Language ("MPEG REL"). The IPMP Components provide a framework by which the components of an MPEG-21 digital item can be protected from undesired access, while MPEG REL provides a mechanism for describing the conditions under which a component of a digital item may be used and distributed. This chapter describes how the IPMP Components and MPEG REL were used to implement a series of digital rights management applications at the Cooperative Research Centre for Smart Internet Technology in Australia. While the IPMP Components and MPEG REL were initially designed to facilitate the protection of copyright, the applications also show how the technology can be adapted to the protection of private personal information and sensitive corporate information.
Resumo:
Embedded generalized markup, as applied by digital humanists to the recording and studying of our textual cultural heritage, suffers from a number of serious technical drawbacks. As a result of its evolution from early printer control languages, generalized markup can only express a document’s ‘logical’ structure via a repertoire of permissible printed format structures. In addition to the well-researched overlap problem, the embedding of markup codes into texts that never had them when written leads to a number of further difficulties: the inclusion of potentially obsolescent technical and subjective information into texts that are supposed to be archivable for the long term, the manual encoding of information that could be better computed automatically, and the obscuring of the text by highly complex technical data. Many of these problems can be alleviated by asserting a separation between the versions of which many cultural heritage texts are composed, and their content. In this way the complex inter-connections between versions can be handled automatically, leaving only simple markup for individual versions to be handled by the user.