895 resultados para Semantic gap
Resumo:
This paper presents a domain ontology, the FeelingTheMusic Ontology - FTMOntology. FTMOntology is designed to represent the complex domain of music and how it relates to other domains like mood, personality and physiology. This includes representing the main concepts and relations of music domain with each of the above-mentioned domains. The concepts and relations between music, mood, personality and physiology. The main contribution of this work is to model and relate these different domains in a consistent ontology. © 2011 Springer-Verlag.
Resumo:
In this paper, we introduce a novel high-level visual content descriptor which is devised for performing semantic-based image classification and retrieval. The work can be treated as an attempt to bridge the so called “semantic gap”. The proposed image feature vector model is fundamentally underpinned by the image labelling framework, called Collaterally Confirmed Labelling (CCL), which incorporates the collateral knowledge extracted from the collateral texts of the images with the state-of-the-art low-level image processing and visual feature extraction techniques for automatically assigning linguistic keywords to image regions. Two different high-level image feature vector models are developed based on the CCL labelling of results for the purposes of image data clustering and retrieval respectively. A subset of the Corel image collection has been used for evaluating our proposed method. The experimental results to-date already indicates that our proposed semantic-based visual content descriptors outperform both traditional visual and textual image feature models.
Resumo:
In this paper, we introduce a novel high-level visual content descriptor devised for performing semantic-based image classification and retrieval. The work can be treated as an attempt for bridging the so called "semantic gap". The proposed image feature vector model is fundamentally underpinned by an automatic image labelling framework, called Collaterally Cued Labelling (CCL), which incorporates the collateral knowledge extracted from the collateral texts accompanying the images with the state-of-the-art low-level visual feature extraction techniques for automatically assigning textual keywords to image regions. A subset of the Corel image collection was used for evaluating the proposed method. The experimental results indicate that our semantic-level visual content descriptors outperform both conventional visual and textual image feature models.
Resumo:
Threat detection is a challenging problem, because threats appear in many variations and differences to normal behaviour can be very subtle. In this paper, we consider threats on a parking lot, where theft of a truck’s cargo occurs. The threats range from explicit, e.g. a person attacking the truck driver, to implicit, e.g. somebody loitering and then fiddling with the exterior of the truck in order to open it. Our goal is a system that is able to recognize a threat instantaneously as they develop. Typical observables of the threats are a person’s activity, presence in a particular zone and the trajectory. The novelty of this paper is an encoding of these threat observables in a semantic, intermediate-level representation, based on low-level visual features that have no intrinsic semantic meaning themselves. The aim of this representation was to bridge the semantic gap between the low-level tracks and motion and the higher-level notion of threats. In our experiments, we demonstrate that our semantic representation is more descriptive for threat detection than directly using low-level features. We find that a person’s activities are the most important elements of this semantic representation, followed by the person’s trajectory. The proposed threat detection system is very accurate: 96.6 % of the tracks are correctly interpreted, when considering the temporal context.
Resumo:
Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial Para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Lecture 4: Ontological Hypertext and the Semantic Web Contains Powerpoint Lecture slides and Hypertext Research Papers: Conceptual linking: Ontology-based Open Hypermedia (Carr et al. 2001); CS AKTiveSpace: Building a Semantic Web Application (Glaser et al., 2004); The Semantic Web Revisited (Shadbolt, Hall and Berners-Lee, 2006); Mind the Semantic Gap (Millard et al., 2005).
Resumo:
As more and more open-source software components become available on the internet we need automatic ways to label and compare them. For example, a developer who searches for reusable software must be able to quickly gain an understanding of retrieved components. This understanding cannot be gained at the level of source code due to the semantic gap between source code and the domain model. In this paper we present a lexical approach that uses the log-likelihood ratios of word frequencies to automatically provide labels for software components. We present a prototype implementation of our labeling/comparison algorithm and provide examples of its application. In particular, we apply the approach to detect trends in the evolution of a software system.
Resumo:
In order to bridge the “Semantic gap”, a number of relevance feedback (RF) mechanisms have been applied to content-based image retrieval (CBIR). However current RF techniques in most existing CBIR systems still lack satisfactory user interaction although some work has been done to improve the interaction as well as the search accuracy. In this paper, we propose a four-factor user interaction model and investigate its effects on CBIR by an empirical evaluation. Whilst the model was developed for our research purposes, we believe the model could be adapted to any content-based search system.
Resumo:
A new distance function to compare arbitrary partitions is proposed. Clustering of image collections and image segmentation give objects to be matched. Offered metric intends for combination of visual features and metadata analysis to solve a semantic gap between low-level visual features and high-level human concept.
Resumo:
The main challenges of multimedia data retrieval lie in the effective mapping between low-level features and high-level concepts, and in the individual users' subjective perceptions of multimedia content. ^ The objectives of this dissertation are to develop an integrated multimedia indexing and retrieval framework with the aim to bridge the gap between semantic concepts and low-level features. To achieve this goal, a set of core techniques have been developed, including image segmentation, content-based image retrieval, object tracking, video indexing, and video event detection. These core techniques are integrated in a systematic way to enable the semantic search for images/videos, and can be tailored to solve the problems in other multimedia related domains. In image retrieval, two new methods of bridging the semantic gap are proposed: (1) for general content-based image retrieval, a stochastic mechanism is utilized to enable the long-term learning of high-level concepts from a set of training data, such as user access frequencies and access patterns of images. (2) In addition to whole-image retrieval, a novel multiple instance learning framework is proposed for object-based image retrieval, by which a user is allowed to more effectively search for images that contain multiple objects of interest. An enhanced image segmentation algorithm is developed to extract the object information from images. This segmentation algorithm is further used in video indexing and retrieval, by which a robust video shot/scene segmentation method is developed based on low-level visual feature comparison, object tracking, and audio analysis. Based on shot boundaries, a novel data mining framework is further proposed to detect events in soccer videos, while fully utilizing the multi-modality features and object information obtained through video shot/scene detection. ^ Another contribution of this dissertation is the potential of the above techniques to be tailored and applied to other multimedia applications. This is demonstrated by their utilization in traffic video surveillance applications. The enhanced image segmentation algorithm, coupled with an adaptive background learning algorithm, improves the performance of vehicle identification. A sophisticated object tracking algorithm is proposed to track individual vehicles, while the spatial and temporal relationships of vehicle objects are modeled by an abstract semantic model. ^
Resumo:
With the proliferation of multimedia data and ever-growing requests for multimedia applications, there is an increasing need for efficient and effective indexing, storage and retrieval of multimedia data, such as graphics, images, animation, video, audio and text. Due to the special characteristics of the multimedia data, the Multimedia Database management Systems (MMDBMSs) have emerged and attracted great research attention in recent years. Though much research effort has been devoted to this area, it is still far from maturity and there exist many open issues. In this dissertation, with the focus of addressing three of the essential challenges in developing the MMDBMS, namely, semantic gap, perception subjectivity and data organization, a systematic and integrated framework is proposed with video database and image database serving as the testbed. In particular, the framework addresses these challenges separately yet coherently from three main aspects of a MMDBMS: multimedia data representation, indexing and retrieval. In terms of multimedia data representation, the key to address the semantic gap issue is to intelligently and automatically model the mid-level representation and/or semi-semantic descriptors besides the extraction of the low-level media features. The data organization challenge is mainly addressed by the aspect of media indexing where various levels of indexing are required to support the diverse query requirements. In particular, the focus of this study is to facilitate the high-level video indexing by proposing a multimodal event mining framework associated with temporal knowledge discovery approaches. With respect to the perception subjectivity issue, advanced techniques are proposed to support users' interaction and to effectively model users' perception from the feedback at both the image-level and object-level.
Resumo:
Software engineering researchers are challenged to provide increasingly more powerful levels of abstractions to address the rising complexity inherent in software solutions. One new development paradigm that places models as abstraction at the forefront of the development process is Model-Driven Software Development (MDSD). MDSD considers models as first class artifacts, extending the capability for engineers to use concepts from the problem domain of discourse to specify apropos solutions. A key component in MDSD is domain-specific modeling languages (DSMLs) which are languages with focused expressiveness, targeting a specific taxonomy of problems. The de facto approach used is to first transform DSML models to an intermediate artifact in a HLL e.g., Java or C++, then execute that resulting code.^ Our research group has developed a class of DSMLs, referred to as interpreted DSMLs (i-DSMLs), where models are directly interpreted by a specialized execution engine with semantics based on model changes at runtime. This execution engine uses a layered architecture and is referred to as a domain-specific virtual machine (DSVM). As the domain-specific model being executed descends the layers of the DSVM the semantic gap between the user-defined model and the services being provided by the underlying infrastructure is closed. The focus of this research is the synthesis engine, the layer in the DSVM which transforms i-DSML models into executable scripts for the next lower layer to process.^ The appeal of an i-DSML is constrained as it possesses unique semantics contained within the DSVM. Existing DSVMs for i-DSMLs exhibit tight coupling between the implicit model of execution and the semantics of the domain, making it difficult to develop DSVMs for new i-DSMLs without a significant investment in resources.^ At the onset of this research only one i-DSML had been created for the user- centric communication domain using the aforementioned approach. This i-DSML is the Communication Modeling Language (CML) and its DSVM is the Communication Virtual machine (CVM). A major problem with the CVM's synthesis engine is that the domain-specific knowledge (DSK) and the model of execution (MoE) are tightly interwoven consequently subsequent DSVMs would need to be developed from inception with no reuse of expertise.^ This dissertation investigates how to decouple the DSK from the MoE and subsequently producing a generic model of execution (GMoE) from the remaining application logic. This GMoE can be reused to instantiate synthesis engines for DSVMs in other domains. The generalized approach to developing the model synthesis component of i-DSML interpreters utilizes a reusable framework loosely coupled to DSK as swappable framework extensions.^ This approach involves first creating an i-DSML and its DSVM for a second do- main, demand-side smartgrid, or microgrid energy management, and designing the synthesis engine so that the DSK and MoE are easily decoupled. To validate the utility of the approach, the SEs are instantiated using the GMoE and DSKs of the two aforementioned domains and an empirical study to support our claim of reduced developmental effort is performed.^
Resumo:
The advent of smart TVs has reshaped the TV-consumer interaction by combining TVs with mobile-like applications and access to the Internet. However, consumers are still unable to seamlessly interact with the contents being streamed. An example of such limitation is TV shopping, in which a consumer makes a purchase of a product or item displayed in the current TV show. Currently, consumers can only stop the current show and attempt to find a similar item in the Web or an actual store. It would be more convenient if the consumer could interact with the TV to purchase interesting items. ^ Towards the realization of TV shopping, this dissertation proposes a scalable multimedia content processing framework. Two main challenges in TV shopping are addressed: the efficient detection of products in the content stream, and the retrieval of similar products given a consumer-selected product. The proposed framework consists of three components. The first component performs computational and temporal aware multimedia abstraction to select a reduced number of frames that summarize the important information in the video stream. By both reducing the number of frames and taking into account the computational cost of the subsequent detection phase, this component component allows the efficient detection of products in the stream. The second component realizes the detection phase. It executes scalable product detection using multi-cue optimization. Additional information cues are formulated into an optimization problem that allows the detection of complex products, i.e., those that do not have a rigid form and can appear in various poses. After the second component identifies products in the video stream, the consumer can select an interesting one for which similar ones must be located in a product database. To this end, the third component of the framework consists of an efficient, multi-dimensional, tree-based indexing method for multimedia databases. The proposed index mechanism serves as the backbone of the search. Moreover, it is able to efficiently bridge the semantic gap and perception subjectivity issues during the retrieval process to provide more relevant results.^
Resumo:
Software engineering researchers are challenged to provide increasingly more pow- erful levels of abstractions to address the rising complexity inherent in software solu- tions. One new development paradigm that places models as abstraction at the fore- front of the development process is Model-Driven Software Development (MDSD). MDSD considers models as first class artifacts, extending the capability for engineers to use concepts from the problem domain of discourse to specify apropos solutions. A key component in MDSD is domain-specific modeling languages (DSMLs) which are languages with focused expressiveness, targeting a specific taxonomy of problems. The de facto approach used is to first transform DSML models to an intermediate artifact in a HLL e.g., Java or C++, then execute that resulting code. Our research group has developed a class of DSMLs, referred to as interpreted DSMLs (i-DSMLs), where models are directly interpreted by a specialized execution engine with semantics based on model changes at runtime. This execution engine uses a layered architecture and is referred to as a domain-specific virtual machine (DSVM). As the domain-specific model being executed descends the layers of the DSVM the semantic gap between the user-defined model and the services being provided by the underlying infrastructure is closed. The focus of this research is the synthesis engine, the layer in the DSVM which transforms i-DSML models into executable scripts for the next lower layer to process. The appeal of an i-DSML is constrained as it possesses unique semantics contained within the DSVM. Existing DSVMs for i-DSMLs exhibit tight coupling between the implicit model of execution and the semantics of the domain, making it difficult to develop DSVMs for new i-DSMLs without a significant investment in resources. At the onset of this research only one i-DSML had been created for the user- centric communication domain using the aforementioned approach. This i-DSML is the Communication Modeling Language (CML) and its DSVM is the Communication Virtual machine (CVM). A major problem with the CVM’s synthesis engine is that the domain-specific knowledge (DSK) and the model of execution (MoE) are tightly interwoven consequently subsequent DSVMs would need to be developed from inception with no reuse of expertise. This dissertation investigates how to decouple the DSK from the MoE and sub- sequently producing a generic model of execution (GMoE) from the remaining appli- cation logic. This GMoE can be reused to instantiate synthesis engines for DSVMs in other domains. The generalized approach to developing the model synthesis com- ponent of i-DSML interpreters utilizes a reusable framework loosely coupled to DSK as swappable framework extensions. This approach involves first creating an i-DSML and its DSVM for a second do- main, demand-side smartgrid, or microgrid energy management, and designing the synthesis engine so that the DSK and MoE are easily decoupled. To validate the utility of the approach, the SEs are instantiated using the GMoE and DSKs of the two aforementioned domains and an empirical study to support our claim of reduced developmental effort is performed.
Resumo:
A strategy for document analysis is presented which uses Portable Document Format (PDF the underlying file structure for Adobe Acrobat software) as its starting point. This strategy examines the appearance and geometric position of text and image blocks distributed over an entire document. A blackboard system is used to tag the blocks as a first stage in deducing the fundamental relationships existing between them. PDF is shown to be a useful intermediate stage in the bottom-up analysis of document structure. Its information on line spacing and font usage gives important clues in bridging the semantic gap between the scanned bitmap page and its fully analysed, block-structured form. Analysis of PDF can yield not only accurate page decomposition but also sufficient document information for the later stages of structural analysis and document understanding.