841 resultados para Natural language techniques, Semantic spaces, Random projection, Documents


Relevância:

50.00% 50.00%

Publicador:

Resumo:

Starting from the premise that human communication is predicated on translational phenomena, this paper applies theoretical insights and practical findings from Translation Studies to a critique of Natural Semantic Metalanguage (NSM), a theory of semantic analysis developed by Anna Wierzbicka. Key tenets of NSM, i.e. (1) culture-specificity of complex concepts; (2) the existence of a small set of universal semantic primes; and (3) definition by reductive paraphrase, are discussed critically with reference to the notions of untranslatability, equivalence, and intra-lingual translation, respectively. It is argued that a broad spectrum of research and theoretical reflection in Translation Studies may successfully feed into the study of cognition, meaning, language, and communication. The interdisciplinary exchange between Translation Studies and linguistics may be properly balanced, with the former not only being informed by but also informing and interrogating the latter.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

A retrieval model describes the transformation of a query into a set of documents. The question is: what drives this transformation? For semantic information retrieval type of models this transformation is driven by the content and structure of the semantic models. In this case, Knowledge Organization Systems (KOSs) are the semantic models that encode the meaning employed for monolingual and cross-language retrieval. The focus of this research is the relationship between these meanings’ representations and their role and potential in augmenting existing retrieval models effectiveness. The proposed approach is unique in explicitly interpreting a semantic reference as a pointer to a concept in the semantic model that activates all its linked neighboring concepts. It is in fact the formalization of the information retrieval model and the integration of knowledge resources from the Linguistic Linked Open Data cloud that is distinctive from other approaches. The preprocessing of the semantic model using Formal Concept Analysis enables the extraction of conceptual spaces (formal contexts)that are based on sub-graphs from the original structure of the semantic model. The types of conceptual spaces built in this case are limited by the KOSs structural relations relevant to retrieval: exact match, broader, narrower, and related. They capture the definitional and relational aspects of the concepts in the semantic model. Also, each formal context is assigned an operational role in the flow of processes of the retrieval system enabling a clear path towards the implementations of monolingual and cross-lingual systems. By following this model’s theoretical description in constructing a retrieval system, evaluation results have shown statistically significant results in both monolingual and bilingual settings when no methods for query expansion were used. The test suite was run on the Cross-Language Evaluation Forum Domain Specific 2004-2006 collection with additional extensions to match the specifics of this model.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This PhD thesis contributes to the problem of resource and service discovery in the context of the composable web. In the current web, mashup technologies allow developers reusing services and contents to build new web applications. However, developers face a problem of information flood when searching for appropriate services or resources for their combination. To contribute to overcoming this problem, a framework is defined for the discovery of services and resources. In this framework, three levels are defined for performing discovery at content, discovery and agente levels. The content level involves the information available in web resources. The web follows the Representational Stateless Transfer (REST) architectural style, in which resources are returned as representations from servers to clients. These representations usually employ the HyperText Markup Language (HTML), which, along with Content Style Sheets (CSS), describes the markup employed to render representations in a web browser. Although the use of SemanticWeb standards such as Resource Description Framework (RDF) make this architecture suitable for automatic processes to use the information present in web resources, these standards are too often not employed, so automation must rely on processing HTML. This process, often referred as Screen Scraping in the literature, is the content discovery according to the proposed framework. At this level, discovery rules indicate how the different pieces of data in resources’ representations are mapped onto semantic entities. By processing discovery rules on web resources, semantically described contents can be obtained out of them. The service level involves the operations that can be performed on the web. The current web allows users to perform different tasks such as search, blogging, e-commerce, or social networking. To describe the possible services in RESTful architectures, a high-level feature-oriented service methodology is proposed at this level. This lightweight description framework allows defining service discovery rules to identify operations in interactions with REST resources. The discovery is thus performed by applying discovery rules to contents discovered in REST interactions, in a novel process called service probing. Also, service discovery can be performed by modelling services as contents, i.e., by retrieving Application Programming Interface (API) documentation and API listings in service registries such as ProgrammableWeb. For this, a unified model for composable components in Mashup-Driven Development (MDD) has been defined after the analysis of service repositories from the web. The agent level involves the orchestration of the discovery of services and contents. At this level, agent rules allow to specify behaviours for crawling and executing services, which results in the fulfilment of a high-level goal. Agent rules are plans that allow introspecting the discovered data and services from the web and the knowledge present in service and content discovery rules to anticipate the contents and services to be found on specific resources from the web. By the definition of plans, an agent can be configured to target specific resources. The discovery framework has been evaluated on different scenarios, each one covering different levels of the framework. Contenidos a la Carta project deals with the mashing-up of news from electronic newspapers, and the framework was used for the discovery and extraction of pieces of news from the web. Similarly, in Resulta and VulneraNET projects the discovery of ideas and security knowledge in the web is covered, respectively. The service level is covered in the OMELETTE project, where mashup components such as services and widgets are discovered from component repositories from the web. The agent level is applied to the crawling of services and news in these scenarios, highlighting how the semantic description of rules and extracted data can provide complex behaviours and orchestrations of tasks in the web. The main contributions of the thesis are the unified framework for discovery, which allows configuring agents to perform automated tasks. Also, a scraping ontology has been defined for the construction of mappings for scraping web resources. A novel first-order logic rule induction algorithm is defined for the automated construction and maintenance of these mappings out of the visual information in web resources. Additionally, a common unified model for the discovery of services is defined, which allows sharing service descriptions. Future work comprises the further extension of service probing, resource ranking, the extension of the Scraping Ontology, extensions of the agent model, and contructing a base of discovery rules. Resumen La presente tesis doctoral contribuye al problema de descubrimiento de servicios y recursos en el contexto de la web combinable. En la web actual, las tecnologías de combinación de aplicaciones permiten a los desarrolladores reutilizar servicios y contenidos para construir nuevas aplicaciones web. Pese a todo, los desarrolladores afrontan un problema de saturación de información a la hora de buscar servicios o recursos apropiados para su combinación. Para contribuir a la solución de este problema, se propone un marco de trabajo para el descubrimiento de servicios y recursos. En este marco, se definen tres capas sobre las que se realiza descubrimiento a nivel de contenido, servicio y agente. El nivel de contenido involucra a la información disponible en recursos web. La web sigue el estilo arquitectónico Representational Stateless Transfer (REST), en el que los recursos son devueltos como representaciones por parte de los servidores a los clientes. Estas representaciones normalmente emplean el lenguaje de marcado HyperText Markup Language (HTML), que, unido al estándar Content Style Sheets (CSS), describe el marcado empleado para mostrar representaciones en un navegador web. Aunque el uso de estándares de la web semántica como Resource Description Framework (RDF) hace apta esta arquitectura para su uso por procesos automatizados, estos estándares no son empleados en muchas ocasiones, por lo que cualquier automatización debe basarse en el procesado del marcado HTML. Este proceso, normalmente conocido como Screen Scraping en la literatura, es el descubrimiento de contenidos en el marco de trabajo propuesto. En este nivel, un conjunto de reglas de descubrimiento indican cómo los diferentes datos en las representaciones de recursos se corresponden con entidades semánticas. Al procesar estas reglas sobre recursos web, pueden obtenerse contenidos descritos semánticamente. El nivel de servicio involucra las operaciones que pueden ser llevadas a cabo en la web. Actualmente, los usuarios de la web pueden realizar diversas tareas como búsqueda, blogging, comercio electrónico o redes sociales. Para describir los posibles servicios en arquitecturas REST, se propone en este nivel una metodología de alto nivel para descubrimiento de servicios orientada a funcionalidades. Este marco de descubrimiento ligero permite definir reglas de descubrimiento de servicios para identificar operaciones en interacciones con recursos REST. Este descubrimiento es por tanto llevado a cabo al aplicar las reglas de descubrimiento sobre contenidos descubiertos en interacciones REST, en un nuevo procedimiento llamado sondeo de servicios. Además, el descubrimiento de servicios puede ser llevado a cabo mediante el modelado de servicios como contenidos. Es decir, mediante la recuperación de documentación de Application Programming Interfaces (APIs) y listas de APIs en registros de servicios como ProgrammableWeb. Para ello, se ha definido un modelo unificado de componentes combinables para Mashup-Driven Development (MDD) tras el análisis de repositorios de servicios de la web. El nivel de agente involucra la orquestación del descubrimiento de servicios y contenidos. En este nivel, las reglas de nivel de agente permiten especificar comportamientos para el rastreo y ejecución de servicios, lo que permite la consecución de metas de mayor nivel. Las reglas de los agentes son planes que permiten la introspección sobre los datos y servicios descubiertos, así como sobre el conocimiento presente en las reglas de descubrimiento de servicios y contenidos para anticipar contenidos y servicios por encontrar en recursos específicos de la web. Mediante la definición de planes, un agente puede ser configurado para descubrir recursos específicos. El marco de descubrimiento ha sido evaluado sobre diferentes escenarios, cada uno cubriendo distintos niveles del marco. El proyecto Contenidos a la Carta trata de la combinación de noticias de periódicos digitales, y en él el framework se ha empleado para el descubrimiento y extracción de noticias de la web. De manera análoga, en los proyectos Resulta y VulneraNET se ha llevado a cabo un descubrimiento de ideas y de conocimientos de seguridad, respectivamente. El nivel de servicio se cubre en el proyecto OMELETTE, en el que componentes combinables como servicios y widgets se descubren en repositorios de componentes de la web. El nivel de agente se aplica al rastreo de servicios y noticias en estos escenarios, mostrando cómo la descripción semántica de reglas y datos extraídos permiten proporcionar comportamientos complejos y orquestaciones de tareas en la web. Las principales contribuciones de la tesis son el marco de trabajo unificado para descubrimiento, que permite configurar agentes para realizar tareas automatizadas. Además, una ontología de extracción ha sido definida para la construcción de correspondencias y extraer información de recursos web. Asimismo, un algoritmo para la inducción de reglas de lógica de primer orden se ha definido para la construcción y el mantenimiento de estas correspondencias a partir de la información visual de recursos web. Adicionalmente, se ha definido un modelo común y unificado para el descubrimiento de servicios que permite la compartición de descripciones de servicios. Como trabajos futuros se considera la extensión del sondeo de servicios, clasificación de recursos, extensión de la ontología de extracción y la construcción de una base de reglas de descubrimiento.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This paper shows the influence of the semantic content of urban sounds in the subjective evaluation of outer spaces. The study is based on the analysis conducted in three neighboring and integrated urban spaces with a different form of social ownership in the city of Cordoba, Argentina. It shows that the type of sound source present at each site influence, by its semantic content, in the user´s identification and permanence in the place. The noise present in a soundscape is able to have a high semantic content, and therefore the sound has a particular meaning for the perceiver. Every particular social group influences the production of their own sounds and how they perceive them. This allows to consider the sound as one of the factors that define the sense of "place" or "no place" of a certain urban space. Evidently the sounds, and their ability to evoke and characterize the environment, cannot be ignored in the construction and recovery of anthropological sites. This urban culture is unique and specific to every society. Thepublic spaces, with their soundscape, are part of the construction of the urban identity of a city. It is shown that for identical general sound levels present in each of the spaces, the level of annoyance or discomfort, in relation to the subjective acoustic quality, is different. This is the result of the influence of semantic content of the sounds present in each urban space. Coinciding with other similar research, the level of discomfort or annoyance decreases as the presence of natural sounds such as water, the wind in the trees or the birds singing increases, even when the objective values of noise level of natural sounds are higher.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Enterprise Application Integration (EAI) is a challenging area that is attracting growing attention from the software industry and the research community. A landscape of languages and techniques for EAI has emerged and is continuously being enriched with new proposals from different software vendors and coalitions. However, little or no effort has been dedicated to systematically evaluate and compare these languages and techniques. The work reported in this paper is a first step in this direction. It presents an in-depth analysis of a language, namely the Business Modeling Language, specifically developed for EAI. The framework used for this analysis is based on a number of workflow and communication patterns. This framework provides a basis for evaluating the advantages and drawbacks of EAI languages with respect to recurrent problems and situations.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

With the advent of Service Oriented Architecture, Web Services have gained tremendous popularity. Due to the availability of a large number of Web services, finding an appropriate Web service according to the requirement of the user is a challenge. This warrants the need to establish an effective and reliable process of Web service discovery. A considerable body of research has emerged to develop methods to improve the accuracy of Web service discovery to match the best service. The process of Web service discovery results in suggesting many individual services that partially fulfil the user’s interest. By considering the semantic relationships of words used in describing the services as well as the use of input and output parameters can lead to accurate Web service discovery. Appropriate linking of individual matched services should fully satisfy the requirements which the user is looking for. This research proposes to integrate a semantic model and a data mining technique to enhance the accuracy of Web service discovery. A novel three-phase Web service discovery methodology has been proposed. The first phase performs match-making to find semantically similar Web services for a user query. In order to perform semantic analysis on the content present in the Web service description language document, the support-based latent semantic kernel is constructed using an innovative concept of binning and merging on the large quantity of text documents covering diverse areas of domain of knowledge. The use of a generic latent semantic kernel constructed with a large number of terms helps to find the hidden meaning of the query terms which otherwise could not be found. Sometimes a single Web service is unable to fully satisfy the requirement of the user. In such cases, a composition of multiple inter-related Web services is presented to the user. The task of checking the possibility of linking multiple Web services is done in the second phase. Once the feasibility of linking Web services is checked, the objective is to provide the user with the best composition of Web services. In the link analysis phase, the Web services are modelled as nodes of a graph and an allpair shortest-path algorithm is applied to find the optimum path at the minimum cost for traversal. The third phase which is the system integration, integrates the results from the preceding two phases by using an original fusion algorithm in the fusion engine. Finally, the recommendation engine which is an integral part of the system integration phase makes the final recommendations including individual and composite Web services to the user. In order to evaluate the performance of the proposed method, extensive experimentation has been performed. Results of the proposed support-based semantic kernel method of Web service discovery are compared with the results of the standard keyword-based information-retrieval method and a clustering-based machine-learning method of Web service discovery. The proposed method outperforms both information-retrieval and machine-learning based methods. Experimental results and statistical analysis also show that the best Web services compositions are obtained by considering 10 to 15 Web services that are found in phase-I for linking. Empirical results also ascertain that the fusion engine boosts the accuracy of Web service discovery by combining the inputs from both the semantic analysis (phase-I) and the link analysis (phase-II) in a systematic fashion. Overall, the accuracy of Web service discovery with the proposed method shows a significant improvement over traditional discovery methods.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Cultural objects are increasingly generated and stored in digital form, yet effective methods for their indexing and retrieval still remain an important area of research. The main problem arises from the disconnection between the content-based indexing approach used by computer scientists and the description-based approach used by information scientists. There is also a lack of representational schemes that allow the alignment of the semantics and context with keywords and low-level features that can be automatically extracted from the content of these cultural objects. This paper presents an integrated approach to address these problems, taking advantage of both computer science and information science approaches. We firstly discuss the requirements from a number of perspectives: users, content providers, content managers and technical systems. We then present an overview of our system architecture and describe various techniques which underlie the major components of the system. These include: automatic object category detection; user-driven tagging; metadata transform and augmentation, and an expression language for digital cultural objects. In addition, we discuss our experience on testing and evaluating some existing collections, analyse the difficulties encountered and propose ways to address these problems.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Business practices vary from one company to another and business practices often need to be changed due to changes of business environments. To satisfy different business practices, enterprise systems need to be customized. To keep up with ongoing business practice changes, enterprise systems need to be adapted. Because of rigidity and complexity, the customization and adaption of enterprise systems often takes excessive time with potential failures and budget shortfall. Moreover, enterprise systems often drag business behind because they cannot be rapidly adapted to support business practice changes. Extensive literature has addressed this issue by identifying success or failure factors, implementation approaches, and project management strategies. Those efforts were aimed at learning lessons from post implementation experiences to help future projects. This research looks into this issue from a different angle. It attempts to address this issue by delivering a systematic method for developing flexible enterprise systems which can be easily tailored for different business practices or rapidly adapted when business practices change. First, this research examines the role of system models in the context of enterprise system development; and the relationship of system models with software programs in the contexts of computer aided software engineering (CASE), model driven architecture (MDA) and workflow management system (WfMS). Then, by applying the analogical reasoning method, this research initiates a concept of model driven enterprise systems. The novelty of model driven enterprise systems is that it extracts system models from software programs and makes system models able to stay independent of software programs. In the paradigm of model driven enterprise systems, system models act as instructors to guide and control the behavior of software programs. Software programs function by interpreting instructions in system models. This mechanism exposes the opportunity to tailor such a system by changing system models. To make this true, system models should be represented in a language which can be easily understood by human beings and can also be effectively interpreted by computers. In this research, various semantic representations are investigated to support model driven enterprise systems. The significance of this research is 1) the transplantation of the successful structure for flexibility in modern machines and WfMS to enterprise systems; and 2) the advancement of MDA by extending the role of system models from guiding system development to controlling system behaviors. This research contributes to the area relevant to enterprise systems from three perspectives: 1) a new paradigm of enterprise systems, in which enterprise systems consist of two essential elements: system models and software programs. These two elements are loosely coupled and can exist independently; 2) semantic representations, which can effectively represent business entities, entity relationships, business logic and information processing logic in a semantic manner. Semantic representations are the key enabling techniques of model driven enterprise systems; and 3) a brand new role of system models; traditionally the role of system models is to guide developers to write system source code. This research promotes the role of system models to control the behaviors of enterprise.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this study, a discussion of the fluid dynamics in the attic space is reported, focusing on its transient response to sudden and linear changes of temperature along the two inclined walls. The transient behaviour of an attic space is relevant to our daily life. The instantaneous and non-instantaneous (ramp) heating boundary condition is applied on the sloping walls of the attic space. A theoretical understanding of the transient behaviour of the flow in the enclosure is performed through scaling analysis. A proper identification of the timescales, the velocity and the thickness relevant to the flow that develops inside the cavity makes it possible to predict theoretically the basic flow features that will survive once the thermal flow in the enclosure reaches a steady state. A time scale for the heating-up of the whole cavity together with the heat transfer scales through the inclined walls has also been obtained through scaling analysis. All scales are verified by the numerical simulations.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper we describe a body of work aimed at extending the reach of mobile navigation and mapping. We describe how running topological and metric mapping and pose estimation processes concurrently, using vision and laser ranging, has produced a full six-degree-of-freedom outdoor navigation system. It is capable of producing intricate three-dimensional maps over many kilometers and in real time. We consider issues concerning the intrinsic quality of the built maps and describe our progress towards adding semantic labels to maps via scene de-construction and labeling. We show how our choices of representation, inference methods and use of both topological and metric techniques naturally allow us to fuse maps built from multiple sessions with no need for manual frame alignment or data association.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Two decades after its inception, Latent Semantic Analysis(LSA) has become part and parcel of every modern introduction to Information Retrieval. For any tool that matures so quickly, it is important to check its lore and limitations, or else stagnation will set in. We focus here on the three main aspects of LSA that are well accepted, and the gist of which can be summarized as follows: (1) that LSA recovers latent semantic factors underlying the document space, (2) that such can be accomplished through lossy compression of the document space by eliminating lexical noise, and (3) that the latter can best be achieved by Singular Value Decomposition. For each aspect we performed experiments analogous to those reported in the LSA literature and compared the evidence brought to bear in each case. On the negative side, we show that the above claims about LSA are much more limited than commonly believed. Even a simple example may show that LSA does not recover the optimal semantic factors as intended in the pedagogical example used in many LSA publications. Additionally, and remarkably deviating from LSA lore, LSA does not scale up well: the larger the document space, the more unlikely that LSA recovers an optimal set of semantic factors. On the positive side, we describe new algorithms to replace LSA (and more recent alternatives as pLSA, LDA, and kernel methods) by trading its l2 space for an l1 space, thereby guaranteeing an optimal set of semantic factors. These algorithms seem to salvage the spirit of LSA as we think it was initially conceived.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper demonstrates an experimental study that examines the accuracy of various information retrieval techniques for Web service discovery. The main goal of this research is to evaluate algorithms for semantic web service discovery. The evaluation is comprehensively benchmarked using more than 1,700 real-world WSDL documents from INEX 2010 Web Service Discovery Track dataset. For automatic search, we successfully use Latent Semantic Analysis and BM25 to perform Web service discovery. Moreover, we provide linking analysis which automatically links possible atomic Web services to meet the complex requirements of users. Our fusion engine recommends a final result to users. Our experiments show that linking analysis can improve the overall performance of Web service discovery. We also find that keyword-based search can quickly return results but it has limitation of understanding users’ goals.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Heat transfer through an attic space into or out of buildings is an important issue for attic-shaped houses in both hot and cold climates. One of the important objectives for design and construction of houses is to provide thermal comfort for occupants. In the present energy-conscious society, it is also a requirement for houses to be energy efficient, i.e. the energy consumption for heating or air-conditioning houses must be minimized. Relevant to these objectives, research into heat transfer in attics has been conducted for about three decades. The transient behaviour of an attic space is directly relevant to our daily life. Certain periods of the day or night may be considered as having a constant ambient temperature (e.g. during 11am - 2pm or 11pm - 2am). However, at other times during the day or night the ambient temperature changes with time (e.g. between 5am - 9am or 5pm - 9pm). Therefore, the analysis of steady state solution is not sufficient to describe the fluid flow and heat transfer in the attic space. The discussion of the transient development of the boundary is required. A theoretical understanding of the transient behaviour of the flow in the enclosure is performed through scaling analysis for sudden and ramp heating conditions. A proper identification of the timescales, the velocity and the thickness relevant to the flow that develops inside the cavity makes it possible to predict theoretically the basic flow features that will survive once the thermal flow in the enclosure reaches a steady state. Those scaling predictions have been verified by a series of numerical simulations.