940 resultados para Semantic Analysis
Resumo:
With the advent of Service Oriented Architecture, Web Services have gained tremendous popularity. Due to the availability of a large number of Web services, finding an appropriate Web service according to the requirement of the user is a challenge. This warrants the need to establish an effective and reliable process of Web service discovery. A considerable body of research has emerged to develop methods to improve the accuracy of Web service discovery to match the best service. The process of Web service discovery results in suggesting many individual services that partially fulfil the user’s interest. By considering the semantic relationships of words used in describing the services as well as the use of input and output parameters can lead to accurate Web service discovery. Appropriate linking of individual matched services should fully satisfy the requirements which the user is looking for. This research proposes to integrate a semantic model and a data mining technique to enhance the accuracy of Web service discovery. A novel three-phase Web service discovery methodology has been proposed. The first phase performs match-making to find semantically similar Web services for a user query. In order to perform semantic analysis on the content present in the Web service description language document, the support-based latent semantic kernel is constructed using an innovative concept of binning and merging on the large quantity of text documents covering diverse areas of domain of knowledge. The use of a generic latent semantic kernel constructed with a large number of terms helps to find the hidden meaning of the query terms which otherwise could not be found. Sometimes a single Web service is unable to fully satisfy the requirement of the user. In such cases, a composition of multiple inter-related Web services is presented to the user. The task of checking the possibility of linking multiple Web services is done in the second phase. Once the feasibility of linking Web services is checked, the objective is to provide the user with the best composition of Web services. In the link analysis phase, the Web services are modelled as nodes of a graph and an allpair shortest-path algorithm is applied to find the optimum path at the minimum cost for traversal. The third phase which is the system integration, integrates the results from the preceding two phases by using an original fusion algorithm in the fusion engine. Finally, the recommendation engine which is an integral part of the system integration phase makes the final recommendations including individual and composite Web services to the user. In order to evaluate the performance of the proposed method, extensive experimentation has been performed. Results of the proposed support-based semantic kernel method of Web service discovery are compared with the results of the standard keyword-based information-retrieval method and a clustering-based machine-learning method of Web service discovery. The proposed method outperforms both information-retrieval and machine-learning based methods. Experimental results and statistical analysis also show that the best Web services compositions are obtained by considering 10 to 15 Web services that are found in phase-I for linking. Empirical results also ascertain that the fusion engine boosts the accuracy of Web service discovery by combining the inputs from both the semantic analysis (phase-I) and the link analysis (phase-II) in a systematic fashion. Overall, the accuracy of Web service discovery with the proposed method shows a significant improvement over traditional discovery methods.
Resumo:
Unstructured text data, such as emails, blogs, contracts, academic publications, organizational documents, transcribed interviews, and even tweets, are important sources of data in Information Systems research. Various forms of qualitative analysis of the content of these data exist and have revealed important insights. Yet, to date, these analyses have been hampered by limitations of human coding of large data sets, and by bias due to human interpretation. In this paper, we compare and combine two quantitative analysis techniques to demonstrate the capabilities of computational analysis for content analysis of unstructured text. Specifically, we seek to demonstrate how two quantitative analytic methods, viz., Latent Semantic Analysis and data mining, can aid researchers in revealing core content topic areas in large (or small) data sets, and in visualizing how these concepts evolve, migrate, converge or diverge over time. We exemplify the complementary application of these techniques through an examination of a 25-year sample of abstracts from selected journals in Information Systems, Management, and Accounting disciplines. Through this work, we explore the capabilities of two computational techniques, and show how these techniques can be used to gather insights from a large corpus of unstructured text.
Resumo:
The broadcast soccer video is usually recorded by one main camera, which is constantly gazing somewhere of playfield where a highlight event is happening. So the camera parameters and their variety have close relationship with semantic information of soccer video, and much interest has been caught in camera calibration for soccer video. The previous calibration methods either deal with goal scene, or have strict calibration conditions and high complexity. So, it does not properly handle the non-goal scene such as midfield or center-forward scene. In this paper, based on a new soccer field model, a field symbol extraction algorithm is proposed to extract the calibration information. Then a two-stage calibration approach is developed which can calibrate camera not only for goal scene but also for non-goal scene. The preliminary experimental results demonstrate its robustness and accuracy. (c) 2010 Elsevier B.V. All rights reserved.
Turning the tide: A critique of Natural Semantic Metalanguage from a translation studies perspective
Resumo:
Starting from the premise that human communication is predicated on translational phenomena, this paper applies theoretical insights and practical findings from Translation Studies to a critique of Natural Semantic Metalanguage (NSM), a theory of semantic analysis developed by Anna Wierzbicka. Key tenets of NSM, i.e. (1) culture-specificity of complex concepts; (2) the existence of a small set of universal semantic primes; and (3) definition by reductive paraphrase, are discussed critically with reference to the notions of untranslatability, equivalence, and intra-lingual translation, respectively. It is argued that a broad spectrum of research and theoretical reflection in Translation Studies may successfully feed into the study of cognition, meaning, language, and communication. The interdisciplinary exchange between Translation Studies and linguistics may be properly balanced, with the former not only being informed by but also informing and interrogating the latter.
Resumo:
Semiotics is the study of signs. Application of semiotics in information systems design is based on the notion that information systems are organizations within which agents deploy signs in the form of actions according to a set of norms. An analysis of the relationships among the agents, their actions and the norms would give a better specification of the system. Distributed multimedia systems (DMMS) could be viewed as a system consisted of many dynamic, self-controlled normative agents engaging in complex interaction and processing of multimedia information. This paper reports the work of applying the semiotic approach to the design and modeling of DMMS, with emphasis on using semantic analysis under the semiotic framework. A semantic model of DMMS describing various components and their ontological dependencies is presented, which then serves as a design model and implemented in a semantic database. Benefits of using the semantic database are discussed with reference to various design scenarios.
Resumo:
Grigorij Kreidlin (Russia). A Comparative Study of Two Semantic Systems: Body Russian and Russian Phraseology. Mr. Kreidlin teaches in the Department of Theoretical and Applied Linguistics of the State University of Humanities in Moscow and worked on this project from August 1996 to July 1998. The classical approach to non-verbal and verbal oral communication is based on a traditional separation of body and mind. Linguists studied words and phrasemes, the products of mind activities, while gestures, facial expressions, postures and other forms of body language were left to anthropologists, psychologists, physiologists, and indeed to anyone but linguists. Only recently have linguists begun to turn their attention to gestures and semiotic and cognitive paradigms are now appearing that raise the question of designing an integral model for the unified description of non-verbal and verbal communicative behaviour. This project attempted to elaborate lexical and semantic fragments of such a model, producing a co-ordinated semantic description of the main Russian gestures (including gestures proper, postures and facial expressions) and their natural language analogues. The concept of emblematic gestures and gestural phrasemes and of their semantic links permitted an appropriate description of the transformation of a body as a purely physical substance into a body as a carrier of essential attributes of Russian culture - the semiotic process called the culturalisation of the human body. Here the human body embodies a system of cultural values and displays them in a text within the area of phraseology and some other important language domains. The goal of this research was to develop a theory that would account for the fundamental peculiarities of the process. The model proposed is based on the unified lexicographic representation of verbal and non-verbal units in the Dictionary of Russian Gestures, which the Mr. Kreidlin had earlier complied in collaboration with a group of his students. The Dictionary was originally oriented only towards reflecting how the lexical competence of Russian body language is represented in the Russian mind. Now a special type of phraseological zone has been designed to reflect explicitly semantic relationships between the gestures in the entries and phrasemes and to provide the necessary information for a detailed description of these. All the definitions, rules of usage and the established correlations are written in a semantic meta-language. Several classes of Russian gestural phrasemes were identified, including those phrasemes and idioms with semantic definitions close to those of the corresponding gestures, those phraseological units that have lost touch with the related gestures (although etymologically they are derived from gestures that have gone out of use), and phrasemes and idioms which have semantic traces or reflexes inherited from the meaning of the related gestures. The basic assumptions and practical considerations underlying the work were as follows. (1) To compare meanings one has to be able to state them. To state the meaning of a gesture or a phraseological expression, one needs a formal semantic meta-language of propositional character that represents the cognitive and mental aspects of the codes. (2) The semantic contrastive analysis of any semiotic codes used in person-to-person communication also requires a single semantic meta-language, i.e. a formal semantic language of description,. This language must be as linguistically and culturally independent as possible and yet must be open to interpretation through any culture and code. Another possible method of conducting comparative verbal-non-verbal semantic research is to work with different semantic meta-languages and semantic nets and to learn how to combine them, translate from one to another, etc. in order to reach a common basis for the subsequent comparison of units. (3) The practical work in defining phraseological units and organising the phraseological zone in the Dictionary of Russian Gestures unexpectedly showed that semantic links between gestures and gestural phrasemes are reflected not only in common semantic elements and syntactic structure of semantic propositions, but also in general and partial cognitive operations that are made over semantic definitions. (4) In comparative semantic analysis one should take into account different values and roles of inner form and image components in the semantic representation of non-verbal and verbal units. (5) For the most part, gestural phrasemes are direct semantic derivatives of gestures. The cognitive and formal techniques can be regarded as typological features for the future functional-semantic classification of gestural phrasemes: two phrasemes whose meaning can be obtained by the same cognitive or purely syntactic operations (or types of operations) over the meanings of the corresponding gestures, belong by definition to one and the same class. The nature of many cognitive operations has not been studied well so far, but the first steps towards its comprehension and description have been taken. The research identified 25 logically possible classes of relationships between a gesture and a gestural phraseme. The calculation is based on theoretically possible formal (set-theory) correlations between signifiers and signified of the non-verbal and verbal units. However, in order to examine which of them are realised in practice a complete semantic and lexicographic description of all (not only central) everyday emblems and gestural phrasemes is required and this unfortunately does not yet exist. Mr. Kreidlin suggests that the results of the comparative analysis of verbal and non-verbal units could also be used in other research areas such as the lexicography of emotions.
Resumo:
A semantic approach towards political conflict first emerged in the 1930s and provides the methodological foundations for the description of political conflicts, in particular as the correlation between the language of description and reality. Any military or political confrontation presupposes axiological, conceptual and ideological confrontation. The form of adequate description can only be comprehended if the characteristic features of its language (structure) and thesaurus are revealed. Admitting the possibility of different descriptions implies the necessity of analysing this possible ambiguity, i.e. the characteristic features of the language which enable us to form various statements, including mutually exclusive ones. The insoluble task of finding a middle ground between the viewpoints of the conflicting parties should be replaced by soluble procedures of explaining and assessing the conflicting axiologies. For the description of conflict situations, when it is essential to represent various positions within a uniform system, an apparatus of model semantics seems to be the most appropriate one both for generating alternatives and for bringing them together in a modal system of a world in which procedures of transition from one world to another (i.e. the transworld compatibility between them) are also reflected. Reality is reconstructed not as a sort of middle ground between the mutually exclusive approaches nor as their sum, but as a result of the overlapping of various worlds and the procedures of transition from one state of affairs to another. The description of a conflict is therefore seen as a system of worlds connected by modal relations, with a system of worlds emerging as a reality to be described. This approach makes it possible to describe the processes from the points of view of the participating parties and, at the same time, to reveal their basic attitudes. The main idea of this research is shown by the problems analysed: the description of conflict as methodology; language and behaviour (general problems of semiotic description), the logico-semantic analysis of the notions of "problem and conflict", "Genesis and Chronology", "the recurrent model of the (historical) explanation and interpretation of the conflict". Zolyan used data on the Karabagh conflict to demonstrate the dependence of the structure of semio-cultural codes on current political development and considered post-soviet history as a semio-cultural problem. He sought to consider and reveal the logic of manipulations with history, and proposed the logic of preferences as a possible instrument for achieving compromise.
Resumo:
OBJECTIVE: To characterize PubMed usage over a typical day and compare it to previous studies of user behavior on Web search engines. DESIGN: We performed a lexical and semantic analysis of 2,689,166 queries issued on PubMed over 24 consecutive hours on a typical day. MEASUREMENTS: We measured the number of queries, number of distinct users, queries per user, terms per query, common terms, Boolean operator use, common phrases, result set size, MeSH categories, used semantic measurements to group queries into sessions, and studied the addition and removal of terms from consecutive queries to gauge search strategies. RESULTS: The size of the result sets from a sample of queries showed a bimodal distribution, with peaks at approximately 3 and 100 results, suggesting that a large group of queries was tightly focused and another was broad. Like Web search engine sessions, most PubMed sessions consisted of a single query. However, PubMed queries contained more terms. CONCLUSION: PubMed's usage profile should be considered when educating users, building user interfaces, and developing future biomedical information retrieval systems.
Resumo:
Sensor network deployments have become a primary source of big data about the real world that surrounds us, measuring a wide range of physical properties in real time. With such large amounts of heterogeneous data, a key challenge is to describe and annotate sensor data with high-level metadata, using and extending models, for instance with ontologies. However, to automate this task there is a need for enriching the sensor metadata using the actual observed measurements and extracting useful meta-information from them. This paper proposes a novel approach of characterization and extraction of semantic metadata through the analysis of sensor data raw observations. This approach consists in using approximations to represent the raw sensor measurements, based on distributions of the observation slopes, building a classi?cation scheme to automatically infer sensor metadata like the type of observed property, integrating the semantic analysis results with existing sensor networks metadata.
Resumo:
Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.
Resumo:
This dissertation research points out major challenging problems with current Knowledge Organization (KO) systems, such as subject gateways or web directories: (1) the current systems use traditional knowledge organization systems based on controlled vocabulary which is not very well suited to web resources, and (2) information is organized by professionals not by users, which means it does not reflect intuitively and instantaneously expressed users’ current needs. In order to explore users’ needs, I examined social tags which are user-generated uncontrolled vocabulary. As investment in professionally-developed subject gateways and web directories diminishes (support for both BUBL and Intute, examined in this study, is being discontinued), understanding characteristics of social tagging becomes even more critical. Several researchers have discussed social tagging behavior and its usefulness for classification or retrieval; however, further research is needed to qualitatively and quantitatively investigate social tagging in order to verify its quality and benefit. This research particularly examined the indexing consistency of social tagging in comparison to professional indexing to examine the quality and efficacy of tagging. The data analysis was divided into three phases: analysis of indexing consistency, analysis of tagging effectiveness, and analysis of tag attributes. Most indexing consistency studies have been conducted with a small number of professional indexers, and they tended to exclude users. Furthermore, the studies mainly have focused on physical library collections. This dissertation research bridged these gaps by (1) extending the scope of resources to various web documents indexed by users and (2) employing the Information Retrieval (IR) Vector Space Model (VSM) - based indexing consistency method since it is suitable for dealing with a large number of indexers. As a second phase, an analysis of tagging effectiveness with tagging exhaustivity and tag specificity was conducted to ameliorate the drawbacks of consistency analysis based on only the quantitative measures of vocabulary matching. Finally, to investigate tagging pattern and behaviors, a content analysis on tag attributes was conducted based on the FRBR model. The findings revealed that there was greater consistency over all subjects among taggers compared to that for two groups of professionals. The analysis of tagging exhaustivity and tag specificity in relation to tagging effectiveness was conducted to ameliorate difficulties associated with limitations in the analysis of indexing consistency based on only the quantitative measures of vocabulary matching. Examination of exhaustivity and specificity of social tags provided insights into particular characteristics of tagging behavior and its variation across subjects. To further investigate the quality of tags, a Latent Semantic Analysis (LSA) was conducted to determine to what extent tags are conceptually related to professionals’ keywords and it was found that tags of higher specificity tended to have a higher semantic relatedness to professionals’ keywords. This leads to the conclusion that the term’s power as a differentiator is related to its semantic relatedness to documents. The findings on tag attributes identified the important bibliographic attributes of tags beyond describing subjects or topics of a document. The findings also showed that tags have essential attributes matching those defined in FRBR. Furthermore, in terms of specific subject areas, the findings originally identified that taggers exhibited different tagging behaviors representing distinctive features and tendencies on web documents characterizing digital heterogeneous media resources. These results have led to the conclusion that there should be an increased awareness of diverse user needs by subject in order to improve metadata in practical applications. This dissertation research is the first necessary step to utilize social tagging in digital information organization by verifying the quality and efficacy of social tagging. This dissertation research combined both quantitative (statistics) and qualitative (content analysis using FRBR) approaches to vocabulary analysis of tags which provided a more complete examination of the quality of tags. Through the detailed analysis of tag properties undertaken in this dissertation, we have a clearer understanding of the extent to which social tagging can be used to replace (and in some cases to improve upon) professional indexing.
Resumo:
This paper demonstrates an experimental study that examines the accuracy of various information retrieval techniques for Web service discovery. The main goal of this research is to evaluate algorithms for semantic web service discovery. The evaluation is comprehensively benchmarked using more than 1,700 real-world WSDL documents from INEX 2010 Web Service Discovery Track dataset. For automatic search, we successfully use Latent Semantic Analysis and BM25 to perform Web service discovery. Moreover, we provide linking analysis which automatically links possible atomic Web services to meet the complex requirements of users. Our fusion engine recommends a final result to users. Our experiments show that linking analysis can improve the overall performance of Web service discovery. We also find that keyword-based search can quickly return results but it has limitation of understanding users’ goals.
Resumo:
Chatrooms, for example Internet Relay Chat, are generally multi-user, multi-channel and multiserver chat-systems which run over the Internet and provide a protocol for real-time text-based conferencing between users all over the world. While a well-trained human observer is able to understand who is chatting with whom, there are no efficient and accurate automated tools to determine the groups of users conversing with each other. A precursor to analysing evolving cyber-social phenomena is to first determine what the conversations are and which groups of chatters are involved in each conversation. We consider this problem in this paper. We propose an algorithm to discover all groups of users that are engaged in conversation. Our algorithms are based on a statistical model of a chatroom that is founded on our experience with real chatrooms. Our approach does not require any semantic analysis of the conversations, rather it is based purely on the statistical information contained in the sequence of posts. We improve the accuracy by applying some graph algorithms to clean the statistical information. We present some experimental results which indicate that one can automatically determine the conversing groups in a chatroom, purely on the basis of statistical analysis.
A tag-based personalized item recommendation system using tensor modeling and topic model approaches
Resumo:
This research falls in the area of enhancing the quality of tag-based item recommendation systems. It aims to achieve this by employing a multi-dimensional user profile approach and by analyzing the semantic aspects of tags. Tag-based recommender systems have two characteristics that need to be carefully studied in order to build a reliable system. Firstly, the multi-dimensional correlation, called as tag assignment
Resumo:
This article presents and evaluates a model to automatically derive word association networks from text corpora. Two aspects were evaluated: To what degree can corpus-based word association networks (CANs) approximate human word association networks with respect to (1) their ability to quantitatively predict word associations and (2) their structural network characteristics. Word association networks are the basis of the human mental lexicon. However, extracting such networks from human subjects is laborious, time consuming and thus necessarily limited in relation to the breadth of human vocabulary. Automatic derivation of word associations from text corpora would address these limitations. In both evaluations corpus-based processing provided vector representations for words. These representations were then employed to derive CANs using two measures: (1) the well known cosine metric, which is a symmetric measure, and (2) a new asymmetric measure computed from orthogonal vector projections. For both evaluations, the full set of 4068 free association networks (FANs) from the University of South Florida word association norms were used as baseline human data. Two corpus based models were benchmarked for comparison: a latent topic model and latent semantic analysis (LSA). We observed that CANs constructed using the asymmetric measure were slightly less effective than the topic model in quantitatively predicting free associates, and slightly better than LSA. The structural networks analysis revealed that CANs do approximate the FANs to an encouraging degree.