27 resultados para latent semantic analysis


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Simplification of texts has traditionally been carried out by replacing words and structures with appropriate semantic equivalents in the learner's interlanguage, omitting whichever items prove intractable, and thereby bringing the language of the original within the scope of the learner's transitional linguistic competence. This kind of simplification focuses mainly on the formal features of language. The simplifier can, on the other hand, concentrate on making explicit the propositional content and its presentation in the original in order to bring what is communicated in the original within the scope of the learner's transitional communicative competence. In this case, simplification focuses on the communicative function of the language. Up to now, however, approaches to the problem of simplification have been mainly concerned with the first kind, using the simplifier’s intuition as to what constitutes difficulty for the learner. There appear to be few objective principles underlying this process. The main aim of this study is to investigate the effect of simplification on the communicative aspects of narrative texts, which includes the manner in which narrative units at higher levels of organisation are structured and presented and also the temporal and logical relationships between lower level structures such as sentences/clauses, with the intention of establishing an objective approach to the problem of simplification based on a set of principled procedures which could be used as a guideline in the simplification of material for foreign students at an advanced level.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a series of experiments, we tested category-specific activation in normal parti¬cipants using magnetoencephalography (MEG). Our experiments explored the temporal processing of objects, as MEG characterises neural activity on the order of milliseconds. Our experiments explored object-processing, including assessing the time-course of ob¬ject naming, early differences in processing living compared with nonliving objects and processing objects at the basic compared with the domain level, and late differences in processing living compared with nonliving objects and processing objects at the basic compared with the domain level. In addition to studies using normal participants, we also utilised MEG to explore category-specific processing in a patient with a deficit for living objects. Our findings support the cascade model of object naming (Humphreys et al., 1988). In addition, our findings using normal participants demonstrate early, category-specific perceptual differences. These findings are corroborated by our patient study. In our assessment of the time-course of category-specific effects as well as a separate analysis designed to measure semantic differences between living and nonliving objects, we found support for the sensory/motor model of object naming (Martin, 1998), in addition to support for the cascade model of object naming. Thus, object processing in normal participants appears to be served by a distributed network in the brain, and there are both perceptual and semantic differences between living and nonliving objects. A separate study assessing the influence of the level at which you are asked to identify an object on processing in the brain found evidence supporting the convergence zone hypothesis (Damasio, 1989). Taken together, these findings indicate the utility of MEG in exploring the time-course of object processing, isolating early perceptual and later semantic effects within the brain.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The topic of this thesis is the development of knowledge based statistical software. The shortcomings of conventional statistical packages are discussed to illustrate the need to develop software which is able to exhibit a greater degree of statistical expertise, thereby reducing the misuse of statistical methods by those not well versed in the art of statistical analysis. Some of the issues involved in the development of knowledge based software are presented and a review is given of some of the systems that have been developed so far. The majority of these have moved away from conventional architectures by adopting what can be termed an expert systems approach. The thesis then proposes an approach which is based upon the concept of semantic modelling. By representing some of the semantic meaning of data, it is conceived that a system could examine a request to apply a statistical technique and check if the use of the chosen technique was semantically sound, i.e. will the results obtained be meaningful. Current systems, in contrast, can only perform what can be considered as syntactic checks. The prototype system that has been implemented to explore the feasibility of such an approach is presented, the system has been designed as an enhanced variant of a conventional style statistical package. This involved developing a semantic data model to represent some of the statistically relevant knowledge about data and identifying sets of requirements that should be met for the application of the statistical techniques to be valid. Those areas of statistics covered in the prototype are measures of association and tests of location.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis presents the results from an investigation into the merits of analysing Magnetoencephalographic (MEG) data in the context of dynamical systems theory. MEG is the study of both the methods for the measurement of minute magnetic flux variations at the scalp, resulting from neuro-electric activity in the neocortex, as well as the techniques required to process and extract useful information from these measurements. As a result of its unique mode of action - by directly measuring neuronal activity via the resulting magnetic field fluctuations - MEG possesses a number of useful qualities which could potentially make it a powerful addition to any brain researcher's arsenal. Unfortunately, MEG research has so far failed to fulfil its early promise, being hindered in its progress by a variety of factors. Conventionally, the analysis of MEG has been dominated by the search for activity in certain spectral bands - the so-called alpha, delta, beta, etc that are commonly referred to in both academic and lay publications. Other efforts have centred upon generating optimal fits of "equivalent current dipoles" that best explain the observed field distribution. Many of these approaches carry the implicit assumption that the dynamics which result in the observed time series are linear. This is despite a variety of reasons which suggest that nonlinearity might be present in MEG recordings. By using methods that allow for nonlinear dynamics, the research described in this thesis avoids these restrictive linearity assumptions. A crucial concept underpinning this project is the belief that MEG recordings are mere observations of the evolution of the true underlying state, which is unobservable and is assumed to reflect some abstract brain cognitive state. Further, we maintain that it is unreasonable to expect these processes to be adequately described in the traditional way: as a linear sum of a large number of frequency generators. One of the main objectives of this thesis will be to prove that much more effective and powerful analysis of MEG can be achieved if one were to assume the presence of both linear and nonlinear characteristics from the outset. Our position is that the combined action of a relatively small number of these generators, coupled with external and dynamic noise sources, is more than sufficient to account for the complexity observed in the MEG recordings. Another problem that has plagued MEG researchers is the extremely low signal to noise ratios that are obtained. As the magnetic flux variations resulting from actual cortical processes can be extremely minute, the measuring devices used in MEG are, necessarily, extremely sensitive. The unfortunate side-effect of this is that even commonplace phenomena such as the earth's geomagnetic field can easily swamp signals of interest. This problem is commonly addressed by averaging over a large number of recordings. However, this has a number of notable drawbacks. In particular, it is difficult to synchronise high frequency activity which might be of interest, and often these signals will be cancelled out by the averaging process. Other problems that have been encountered are high costs and low portability of state-of-the- art multichannel machines. The result of this is that the use of MEG has, hitherto, been restricted to large institutions which are able to afford the high costs associated with the procurement and maintenance of these machines. In this project, we seek to address these issues by working almost exclusively with single channel, unaveraged MEG data. We demonstrate the applicability of a variety of methods originating from the fields of signal processing, dynamical systems, information theory and neural networks, to the analysis of MEG data. It is noteworthy that while modern signal processing tools such as independent component analysis, topographic maps and latent variable modelling have enjoyed extensive success in a variety of research areas from financial time series modelling to the analysis of sun spot activity, their use in MEG analysis has thus far been extremely limited. It is hoped that this work will help to remedy this oversight.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Component-based development (CBD) has become an important emerging topic in the software engineering field. It promises long-sought-after benefits such as increased software reuse, reduced development time to market and, hence, reduced software production cost. Despite the huge potential, the lack of reasoning support and development environment of component modeling and verification may hinder its development. Methods and tools that can support component model analysis are highly appreciated by industry. Such a tool support should be fully automated as well as efficient. At the same time, the reasoning tool should scale up well as it may need to handle hundreds or even thousands of components that a modern software system may have. Furthermore, a distributed environment that can effectively manage and compose components is also desirable. In this paper, we present an approach to the modeling and verification of a newly proposed component model using Semantic Web languages and their reasoning tools. We use the Web Ontology Language and the Semantic Web Rule Language to precisely capture the inter-relationships and constraints among the entities in a component model. Semantic Web reasoning tools are deployed to perform automated analysis support of the component models. Moreover, we also proposed a service-oriented architecture (SOA)-based semantic web environment for CBD. The adoption of Semantic Web services and SOA make our component environment more reusable, scalable, dynamic and adaptive.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Structural analysis in handwritten mathematical expressions focuses on interpreting the recognized symbols using geometrical information such as relative sizes and positions of the symbols. Most existing approaches rely on hand-crafted grammar rules to identify semantic relationships among the recognized mathematical symbols. They could easily fail when writing errors occurred. Moreover, they assume the availability of the whole mathematical expression before being able to analyze the semantic information of the expression. To tackle these problems, we propose a progressive structural analysis (PSA) approach for dynamic recognition of handwritten mathematical expressions. The proposed PSA approach is able to provide analysis result immediately after each written input symbol. This has an advantage that users are able to detect any recognition errors immediately and correct only the mis-recognized symbols rather than the whole expression. Experiments conducted on 57 most commonly used mathematical expressions have shown that the PSA approach is able to achieve very good performance results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Previous research into formulaic language has focussed on specialised groups of people (e.g. L1 acquisition by infants and adult L2 acquisition) with ordinary adult native speakers of English receiving less attention. Additionally, whilst some features of formulaic language have been used as evidence of authorship (e.g. the Unabomber’s use of you can’t eat your cake and have it too) there has been no systematic investigation into this as a potential marker of authorship. This thesis reports the first full-scale study into the use of formulaic sequences by individual authors. The theory of formulaic language hypothesises that formulaic sequences contained in the mental lexicon are shaped by experience combined with what each individual has found to be communicatively effective. Each author’s repertoire of formulaic sequences should therefore differ. To test this assertion, three automated approaches to the identification of formulaic sequences are tested on a specially constructed corpus containing 100 short narratives. The first approach explores a limited subset of formulaic sequences using recurrence across a series of texts as the criterion for identification. The second approach focuses on a word which frequently occurs as part of formulaic sequences and also investigates alternative non-formulaic realisations of the same semantic content. Finally, a reference list approach is used. Whilst claiming authority for any reference list can be difficult, the proposed method utilises internet examples derived from lists prepared by others, a procedure which, it is argued, is akin to asking large groups of judges to reach consensus about what is formulaic. The empirical evidence supports the notion that formulaic sequences have potential as a marker of authorship since in some cases a Questioned Document was correctly attributed. Although this marker of authorship is not universally applicable, it does promise to become a viable new tool in the forensic linguist’s tool-kit.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet Allocation (LDA), called joint sentiment/topic model (JST), which detects sentiment and topic simultaneously from text. Unlike other machine learning approaches to sentiment classification which often require labeled corpora for classifier training, the proposed JST model is fully unsupervised. The model has been evaluated on the movie review dataset to classify the review sentiment polarity and minimum prior information have also been explored to further improve the sentiment classification accuracy. Preliminary experiments have shown promising results achieved by JST.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

While much of a company's knowledge can be found in text repositories, current content management systems have limited capabilities for structuring and interpreting documents. In the emerging Semantic Web, search, interpretation and aggregation can be addressed by ontology-based semantic mark-up. In this paper, we examine semantic annotation, identify a number of requirements, and review the current generation of semantic annotation systems. This analysis shows that, while there is still some way to go before semantic annotation tools will be able to address fully all the knowledge management needs, research in the area is active and making good progress.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we explore the idea of social role theory (SRT) and propose a novel regularized topic model which incorporates SRT into the generative process of social media content. We assume that a user can play multiple social roles, and each social role serves to fulfil different duties and is associated with a role-driven distribution over latent topics. In particular, we focus on social roles corresponding to the most common social activities on social networks. Our model is instantiated on microblogs, i.e., Twitter and community question-answering (cQA), i.e., Yahoo! Answers, where social roles on Twitter include "originators" and "propagators", and roles on cQA are "askers" and "answerers". Both explicit and implicit interactions between users are taken into account and modeled as regularization factors. To evaluate the performance of our proposed method, we have conducted extensive experiments on two Twitter datasets and two cQA datasets. Furthermore, we also consider multi-role modeling for scientific papers where an author's research expertise area is considered as a social role. A novel application of detecting users' research interests through topical keyword labeling based on the results of our multi-role model has been presented. The evaluation results have shown the feasibility and effectiveness of our model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Sentiment lexicons for sentiment analysis offer a simple, yet effective way to obtain the prior sentiment information of opinionated words in texts. However, words’ sentiment orientations and strengths often change throughout various contexts in which the words appear. In this paper, we propose a lexicon adaptation approach that uses the contextual semantics of words to capture their contexts in tweet messages and update their prior sentiment orientations and/or strengths accordingly. We evaluate our approach on one state-of-the-art sentiment lexicon using three different Twitter datasets. Results show that the sentiment lexicons adapted by our approach outperform the original lexicon in accuracy and F-measure in two datasets, but give similar accuracy and slightly lower F-measure in one dataset.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Center for Epidemiologic Studies-Depression Scale (CES-D) is the most frequently used scale for measuring depressive symptomatology in caregiving research. The aim of this study is to test its construct structure and measurement equivalence between caregivers from two Spanish-speaking countries. Face-to-face interviews were carried out with 595 female dementia caregivers from Madrid, Spain, and from Coahuila, Mexico. The structure of the CES-D was analyzed using exploratory and confirmatory factor analysis (EFA and CFA, respectively). Measurement invariance across samples was analyzed comparing a baseline model with a more restrictive model. Significant differences between means were found for 7 items. The results of the EFA clearly supported a four-factor solution. The CFA for the whole sample with the four factors revealed high and statistically significant loading coefficients for all items (except item number 4). When equality constraints were imposed to test for the invariance between countries, the change in chi-square was significant, indicating that complete invariance could not be assumed. Significant between-countries differences were found for three of the four latent factor mean scores. Although the results provide general support for the original four-factor structure, caution should be exercised on reporting comparisons of depression scores between Spanish-speaking countries.