941 resultados para Semantic Errors
Resumo:
* Work done under partial support of Mexican Government (CONACyT, SNI), IPN (CGPI, COFAA) and Korean Government (KIPA Professorship for Visiting Faculty Positions). The second author is currently on Sabbatical leave at Chung-Ang University.
Resumo:
In a system where tens of thousands of words are made up of a limited number of phonemes, many words are bound to sound alike. This similarity of the words in the lexicon as characterized by phonological neighbourhood density (PhND) has been shown to affect speed and accuracy of word comprehension and production. Whereas there is a consensus about the interfering nature of neighbourhood effects in comprehension, the language production literature offers a more contradictory picture with mainly facilitatory but also interfering effects reported on word production. Here we report both of these two types of effects in the same study. Multiple regression mixed models analyses were conducted on PhND effects on errors produced in a naming task by a group of 21 participants with aphasia. These participants produced more formal errors (interfering effect) for words in dense phonological neighbourhoods, but produced fewer nonwords and semantic errors (a facilitatory effect) with increasing density. In order to investigate the nature of these opposite effects of PhND, we further analysed a subset of formal errors and nonword errors by distinguishing errors differing on a single phoneme from the target (corresponding to the definition of phonological neighbours) from those differing on two or more phonemes. This analysis confirmed that only formal errors that were phonological neighbours of the target increased in dense neighbourhoods, while all other errors decreased. Based on additional observations favouring a lexical origin of these formal errors (they exceeded the probability of producing a real-word error by chance, were of a higher frequency, and preserved the grammatical category of the targets), we suggest that the interfering effect of PhND is due to competition between lexical neighbours and target words in dense neighbourhoods.
Resumo:
Cette recherche vise à décrire 1) les erreurs lexicales commises en production écrite par des élèves francophones de 3e secondaire et 2) le rapport à l’erreur lexicale d’enseignants de français (conception de l’erreur lexicale, pratiques d’évaluation du vocabulaire en production écrite, modes de rétroaction aux erreurs lexicales). Le premier volet de la recherche consiste en une analyse d’erreurs à trois niveaux : 1) une description linguistique des erreurs à l’aide d’une typologie, 2) une évaluation de la gravité des erreurs et 3) une explication de leurs sources possibles. Le corpus analysé est constitué de 300 textes rédigés en classe de français par des élèves de 3e secondaire. L’analyse a révélé 1144 erreurs lexicales. Les plus fréquentes sont les problèmes sémantiques (30%), les erreurs liées aux propriétés morphosyntaxiques des unités lexicales (21%) et l’utilisation de termes familiers (17%). Cette répartition démontre que la moitié des erreurs lexicales sont attribuables à une méconnaissance de propriétés des mots autres que le sens et la forme. L’évaluation de la gravité des erreurs repose sur trois critères : leur acceptation linguistique selon les dictionnaires, leur impact sur la compréhension et leur degré d’intégration à l’usage. Les problèmes liés aux registres de langue sont généralement ceux qui sont considérés comme les moins graves et les erreurs sémantiques représentent la quasi-totalité des erreurs graves. Le troisième axe d’analyse concerne la source des erreurs et fait ressortir trois sources principales : l’influence de la langue orale, la proximité sémantique et la parenté formelle entre le mot utilisé et celui visé. Le second volet de la thèse concerne le rapport des enseignants de français à l’erreur lexicale et repose sur l’analyse de 224 rédactions corrigées ainsi que sur une série de huit entrevues menées avec des enseignants de 3e secondaire. Lors de la correction, les enseignants relèvent surtout les erreurs orthographiques ainsi que celles relevant des propriétés morphosyntaxiques des mots (genre, invariabilité, régime), qu’ils classent parmi les erreurs de grammaire. Les erreurs plus purement lexicales, c’est-à-dire les erreurs sémantiques, l’emploi de termes familiers et les erreurs de collocation, demeurent peu relevées, et les annotations des enseignants concernant ces types d’erreurs sont vagues et peu systématiques, donnant peu de pistes aux élèves pour la correction. L’évaluation du vocabulaire en production écrite est toujours soumise à une appréciation qualitative, qui repose sur l’impression générale des enseignants plutôt que sur des critères précis, le seul indicateur clair étant la répétition. Les explications des enseignants concernant les erreurs lexicales reposent beaucoup sur l’intuition, ce qui témoigne de certaines lacunes dans leur formation en lien avec le vocabulaire. Les enseignants admettent enseigner très peu le vocabulaire en classe au secondaire et expliquent ce choix par le manque de temps et d’outils adéquats. L’enseignement du vocabulaire est toujours subordonné à des tâches d’écriture ou de lecture et vise davantage l’acquisition de mots précis que le développement d’une réelle compétence lexicale.
Resumo:
The schema of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. Obtaining quickly the appropriate data increases the likelihood that an organization will make good decisions and respond adeptly to challenges. This research presents and validates a methodology for evaluating, ex ante, the relative desirability of alternative instantiations of a model of data. In contrast to prior research, each instantiation is based on a different formal theory. This research theorizes that the instantiation that yields the lowest weighted average query complexity for a representative sample of information requests is the most desirable instantiation for end-user queries. The theory was validated by an experiment that compared end-user performance using an instantiation of a data structure based on the relational model of data with performance using the corresponding instantiation of the data structure based on the object-relational model of data. Complexity was measured using three different Halstead metrics: program length, difficulty, and effort. For a representative sample of queries, the average complexity using each instantiation was calculated. As theorized, end users querying the instantiation with the lower average complexity made fewer semantic errors, i.e., were more effective at composing queries. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
Semantic data models provide a map of the components of an information system. The characteristics of these models affect their usefulness for various tasks (e.g., information retrieval). The quality of information retrieval has obvious important consequences, both economic and otherwise. Traditionally, data base designers have produced parsimonious logical data models. In spite of their increased size, ontologically clearer conceptual models have been shown to facilitate better performance for both problem solving and information retrieval tasks in experimental settings. The experiments producing evidence of enhanced performance for ontologically clearer models have, however, used application domains of modest size. Data models in organizational settings are likely to be substantially larger than those used in these experiments. This research used an experiment to investigate whether the benefits of improved information retrieval performance associated with ontologically clearer models are robust as the size of the application domains increase. The experiment used an application domain of approximately twice the size as tested in prior experiments. The results indicate that, relative to the users of the parsimonious implementation, end users of the ontologically clearer implementation made significantly more semantic errors, took significantly more time to compose their queries, and were significantly less confident in the accuracy of their queries.
Resumo:
We describe the case of a dysgraphic aphasic individual-S.G.W.-who, in writing to dictation, produced high rates of formally related errors consisting of both lexical substitutions and what we call morphological-compound errors involving legal or illegal combinations of morphemes. These errors were produced in the context of a minimal number of semantic errors. We could exclude problems with phonological discrimination and phonological short-term memory. We also excluded rapid decay of lexical information and/or weak activation of word forms and letter representations since S.G.W.'s spelling showed no effect of delay and no consistent length effects, but, instead, paradoxical complexity effects with segmental, lexical, and morphological errors that were more complex than the target. The case of S.G.W. strongly resembles that of another dysgraphic individual reported in the literature-D.W.-suggesting that this pattern of errors can be replicated across patients. In particular, both patients show unusual errors resulting in the production of neologistic compounds (e.g., "bed button" in response to "bed"). These patterns can be explained if we accept two claims: (a) Brain damage can produce both a reduction and an increase in lexical activation; and (b) there are direct connections between phonological and orthographic lexical representations (a third spelling route). We suggest that both patients are suffering from a difficulty of lexical selection resulting from excessive activation of formally related lexical representations. This hypothesis is strongly supported by S.G.W.'s worse performance in spelling to dictation than in written naming, which shows that a phonological input, activating a cohort of formally related lexical representations, increases selection difficulties. © 2014 Taylor & Francis.
Resumo:
This research aims to determine the dimensions of motivation and satisfaction, acquired, through the perception in context of job training, by future technicians (students) in the hospitality and tourism industry, particularly by technical courses in the hotel and restaurant sector. The methodology comprises three distinct stages. First were recovered instruments (questionnaires), already validated by other authors of motivation and satisfaction, which had the intention to replicate studies conducted in other scientific knowledge fields, such as tourism. Those instruments were recovered from the reviewed literature conducted about other themes. On second place the measuring instruments were submitted to a pre-test, or rather, were subject of a pioneer study, in order to verify other assumptions such as semantic errors or see if there was the possibility of some prepared questions to be consider invalidated by poor formulation or interpretation. Finally, were applied in three educational institutions who agreed to cooperate on the research, with the reservation that the interviewed needed a mandatory pre-requirement that consisted in conducting a minimum training in work context (TWC). Then, proceed the statistical analysis to support all the empirical part. The results show that, in general, motivation and satisfaction were present during the period of training in work context. To some people it meant a very important period of personal and professional life, concerning the interactions, emotions and involvement with touristic organizations but also the personal and social relationships.
Resumo:
In order to explore the impact of a degraded semantic system on the structure of language production, we analysed transcripts from autobiographical memory interviews to identify naturally-occurring speech errors by eight patients with semantic dementia (SD) and eight age-matched normal speakers. Relative to controls, patients were significantly more likely to (a) substitute and omit open class words, (b) substitute (but not omit) closed class words, (c) substitute incorrect complex morphological forms and (d) produce semantically and/or syntactically anomalous sentences. Phonological errors were scarce in both groups. The study confirms previous evidence of SD patients’ problems with open class content words which are replaced by higher frequency, less specific terms. It presents the first evidence that SD patients have problems with closed class items and make syntactic as well as semantic speech errors, although these grammatical abnormalities are mostly subtle rather than gross. The results can be explained by the semantic deficit which disrupts the representation of a pre-verbal message, lexical retrieval and the early stages of grammatical encoding.
Resumo:
We explored the impact of a degraded semantic system on lexical, morphological and syntactic complexity in language production. We analysed transcripts from connected speech samples from eight patients with semantic dementia (SD) and eight age-matched healthy speakers. The frequency distributions of nouns and verbs were compared for hand-scored data and data extracted using text-analysis software. Lexical measures showed the predicted pattern for nouns and verbs in hand-scored data, and for nouns in software-extracted data, with fewer low frequency items in the speech of the patients relative to controls. The distribution of complex morpho-syntactic forms for the SD group showed a reduced range, with fewer constructions that required multiple auxiliaries and inflections. Finally, the distribution of syntactic constructions also differed between groups, with a pattern that reflects the patients’ characteristic anomia and constraints on morpho-syntactic complexity. The data are in line with previous findings of an absence of gross syntactic errors or violations in SD speech. Alterations in the distributions of morphology and syntax, however, support constraint satisfaction models of speech production in which there is no hard boundary between lexical retrieval and grammatical encoding.
Resumo:
OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web
1. INTRODUCTION. LINGUISTIC TOOLS AND ANNOTATIONS: THEIR LIGHTS AND SHADOWS
Computational Linguistics is already a consolidated research area. It builds upon the results of other two major ones, namely Linguistics and Computer Science and Engineering, and it aims at developing computational models of human language (or natural language, as it is termed in this area). Possibly, its most well-known applications are the different tools developed so far for processing human language, such as machine translation systems and speech recognizers or dictation programs.
These tools for processing human language are commonly referred to as linguistic tools. Apart from the examples mentioned above, there are also other types of linguistic tools that perhaps are not so well-known, but on which most of the other applications of Computational Linguistics are built. These other types of linguistic tools comprise POS taggers, natural language parsers and semantic taggers, amongst others. All of them can be termed linguistic annotation tools.
Linguistic annotation tools are important assets. In fact, POS and semantic taggers (and, to a lesser extent, also natural language parsers) have become critical resources for the computer applications that process natural language. Hence, any computer application that has to analyse a text automatically and ‘intelligently’ will include at least a module for POS tagging. The more an application needs to ‘understand’ the meaning of the text it processes, the more linguistic tools and/or modules it will incorporate and integrate.
However, linguistic annotation tools have still some limitations, which can be summarised as follows:
1. Normally, they perform annotations only at a certain linguistic level (that is, Morphology, Syntax, Semantics, etc.).
2. They usually introduce a certain rate of errors and ambiguities when tagging. This error rate ranges from 10 percent up to 50 percent of the units annotated for unrestricted, general texts.
3. Their annotations are most frequently formulated in terms of an annotation schema designed and implemented ad hoc.
A priori, it seems that the interoperation and the integration of several linguistic tools into an appropriate software architecture could most likely solve the limitations stated in (1). Besides, integrating several linguistic annotation tools and making them interoperate could also minimise the limitation stated in (2). Nevertheless, in the latter case, all these tools should produce annotations for a common level, which would have to be combined in order to correct their corresponding errors and inaccuracies. Yet, the limitation stated in (3) prevents both types of integration and interoperation from being easily achieved.
In addition, most high-level annotation tools rely on other lower-level annotation tools and their outputs to generate their own ones. For example, sense-tagging tools (operating at the semantic level) often use POS taggers (operating at a lower level, i.e., the morphosyntactic) to identify the grammatical category of the word or lexical unit they are annotating. Accordingly, if a faulty or inaccurate low-level annotation tool is to be used by other higher-level one in its process, the errors and inaccuracies of the former should be minimised in advance. Otherwise, these errors and inaccuracies would be transferred to (and even magnified in) the annotations of the high-level annotation tool.
Therefore, it would be quite useful to find a way to
(i) correct or, at least, reduce the errors and the inaccuracies of lower-level linguistic tools;
(ii) unify the annotation schemas of different linguistic annotation tools or, more generally speaking, make these tools (as well as their annotations) interoperate.
Clearly, solving (i) and (ii) should ease the automatic annotation of web pages by means of linguistic tools, and their transformation into Semantic Web pages (Berners-Lee, Hendler and Lassila, 2001). Yet, as stated above, (ii) is a type of interoperability problem. There again, ontologies (Gruber, 1993; Borst, 1997) have been successfully applied thus far to solve several interoperability problems. Hence, ontologies should help solve also the problems and limitations of linguistic annotation tools aforementioned.
Thus, to summarise, the main aim of the present work was to combine somehow these separated approaches, mechanisms and tools for annotation from Linguistics and Ontological Engineering (and the Semantic Web) in a sort of hybrid (linguistic and ontological) annotation model, suitable for both areas. This hybrid (semantic) annotation model should (a) benefit from the advances, models, techniques, mechanisms and tools of these two areas; (b) minimise (and even solve, when possible) some of the problems found in each of them; and (c) be suitable for the Semantic Web. The concrete goals that helped attain this aim are presented in the following section.
2. GOALS OF THE PRESENT WORK
As mentioned above, the main goal of this work was to specify a hybrid (that is, linguistically-motivated and ontology-based) model of annotation suitable for the Semantic Web (i.e. it had to produce a semantic annotation of web page contents). This entailed that the tags included in the annotations of the model had to (1) represent linguistic concepts (or linguistic categories, as they are termed in ISO/DCR (2008)), in order for this model to be linguistically-motivated; (2) be ontological terms (i.e., use an ontological vocabulary), in order for the model to be ontology-based; and (3) be structured (linked) as a collection of ontology-based
Resumo:
The creation of language resources is a time-consuming process requiring the efforts of many people. The use of resources collaboratively created by non-linguists can potentially ameliorate this situation. However, such resources often contain more errors compared to resources created by experts. For the particular case of lexica, we analyse the case of Wiktionary, a resource created along wiki principles and argue that through the use of a principled lexicon model, namely lemon, the resulting data could be better understandable to machines. We then present a platform called lemon source that supports the creation of linked lexical data along the lemon model. This tool builds on the concept of a semantic wiki to enable collaborative editing of the resources by many users concurrently. In this paper, we describe the model, the tool and present an evaluation of its usability based on a small group of users.
Resumo:
Semantic Web Service, one of the most significant research areas within the Semantic Web vision, has attracted increasing attention from both the research community and industry. The Web Service Modelling Ontology (WSMO) has been proposed as an enabling framework for the total/partial automation of the tasks (e.g., discovery, selection, composition, mediation, execution, monitoring, etc.) involved in both intra- and inter-enterprise integration of Web services. To support the standardisation and tool support of WSMO, a formal model of the language is highly desirable. As several variants of WSMO have been proposed by the WSMO community, which are still under development, the syntax and semantics of WSMO should be formally defined to facilitate easy reuse and future development. In this paper, we present a formal Object-Z formal model of WSMO, where different aspects of the language have been precisely defined within one unified framework. This model not only provides a formal unambiguous model which can be used to develop tools and facilitate future development, but as demonstrated in this paper, can be used to identify and eliminate errors present in existing documentation.
Resumo:
The Semantic Web relies on carefully structured, well defined, data to allow machines to communicate and understand one another. In many domains (e.g. geospatial) the data being described contains some uncertainty, often due to incomplete knowledge; meaningful processing of this data requires these uncertainties to be carefully analysed and integrated into the process chain. Currently, within the SemanticWeb there is no standard mechanism for interoperable description and exchange of uncertain information, which renders the automated processing of such information implausible, particularly where error must be considered and captured as it propagates through a processing sequence. In particular we adopt a Bayesian perspective and focus on the case where the inputs / outputs are naturally treated as random variables. This paper discusses a solution to the problem in the form of the Uncertainty Markup Language (UncertML). UncertML is a conceptual model, realised as an XML schema, that allows uncertainty to be quantified in a variety of ways i.e. realisations, statistics and probability distributions. UncertML is based upon a soft-typed XML schema design that provides a generic framework from which any statistic or distribution may be created. Making extensive use of Geography Markup Language (GML) dictionaries, UncertML provides a collection of definitions for common uncertainty types. Containing both written descriptions and mathematical functions, encoded as MathML, the definitions within these dictionaries provide a robust mechanism for defining any statistic or distribution and can be easily extended. Universal Resource Identifiers (URIs) are used to introduce semantics to the soft-typed elements by linking to these dictionary definitions. The INTAMAP (INTeroperability and Automated MAPping) project provides a use case for UncertML. This paper demonstrates how observation errors can be quantified using UncertML and wrapped within an Observations & Measurements (O&M) Observation. The interpolation service uses the information within these observations to influence the prediction outcome. The output uncertainties may be encoded in a variety of UncertML types, e.g. a series of marginal Gaussian distributions, a set of statistics, such as the first three marginal moments, or a set of realisations from a Monte Carlo treatment. Quantifying and propagating uncertainty in this way allows such interpolation results to be consumed by other services. This could form part of a risk management chain or a decision support system, and ultimately paves the way for complex data processing chains in the Semantic Web.
Resumo:
Because metadata that underlies semantic web applications is gathered from distributed and heterogeneous data sources, it is important to ensure its quality (i.e., reduce duplicates, spelling errors, ambiguities). However, current infrastructures that acquire and integrate semantic data have only marginally addressed the issue of metadata quality. In this paper we present our metadata acquisition infrastructure, ASDI, which pays special attention to ensuring that high quality metadata is derived. Central to the architecture of ASDI is a verification engine that relies on several semantic web tools to check the quality of the derived data. We tested our prototype in the context of building a semantic web portal for our lab, KMi. An experimental evaluation comparing the automatically extracted data against manual annotations indicates that the verification engine enhances the quality of the extracted semantic metadata.
Resumo:
Purpose: To establish the prevalence of refractive errors and ocular disorders in preschool and schoolchildren of Ibiporã, Brazil. Methods: A survey of 6 to 12-year-old children from public and private elementary schools was carried out in Ibiporã between 1989 and 1996. Visual acuity measurements were performed by trained teachers using Snellen's chart. Children with visual acuity <0.7 in at least one eye were referred to a complete ophthalmologic examination. Results: 35,936 visual acuity measurements were performed in 13,471 children. 1.966 children (14.59%) were referred to an ophthalmologic examination. Amblyopia was diagnosed in 237 children (1.76%), whereas strabismus was observed in 114 cases (0.84%). Cataract (n=17) (0.12%), chorioretinitis (n=38) (0.28%) and eyelid ptosis (n=6) (0.04%) were also diagnosed. Among the 614 (4.55%) children who were found to have refractive errors, 284 (46.25%) had hyperopia (hyperopia or hyperopic astigmatism), 206 (33.55%) had myopia (myopia or myopic astigmatism) and 124 (20.19%) showed mixed astigmatism. Conclusions: The study determined the local prevalence of amblyopia, refractive errors and eye disorders among preschool and schoolchildren.