883 resultados para corpus diacrônico


Relevância:

10.00% 10.00%

Publicador:

Resumo:

RÉSUMÉ. La prise en compte des troubles de la communication dans l’utilisation des systèmes de recherche d’information tels qu’on peut en trouver sur le Web est généralement réalisée par des interfaces utilisant des modalités n’impliquant pas la lecture et l’écriture. Peu d’applications existent pour aider l’utilisateur en difficulté dans la modalité textuelle. Nous proposons la prise en compte de la conscience phonologique pour assister l’utilisateur en difficulté d’écriture de requêtes (dysorthographie) ou de lecture de documents (dyslexie). En premier lieu un système de réécriture et d’interprétation des requêtes entrées au clavier par l’utilisateur est proposé : en s’appuyant sur les causes de la dysorthographie et sur les exemples à notre disposition, il est apparu qu’un système combinant une approche éditoriale (type correcteur orthographique) et une approche orale (système de transcription automatique) était plus approprié. En second lieu une méthode d’apprentissage automatique utilise des critères spécifiques , tels que la cohésion grapho-phonémique, pour estimer la lisibilité d’une phrase, puis d’un texte. ABSTRACT. Most applications intend to help disabled users in the information retrieval process by proposing non-textual modalities. This paper introduces specific parameters linked to phonological awareness in the textual modality. This will enhance the ability of systems to deal with orthographic issues and with the adaptation of results to the reader when for example the reader is dyslexic. We propose a phonology based sentence level rewriting system that combines spelling correction, speech synthesis and automatic speech recognition. This has been evaluated on a corpus of questions we get from dyslexic children. We propose a specific sentence readability measure that involves phonetic parameters such as grapho-phonemic cohesion. This has been learned on a corpus of reading time of sentences read by dyslexic children.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Since manually constructing domain-specific sentiment lexicons is extremely time consuming and it may not even be feasible for domains where linguistic expertise is not available. Research on the automatic construction of domain-specific sentiment lexicons has become a hot topic in recent years. The main contribution of this paper is the illustration of a novel semi-supervised learning method which exploits both term-to-term and document-to-term relations hidden in a corpus for the construction of domain specific sentiment lexicons. More specifically, the proposed two-pass pseudo labeling method combines shallow linguistic parsing and corpusbase statistical learning to make domain-specific sentiment extraction scalable with respect to the sheer volume of opinionated documents archived on the Internet these days. Another novelty of the proposed method is that it can utilize the readily available user-contributed labels of opinionated documents (e.g., the user ratings of product reviews) to bootstrap the performance of sentiment lexicon construction. Our experiments show that the proposed method can generate high quality domain-specific sentiment lexicons as directly assessed by human experts. Moreover, the system generated domain-specific sentiment lexicons can improve polarity prediction tasks at the document level by 2:18% when compared to other well-known baseline methods. Our research opens the door to the development of practical and scalable methods for domain-specific sentiment analysis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Models of word meaning, built from a corpus of text, have demonstrated success in emulating human performance on a number of cognitive tasks. Many of these models use geometric representations of words to store semantic associations between words. Often word order information is not captured in these models. The lack of structural information used by these models has been raised as a weakness when performing cognitive tasks. This paper presents an efficient tensor based approach to modelling word meaning that builds on recent attempts to encode word order information, while providing flexible methods for extracting task specific semantic information.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Information has no value unless it is accessible. Information must be connected together so a knowledge network can then be built. Such a knowledge base is a key resource for Internet users to interlink information from documents. Information retrieval, a key technology for knowledge management, guarantees access to large corpora of unstructured text. Collaborative knowledge management systems such as Wikipedia are becoming more popular than ever; however, their link creation function is not optimized for discovering possible links in the collection and the quality of automatically generated links has never been quantified. This research begins with an evaluation forum which is intended to cope with the experiments of focused link discovery in a collaborative way as well as with the investigation of the link discovery application. The research focus was on the evaluation strategy: the evaluation framework proposal, including rules, formats, pooling, validation, assessment and evaluation has proved to be efficient, reusable for further extension and efficient for conducting evaluation. The collection-split approach is used to re-construct the Wikipedia collection into a split collection comprising single passage files. This split collection is proved to be feasible for improving relevant passages discovery and is devoted to being a corpus for focused link discovery. Following these experiments, a mobile client-side prototype built on iPhone is developed to resolve the mobile Search issue by using focused link discovery technology. According to the interview survey, the proposed mobile interactive UI does improve the experience of mobile information seeking. Based on this evaluation framework, a novel cross-language link discovery proposal using multiple text collections is developed. A dynamic evaluation approach is proposed to enhance both the collaborative effort and the interacting experience between submission and evaluation. A realistic evaluation scheme has been implemented at NTCIR for cross-language link discovery tasks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Unstructured text data, such as emails, blogs, contracts, academic publications, organizational documents, transcribed interviews, and even tweets, are important sources of data in Information Systems research. Various forms of qualitative analysis of the content of these data exist and have revealed important insights. Yet, to date, these analyses have been hampered by limitations of human coding of large data sets, and by bias due to human interpretation. In this paper, we compare and combine two quantitative analysis techniques to demonstrate the capabilities of computational analysis for content analysis of unstructured text. Specifically, we seek to demonstrate how two quantitative analytic methods, viz., Latent Semantic Analysis and data mining, can aid researchers in revealing core content topic areas in large (or small) data sets, and in visualizing how these concepts evolve, migrate, converge or diverge over time. We exemplify the complementary application of these techniques through an examination of a 25-year sample of abstracts from selected journals in Information Systems, Management, and Accounting disciplines. Through this work, we explore the capabilities of two computational techniques, and show how these techniques can be used to gather insights from a large corpus of unstructured text.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This study examined the everyday practices of families within the context of family mealtime to investigate how members accomplished mealtime interactions. Using an ethnomethodological approach, conversation analysis and membership categorization analysis, the study investigated the interactional resources that family members used to assemble their social orders moment by moment during family mealtimes. While there is interest in mealtimes within educational policy, health research and the media, there remain few studies that provide fine-grained detail about how members produce the social activity of having a family meal. Findings from this study contribute empirical understandings about families and family mealtime. Two families with children aged 2 to 10 years were observed as they accomplished their everyday mealtime activities. Data collection took place in the family homes where family members video recorded their naturally occurring mealtimes. Each family was provided with a video camera for a one-month period and they decided which mealtimes they recorded, a method that afforded participants greater agency in the data collection process and made available to the analyst a window into the unfolding of the everyday lives of the families. A total of 14 mealtimes across the two families were recorded, capturing 347 minutes of mealtime interactions. Selected episodes from the data corpus, which includes centralised breakfast and dinnertime episodes, were transcribed using the Jeffersonian system. Three data chapters examine extended sequences of family talk at mealtimes, to show the interactional resources used by members during mealtime interactions. The first data chapter explores multiparty talk to show how the uniqueness of the occasion of having a meal influences turn design. It investigates the ways in which members accomplish two-party talk within a multiparty setting, showing how one child "tells" a funny story to accomplish the drawing together of his brothers as an audience. As well, this chapter identifies the interactional resources used by the mother to cohort her children to accomplish the choralling of grace. The second data chapter draws on sequential and categorical analysis to show how members are mapped to a locally produced membership category. The chapter shows how the mapping of members into particular categories is consequential for social order; for example, aligning members who belong to the membership category "had haircuts" and keeping out those who "did not have haircuts". Additional interactional resources such as echoing, used here to refer to the use of exactly the same words, similar prosody and physical action, and increasing physical closeness, are identified as important to the unfolding talk particularly as a way of accomplishing alignment between the grandmother and grand-daughter. The third and final data analysis chapter examines topical talk during family mealtimes. It explicates how members introduce topics of talk with an orientation to their co-participant and the way in which the take up of a topic is influenced both by the sequential environment in which it is introduced and the sensitivity of the topic. Together, these three data chapters show aspects of how family members participated in family mealtimes. The study contributes four substantive themes that emerged during the analytic process and, as such, the themes reflect what the members were observed to be doing. The first theme identified how family knowledge was relevant and consequential for initiating and sustaining interaction during mealtime with, for example, members buying into the talk of other members or being requested to help out with knowledge about a shared experience. Knowledge about members and their activities was evident with the design of questions evidencing an orientation to coparticipant’s knowledge. The second theme found how members used topic as a resource for social interaction. The third theme concerned the way in which members utilised membership categories for producing and making sense of social action. The fourth theme, evident across all episodes selected for analysis, showed how children’s competence is an ongoing interactional accomplishment as they manipulated interactional resources to manage their participation in family mealtime. The way in which children initiated interactions challenges previous understandings about children’s restricted rights as conversationalists. As well as making a theoretical contribution, the study offers methodological insight by working with families as research participants. The study shows the procedures involved as the study moved from one where the researcher undertook the decisions about what to videorecord to offering this decision making to the families, who chose when and what to videorecord of their mealtime practices. Evident also are the ways in which participants orient both to the video-camera and to the absent researcher. For the duration of the mealtime the video-camera was positioned by the adults as out of bounds to the children; however, it was offered as a "treat" to view after the mealtime was recorded. While situated within family mealtimes and reporting on the experiences of two families, this study illuminates how mealtimes are not just about food and eating; they are social. The study showed the constant and complex work of establishing and maintaining social orders and the rich array of interactional resources that members draw on during family mealtimes. The family’s interactions involved members contributing to building the social orders of family mealtime. With mealtimes occurring in institutional settings involving young children, such as long day care centres and kindergartens, the findings of this study may help educators working with young children to see the rich interactional opportunities mealtimes afford children, the interactional competence that children demonstrate during mealtimes, and the important role/s that adults may assume as co-participants in interactions with children within institutional settings.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term- based ones in describing user preferences, but many experiments do not support this hypothesis. This research presents a promising method, Relevance Feature Discovery (RFD), for solving this challenging issue. It discovers both positive and negative patterns in text documents as high-level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the high-level features. The thesis also introduces an adaptive model (called ARFD) to enhance the exibility of using RFD in adaptive environment. ARFD automatically updates the system's knowledge based on a sliding window over new incoming feedback documents. It can efficiently decide which incoming documents can bring in new knowledge into the system. Substantial experiments using the proposed models on Reuters Corpus Volume 1 and TREC topics show that the proposed models significantly outperform both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and other pattern-based methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

It is a big challenge to acquire correct user profiles for personalized text classification since users may be unsure in providing their interests. Traditional approaches to user profiling adopt machine learning (ML) to automatically discover classification knowledge from explicit user feedback in describing personal interests. However, the accuracy of ML-based methods cannot be significantly improved in many cases due to the term independence assumption and uncertainties associated with them. This paper presents a novel relevance feedback approach for personalized text classification. It basically applies data mining to discover knowledge from relevant and non-relevant text and constraints specific knowledge by reasoning rules to eliminate some conflicting information. We also developed a Dempster-Shafer (DS) approach as the means to utilise the specific knowledge to build high-quality data models for classification. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics support that the proposed technique achieves encouraging performance in comparing with the state-of-the-art relevance feedback models.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Purpose This chapter investigates an episode where a supervising teacher on playground duty asks two boys to each give an account of their actions over an incident that had just occurred on some climbing equipment in the playground. Methodology This paper employs an ethnomethodological approach using conversation analysis. The data are taken from a corpus of video recorded interactions of children, aged 7-9 years, and the teacher, in school playgrounds during the lunch recess. Findings The findings show the ways that children work up accounts of their playground practices when asked by the teacher. The teacher initially provided interactional space for each child to give their version of the events. Ultimately, the teacher’s version of how to act in the playground became the sanctioned one. The children and the teacher formulated particular social orders of behavior in the playground through multi-modal devices, direct reported speech and scripts. Such public displays of talk work as socialization practices that frame teacher-sanctioned morally appropriate actions in the playground. Value of paper This chapter shows the pervasiveness of the teacher’s social order, as she presented an institutional social order of how to interact in the playground, showing clearly the disjunction of adult-child orders between the teacher and children.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Electronic services are a leitmotif in ‘hot’ topics like Software as a Service, Service Oriented Architecture (SOA), Service oriented Computing, Cloud Computing, application markets and smart devices. We propose to consider these in what has been termed the Service Ecosystem (SES). The SES encompasses all levels of electronic services and their interaction, with human consumption and initiation on its periphery in much the same way the ‘Web’ describes a plethora of technologies that eventuate to connect information and expose it to humans. Presently, the SES is heterogeneous, fragmented and confined to semi-closed systems. A key issue hampering the emergence of an integrated SES is Service Discovery (SD). A SES will be dynamic with areas of structured and unstructured information within which service providers and ‘lay’ human consumers interact; until now the two are disjointed, e.g., SOA-enabled organisations, industries and domains are choreographed by domain experts or ‘hard-wired’ to smart device application markets and web applications. In a SES, services are accessible, comparable and exchangeable to human consumers closing the gap to the providers. This requires a new SD with which humans can discover services transparently and effectively without special knowledge or training. We propose two modes of discovery, directed search following an agenda and explorative search, which speculatively expands knowledge of an area of interest by means of categories. Inspired by conceptual space theory from cognitive science, we propose to implement the modes of discovery using concepts to map a lay consumer’s service need to terminologically sophisticated descriptions of services. To this end, we reframe SD as an information retrieval task on the information attached to services, such as, descriptions, reviews, documentation and web sites - the Service Information Shadow. The Semantic Space model transforms the shadow's unstructured semantic information into a geometric, concept-like representation. We introduce an improved and extended Semantic Space including categorization calling it the Semantic Service Discovery model. We evaluate our model with a highly relevant, service related corpus simulating a Service Information Shadow including manually constructed complex service agendas, as well as manual groupings of services. We compare our model against state-of-the-art information retrieval systems and clustering algorithms. By means of an extensive series of empirical evaluations, we establish optimal parameter settings for the semantic space model. The evaluations demonstrate the model’s effectiveness for SD in terms of retrieval precision over state-of-the-art information retrieval models (directed search) and the meaningful, automatic categorization of service related information, which shows potential to form the basis of a useful, cognitively motivated map of the SES for exploratory search.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The quality of discovered features in relevance feedback (RF) is the key issue for effective search query. Most existing feedback methods do not carefully address the issue of selecting features for noise reduction. As a result, extracted noisy features can easily contribute to undesirable effectiveness. In this paper, we propose a novel feature extraction method for query formulation. This method first extract term association patterns in RF as knowledge for feature extraction. Negative RF is then used to improve the quality of the discovered knowledge. A novel information filtering (IF) model is developed to evaluate the proposed method. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics confirm that the proposed model achieved encouraging performance compared to state-of-the-art IF models.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper develops and evaluates an enhanced corpus based approach for semantic processing. Corpus based models that build representations of words directly from text do not require pre-existing linguistic knowledge, and have demonstrated psychologically relevant performance on a number of cognitive tasks. However, they have been criticised in the past for not incorporating sufficient structural information. Using ideas underpinning recent attempts to overcome this weakness, we develop an enhanced tensor encoding model to build representations of word meaning for semantic processing. Our enhanced model demonstrates superior performance when compared to a robust baseline model on a number of semantic processing tasks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent Australian early childhood policy and curriculum guidelines promoting the use of technologies invite investigations of young children’s practices in classrooms. This study examined the practices of one preparatory year classroom, to show teacher and child interactions as they engaged in Web searching. The study investigated the in situ practices of the teacher and children to show how they accomplished the Web search. The data corpus consists of eight hours of videorecorded interactions over three days where children and teachers engaged in Web searching. One episode was selected that showed a teacher and two children undertaking a Web search. The episode is shown to consist of four phases: deciding on a new search subject, inputting the search query, considering the result options, and exploring the selected result. The sociological perspectives of ethnomethodology and conversation analysis were employed as the conceptual and methodological frameworks of the study, to analyse the video-recorded teacher and child interactions as they co-constructed a Web search. Ethnomethodology is concerned with how people make ‘sense’ in everyday interactions, and conversation analysis focuses on the sequential features of interaction to show how the interaction unfolds moment by moment. This extended single case analysis showed how the Web search was accomplished over multiple turns, and how the children and teacher collaboratively engaged in talk. There are four main findings. The first was that Web searching featured sustained teacher-child interaction, requiring a particular sort of classroom organisation to enable the teacher to work in this sustained way. The second finding was that the teacher’s actions recognised the children’s interactional competence in situ, orchestrating an interactional climate where everyone was heard. The third finding was that the teacher drew upon a range of interactional resources designed to progress the activity at hand, that of accomplishing the Web search. The teacher drew upon the interactional resources of interrogatives, discourse markers, and multi-unit turns during the Web search, and these assisted the teacher and children to co-construct their discussion, decide upon and co-ordinate their future actions, and accomplish the Web search in a timely way. The fourth finding explicates how particular social and pedagogic orders are accomplished through talk, where children collaborated with each other and with the teacher to complete the Web search. The study makes three key recommendations for the field of early childhood education. The study’s first recommendation is that fine-grained transcription and analysis of interaction aids in understanding interactional practices of Web searching. This study offers material for use in professional development, such as using transcribed and videorecorded interactions to highlight how teachers strategically engage with children, that is, how talk works in classroom settings. Another strategy is to focus on the social interactions of members engaging in Web searches, which is likely to be of interest to teachers as they work to engage with children in an increasingly online environment. The second recommendation involves classroom organisation; how teachers consider and plan for extended periods of time for Web searching, and how teachers accommodate children’s prior knowledge of Web searching in their classrooms. The third recommendation is in relation to future empirical research, with suggested possible topics focusing on the social interactions of children as they engage with peers as they Web search, as well as investigations of techno-literacy skills as children use the Internet in the early years.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper outlines a novel approach for modelling semantic relationships within medical documents. Medical terminologies contain a rich source of semantic information critical to a number of techniques in medical informatics, including medical information retrieval. Recent research suggests that corpus-driven approaches are effective at automatically capturing semantic similarities between medical concepts, thus making them an attractive option for accessing semantic information. Most previous corpus-driven methods only considered syntagmatic associations. In this paper, we adapt a recent approach that explicitly models both syntagmatic and paradigmatic associations. We show that the implicit similarity between certain medical concepts can only be modelled using paradigmatic associations. In addition, the inclusion of both types of associations overcomes the sensitivity to the training corpus experienced by previous approaches, making our method both more effective and more robust. This finding may have implications for researchers in the area of medical information retrieval.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Allegations of child sexual abuse in Family Court cases have gained increasing attention. The study investigates factors involved in Family Court cases involving allegations of child sexual abuse. A qualitative methodology was employed to examine Records of Judgement and Psychiatric Reports for 20 cases distilled from the data corpus of 102 cases. A seven-stage methodology was developed utilising a thematic analysis process informed by principles of grounded theory and phenomenology. The explication of eight thematic clusters was undertaken. The findings point to complex issues and dynamics in which child sexual abuse allegations have been raised. The alleging parent’s allegations of sexual abuse against their ex-partner may be: the expression of unconscious deep fears for their children’s welfare, or an action to meet their needs for personal affirmation in the context of the painful upheaval of a relationship break-up. Implications of the findings are discussed.