889 resultados para Corpus parallèles


Relevância:

10.00% 10.00%

Publicador:

Resumo:

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term- based ones in describing user preferences, but many experiments do not support this hypothesis. This research presents a promising method, Relevance Feature Discovery (RFD), for solving this challenging issue. It discovers both positive and negative patterns in text documents as high-level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the high-level features. The thesis also introduces an adaptive model (called ARFD) to enhance the exibility of using RFD in adaptive environment. ARFD automatically updates the system's knowledge based on a sliding window over new incoming feedback documents. It can efficiently decide which incoming documents can bring in new knowledge into the system. Substantial experiments using the proposed models on Reuters Corpus Volume 1 and TREC topics show that the proposed models significantly outperform both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and other pattern-based methods.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

It is a big challenge to acquire correct user profiles for personalized text classification since users may be unsure in providing their interests. Traditional approaches to user profiling adopt machine learning (ML) to automatically discover classification knowledge from explicit user feedback in describing personal interests. However, the accuracy of ML-based methods cannot be significantly improved in many cases due to the term independence assumption and uncertainties associated with them. This paper presents a novel relevance feedback approach for personalized text classification. It basically applies data mining to discover knowledge from relevant and non-relevant text and constraints specific knowledge by reasoning rules to eliminate some conflicting information. We also developed a Dempster-Shafer (DS) approach as the means to utilise the specific knowledge to build high-quality data models for classification. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics support that the proposed technique achieves encouraging performance in comparing with the state-of-the-art relevance feedback models.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Purpose This chapter investigates an episode where a supervising teacher on playground duty asks two boys to each give an account of their actions over an incident that had just occurred on some climbing equipment in the playground. Methodology This paper employs an ethnomethodological approach using conversation analysis. The data are taken from a corpus of video recorded interactions of children, aged 7-9 years, and the teacher, in school playgrounds during the lunch recess. Findings The findings show the ways that children work up accounts of their playground practices when asked by the teacher. The teacher initially provided interactional space for each child to give their version of the events. Ultimately, the teacher’s version of how to act in the playground became the sanctioned one. The children and the teacher formulated particular social orders of behavior in the playground through multi-modal devices, direct reported speech and scripts. Such public displays of talk work as socialization practices that frame teacher-sanctioned morally appropriate actions in the playground. Value of paper This chapter shows the pervasiveness of the teacher’s social order, as she presented an institutional social order of how to interact in the playground, showing clearly the disjunction of adult-child orders between the teacher and children.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Electronic services are a leitmotif in ‘hot’ topics like Software as a Service, Service Oriented Architecture (SOA), Service oriented Computing, Cloud Computing, application markets and smart devices. We propose to consider these in what has been termed the Service Ecosystem (SES). The SES encompasses all levels of electronic services and their interaction, with human consumption and initiation on its periphery in much the same way the ‘Web’ describes a plethora of technologies that eventuate to connect information and expose it to humans. Presently, the SES is heterogeneous, fragmented and confined to semi-closed systems. A key issue hampering the emergence of an integrated SES is Service Discovery (SD). A SES will be dynamic with areas of structured and unstructured information within which service providers and ‘lay’ human consumers interact; until now the two are disjointed, e.g., SOA-enabled organisations, industries and domains are choreographed by domain experts or ‘hard-wired’ to smart device application markets and web applications. In a SES, services are accessible, comparable and exchangeable to human consumers closing the gap to the providers. This requires a new SD with which humans can discover services transparently and effectively without special knowledge or training. We propose two modes of discovery, directed search following an agenda and explorative search, which speculatively expands knowledge of an area of interest by means of categories. Inspired by conceptual space theory from cognitive science, we propose to implement the modes of discovery using concepts to map a lay consumer’s service need to terminologically sophisticated descriptions of services. To this end, we reframe SD as an information retrieval task on the information attached to services, such as, descriptions, reviews, documentation and web sites - the Service Information Shadow. The Semantic Space model transforms the shadow's unstructured semantic information into a geometric, concept-like representation. We introduce an improved and extended Semantic Space including categorization calling it the Semantic Service Discovery model. We evaluate our model with a highly relevant, service related corpus simulating a Service Information Shadow including manually constructed complex service agendas, as well as manual groupings of services. We compare our model against state-of-the-art information retrieval systems and clustering algorithms. By means of an extensive series of empirical evaluations, we establish optimal parameter settings for the semantic space model. The evaluations demonstrate the model’s effectiveness for SD in terms of retrieval precision over state-of-the-art information retrieval models (directed search) and the meaningful, automatic categorization of service related information, which shows potential to form the basis of a useful, cognitively motivated map of the SES for exploratory search.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The quality of discovered features in relevance feedback (RF) is the key issue for effective search query. Most existing feedback methods do not carefully address the issue of selecting features for noise reduction. As a result, extracted noisy features can easily contribute to undesirable effectiveness. In this paper, we propose a novel feature extraction method for query formulation. This method first extract term association patterns in RF as knowledge for feature extraction. Negative RF is then used to improve the quality of the discovered knowledge. A novel information filtering (IF) model is developed to evaluate the proposed method. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics confirm that the proposed model achieved encouraging performance compared to state-of-the-art IF models.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper develops and evaluates an enhanced corpus based approach for semantic processing. Corpus based models that build representations of words directly from text do not require pre-existing linguistic knowledge, and have demonstrated psychologically relevant performance on a number of cognitive tasks. However, they have been criticised in the past for not incorporating sufficient structural information. Using ideas underpinning recent attempts to overcome this weakness, we develop an enhanced tensor encoding model to build representations of word meaning for semantic processing. Our enhanced model demonstrates superior performance when compared to a robust baseline model on a number of semantic processing tasks.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent Australian early childhood policy and curriculum guidelines promoting the use of technologies invite investigations of young children’s practices in classrooms. This study examined the practices of one preparatory year classroom, to show teacher and child interactions as they engaged in Web searching. The study investigated the in situ practices of the teacher and children to show how they accomplished the Web search. The data corpus consists of eight hours of videorecorded interactions over three days where children and teachers engaged in Web searching. One episode was selected that showed a teacher and two children undertaking a Web search. The episode is shown to consist of four phases: deciding on a new search subject, inputting the search query, considering the result options, and exploring the selected result. The sociological perspectives of ethnomethodology and conversation analysis were employed as the conceptual and methodological frameworks of the study, to analyse the video-recorded teacher and child interactions as they co-constructed a Web search. Ethnomethodology is concerned with how people make ‘sense’ in everyday interactions, and conversation analysis focuses on the sequential features of interaction to show how the interaction unfolds moment by moment. This extended single case analysis showed how the Web search was accomplished over multiple turns, and how the children and teacher collaboratively engaged in talk. There are four main findings. The first was that Web searching featured sustained teacher-child interaction, requiring a particular sort of classroom organisation to enable the teacher to work in this sustained way. The second finding was that the teacher’s actions recognised the children’s interactional competence in situ, orchestrating an interactional climate where everyone was heard. The third finding was that the teacher drew upon a range of interactional resources designed to progress the activity at hand, that of accomplishing the Web search. The teacher drew upon the interactional resources of interrogatives, discourse markers, and multi-unit turns during the Web search, and these assisted the teacher and children to co-construct their discussion, decide upon and co-ordinate their future actions, and accomplish the Web search in a timely way. The fourth finding explicates how particular social and pedagogic orders are accomplished through talk, where children collaborated with each other and with the teacher to complete the Web search. The study makes three key recommendations for the field of early childhood education. The study’s first recommendation is that fine-grained transcription and analysis of interaction aids in understanding interactional practices of Web searching. This study offers material for use in professional development, such as using transcribed and videorecorded interactions to highlight how teachers strategically engage with children, that is, how talk works in classroom settings. Another strategy is to focus on the social interactions of members engaging in Web searches, which is likely to be of interest to teachers as they work to engage with children in an increasingly online environment. The second recommendation involves classroom organisation; how teachers consider and plan for extended periods of time for Web searching, and how teachers accommodate children’s prior knowledge of Web searching in their classrooms. The third recommendation is in relation to future empirical research, with suggested possible topics focusing on the social interactions of children as they engage with peers as they Web search, as well as investigations of techno-literacy skills as children use the Internet in the early years.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper outlines a novel approach for modelling semantic relationships within medical documents. Medical terminologies contain a rich source of semantic information critical to a number of techniques in medical informatics, including medical information retrieval. Recent research suggests that corpus-driven approaches are effective at automatically capturing semantic similarities between medical concepts, thus making them an attractive option for accessing semantic information. Most previous corpus-driven methods only considered syntagmatic associations. In this paper, we adapt a recent approach that explicitly models both syntagmatic and paradigmatic associations. We show that the implicit similarity between certain medical concepts can only be modelled using paradigmatic associations. In addition, the inclusion of both types of associations overcomes the sensitivity to the training corpus experienced by previous approaches, making our method both more effective and more robust. This finding may have implications for researchers in the area of medical information retrieval.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Allegations of child sexual abuse in Family Court cases have gained increasing attention. The study investigates factors involved in Family Court cases involving allegations of child sexual abuse. A qualitative methodology was employed to examine Records of Judgement and Psychiatric Reports for 20 cases distilled from the data corpus of 102 cases. A seven-stage methodology was developed utilising a thematic analysis process informed by principles of grounded theory and phenomenology. The explication of eight thematic clusters was undertaken. The findings point to complex issues and dynamics in which child sexual abuse allegations have been raised. The alleging parent’s allegations of sexual abuse against their ex-partner may be: the expression of unconscious deep fears for their children’s welfare, or an action to meet their needs for personal affirmation in the context of the painful upheaval of a relationship break-up. Implications of the findings are discussed.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Children’s Literature Digital Resources incorporates primary texts published from white settlement to 1945, including children’s and young adult fiction, poetry, short stories, and picture books. This collection is supported by selected secondary material. The objective is to provide a centralised access point for information about Australian children's literature and writers and a growing body of full-text primary resources. Four key aims are: * To establish an important digital facility for research, teaching, and information provision around Australian children’s literature; * To provide access to a wide range of high-quality full-text data, both primary and secondary resources; * To provide access to essential library and research information infrastructure and facilities for established and emerging researchers in the fields of Humanities and Education; To enable research while preserving important heritage material. The collection contains texts digitised for AustLit through cooperation with various Australian libraries. The collection includes children’s and young adult fiction, poetry, picture books, short stories, and critical articles relating to relevant primary texts. Authors of primary sources include Irene Cheyne, E. W. Cole, Richard Rowe, Lillian M. Pyke, and Dorothy Wall. Secondary sources include critical works by Clare Bradford, Heather Scutter, Kerry White, Sharyn Pearce, and Marcie Muir. These full-text materials are keyword searchable (both within individual texts and across the CLDR corpus) and can be downloaded for research purposes. As well as digitising primary and secondary material, the project locates and provides pathways to existing online resources or internet publications to enhance AustLit's Children's Literature subset. These resources include both primary and secondary texts.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The aim of this paper is to provide a comparison of various algorithms and parameters to build reduced semantic spaces. The effect of dimension reduction, the stability of the representation and the effect of word order are examined in the context of the five algorithms bearing on semantic vectors: Random projection (RP), singular value decom- position (SVD), non-negative matrix factorization (NMF), permutations and holographic reduced representations (HRR). The quality of semantic representation was tested by means of synonym finding task using the TOEFL test on the TASA corpus. Dimension reduction was found to improve the quality of semantic representation but it is hard to find the optimal parameter settings. Even though dimension reduction by RP was found to be more generally applicable than SVD, the semantic vectors produced by RP are somewhat unstable. The effect of encoding word order into the semantic vector representation via HRR did not lead to any increase in scores over vectors constructed from word co-occurrence in context information. In this regard, very small context windows resulted in better semantic vectors for the TOEFL test.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we propose an approach which attempts to solve the problem of surveillance event detection, assuming that we know the definition of the events. To facilitate the discussion, we first define two concepts. The event of interest refers to the event that the user requests the system to detect; and the background activities are any other events in the video corpus. This is an unsolved problem due to many factors as listed below: 1) Occlusions and clustering: The surveillance scenes which are of significant interest at locations such as airports, railway stations, shopping centers are often crowded, where occlusions and clustering of people are frequently encountered. This significantly affects the feature extraction step, and for instance, trajectories generated by object tracking algorithms are usually not robust under such a situation. 2) The requirement for real time detection: The system should process the video fast enough in both of the feature extraction and the detection step to facilitate real time operation. 3) Massive size of the training data set: Suppose there is an event that lasts for 1 minute in a video with a frame rate of 25fps, the number of frames for this events is 60X25 = 1500. If we want to have a training data set with many positive instances of the event, the video is likely to be very large in size (i.e. hundreds of thousands of frames or more). How to handle such a large data set is a problem frequently encountered in this application. 4) Difficulty in separating the event of interest from background activities: The events of interest often co-exist with a set of background activities. Temporal groundtruth typically very ambiguous, as it does not distinguish the event of interest from a wide range of co-existing background activities. However, it is not practical to annotate the locations of the events in large amounts of video data. This problem becomes more serious in the detection of multi-agent interactions, since the location of these events can often not be constrained to within a bounding box. 5) Challenges in determining the temporal boundaries of the events: An event can occur at any arbitrary time with an arbitrary duration. The temporal segmentation of events is difficult and ambiguous, and also affected by other factors such as occlusions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the overwhelming increase in the amount of texts on the web, it is almost impossible for people to keep abreast of up-to-date information. Text mining is a process by which interesting information is derived from text through the discovery of patterns and trends. Text mining algorithms are used to guarantee the quality of extracted knowledge. However, the extracted patterns using text or data mining algorithms or methods leads to noisy patterns and inconsistency. Thus, different challenges arise, such as the question of how to understand these patterns, whether the model that has been used is suitable, and if all the patterns that have been extracted are relevant. Furthermore, the research raises the question of how to give a correct weight to the extracted knowledge. To address these issues, this paper presents a text post-processing method, which uses a pattern co-occurrence matrix to find the relation between extracted patterns in order to reduce noisy patterns. The main objective of this paper is not only reducing the number of closed sequential patterns, but also improving the performance of pattern mining as well. The experimental results on Reuters Corpus Volume 1 data collection and TREC filtering topics show that the proposed method is promising.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Finding and labelling semantic features patterns of documents in a large, spatial corpus is a challenging problem. Text documents have characteristics that make semantic labelling difficult; the rapidly increasing volume of online documents makes a bottleneck in finding meaningful textual patterns. Aiming to deal with these issues, we propose an unsupervised documnent labelling approach based on semantic content and feature patterns. A world ontology with extensive topic coverage is exploited to supply controlled, structured subjects for labelling. An algorithm is also introduced to reduce dimensionality based on the study of ontological structure. The proposed approach was promisingly evaluated by compared with typical machine learning methods including SVMs, Rocchio, and kNN.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The disparity that exists between the highest and lowest achievers together with deficit approaches to teaching, learning and assessment raise serious equity issues related to fairness, validity, culture and access, which were analysed in a recent Australian Research Council funded project. This chapter explores the potential that exists for teachers to work with Indigenous Teacher Assistants (ITAs) to secure cultural connectedness in teaching, learning and assessment of Indigenous students. The study was a design experiment, which was conducted in seven Catholic and Independent primary schools in northern Queensland and involved semi-structured focus group interviews with Year 4 and 6 Indigenous students, principals, teachers and Indigenous Teacher Assistants. Classroom observations and document analyses were also conducted. This corpus of data was analysed using a sociocultural theoretical lens. The use of a sociocultural analysis helped to identify cultural influences, Indigenous students’ funds of knowledge and values. The information from this analysis was made explicit to teachers to demonstrate how they could enhance their pedagogic and assessment practices by embracing and extending the cultural spaces for learning and teaching of Indigenous students. The way in which teachers construct their interactions for greater cultural connectedness and enhanced learning would appear to rely on relationship building with Indigenous staff, Indigenous students’ cultural knowledge, and improved understanding of assessment and related equity issues.