883 resultados para corpus diacrônico


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Children’s Literature Digital Resources incorporates primary texts published from white settlement to 1945, including children’s and young adult fiction, poetry, short stories, and picture books. This collection is supported by selected secondary material. The objective is to provide a centralised access point for information about Australian children's literature and writers and a growing body of full-text primary resources. Four key aims are: * To establish an important digital facility for research, teaching, and information provision around Australian children’s literature; * To provide access to a wide range of high-quality full-text data, both primary and secondary resources; * To provide access to essential library and research information infrastructure and facilities for established and emerging researchers in the fields of Humanities and Education; To enable research while preserving important heritage material. The collection contains texts digitised for AustLit through cooperation with various Australian libraries. The collection includes children’s and young adult fiction, poetry, picture books, short stories, and critical articles relating to relevant primary texts. Authors of primary sources include Irene Cheyne, E. W. Cole, Richard Rowe, Lillian M. Pyke, and Dorothy Wall. Secondary sources include critical works by Clare Bradford, Heather Scutter, Kerry White, Sharyn Pearce, and Marcie Muir. These full-text materials are keyword searchable (both within individual texts and across the CLDR corpus) and can be downloaded for research purposes. As well as digitising primary and secondary material, the project locates and provides pathways to existing online resources or internet publications to enhance AustLit's Children's Literature subset. These resources include both primary and secondary texts.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The aim of this paper is to provide a comparison of various algorithms and parameters to build reduced semantic spaces. The effect of dimension reduction, the stability of the representation and the effect of word order are examined in the context of the five algorithms bearing on semantic vectors: Random projection (RP), singular value decom- position (SVD), non-negative matrix factorization (NMF), permutations and holographic reduced representations (HRR). The quality of semantic representation was tested by means of synonym finding task using the TOEFL test on the TASA corpus. Dimension reduction was found to improve the quality of semantic representation but it is hard to find the optimal parameter settings. Even though dimension reduction by RP was found to be more generally applicable than SVD, the semantic vectors produced by RP are somewhat unstable. The effect of encoding word order into the semantic vector representation via HRR did not lead to any increase in scores over vectors constructed from word co-occurrence in context information. In this regard, very small context windows resulted in better semantic vectors for the TOEFL test.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we propose an approach which attempts to solve the problem of surveillance event detection, assuming that we know the definition of the events. To facilitate the discussion, we first define two concepts. The event of interest refers to the event that the user requests the system to detect; and the background activities are any other events in the video corpus. This is an unsolved problem due to many factors as listed below: 1) Occlusions and clustering: The surveillance scenes which are of significant interest at locations such as airports, railway stations, shopping centers are often crowded, where occlusions and clustering of people are frequently encountered. This significantly affects the feature extraction step, and for instance, trajectories generated by object tracking algorithms are usually not robust under such a situation. 2) The requirement for real time detection: The system should process the video fast enough in both of the feature extraction and the detection step to facilitate real time operation. 3) Massive size of the training data set: Suppose there is an event that lasts for 1 minute in a video with a frame rate of 25fps, the number of frames for this events is 60X25 = 1500. If we want to have a training data set with many positive instances of the event, the video is likely to be very large in size (i.e. hundreds of thousands of frames or more). How to handle such a large data set is a problem frequently encountered in this application. 4) Difficulty in separating the event of interest from background activities: The events of interest often co-exist with a set of background activities. Temporal groundtruth typically very ambiguous, as it does not distinguish the event of interest from a wide range of co-existing background activities. However, it is not practical to annotate the locations of the events in large amounts of video data. This problem becomes more serious in the detection of multi-agent interactions, since the location of these events can often not be constrained to within a bounding box. 5) Challenges in determining the temporal boundaries of the events: An event can occur at any arbitrary time with an arbitrary duration. The temporal segmentation of events is difficult and ambiguous, and also affected by other factors such as occlusions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

With the overwhelming increase in the amount of texts on the web, it is almost impossible for people to keep abreast of up-to-date information. Text mining is a process by which interesting information is derived from text through the discovery of patterns and trends. Text mining algorithms are used to guarantee the quality of extracted knowledge. However, the extracted patterns using text or data mining algorithms or methods leads to noisy patterns and inconsistency. Thus, different challenges arise, such as the question of how to understand these patterns, whether the model that has been used is suitable, and if all the patterns that have been extracted are relevant. Furthermore, the research raises the question of how to give a correct weight to the extracted knowledge. To address these issues, this paper presents a text post-processing method, which uses a pattern co-occurrence matrix to find the relation between extracted patterns in order to reduce noisy patterns. The main objective of this paper is not only reducing the number of closed sequential patterns, but also improving the performance of pattern mining as well. The experimental results on Reuters Corpus Volume 1 data collection and TREC filtering topics show that the proposed method is promising.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Finding and labelling semantic features patterns of documents in a large, spatial corpus is a challenging problem. Text documents have characteristics that make semantic labelling difficult; the rapidly increasing volume of online documents makes a bottleneck in finding meaningful textual patterns. Aiming to deal with these issues, we propose an unsupervised documnent labelling approach based on semantic content and feature patterns. A world ontology with extensive topic coverage is exploited to supply controlled, structured subjects for labelling. An algorithm is also introduced to reduce dimensionality based on the study of ontological structure. The proposed approach was promisingly evaluated by compared with typical machine learning methods including SVMs, Rocchio, and kNN.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The disparity that exists between the highest and lowest achievers together with deficit approaches to teaching, learning and assessment raise serious equity issues related to fairness, validity, culture and access, which were analysed in a recent Australian Research Council funded project. This chapter explores the potential that exists for teachers to work with Indigenous Teacher Assistants (ITAs) to secure cultural connectedness in teaching, learning and assessment of Indigenous students. The study was a design experiment, which was conducted in seven Catholic and Independent primary schools in northern Queensland and involved semi-structured focus group interviews with Year 4 and 6 Indigenous students, principals, teachers and Indigenous Teacher Assistants. Classroom observations and document analyses were also conducted. This corpus of data was analysed using a sociocultural theoretical lens. The use of a sociocultural analysis helped to identify cultural influences, Indigenous students’ funds of knowledge and values. The information from this analysis was made explicit to teachers to demonstrate how they could enhance their pedagogic and assessment practices by embracing and extending the cultural spaces for learning and teaching of Indigenous students. The way in which teachers construct their interactions for greater cultural connectedness and enhanced learning would appear to rely on relationship building with Indigenous staff, Indigenous students’ cultural knowledge, and improved understanding of assessment and related equity issues.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this thesis, I contribute to the study of how arrangements are made in social interaction. Using conversation analysis, I examine a corpus of 375 telephone calls between employees and clients of three Community Home Care (CHC) service agencies in metropolitan Adelaide, South Australia. My analysis of the CHC data corpus draws upon existing empirical findings within conversation analysis in order to generate novel findings about how people make arrangements with one another, and some of the attendant considerations that parties to such an activity can engage in: Prospective informings as remote proposals for a future arrangement – Focusing on how employees make arrangements with clients, I show how the employees in the CHC data corpus use ‘prospective informings’ to detail a future course of action that will involve the recipient of that informing. These informings routinely occasion a double-paired sequence, where informers pursue a response to their informing. This pursuit often occurs even after recipients have provided an initial response. This practice for making arrangements has been previously described by Houtkoop (1987) as ‘remote proposing.’ I develop Houtkoop’s analysis to show how an informing of a future arrangement can be recompleted, with response solicitation, as a proposal that is contingent upon a recipient’s acceptance. Participants’ understanding of references to non-present third parties – In the process of making arrangements, references are routinely made to non-present third parties. In the CHC data corpus, these third parties are usually care workers. Prior research (e.g., Sacks & Schegloff, 1979; Schegloff, 1996b) explains how the use of ‘recognitional references’ (such as the bare name ‘Kerry’), conveys to recipients that they should be able to locate the referent from amongst their acquaintances. Conversely, the use of ‘non-recognitional references’ (such as the description ‘a lady called Kerry’), conveys that recipients are unacquainted with the referent. I examine instances where the selection of a recognitional or non-recognitional reference form is followed by a recipient initiating repair on that reference. My analysis provides further evidence thatthe existing analytic account of these references corresponds to the way in which participants themselves make sense of them. My analysis also advances an understanding of how repair can be used, by recipients, to indicate the inappositeness of a prior turn. Post-possible-completion accounts – In a case study of a problematic interaction, I examine a misunderstanding that is not resolved within the repair space, the usual defence of intersubjectivity in interaction (cf. Schegloff, 1992b). Rather, I explore how the source of trouble is addressed, outside of the sequence of its production, with a ‘post-possible-completion account.’ This account specifies the basis of a misunderstanding and yet, unlike repair, does so without occasioning a revised response to a trouble-source turn. By considering various aspects of making arrangements in social interaction, I highlight some of the rich order that underpins the maintenance of human relationships across time. In the concluding section of this thesis I review this order, while also discussing practical implications of this analysis for CHC practice.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This project was a step forward in developing and evaluating a novel, mathematical model that can deduce the meaning of words based on their use in language. This model can be applied to a wide range of natural language applications, including the information seeking process most of us undertake on a daily basis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A user’s query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques ignore information about the dependencies that exist between words in natural language. However, more recent approaches have demonstrated that by explicitly modeling associations between terms significant improvements in retrieval effectiveness can be achieved over those that ignore these dependencies. State-of-the-art dependency-based approaches have been shown to primarily model syntagmatic associations. Syntagmatic associations infer a likelihood that two terms co-occur more often than by chance. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process will improve retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Our task is to consider the evolving perspectives around curriculum documented in the Theory Into Practice (TIP) corpus to date. The 50 years in question, 1962–2012, account for approximately half the history of mass institutionalized schooling. Over this time, the upper age of compulsory schooling has crept up, stretching the school curriculum's reach, purpose, and clientele. These years also span remarkable changes in the social fabric, challenging deep senses of the nature and shelf-life of knowledge, whose knowledge counts, what science can and cannot deliver, and the very purpose of education. The school curriculum is a key social site where these challenges have to be addressed in a very practical sense, through a design on the future implemented within the resources and politics of the present. The task's metaphor of ‘evolution’ may invoke a sense of gradual cumulative improvement, but equally connotes mutation, hybridization, extinction, survival of the fittest, and environmental pressures. Viewed in this way, curriculum theory and practice cannot be isolated and studied in laboratory conditions—there is nothing natural, neutral, or self-evident about what knowledge gets selected into the curriculum. Rather, the process of selection unfolds as a series of messy, politically contaminated, lived experiments; thus curriculum studies require field work in dynamic open systems. We subscribe to Raymond Williams' approach to social change, which he argues is not absolute and abrupt, one set of ideas neatly replacing the other. For Williams, newly emergent ideas have to compete against the dominant mindset and residual ideas “still active in the cultural process'” (Williams, 1977, p. 122). This means ongoing debates. For these reasons, we join Schubert (1992) in advocating “continuous reconceptualising of the flow of experience” (p. 238) by both researchers and practitioners.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Football, or soccer as it is more commonly referred to in Australia and the US, is arguably the world’s most popular sport. It generates a proportionate volume of related writing. Within this landscape, works of novel-length fiction are seemingly rare. This paper establishes and maps a substantial body of football fiction works, explores elements and qualities exhibited individually and collectively. In bringing together current, limited surveys of the field, it presents the first rigorous definition of football fiction and captures the first historiography of the corpus. Drawing on distant reading methods developed in conjunction with closer textual analyses, the historiography and subsequent taxonomy represent the first articulation of relationships across the body of work, identify growth areas and establish a number of movements and trends. In advancing the understanding of football fiction as a collective body, the paper lays foundations for further research and consideration of the works in generic terms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper details the participation of the Australian e- Health Research Centre (AEHRC) in the ShARe/CLEF 2013 eHealth Evaluation Lab { Task 3. This task aims to evaluate the use of information retrieval (IR) systems to aid consumers (e.g. patients and their relatives) in seeking health advice on the Web. Our submissions to the ShARe/CLEF challenge are based on language models generated from the web corpus provided by the organisers. Our baseline system is a standard Dirichlet smoothed language model. We enhance the baseline by identifying and correcting spelling mistakes in queries, as well as expanding acronyms using AEHRC's Medtex medical text analysis platform. We then consider the readability and the authoritativeness of web pages to further enhance the quality of the document ranking. Measures of readability are integrated in the language models used for retrieval via prior probabilities. Prior probabilities are also used to encode authoritativeness information derived from a list of top-100 consumer health websites. Empirical results show that correcting spelling mistakes and expanding acronyms found in queries signi cantly improves the e ectiveness of the language model baseline. Readability priors seem to increase retrieval e ectiveness for graded relevance at early ranks (nDCG@5, but not precision), but no improvements are found at later ranks and when considering binary relevance. The authoritativeness prior does not appear to provide retrieval gains over the baseline: this is likely to be because of the small overlap between websites in the corpus and those in the top-100 consumer-health websites we acquired.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose a cluster ensemble method to map the corpus documents into the semantic space embedded in Wikipedia and group them using multiple types of feature space. A heterogeneous cluster ensemble is constructed with multiple types of relations i.e. document-term, document-concept and document-category. A final clustering solution is obtained by exploiting associations between document pairs and hubness of the documents. Empirical analysis with various real data sets reveals that the proposed meth-od outperforms state-of-the-art text clustering approaches.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Over the last decade, the majority of existing search techniques is either keyword- based or category-based, resulting in unsatisfactory effectiveness. Meanwhile, studies have illustrated that more than 80% of users preferred personalized search results. As a result, many studies paid a great deal of efforts (referred to as col- laborative filtering) investigating on personalized notions for enhancing retrieval performance. One of the fundamental yet most challenging steps is to capture precise user information needs. Most Web users are inexperienced or lack the capability to express their needs properly, whereas the existent retrieval systems are highly sensitive to vocabulary. Researchers have increasingly proposed the utilization of ontology-based tech- niques to improve current mining approaches. The related techniques are not only able to refine search intentions among specific generic domains, but also to access new knowledge by tracking semantic relations. In recent years, some researchers have attempted to build ontological user profiles according to discovered user background knowledge. The knowledge is considered to be both global and lo- cal analyses, which aim to produce tailored ontologies by a group of concepts. However, a key problem here that has not been addressed is: how to accurately match diverse local information to universal global knowledge. This research conducts a theoretical study on the use of personalized ontolo- gies to enhance text mining performance. The objective is to understand user information needs by a \bag-of-concepts" rather than \words". The concepts are gathered from a general world knowledge base named the Library of Congress Subject Headings. To return desirable search results, a novel ontology-based mining approach is introduced to discover accurate search intentions and learn personalized ontologies as user profiles. The approach can not only pinpoint users' individual intentions in a rough hierarchical structure, but can also in- terpret their needs by a set of acknowledged concepts. Along with global and local analyses, another solid concept matching approach is carried out to address about the mismatch between local information and world knowledge. Relevance features produced by the Relevance Feature Discovery model, are determined as representatives of local information. These features have been proven as the best alternative for user queries to avoid ambiguity and consistently outperform the features extracted by other filtering models. The two attempt-to-proposed ap- proaches are both evaluated by a scientific evaluation with the standard Reuters Corpus Volume 1 testing set. A comprehensive comparison is made with a num- ber of the state-of-the art baseline models, including TF-IDF, Rocchio, Okapi BM25, the deploying Pattern Taxonomy Model, and an ontology-based model. The gathered results indicate that the top precision can be improved remarkably with the proposed ontology mining approach, where the matching approach is successful and achieves significant improvements in most information filtering measurements. This research contributes to the fields of ontological filtering, user profiling, and knowledge representation. The related outputs are critical when systems are expected to return proper mining results and provide personalized services. The scientific findings have the potential to facilitate the design of advanced preference mining models, where impact on people's daily lives.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Objective To determine stage-specific and average disability weights (DWs) of malignant neoplasm and provide support and evidence for study on burden of cancer and policy development in Shandong province. Methods Health status of each cancer patient identified during the cancer prevalence survey in Shandong, 2007 was investigated. In line with the GBD methodology in estimating DWs, the disability extent of every case was classified and evaluated according to the Six-class Disability Classification version and then the stage-specific weights and average DWs with their 95 % confidence intervals were calculated, using SAS software. Results A total of 11 757 cancer cases were investigated and evaluated. DWs of specific stage of therapy, remission, metastasis and terminal of all cancers were 0.310, 0.218, 0.450 and 0.653 respectively. The average DW of all cancers was 0.317(95 % CI:0.312-0.321). Weights of different stage and different cancer varied significantly, while no significant differences were found between males and females. DWs were found higher (>0.4) for liver cancer, bone cancer, lymphoma and pancreas cancer. Lower DWs (<0.3) were found for breast cancer, cervix uteri, corpus uteri, ovarian cancer, larynx cancer, mouth and oropharynx cancer. Conclusion Stage-specific and average DWs for various cancers were estimated based on a large sample size survey. The average DWs of 0.317 for all cancers indicated that 1/3 healthy year lost for each survived life year of them. The difference of DWs between different cancer and stage provide scientific evidence for cancer prevention strategy development. Abstract in Chinese 目的 测算各种恶性肿瘤的分病程残疾权重和平均残疾权重,为山东省恶性肿瘤疾病负担研究及肿瘤防治对策制定提供参考依据. 方法 在山东省2007年恶性肿瘤现患调查中对所有恶性肿瘤患者的健康状况进行调查,参考全球疾病负担研究的方法 ,利用六级社会功能分级标准对患者残疾状况进行分级和赋值,分别计算20种恶性肿瘤的分病程残疾权重和平均残疾权重及其95%CI. 结果 共调查恶性肿瘤患者11757例,所有恶性肿瘤治疗期、恢复期、转移期和晚期的残疾权重分别为0.310、0.218、0.450和0.653,平均残疾权重为0.317(95%CI:0.312~0.321).不同恶性肿瘤和不同病程阶段的残疾权重差别显著,性别间差异无统计学意义.肝癌、骨癌、淋巴瘤和胰腺癌平均残疾权重较高(>0.4),乳腺癌、子宫体癌、子宫颈癌、卵巢癌、喉癌和口咽部癌症相对较低(<0.3). 结论 山东省恶性肿瘤平均残疾权重为0.317,即恶性肿瘤患者每存活1年平均损失近1/3个健康生命年;不同恶性肿瘤和不同病程阶段的残疾权重差别为肿瘤防治对策的制定具有重要意义.