876 resultados para HÁBEAS CORPUS
Resumo:
A user’s query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques ignore information about the dependencies that exist between words in natural language. However, more recent approaches have demonstrated that by explicitly modeling associations between terms significant improvements in retrieval effectiveness can be achieved over those that ignore these dependencies. State-of-the-art dependency-based approaches have been shown to primarily model syntagmatic associations. Syntagmatic associations infer a likelihood that two terms co-occur more often than by chance. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process will improve retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
Resumo:
Our task is to consider the evolving perspectives around curriculum documented in the Theory Into Practice (TIP) corpus to date. The 50 years in question, 1962–2012, account for approximately half the history of mass institutionalized schooling. Over this time, the upper age of compulsory schooling has crept up, stretching the school curriculum's reach, purpose, and clientele. These years also span remarkable changes in the social fabric, challenging deep senses of the nature and shelf-life of knowledge, whose knowledge counts, what science can and cannot deliver, and the very purpose of education. The school curriculum is a key social site where these challenges have to be addressed in a very practical sense, through a design on the future implemented within the resources and politics of the present. The task's metaphor of ‘evolution’ may invoke a sense of gradual cumulative improvement, but equally connotes mutation, hybridization, extinction, survival of the fittest, and environmental pressures. Viewed in this way, curriculum theory and practice cannot be isolated and studied in laboratory conditions—there is nothing natural, neutral, or self-evident about what knowledge gets selected into the curriculum. Rather, the process of selection unfolds as a series of messy, politically contaminated, lived experiments; thus curriculum studies require field work in dynamic open systems. We subscribe to Raymond Williams' approach to social change, which he argues is not absolute and abrupt, one set of ideas neatly replacing the other. For Williams, newly emergent ideas have to compete against the dominant mindset and residual ideas “still active in the cultural process'” (Williams, 1977, p. 122). This means ongoing debates. For these reasons, we join Schubert (1992) in advocating “continuous reconceptualising of the flow of experience” (p. 238) by both researchers and practitioners.
Resumo:
Football, or soccer as it is more commonly referred to in Australia and the US, is arguably the world’s most popular sport. It generates a proportionate volume of related writing. Within this landscape, works of novel-length fiction are seemingly rare. This paper establishes and maps a substantial body of football fiction works, explores elements and qualities exhibited individually and collectively. In bringing together current, limited surveys of the field, it presents the first rigorous definition of football fiction and captures the first historiography of the corpus. Drawing on distant reading methods developed in conjunction with closer textual analyses, the historiography and subsequent taxonomy represent the first articulation of relationships across the body of work, identify growth areas and establish a number of movements and trends. In advancing the understanding of football fiction as a collective body, the paper lays foundations for further research and consideration of the works in generic terms.
Resumo:
This paper details the participation of the Australian e- Health Research Centre (AEHRC) in the ShARe/CLEF 2013 eHealth Evaluation Lab { Task 3. This task aims to evaluate the use of information retrieval (IR) systems to aid consumers (e.g. patients and their relatives) in seeking health advice on the Web. Our submissions to the ShARe/CLEF challenge are based on language models generated from the web corpus provided by the organisers. Our baseline system is a standard Dirichlet smoothed language model. We enhance the baseline by identifying and correcting spelling mistakes in queries, as well as expanding acronyms using AEHRC's Medtex medical text analysis platform. We then consider the readability and the authoritativeness of web pages to further enhance the quality of the document ranking. Measures of readability are integrated in the language models used for retrieval via prior probabilities. Prior probabilities are also used to encode authoritativeness information derived from a list of top-100 consumer health websites. Empirical results show that correcting spelling mistakes and expanding acronyms found in queries signi cantly improves the e ectiveness of the language model baseline. Readability priors seem to increase retrieval e ectiveness for graded relevance at early ranks (nDCG@5, but not precision), but no improvements are found at later ranks and when considering binary relevance. The authoritativeness prior does not appear to provide retrieval gains over the baseline: this is likely to be because of the small overlap between websites in the corpus and those in the top-100 consumer-health websites we acquired.
Resumo:
We propose a cluster ensemble method to map the corpus documents into the semantic space embedded in Wikipedia and group them using multiple types of feature space. A heterogeneous cluster ensemble is constructed with multiple types of relations i.e. document-term, document-concept and document-category. A final clustering solution is obtained by exploiting associations between document pairs and hubness of the documents. Empirical analysis with various real data sets reveals that the proposed meth-od outperforms state-of-the-art text clustering approaches.
Resumo:
Over the last decade, the majority of existing search techniques is either keyword- based or category-based, resulting in unsatisfactory effectiveness. Meanwhile, studies have illustrated that more than 80% of users preferred personalized search results. As a result, many studies paid a great deal of efforts (referred to as col- laborative filtering) investigating on personalized notions for enhancing retrieval performance. One of the fundamental yet most challenging steps is to capture precise user information needs. Most Web users are inexperienced or lack the capability to express their needs properly, whereas the existent retrieval systems are highly sensitive to vocabulary. Researchers have increasingly proposed the utilization of ontology-based tech- niques to improve current mining approaches. The related techniques are not only able to refine search intentions among specific generic domains, but also to access new knowledge by tracking semantic relations. In recent years, some researchers have attempted to build ontological user profiles according to discovered user background knowledge. The knowledge is considered to be both global and lo- cal analyses, which aim to produce tailored ontologies by a group of concepts. However, a key problem here that has not been addressed is: how to accurately match diverse local information to universal global knowledge. This research conducts a theoretical study on the use of personalized ontolo- gies to enhance text mining performance. The objective is to understand user information needs by a \bag-of-concepts" rather than \words". The concepts are gathered from a general world knowledge base named the Library of Congress Subject Headings. To return desirable search results, a novel ontology-based mining approach is introduced to discover accurate search intentions and learn personalized ontologies as user profiles. The approach can not only pinpoint users' individual intentions in a rough hierarchical structure, but can also in- terpret their needs by a set of acknowledged concepts. Along with global and local analyses, another solid concept matching approach is carried out to address about the mismatch between local information and world knowledge. Relevance features produced by the Relevance Feature Discovery model, are determined as representatives of local information. These features have been proven as the best alternative for user queries to avoid ambiguity and consistently outperform the features extracted by other filtering models. The two attempt-to-proposed ap- proaches are both evaluated by a scientific evaluation with the standard Reuters Corpus Volume 1 testing set. A comprehensive comparison is made with a num- ber of the state-of-the art baseline models, including TF-IDF, Rocchio, Okapi BM25, the deploying Pattern Taxonomy Model, and an ontology-based model. The gathered results indicate that the top precision can be improved remarkably with the proposed ontology mining approach, where the matching approach is successful and achieves significant improvements in most information filtering measurements. This research contributes to the fields of ontological filtering, user profiling, and knowledge representation. The related outputs are critical when systems are expected to return proper mining results and provide personalized services. The scientific findings have the potential to facilitate the design of advanced preference mining models, where impact on people's daily lives.
Resumo:
Objective To determine stage-specific and average disability weights (DWs) of malignant neoplasm and provide support and evidence for study on burden of cancer and policy development in Shandong province. Methods Health status of each cancer patient identified during the cancer prevalence survey in Shandong, 2007 was investigated. In line with the GBD methodology in estimating DWs, the disability extent of every case was classified and evaluated according to the Six-class Disability Classification version and then the stage-specific weights and average DWs with their 95 % confidence intervals were calculated, using SAS software. Results A total of 11 757 cancer cases were investigated and evaluated. DWs of specific stage of therapy, remission, metastasis and terminal of all cancers were 0.310, 0.218, 0.450 and 0.653 respectively. The average DW of all cancers was 0.317(95 % CI:0.312-0.321). Weights of different stage and different cancer varied significantly, while no significant differences were found between males and females. DWs were found higher (>0.4) for liver cancer, bone cancer, lymphoma and pancreas cancer. Lower DWs (<0.3) were found for breast cancer, cervix uteri, corpus uteri, ovarian cancer, larynx cancer, mouth and oropharynx cancer. Conclusion Stage-specific and average DWs for various cancers were estimated based on a large sample size survey. The average DWs of 0.317 for all cancers indicated that 1/3 healthy year lost for each survived life year of them. The difference of DWs between different cancer and stage provide scientific evidence for cancer prevention strategy development. Abstract in Chinese 目的 测算各种恶性肿瘤的分病程残疾权重和平均残疾权重,为山东省恶性肿瘤疾病负担研究及肿瘤防治对策制定提供参考依据. 方法 在山东省2007年恶性肿瘤现患调查中对所有恶性肿瘤患者的健康状况进行调查,参考全球疾病负担研究的方法 ,利用六级社会功能分级标准对患者残疾状况进行分级和赋值,分别计算20种恶性肿瘤的分病程残疾权重和平均残疾权重及其95%CI. 结果 共调查恶性肿瘤患者11757例,所有恶性肿瘤治疗期、恢复期、转移期和晚期的残疾权重分别为0.310、0.218、0.450和0.653,平均残疾权重为0.317(95%CI:0.312~0.321).不同恶性肿瘤和不同病程阶段的残疾权重差别显著,性别间差异无统计学意义.肝癌、骨癌、淋巴瘤和胰腺癌平均残疾权重较高(>0.4),乳腺癌、子宫体癌、子宫颈癌、卵巢癌、喉癌和口咽部癌症相对较低(<0.3). 结论 山东省恶性肿瘤平均残疾权重为0.317,即恶性肿瘤患者每存活1年平均损失近1/3个健康生命年;不同恶性肿瘤和不同病程阶段的残疾权重差别为肿瘤防治对策的制定具有重要意义.
Resumo:
It’s hard not to be somewhat cynical about the self-congratulatory ‘diversity’ at the centre of the growing calendar of art bi/tri-ennials. The –ennial has proven expedient to the global tourism circuit, keeping regional economies and a relatively moderate pool of transnational artists afloat and the Asia Pacific Triennial is no exception. The mediation of representation that is imperative to the ‘best of’ formats of these transnational art shows hinges on a categorical backwardness that can feel more than a little like a Miss World competition than a progressive art show because the little tag in parenthesis after each artist’s name seems just as politically precarious now as it did forty years ago. Despite a weighty corpus of practical and critical work to the contrary, identity politics are so intrinsic to art capitalization, for both artists and institutions, that extricating ourselves from the particular and strategic politics of identification is seemingly impossible. Not that everyone wants to of course.
Resumo:
Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, which has been widely utilized in the fields of machine learning and information retrieval, etc. But its effectiveness in information filtering is rarely known. Patterns are always thought to be more representative than single terms for representing documents. In this paper, a novel information filtering model, Pattern-based Topic Model(PBTM) , is proposed to represent the text documents not only using the topic distributions at general level but also using semantic pattern representations at detailed specific level, both of which contribute to the accurate document representation and document relevance ranking. Extensive experiments are conducted to evaluate the effectiveness of PBTM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model achieves outstanding performance.
Resumo:
Women’s Experimental Cinema provides lively introductions to the work of fifteen avant-garde women filmmakers, some of whom worked as early as the 1950s and many of whom are still working today. In each essay in this collection, a leading film scholar considers a single filmmaker, supplying biographical information, analyzing various influences on her work, examining the development of her corpus, and interpreting a significant number of individual films. The essays rescue the work of critically neglected but influential women filmmakers for teaching, further study, and, hopefully, restoration and preservation. Just as importantly, they enrich the understanding of feminism in cinema and expand the terrain of film history, particularly the history of the American avant-garde.
Resumo:
Guaranteeing the quality of extracted features that describe relevant knowledge to users or topics is a challenge because of the large number of extracted features. Most popular existing term-based feature selection methods suffer from noisy feature extraction, which is irrelevant to the user needs (noisy). One popular method is to extract phrases or n-grams to describe the relevant knowledge. However, extracted n-grams and phrases usually contain a lot of noise. This paper proposes a method for reducing the noise in n-grams. The method first extracts more specific features (terms) to remove noisy features. The method then uses an extended random set to accurately weight n-grams based on their distribution in the documents and their terms distribution in n-grams. The proposed approach not only reduces the number of extracted n-grams but also improves the performance. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms the state-of-art methods underpinned by Okapi BM25, tf*idf and Rocchio.
Resumo:
Topic modelling has been widely used in the fields of information retrieval, text mining, machine learning, etc. In this paper, we propose a novel model, Pattern Enhanced Topic Model (PETM), which makes improvements to topic modelling by semantically representing topics with discriminative patterns, and also makes innovative contributions to information filtering by utilising the proposed PETM to determine document relevance based on topics distribution and maximum matched patterns proposed in this paper. Extensive experiments are conducted to evaluate the effectiveness of PETM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model significantly outperforms both state-of-the-art term-based models and pattern-based models.
Resumo:
Chinese modal particles feature prominently in Chinese people’s daily use of the language, but their pragmatic and semantic functions are elusive as commonly recognised by Chinese linguists and teachers of Chinese as a foreign language. This book originates from an extensive and intensive empirical study of the Chinese modal particle a (啊), one of the most frequently used modal particles in Mandarin Chinese. In order to capture all the uses and the underlying meanings of the particle, the author transcribed the first 20 episodes, about 20 hours in length, of the popular Chinese TV drama series Kewang ‘Expectations’, which yielded a corpus data of more than 142’000 Chinese characters with a total of 1829 instances of the particle all used in meaningful communicative situations. Within its context of use, every single occurrence of the particle was analysed in terms of its pragmatic and semantic contributions to the hosting utterance. Upon this basis the core meanings were identified which were seen as constituting the modal nature of the particle.
Resumo:
Many mature term-based or pattern-based approaches have been used in the field of information filtering to generate users’ information needs from a collection of documents. A fundamental assumption for these approaches is that the documents in the collection are all about one topic. However, in reality users’ interests can be diverse and the documents in the collection often involve multiple topics. Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, and this has been widely utilized in the fields of machine learning and information retrieval, etc. But its effectiveness in information filtering has not been so well explored. Patterns are always thought to be more discriminative than single terms for describing documents. However, the enormous amount of discovered patterns hinder them from being effectively and efficiently used in real applications, therefore, selection of the most discriminative and representative patterns from the huge amount of discovered patterns becomes crucial. To deal with the above mentioned limitations and problems, in this paper, a novel information filtering model, Maximum matched Pattern-based Topic Model (MPBTM), is proposed. The main distinctive features of the proposed model include: (1) user information needs are generated in terms of multiple topics; (2) each topic is represented by patterns; (3) patterns are generated from topic models and are organized in terms of their statistical and taxonomic features, and; (4) the most discriminative and representative patterns, called Maximum Matched Patterns, are proposed to estimate the document relevance to the user’s information needs in order to filter out irrelevant documents. Extensive experiments are conducted to evaluate the effectiveness of the proposed model by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model significantly outperforms both state-of-the-art term-based models and pattern-based models
Resumo:
Term-based approaches can extract many features in text documents, but most include noise. Many popular text-mining strategies have been adapted to reduce noisy information from extracted features; however, text-mining techniques suffer from low frequency. The key issue is how to discover relevance features in text documents to fulfil user information needs. To address this issue, we propose a new method to extract specific features from user relevance feedback. The proposed approach includes two stages. The first stage extracts topics (or patterns) from text documents to focus on interesting topics. In the second stage, topics are deployed to lower level terms to address the low-frequency problem and find specific terms. The specific terms are determined based on their appearances in relevance feedback and their distribution in topics or high-level patterns. We test our proposed method with extensive experiments in the Reuters Corpus Volume 1 dataset and TREC topics. Results show that our proposed approach significantly outperforms the state-of-the-art models.