889 resultados para vignette in-text
Resumo:
This article explores two matrix methods to induce the ``shades of meaning" (SoM) of a word. A matrix representation of a word is computed from a corpus of traces based on the given word. Non-negative Matrix Factorisation (NMF) and Singular Value Decomposition (SVD) compute a set of vectors corresponding to a potential shade of meaning. The two methods were evaluated based on loss of conditional entropy with respect to two sets of manually tagged data. One set reflects concepts generally appearing in text, and the second set comprises words used for investigations into word sense disambiguation. Results show that for NMF consistently outperforms SVD for inducing both SoM of general concepts as well as word senses. The problem of inducing the shades of meaning of a word is more subtle than that of word sense induction and hence relevant to thematic analysis of opinion where nuances of opinion can arise.
Resumo:
This silent swarm of stylized crickets is downloading data from Internet and catalogue searches being undertaken by the public at the State Library Queensland. These searches are being displayed on the screen on their backs. Each cricket downloads the searches and communicates this information with other crickets. Commonly found searches spread like a meme through the swarm. In this work memes replace the crickets’ song, washing like a wave through the swarm and changing on the whim of Internet users. When one cricket begins calling others, the swarm may respond to produce emergent patterns of text. When traffic is slow or of now interest to the crickets, they display onomatopoeia. The work is inspired by R. Murray Schafer’s research into acoustic ecologies. In the 1960’s Schafer proposed that many species develop calls that fit niches within their acoustic environment. An increasing background of white noise dominates the acoustic environment of urban human habitats, leaving few acoustic niches for other species to communicate. The popularity of headphones and portable music may be seen as an evolution of our acoustic ecology driven by our desire to hear expressive, meaningful sound, above the din of our cities. Similarly, the crickets in this work are hypothetical creatures that have evolved to survive in a noisy human environment. This speculative species replaces auditory calls with onomatopoeia and information memes, communicating with the swarm via radio frequency chirps instead of sound. Whilst these crickets cannot make sound, each individual has been programmed respond to sound generated by the audience, by making onomatopoeia calls in text. Try talking to a cricket, blowing on its tail, or making other sounds to trigger a call.
Resumo:
Script for non-verbal performance. ----- ----- ----- Research Component: Silent Treatment: Creating Non-verbal Performance Works for Children ----- ----- ----- The research field of theatre for young people draws on theories of child development and popular culture. SHOW explored personal and social development, friendship and creative play through the lens of the experience of girls aged 8-12. This project consolidated and refined innovative approaches to creating non-verbal theatre performance, and addressed challenges inherent in the creation of a performance by adults for young audiences. A significant finding of the project was the unanticipated convergence of creative practice and research into child behaviour and development: the congruence of content (Female bullying) and theatrical form (non-verbal performance: “Within the hidden culture of aggression, girls fight with body language and relationships instead of fists and knives. In this world, friendship is a weapon, and the sting of a shout pales in comparison to a day of someone’s silence. There is no gesture more devastating than the back turning away Simmons, Rachel (2002:3) Odd Girl Out: The Hidden Culture Of Aggression In Girls Schwartz Books The creative development and drafting process focussed on negotiating the conceptual design and practical constraints of incorporating diegetic music and video sources into the narrative. The authorial (and production) challenges of creating a script that could facilitate the re-mount a non-verbal work for a company specialising in text-based theatre . ----- ----- ----- Show was commissioned by the Queensland Theatre Company in 2003, toured into Queensland Schools by the Queensland Arts Council and in 2004 was performed at the Sydney Opera House.
Resumo:
It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures.
Resumo:
Term-based approaches can extract many features in text documents, but most include noise. Many popular text-mining strategies have been adapted to reduce noisy information from extracted features; however, text-mining techniques suffer from low frequency. The key issue is how to discover relevance features in text documents to fulfil user information needs. To address this issue, we propose a new method to extract specific features from user relevance feedback. The proposed approach includes two stages. The first stage extracts topics (or patterns) from text documents to focus on interesting topics. In the second stage, topics are deployed to lower level terms to address the low-frequency problem and find specific terms. The specific terms are determined based on their appearances in relevance feedback and their distribution in topics or high-level patterns. We test our proposed method with extensive experiments in the Reuters Corpus Volume 1 dataset and TREC topics. Results show that our proposed approach significantly outperforms the state-of-the-art models.
Resumo:
In this paper we introduce a formalization of Logical Imaging applied to IR in terms of Quantum Theory through the use of an analogy between states of a quantum system and terms in text documents. Our formalization relies upon the Schrodinger Picture, creating an analogy between the dynamics of a physical system and the kinematics of probabilities generated by Logical Imaging. By using Quantum Theory, it is possible to model more precisely contextual information in a seamless and principled fashion within the Logical Imaging process. While further work is needed to empirically validate this, the foundations for doing so are provided.
Resumo:
With the overwhelming increase in the amount of data on the web and data bases, many text mining techniques have been proposed for mining useful patterns in text documents. Extracting closed sequential patterns using the Pattern Taxonomy Model (PTM) is one of the pruning methods to remove noisy, inconsistent, and redundant patterns. However, PTM model treats each extracted pattern as whole without considering included terms, which could affect the quality of extracted patterns. This paper propose an innovative and effective method that extends the random set to accurately weigh patterns based on their distribution in the documents and their terms distribution in patterns. Then, the proposed approach will find the specific closed sequential patterns (SCSP) based on the new calculated weight. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms other state-of-the-art methods in different popular measures.
Resumo:
Objective To synthesise recent research on the use of machine learning approaches to mining textual injury surveillance data. Design Systematic review. Data sources The electronic databases which were searched included PubMed, Cinahl, Medline, Google Scholar, and Proquest. The bibliography of all relevant articles was examined and associated articles were identified using a snowballing technique. Selection criteria For inclusion, articles were required to meet the following criteria: (a) used a health-related database, (b) focused on injury-related cases, AND used machine learning approaches to analyse textual data. Methods The papers identified through the search were screened resulting in 16 papers selected for review. Articles were reviewed to describe the databases and methodology used, the strength and limitations of different techniques, and quality assurance approaches used. Due to heterogeneity between studies meta-analysis was not performed. Results Occupational injuries were the focus of half of the machine learning studies and the most common methods described were Bayesian probability or Bayesian network based methods to either predict injury categories or extract common injury scenarios. Models were evaluated through either comparison with gold standard data or content expert evaluation or statistical measures of quality. Machine learning was found to provide high precision and accuracy when predicting a small number of categories, was valuable for visualisation of injury patterns and prediction of future outcomes. However, difficulties related to generalizability, source data quality, complexity of models and integration of content and technical knowledge were discussed. Conclusions The use of narrative text for injury surveillance has grown in popularity, complexity and quality over recent years. With advances in data mining techniques, increased capacity for analysis of large databases, and involvement of computer scientists in the injury prevention field, along with more comprehensive use and description of quality assurance methods in text mining approaches, it is likely that we will see a continued growth and advancement in knowledge of text mining in the injury field.
Resumo:
The literacy demands of tables and graphs are different from those of prose texts such as narrative. This paper draws from part of a qualitative case study which sought to investigate strategies that scaffold and enhance the teaching and learning of varied representations in text. As indicated in the paper, the method focused on the teaching and learning of tables and graphs with use of Freebody and Luke's (1990) four resources model from literacy education.
Resumo:
The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: Firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context are the only means we have to separate word senses. If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we ?nd translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets. In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to define, collect and represent contexts. We discuss how to evaluate the trained context classi?ers and discovered word sense classifications, and ?nally we present the word sense discovery and disambiguation methods of the publications. This work supports Harris' hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes. Keywords: Word senses, Context, Evaluation, Word sense disambiguation, Word sense discovery.
Resumo:
In this paper, I look into a grammatical phenomenon found among speakers of the Cambridgeshire dialect of English. According to my hypothesis, the phenomenon is a new entry into the past BE verb paradigm in the English language. In my paper, I claim that the structure I have found complements the existing two verb forms, was and were, with a third verb form that I have labelled ‘intermediate past BE’. The paper is divided into two parts. In the first section, I introduce the theoretical ground for the study of variation, which is founded on empiricist principles. In variationist linguistics, the main claim is that heterogeneous language use is structured and ordered. In the last 50 years of history in modern linguistics, this claim is controversial. In the 1960s, the generativist movement spearheaded by Noam Chomsky diverted attention away from grammatical theories that are based on empirical observations. The generativists steered away from language diversity, variation and change in favour of generalisations, abstractions and universalist claims. The theoretical part of my paper goes through the main points of the variationist agenda and concludes that abandoning the concept of language variation in linguistics is harmful for both theory and methodology. In the method part of the paper, I present the Helsinki Archive of Regional English Speech (HARES) corpus. It is an audio archive that contains interviews conducted in England in the 1970s and 1980s. The interviews were done in accordance to methods used generally in traditional dialectology. The informants are mostly elderly male people who have lived in the same region throughout their lives and who have left school at an early age. The interviews are actually conversations: the interviewer allowed the informant to pick the topic of conversation to induce a maximally relaxed and comfortable atmosphere and thus allow the most natural dialect variant to emerge in the informant’s speech. In the paper, the corpus chapter introduces some of the transcription and annotation problems associated with spoken language corpora (especially those containing dialectal speech). Questions surrounding the concept of variation are present in this part of the paper too, as especially transcription work is troubled by the fundamental problem of having to describe the fluctuations of everyday speech in text. In the empirical section of the paper, I use HARES to analyse the speech of four informants, with special focus on the emergence of the intermediate past BE variant. My observations and the subsequent analysis permit me to claim that my hypothesis seems to hold. The intermediate variant occupies almost all contexts where one would expect was or were in the informants’ speech. This means that the new variant is integrated into the speakers’ grammars and exemplifies the kind of variation that is at the heart of this paper.
Resumo:
Texts in the work of a city department: A study of the language and context of benefit decisions This dissertation examines documents granting or denying the access to municipal services. The data consist of decisions on transport services made by the Social Services Department of the City of Helsinki. The circumstances surrounding official texts and their language and production are studied through textual analysis and interviews. The dissertation describes the textual features of the above decisions, and seeks to explain such features. Also explored are the topics and methods of genre studies, especially the relationship between text and context. Although the approach is linguistic, the dissertation also touches on research in social work and administrative decision making, and contributes to more general discussion on the language and duties of public administration. My key premise is that a text is more than a mere psycholinguistic phenomenon. Rather, a text is also a physical object and the result of certain production processes. This dissertation thus not only describes genre-specific features, but also sheds light on the work that generates the texts examined. Textual analysis and analyses of discursive practices are linked through an analysis of intertextuality: written decisions are compared with other application documents, such as expert statements and the applications themselves. The study shows that decisions are texts governed by strict rules and written with modest resources. Textwork is organised as hierarchical mass production. The officials who write decisions rely on standard phrases extracted from a computer system. This allows them to produce texts of uniform quality which have been approved by the department s legal experts. Using a computer system in text production does not, however, serve all the needs of the writers. This leads to many problems in the texts themselves. Intertextual analysis indicates that medical argumentation weighs most heavily in an application process, although a social appraisal should be carried out when deciding on applications for transport services. The texts reflect a hierarchy in which a physician ranks above the applicant, and the department s own expert physician ranks above the applicant s physician. My analysis also highlights good, but less obvious practices. The social workers and secretaries who write decisions must balance conflicting demands. They use delicate linguistic means to adjust the standard phrases to suit individual cases, and employ subtle strategies of politeness. The dissertation suggests that the customer contact staff who write official texts should be allowed to make better use of their professional competence. A more general concern is that legislation and new management strategies require more and more documentation. Yet, textwork is only rarely taken into account in the allocation of resources. Keywords: (Critical) text analysis, genre analysis, administration, social work, administrative language, texts, genres, context, intertextuality, discursive practices
Resumo:
Remediation of Reading Difficulties in Grade 1. Three Pedagogical Interventions Keywords: initial teaching, learning to read, reading difficulties, intervention, dyslexia, remediation of dyslexia, home reading, computerized training In this study three different reading interventions were tested for first-graders at risk of reading difficulties at school commencement. The intervention groups were compared together and with a control group receiving special education provided by the school. First intervention was a new approach called syllable rhythmics in which syllabic rhythm, phonological knowledge and letter-phoneme correspondence are emphasized. Syllable rhythmics is based on multi-sensory training elements aimed at finding the most functional modality for every child. The second intervention was computerized training of letter-sound correspondence with the Ekapeli learning game. The third intervention was home-based shared book reading, where every family was given a story book, and dialogic reading style reading and writing exercises were prepared for each chapter of the book. The participants were 80 first-graders in 19 classes in nine schools. The children were matched in four groups according to pre-test results: three intervention and one control. The interventions took ten weeks starting from September in grade 1. The first post-test including several measures of reading abilities was administered in December. The first delayed post-test was administered in March, the second in September in grade 2, and the third, “ALLU” test (reading test for primary school) was administered in March in grade 2. The intervention and control groups differed only slightly from each other in grade 1. However, girls progressed significantly more than boys in both word reading and reading comprehension in December and this difference remained in March. The children who had been cited as inattentive by their teachers also lagged behind the others in the post-tests in December and March. When participants were divided into two groups according to their initial letter knowledge at school entry, the weaker group (maximum 17 correctly named letters in pre-test) progressed more slowly in both word reading and reading comprehension in grade 1. Intervention group and gender had no interaction effect in grade 1. Instead, intervention group and attentiveness had an interaction effect on most test measures the inattentive students in the syllable rhythmic group doing worst and attentive students in the control group doing best in grade 1. The smallest difference between results of attentive and inattentive students was in the Ekapeli group. In grade 2 still only minor differences were found between the intervention groups and control group. The only significant difference was in non-word reading, with the syllable rhythmics group outperforming the other groups in the fall. The difference between girls’ and boys’ performances in both technical reading and text comprehension disappeared in grade 2. The difference between the inattentive and attentive students cold no longer be found in technical reading, and the difference became smaller in text comprehension as well. The difference between two groups divided according to their initial letter knowledge disappeared in technical reading but remained significant in text comprehension measures in the ALLU test in the spring of grade 2. In all, the children in the study did better in the ALLU test than expected according to ALLU test norms. Being the weakest readers in their classes in the pre-test, 52.3 % reached the normal reading ability level. In the norm group 72.3 % of all students attained normal reading ability. The results of this study indicate that different types of remediation programs can be effective, and that special education has been apparently useful. The results suggest careful consideration of first-graders’ initial reading abilities (especially letter knowledge) and possible failure of attention; remediation should be individually targeted while flexibly using different methods.
Resumo:
MLDB (macromolecule ligand database) is a knowledge base containing ligands co-crystallized with the three-dimensional structures available in the Protein Data Bank. The proposed knowledge base serves as an open resource for the analysis and visualization of all ligands and their interactions with macromolecular structures. MLDB can be used to search ligands, and their interactions can be visualized both in text and graphical formats. MLDB will be updated at regular intervals (weekly) with automated Perl scripts. The knowledge base is intended to serve the scientific community working in the areas of molecular and structural biology. It is available free to users around the clock and can be accessed at http://dicsoft2.physics.iisc.ernet.in/mldb/.
Resumo:
This research constructed a readability measurement for French speakers who view English as a second language. It identified the true cognates, which are the similar words from these two languages, as an indicator of the difficulty of an English text for French people. A multilingual lexical resource is used to detect true cognates in text, and Statistical Language Modelling to predict the predict the readability level. The proposed enhanced statistical language model is making a step in the right direction by improving the accuracy of readability predictions for French speakers by up to 10% compared to state of the art approaches. The outcome of this study could accelerate the learning process for French speakers who are studying English. More importantly, this study also benefits the readability estimation research community, presenting an approach and evaluation at sentence level as well as innovating with the use of cognates as a new text feature.