56 resultados para Text Analysis
Resumo:
Information behaviour (IB) is an area within Library and Information Science that studies the totality of human behaviour in relation to information, both active and passive, along with the explicit and the tacit mental states related to information. This study reports on a recently completed dissertation research that integrates the different models of information behaviours using a diary study where 34 participants maintained a daily journal for two weeks through a web log or paper diary. This resulted in thick descriptions of IB, which were manually analysed using the Grounded Theory method of inquiry, and then cross-referenced through both text-analysis and statistical analysis programs. Among the many key findings of this study, one is the focus this paper: how participants express their feelings of the information seeking process and their mental and affective states related specifically to the sense-making component which co-occurs with almost every other aspect of information behaviour. The paper title – Down the Rabbit Hole and Through the Looking Glass – refers to an observation that some of the participants made in their journals when they searched for, or avoided information, and wrote that they felt like they have fallen into a rabbit hole where nothing made sense, and reported both positive feelings of surprise and amazement, and negative feelings of confusion, puzzlement, apprehensiveness, frustration, stress, ambiguity, and fatigue. The study situates this sense-making aspects of IB within an overarching model of information behaviour that includes IB concepts like monitoring information, encountering information, information seeking and searching, flow, multitasking, information grounds, information horizons, and more, and proposes an integrated model of information behaviour illuminating how these different concepts are interleaved and inter-connected with each other, along with it's implications for information services.
Resumo:
To detect and annotate the key events of live sports videos, we need to tackle the semantic gaps of audio-visual information. Previous work has successfully extracted semantic from the time-stamped web match reports, which are synchronized with the video contents. However, web and social media articles with no time-stamps have not been fully leveraged, despite they are increasingly used to complement the coverage of major sporting tournaments. This paper aims to address this limitation using a novel multimodal summarization framework that is based on sentiment analysis and players' popularity. It uses audiovisual contents, web articles, blogs, and commentators' speech to automatically annotate and visualize the key events and key players in a sports tournament coverage. The experimental results demonstrate that the automatically generated video summaries are aligned with the events identified from the official website match reports.
Resumo:
Corporate reputation is viewed as fundamental to firm performance, growth and survival and the maintenance and enhancement of that reputation is a key responsibility of senior executives. However, relatively little is known about the main dimensions of corporate reputation and the amount of attention given to them by senior executives. Based on the corporate reputation and intangible resources literatures, thirteen reputational elements were identified and the amount of attention given to those elements in a large, longitudinal sample of annual reports from Australian firms was measured using computer aided text analysis. This identified five, main reputational dimensions that were both stable over time and related to firms’ future financial performance.
Resumo:
This paper details the participation of the Australian e- Health Research Centre (AEHRC) in the ShARe/CLEF 2013 eHealth Evaluation Lab { Task 3. This task aims to evaluate the use of information retrieval (IR) systems to aid consumers (e.g. patients and their relatives) in seeking health advice on the Web. Our submissions to the ShARe/CLEF challenge are based on language models generated from the web corpus provided by the organisers. Our baseline system is a standard Dirichlet smoothed language model. We enhance the baseline by identifying and correcting spelling mistakes in queries, as well as expanding acronyms using AEHRC's Medtex medical text analysis platform. We then consider the readability and the authoritativeness of web pages to further enhance the quality of the document ranking. Measures of readability are integrated in the language models used for retrieval via prior probabilities. Prior probabilities are also used to encode authoritativeness information derived from a list of top-100 consumer health websites. Empirical results show that correcting spelling mistakes and expanding acronyms found in queries signi cantly improves the e ectiveness of the language model baseline. Readability priors seem to increase retrieval e ectiveness for graded relevance at early ranks (nDCG@5, but not precision), but no improvements are found at later ranks and when considering binary relevance. The authoritativeness prior does not appear to provide retrieval gains over the baseline: this is likely to be because of the small overlap between websites in the corpus and those in the top-100 consumer-health websites we acquired.
Resumo:
Aims Pathology notification for a Cancer Registry is regarded as the most valid information for the confirmation of a diagnosis of cancer. In view of the importance of pathology data, an automatic medical text analysis system (Medtex) is being developed to perform electronic Cancer Registry data extraction and coding of important clinical information embedded within pathology reports. Methods The system automatically scans HL7 messages received from a Queensland pathology information system and analyses the reports for terms and concepts relevant to a cancer notification. A multitude of data items for cancer notification such as primary site, histological type, stage, and other synoptic data are classified by the system. The underlying extraction and classification technology is based on SNOMED CT1 2. The Queensland Cancer Registry business rules3 and International Classification of Diseases – Oncology – Version 34 have been incorporated. Results The cancer notification services show that the classification of notifiable reports can be achieved with sensitivities of 98% and specificities of 96%5, while the coding of cancer notification items such as basis of diagnosis, histological type and grade, primary site and laterality can be extracted with an overall accuracy of 80%6. In the case of lung cancer staging, the automated stages produced were accurate enough for the purposes of population level research and indicative staging prior to multi-disciplinary team meetings2 7. Medtex also allows for detailed tumour stream synoptic reporting8. Conclusions Medtex demonstrates how medical free-text processing could enable the automation of some Cancer Registry processes. Over 70% of Cancer Registry coding resources are devoted to information acquisition. The development of a clinical decision support system to unlock information from medical free-text could significantly reduce costs arising from duplicated processes and enable improved decision support, enhancing efficiency and timeliness of cancer information for Cancer Registries.
Resumo:
Feminism in Indonesian society is related to the emancipation term that women nowadays have still been bringing up this issue. However, Arisan 2! film showed a shift in film discourse regarding the representation of cosmopolitan women in Indonesia. This research examines on how Arisan 2! film as a media portrays feminism in the society of Jakarta. Feminism in Arisan 2! film was likely to expose the liberal feminism in nowadays modern society through several issues of women’s emancipation, specifically in the areas of marriage, job, and social life.
Resumo:
The rapid increase in the number of text documents available on the Internet has created pressure to use effective cleaning techniques. Cleaning techniques are needed for converting these documents to structured documents. Text cleaning techniques are one of the key mechanisms in typical text mining application frameworks. In this paper, we explore the role of text cleaning in the 20 newsgroups dataset, and report on experimental results.
Resumo:
Much has been written on Michel Foucault’s reluctance to clearly delineate a research method, particularly with respect to genealogy (Harwood 2000; Meadmore, Hatcher, & McWilliam 2000; Tamboukou 1999). Foucault (1994, p. 288) himself disliked prescription stating, “I take care not to dictate how things should be” and wrote provocatively to disrupt equilibrium and certainty, so that “all those who speak for others or to others” no longer know what to do. It is doubtful, however, that Foucault ever intended for researchers to be stricken by that malaise to the point of being unwilling to make an intellectual commitment to methodological possibilities. Taking criticism of “Foucauldian” discourse analysis as a convenient point of departure to discuss the objectives of poststructural analyses of language, this paper develops what might be called a discursive analytic; a methodological plan to approach the analysis of discourses through the location of statements that function with constitutive effects.
Resumo:
Much has been written on Michel Foucault’s reluctance to clearly delineate a research method, particularly with respect to genealogy (Harwood 2000; Meadmore, Hatcher, & McWilliam 2000; Tamboukou 1999). Foucault (1994, p. 288) himself disliked prescription stating, “I take care not to dictate how things should be” and wrote provocatively to disrupt equilibrium and certainty, so that “all those who speak for others or to others” no longer know what to do. It is doubtful, however, that Foucault ever intended for researchers to be stricken by that malaise to the point of being unwilling to make an intellectual commitment to methodological possibilities. Taking criticism of “Foucauldian” discourse analysis as a convenient point of departure to discuss the objectives of poststructural analyses of language, this paper develops what might be called a discursive analytic; a methodological plan to approach the analysis of discourses through the location of statements that function with constitutive effects.
Resumo:
Isolating the impact of a colour, or a combination of colours, is extremely difficult to achieve because it is difficult to remove other environmental elements such as sound, odours, light, and occasion from the experience of being in a place. In order to ascertain the impact of colour on how we interpret the world in day to day situations, the current study records participant responses to achromatic scenes of the built environment prior to viewing the same scene in colour. A number of environments were photographed in colour or copied from design books; and copies of the images saved as both colour and black/grey/white. An overview of the study will be introduced by firstly providing examples of studies which have linked colour to meaning and emotions. For example, yellow is said to be connected to happiness1 ; or red evokes feelings of anger2 or passion. A link between colour and the way we understand and/or feel is established however, there is a further need for knowledge of colour in context. In response to this need, the current achromatic/chromatic environmental study will be described and discussed in light of the findings. Finally, suggestions for future research are posed. Based on previous research the authors hypothesised that a shift in environmental perception by participants would occur. It was found that the impact of colour includes a shift in perception of aspects such as its atmosphere and youthfulness. Through studio-class discussions it was also noted that the predicted age of the place, the function, and in association, the potential users when colour was added (or deleted) were often challenged. It is posited that the ability of a designer (for example, interior designer, architect, or landscape architect) to design for a particular target group—user and/or clients will be enhanced through more targeted studies relating colour in situ. The importance of noting the perceptual shift for the participants in our study, who were young designers, is the realisation that colour potentially holds the power to impact on the identity of an architectural form, an interior space, and/or particular elements such as doorways, furniture settings, and the like.
Resumo:
The recent focus on literacy in Social Studies has been on linguistic design, particularly that related to the grammar of written and spoken text. When students are expected to produce complex hybridized genres such as timelines, a focus on the teaching and learning of linguistic design is necessary but not sufficient to complete the task. Theorizations of new literacies identify five interrelated meaning making designs for text deconstruction and reproduction: linguistic, spatial, visual, gestural, and audio design. Honing in on the complexity of timelines, this paper casts a lens on the linguistic, visual, spatial, and gestural designs of three pairs of primary school aged Social Studies learners. Drawing on a functional metalanguage, we analyze the linguistic, visual, spatial, and gestural designs of their work. We also offer suggestions of their effect, and from there consider the importance of explicit instruction in text design choices for this Social Studies task. We conclude the analysis by suggesting the foci of explicit instruction for future lessons.
Resumo:
Objective: To summarise the extent to which narrative text fields in administrative health data are used to gather information about the event resulting in presentation to a health care provider for treatment of an injury, and to highlight best practise approaches to conducting narrative text interrogation for injury surveillance purposes.----- Design: Systematic review----- Data sources: Electronic databases searched included CINAHL, Google Scholar, Medline, Proquest, PubMed and PubMed Central.. Snowballing strategies were employed by searching the bibliographies of retrieved references to identify relevant associated articles.----- Selection criteria: Papers were selected if the study used a health-related database and if the study objectives were to a) use text field to identify injury cases or use text fields to extract additional information on injury circumstances not available from coded data or b) use text fields to assess accuracy of coded data fields for injury-related cases or c) describe methods/approaches for extracting injury information from text fields.----- Methods: The papers identified through the search were independently screened by two authors for inclusion, resulting in 41 papers selected for review. Due to heterogeneity between studies metaanalysis was not performed.----- Results: The majority of papers reviewed focused on describing injury epidemiology trends using coded data and text fields to supplement coded data (28 papers), with these studies demonstrating the value of text data for providing more specific information beyond what had been coded to enable case selection or provide circumstantial information. Caveats were expressed in terms of the consistency and completeness of recording of text information resulting in underestimates when using these data. Four coding validation papers were reviewed with these studies showing the utility of text data for validating and checking the accuracy of coded data. Seven studies (9 papers) described methods for interrogating injury text fields for systematic extraction of information, with a combination of manual and semi-automated methods used to refine and develop algorithms for extraction and classification of coded data from text. Quality assurance approaches to assessing the robustness of the methods for extracting text data was only discussed in 8 of the epidemiology papers, and 1 of the coding validation papers. All of the text interrogation methodology papers described systematic approaches to ensuring the quality of the approach.----- Conclusions: Manual review and coding approaches, text search methods, and statistical tools have been utilised to extract data from narrative text and translate it into useable, detailed injury event information. These techniques can and have been applied to administrative datasets to identify specific injury types and add value to previously coded injury datasets. Only a few studies thoroughly described the methods which were used for text mining and less than half of the studies which were reviewed used/described quality assurance methods for ensuring the robustness of the approach. New techniques utilising semi-automated computerised approaches and Bayesian/clustering statistical methods offer the potential to further develop and standardise the analysis of narrative text for injury surveillance.
Resumo:
This article explores two matrix methods to induce the ``shades of meaning" (SoM) of a word. A matrix representation of a word is computed from a corpus of traces based on the given word. Non-negative Matrix Factorisation (NMF) and Singular Value Decomposition (SVD) compute a set of vectors corresponding to a potential shade of meaning. The two methods were evaluated based on loss of conditional entropy with respect to two sets of manually tagged data. One set reflects concepts generally appearing in text, and the second set comprises words used for investigations into word sense disambiguation. Results show that for NMF consistently outperforms SVD for inducing both SoM of general concepts as well as word senses. The problem of inducing the shades of meaning of a word is more subtle than that of word sense induction and hence relevant to thematic analysis of opinion where nuances of opinion can arise.
Resumo:
To date, most applications of algebraic analysis and attacks on stream ciphers are on those based on lin- ear feedback shift registers (LFSRs). In this paper, we extend algebraic analysis to non-LFSR based stream ciphers. Specifically, we perform an algebraic analysis on the RC4 family of stream ciphers, an example of stream ciphers based on dynamic tables, and inves- tigate its implications to potential algebraic attacks on the cipher. This is, to our knowledge, the first pa- per that evaluates the security of RC4 against alge- braic attacks through providing a full set of equations that describe the complex word manipulations in the system. For an arbitrary word size, we derive alge- braic representations for the three main operations used in RC4, namely state extraction, word addition and state permutation. Equations relating the inter- nal states and keystream of RC4 are then obtained from each component of the cipher based on these al- gebraic representations, and analysed in terms of their contributions to the security of RC4 against algebraic attacks. Interestingly, it is shown that each of the three main operations contained in the components has its own unique algebraic properties, and when their respective equations are combined, the resulting system becomes infeasible to solve. This results in a high level of security being achieved by RC4 against algebraic attacks. On the other hand, the removal of an operation from the cipher could compromise this security. Experiments on reduced versions of RC4 have been performed, which confirms the validity of our algebraic analysis and the conclusion that the full RC4 stream cipher seems to be immune to algebraic attacks at present.