56 resultados para Text Analysis
Resumo:
Trivium is a bit-based stream cipher in the final portfolio of the eSTREAM project. In this paper, we apply the algebraic attack approach of Berbain et al. to Trivium-like ciphers and perform new analyses on them. We demonstrate a new algebraic attack on Bivium-A. This attack requires less time and memory than previous techniques to recover Bivium-A's initial state. Though our attacks on Bivium-B, Trivium and Trivium-N are worse than exhaustive keysearch, the systems of equations which are constructed are smaller and less complex compared to previous algebraic analyses. We also answer an open question posed by Berbain et al. on the feasibility of applying their technique on Trivium-like ciphers. Factors which can affect the complexity of our attack on Trivium-like ciphers are discussed in detail. Analysis of Bivium-B and Trivium-N are omitted from this manuscript. The full paper is available on the IACR ePrint Archive.
Resumo:
Background Managing large student cohorts can be a challenge for university academics, coordinating these units. Bachelor of Nursing programmes have the added challenge of managing multiple groups of students and clinical facilitators whilst completing clinical placement. Clear, time efficient and effective communication between coordinating academics and clinical facilitators is needed to ensure consistency between student and teaching groups and prompt management of emerging issues. Methods This study used a descriptive survey to explore the use of text messaging via a mobile phone, sent from coordinating academics to off-campus clinical facilitators, as an approach to providing direction and support. Results The response rate was 47.8% (n = 22). Correlations were found between the approachability of the coordinating academic and clinical facilitator perception that, a) the coordinating academic understood issues on clinical placement (r = 0.785, p < 0.001), and b) being part of the teaching team (r = 0.768, p < 0.001). Analysis of responses to qualitative questions revealed three themes: connection, approachability and collaboration. Conclusions This study demonstrates that use of regular text messages improves communication between coordinating academics and clinical facilitators. Findings suggest improved connection, approachability and collaboration between the coordinating academic and clinical facilitation staff.
Resumo:
Genomic sequences are fundamentally text documents, admitting various representations according to need and tokenization. Gene expression depends crucially on binding of enzymes to the DNA sequence at small, poorly conserved binding sites, limiting the utility of standard pattern search. However, one may exploit the regular syntactic structure of the enzyme's component proteins and the corresponding binding sites, framing the problem as one of detecting grammatically correct genomic phrases. In this paper we propose new kernels based on weighted tree structures, traversing the paths within them to capture the features which underpin the task. Experimentally, we and that these kernels provide performance comparable with state of the art approaches for this problem, while offering significant computational advantages over earlier methods. The methods proposed may be applied to a broad range of sequence or tree-structured data in molecular biology and other domains.
Resumo:
Background The purpose of this study was to estimate the incidence of fatal and non-fatal Low Speed Vehicle Run Over (LSVRO) events among children aged 0–15 years in Queensland, Australia, at a population level. Methods Fatal and non-fatal LSVRO events that occurred in children resident in Queensland over eleven calendar years (1999-2009) were identified using ICD codes, text description, word searches and medical notes clarification, obtained from five health related data bases across the continuum of care (pre-hospital to fatality). Data were manually linked. Population data provided by the Australian Bureau of Statistics were used to calculate crude incidence rates for fatal and non-fatal LSVRO events. Results There were 1611 LSVROs between 1999–2009 (IR = 16.87/100,000/annum). Incidence of non-fatal events (IR = 16.60/100,000/annum) was 61.5 times higher than fatal events (IR = 0.27/100,000/annum). LSVRO events were more common in boys (IR = 20.97/100,000/annum) than girls (IR = 12.55/100,000/annum), and among younger children aged 0–4 years (IR = 21.45/100000/annum; 39% or all events) than older children (5–9 years: IR = 16.47/100,000/annum; 10–15 years IR = 13.59/100,000/annum). A total of 896 (56.8%) children were admitted to hospital for 24 hours of more following an LSVRO event (IR = 9.38/100,000/annum). Total LSVROs increased from 1999 (IR = 14.79/100,000) to 2009 (IR = 18.56/100,000), but not significantly. Over the 11 year period, there was a slight (non –significant) increase in fatalities (IR = 0.37-0.42/100,000/annum); a significant decrease in admissions (IR = 12.39–5.36/100,000/annum), and significant increase in non-admissions (IR = 2.02-12.77/100,000/annum). Trends over time differed by age, gender and severity. Conclusion This is the most comprehensive, population-based epidemiological study on fatal and non-fatal LSVRO events to date. Results from this study indicate that LSVROs incur a substantial burden. Further research is required on the characteristics and risk factors associated with these events, in order to adequately inform injury prevention. Strategies are urgently required in order to prevent these events, especially among young children aged 0-4 years.
Resumo:
Background Maternity care reform plans have been proposed at state and national levels in Australia, but the extent to which these respond to maternity care consumers’ expressed needs is unclear. This study examines open-text survey comments to identify women’s unmet needs and priorities for maternity care. It is then considered whether these needs and priorities are addressed in current reform plans. Methods Women who had a live single or multiple birth in Queensland, Australia, in 2010 (n 3,635) were invited to complete a retrospective self-report survey. In addition to questions about clinical and interpersonal maternity care experiences from pregnancy to postpartum, women were asked an open-ended question “Is there anything else you’d like to tell us about having your baby?” This paper describes a detailed thematic analysis of open-ended responses from a random selection of 150 women (10% of 1,510 who responded to the question). Results Four broad themes emerged relevant to improving women’s experiences of maternity care: quality of care (interpersonal and technical); access to choices and involvement in decision-making; unmet information needs; and dissatisfaction with the care environment. Some of these topics are reflected in current reform goals, while others provide evidence of the need for further reforms. Conclusions The findings reinforce the importance of some existing maternity reform objectives, and describe how these might best be met. Findings affirm the importance of information provision to enable informed choices; a goal of Queensland and national reform agendas. Improvement opportunities not currently specified in reform agendas were also identified, including the quality of interpersonal relationships between women and staff, particular unmet information needs (e.g., breastfeeding), and concerns regarding the care environment (e.g., crowding and long waiting times).
Resumo:
Efficient error-Propagating Block Chaining (EPBC) is a block cipher mode intended to simultaneously provide both confidentiality and integrity protection for messages. Mitchell’s analysis pointed out a weakness in the EPBC integrity mechanism that can be used in a forgery attack. This paper identifies and corrects a flaw in Mitchell’s analysis of EPBC, and presents other attacks on the EPBC integrity mechanism.
Resumo:
Background The requirement for dual screening of titles and abstracts to select papers to examine in full text can create a huge workload, not least when the topic is complex and a broad search strategy is required, resulting in a large number of results. An automated system to reduce this burden, while still assuring high accuracy, has the potential to provide huge efficiency savings within the review process. Objectives To undertake a direct comparison of manual screening with a semi‐automated process (priority screening) using a machine classifier. The research is being carried out as part of the current update of a population‐level public health review. Methods Authors have hand selected studies for the review update, in duplicate, using the standard Cochrane Handbook methodology. A retrospective analysis, simulating a quasi‐‘active learning’ process (whereby a classifier is repeatedly trained based on ‘manually’ labelled data) will be completed, using different starting parameters. Tests will be carried out to see how far different training sets, and the size of the training set, affect the classification performance; i.e. what percentage of papers would need to be manually screened to locate 100% of those papers included as a result of the traditional manual method. Results From a search retrieval set of 9555 papers, authors excluded 9494 papers at title/abstract and 52 at full text, leaving 9 papers for inclusion in the review update. The ability of the machine classifier to reduce the percentage of papers that need to be manually screened to identify all the included studies, under different training conditions, will be reported. Conclusions The findings of this study will be presented along with an estimate of any efficiency gains for the author team if the screening process can be semi‐automated using text mining methodology, along with a discussion of the implications for text mining in screening papers within complex health reviews.
Resumo:
This article presents and evaluates a model to automatically derive word association networks from text corpora. Two aspects were evaluated: To what degree can corpus-based word association networks (CANs) approximate human word association networks with respect to (1) their ability to quantitatively predict word associations and (2) their structural network characteristics. Word association networks are the basis of the human mental lexicon. However, extracting such networks from human subjects is laborious, time consuming and thus necessarily limited in relation to the breadth of human vocabulary. Automatic derivation of word associations from text corpora would address these limitations. In both evaluations corpus-based processing provided vector representations for words. These representations were then employed to derive CANs using two measures: (1) the well known cosine metric, which is a symmetric measure, and (2) a new asymmetric measure computed from orthogonal vector projections. For both evaluations, the full set of 4068 free association networks (FANs) from the University of South Florida word association norms were used as baseline human data. Two corpus based models were benchmarked for comparison: a latent topic model and latent semantic analysis (LSA). We observed that CANs constructed using the asymmetric measure were slightly less effective than the topic model in quantitatively predicting free associates, and slightly better than LSA. The structural networks analysis revealed that CANs do approximate the FANs to an encouraging degree.
Resumo:
The world is rich with information such as signage and maps to assist humans to navigate. We present a method to extract topological spatial information from a generic bitmap floor plan and build a topometric graph that can be used by a mobile robot for tasks such as path planning and guided exploration. The algorithm first detects and extracts text in an image of the floor plan. Using the locations of the extracted text, flood fill is used to find the rooms and hallways. Doors are found by matching SURF features and these form the connections between rooms, which are the edges of the topological graph. Our system is able to automatically detect doors and differentiate between hallways and rooms, which is important for effective navigation. We show that our method can extract a topometric graph from a floor plan and is robust against ambiguous cases most commonly seen in floor plans including elevators and stairwells.
Resumo:
Objective Death certificates provide an invaluable source for cancer mortality statistics; however, this value can only be realised if accurate, quantitative data can be extracted from certificates – an aim hampered by both the volume and variable nature of certificates written in natural language. This paper proposes an automatic classification system for identifying cancer related causes of death from death certificates. Methods Detailed features, including terms, n-grams and SNOMED CT concepts were extracted from a collection of 447,336 death certificates. These features were used to train Support Vector Machine classifiers (one classifier for each cancer type). The classifiers were deployed in a cascaded architecture: the first level identified the presence of cancer (i.e., binary cancer/nocancer) and the second level identified the type of cancer (according to the ICD-10 classification system). A held-out test set was used to evaluate the effectiveness of the classifiers according to precision, recall and F-measure. In addition, detailed feature analysis was performed to reveal the characteristics of a successful cancer classification model. Results The system was highly effective at identifying cancer as the underlying cause of death (F-measure 0.94). The system was also effective at determining the type of cancer for common cancers (F-measure 0.7). Rare cancers, for which there was little training data, were difficult to classify accurately (F-measure 0.12). Factors influencing performance were the amount of training data and certain ambiguous cancers (e.g., those in the stomach region). The feature analysis revealed a combination of features were important for cancer type classification, with SNOMED CT concept and oncology specific morphology features proving the most valuable. Conclusion The system proposed in this study provides automatic identification and characterisation of cancers from large collections of free-text death certificates. This allows organisations such as Cancer Registries to monitor and report on cancer mortality in a timely and accurate manner. In addition, the methods and findings are generally applicable beyond cancer classification and to other sources of medical text besides death certificates.
Resumo:
In this paper, we present the results of an exploratory study that examined the problem of automating content analysis of student online discussion transcripts. We looked at the problem of coding discussion transcripts for the levels of cognitive presence, one of the three main constructs in the Community of Inquiry (CoI) model of distance education. Using Coh-Metrix and LIWC features, together with a set of custom features developed to capture discussion context, we developed a random forest classification system that achieved 70.3% classification accuracy and 0.63 Cohen's kappa, which is significantly higher than values reported in the previous studies. Besides improvement in classification accuracy, the developed system is also less sensitive to overfitting as it uses only 205 classification features, which is around 100 times less features than in similar systems based on bag-of-words features. We also provide an overview of the classification features most indicative of the different phases of cognitive presence that gives an additional insights into the nature of cognitive presence learning cycle. Overall, our results show great potential of the proposed approach, with an added benefit of providing further characterization of the cognitive presence coding scheme.