81 resultados para Document description


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Description of a patient's injuries is recorded in narrative text form by hospital emergency departments. For statistical reporting, this text data needs to be mapped to pre-defined codes. Existing research in this field uses the Naïve Bayes probabilistic method to build classifiers for mapping. In this paper, we focus on providing guidance on the selection of a classification method. We build a number of classifiers belonging to different classification families such as decision tree, probabilistic, neural networks, and instance-based, ensemble-based and kernel-based linear classifiers. An extensive pre-processing is carried out to ensure the quality of data and, in hence, the quality classification outcome. The records with a null entry in injury description are removed. The misspelling correction process is carried out by finding and replacing the misspelt word with a soundlike word. Meaningful phrases have been identified and kept, instead of removing the part of phrase as a stop word. The abbreviations appearing in many forms of entry are manually identified and only one form of abbreviations is used. Clustering is utilised to discriminate between non-frequent and frequent terms. This process reduced the number of text features dramatically from about 28,000 to 5000. The medical narrative text injury dataset, under consideration, is composed of many short documents. The data can be characterized as high-dimensional and sparse, i.e., few features are irrelevant but features are correlated with one another. Therefore, Matrix factorization techniques such as Singular Value Decomposition (SVD) and Non Negative Matrix Factorization (NNMF) have been used to map the processed feature space to a lower-dimensional feature space. Classifiers with these reduced feature space have been built. In experiments, a set of tests are conducted to reflect which classification method is best for the medical text classification. The Non Negative Matrix Factorization with Support Vector Machine method can achieve 93% precision which is higher than all the tested traditional classifiers. We also found that TF/IDF weighting which works well for long text classification is inferior to binary weighting in short document classification. Another finding is that the Top-n terms should be removed in consultation with medical experts, as it affects the classification performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The use of ‘topic’ concepts has shown improved search performance, given a query, by bringing together relevant documents which use different terms to describe a higher level concept. In this paper, we propose a method for discovering and utilizing concepts in indexing and search for a domain specific document collection being utilized in industry. This approach differs from others in that we only collect focused concepts to build the concept space and that instead of turning a user’s query into a concept based query, we experiment with different techniques of combining the original query with a concept query. We apply the proposed approach to a real-world document collection and the results show that in this scenario the use of concept knowledge at index and search can improve the relevancy of results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective: Individuals with chronic whiplash-associated disorders (WADs) often note driving as a difficult task. This study’s aims were to (1) compare, while driving, neck motor performance, mental effort, and fatigue in individuals with chronic WAD against healthy controls and (2) investigate the relationships of these variables and neck pain to self-reported driving difficulty in the WAD group. Design: This study involved 14 participants in each group (WAD and control). Measures included self-reported driving difficulty and measures of neck pain intensity, overall fatigue, mental effort, and neck motor performance (head rotation and upper trapezius activity) while driving a simulator. Results: The WAD group had greater absolute path of head rotation in a simulated city area and used greater mental effort (P = 0.04), but there were no differences in other measures while driving compared with the controls (all P Q 0.05). Self-reported driving difficulty correlated moderately with neck pain intensity, fatigue level, and maximum velocity of head rotation while driving in the WAD group (all P G 0.05). Conclusions: Individuals with chronic WAD do not seem to have impaired neck motor performance while driving yet use greater mental effort. Neck pain, fatigue, and maximum head rotation velocity could be potential contributors to self-reported driving difficulty in this group.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering is an important technique in organising and categorising web scale documents. The main challenges faced in clustering the billions of documents available on the web are the processing power required and the sheer size of the datasets available. More importantly, it is nigh impossible to generate the labels for a general web document collection containing billions of documents and a vast taxonomy of topics. However, document clusters are most commonly evaluated by comparison to a ground truth set of labels for documents. This paper presents a clustering and labeling solution where the Wikipedia is clustered and hundreds of millions of web documents in ClueWeb12 are mapped on to those clusters. This solution is based on the assumption that the Wikipedia contains such a wide range of diverse topics that it represents a small scale web. We found that it was possible to perform the web scale document clustering and labeling process on one desktop computer under a couple of days for the Wikipedia clustering solution containing about 1000 clusters. It takes longer to execute a solution with finer granularity clusters such as 10,000 or 50,000. These results were evaluated using a set of external data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A long-held assumption in entrepreneurship research is that normal (i.e., Gaussian) distributions characterize variables of interest for both theory and practice. We challenge this assumption by examining more than 12,000 nascent, young, and hyper-growth firms. Results reveal that variables which play central roles in resource-, cognition-, action-, and environment-based entrepreneurship theories exhibit highly skewed power law distributions, where a few outliers account for a disproportionate amount of the distribution's total output. Our results call for the development of new theory to explain and predict the mechanisms that generate these distributions and the outliers therein. We offer a research agenda, including a description of non-traditional methodological approaches, to answer this call.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Australian species of the Orthocladiinae genus Cricotopus Wulp (Diptera: Chironomidae) are revised for larval, pupal, adult male and female life stages. Eleven species, ten of which are new, are recognised and keyed, namely Cricotopus acornis Drayson & Cranston sp. nov., Cricotopus albitarsis Hergstrom sp. nov., Cricotopus annuliventris (Skuse), Cricotopus brevicornis Drayson & Cranston sp. nov., Cricotopus conicornis Drayson & Cranston sp. nov., Cricotopus hillmani Drayson & Cranston, sp. nov., Cricotopus howensis Cranston sp. nov., Cricotopus parbicinctus Hergstrom sp. nov., Cricotopus tasmania Drayson & Cranston sp. nov., Cricotopus varicornis Drayson & Cranston sp. nov. and Cricotopus wangi Cranston & Krosch sp. nov. Using data from this study, we consider the wider utility of morphological and molecular diagnostic tools in untangling species diversity in the Chironomidae. Morphological support for distinguishing Cricotopus from Paratrichocladius Santo-Abreu in larval and pupal stages appears lacking for Australian taxa and brief notes are provided concerning this matter.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose the use of optical flow information as a method for detecting and describing changes in the environment, from the perspective of a mobile camera. We analyze the characteristics of the optical flow signal and demonstrate how robust flow vectors can be generated and used for the detection of depth discontinuities and appearance changes at key locations. To successfully achieve this task, a full discussion on camera positioning, distortion compensation, noise filtering, and parameter estimation is presented. We then extract statistical attributes from the flow signal to describe the location of the scene changes. We also employ clustering and dominant shape of vectors to increase the descriptiveness. Once a database of nodes (where a node is a detected scene change) and their corresponding flow features is created, matching can be performed whenever nodes are encountered, such that topological localization can be achieved. We retrieve the most likely node according to the Mahalanobis and Chi-square distances between the current frame and the database. The results illustrate the applicability of the technique for detecting and describing scene changes in diverse lighting conditions, considering indoor and outdoor environments and different robot platforms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For traditional information filtering (IF) models, it is often assumed that the documents in one collection are only related to one topic. However, in reality users’ interests can be diverse and the documents in the collection often involve multiple topics. Topic modelling was proposed to generate statistical models to represent multiple topics in a collection of documents, but in a topic model, topics are represented by distributions over words which are limited to distinctively represent the semantics of topics. Patterns are always thought to be more discriminative than single terms and are able to reveal the inner relations between words. This paper proposes a novel information filtering model, Significant matched Pattern-based Topic Model (SPBTM). The SPBTM represents user information needs in terms of multiple topics and each topic is represented by patterns. More importantly, the patterns are organized into groups based on their statistical and taxonomic features, from which the more representative patterns, called Significant Matched Patterns, can be identified and used to estimate the document relevance. Experiments on benchmark data sets demonstrate that the SPBTM significantly outperforms the state-of-the-art models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective To synthesise recent research on the use of machine learning approaches to mining textual injury surveillance data. Design Systematic review. Data sources The electronic databases which were searched included PubMed, Cinahl, Medline, Google Scholar, and Proquest. The bibliography of all relevant articles was examined and associated articles were identified using a snowballing technique. Selection criteria For inclusion, articles were required to meet the following criteria: (a) used a health-related database, (b) focused on injury-related cases, AND used machine learning approaches to analyse textual data. Methods The papers identified through the search were screened resulting in 16 papers selected for review. Articles were reviewed to describe the databases and methodology used, the strength and limitations of different techniques, and quality assurance approaches used. Due to heterogeneity between studies meta-analysis was not performed. Results Occupational injuries were the focus of half of the machine learning studies and the most common methods described were Bayesian probability or Bayesian network based methods to either predict injury categories or extract common injury scenarios. Models were evaluated through either comparison with gold standard data or content expert evaluation or statistical measures of quality. Machine learning was found to provide high precision and accuracy when predicting a small number of categories, was valuable for visualisation of injury patterns and prediction of future outcomes. However, difficulties related to generalizability, source data quality, complexity of models and integration of content and technical knowledge were discussed. Conclusions The use of narrative text for injury surveillance has grown in popularity, complexity and quality over recent years. With advances in data mining techniques, increased capacity for analysis of large databases, and involvement of computer scientists in the injury prevention field, along with more comprehensive use and description of quality assurance methods in text mining approaches, it is likely that we will see a continued growth and advancement in knowledge of text mining in the injury field.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Non-thermal plasma (NTP) has been introduced over the past several years as a promising method for nitrogen oxide (NOx) removal. The intent, when using NTP, is to selectively transfer input electrical energy to the electrons, and to not expend this in heating the entire gas stream, which generates free radicals through collisions, and promotes the desired chemical changes in the exhaust gases. The generated active species react with the pollutant molecules and decompose them. This paper reviews and summarizes relevant literature regarding various aspects of the application of {NTP} technology on {NOx} removal from exhaust gases. A comprehensive description of available scientific literature on {NOx} removal using {NTP} technology is presented, including various types of NTP, e.g. dielectric barrier discharge, corona discharge and electron beam. Furthermore, the combination of {NTP} with catalyst and adsorbent for better {NOx} removal efficiency is presented in detail. The removal of {NOx} from both simulated gases and real diesel engines is also considered in this review paper. As {NTP} is a new technique and is not yet commercialized, there is a need for more studies to be performed in this field.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This report document the recent progress (current as of December 2014) of the research project investigating novice driver safety in Oman. Included in this report is a summary of progress with publications to date, as well as description of the preliminary results of the first phase of the quantitative survey with young drivers. With regards to the publications which have resulted from this research, two journal articles have been published in print, one is under review, and a fourth is in the late stages of development for submission...

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The rehabilitation programs of bone-anchorage prostheses relying either on the OPRA (Integrum, Sweden) or the ILP (Orthodynamics, Germany) fixation involve some forms of static load bearing exercises (LBE). So far, most of biomechanical studies of these static LBEs focused on the direct measurements of the actual forces and moments applied on the OPRA fixation of individuals with transfemoral amputation (TFA). To date, the proof-of-concept of an apparatus to conduct these kinetic measurements has been presented, along with some preliminary data. The understanding of the kinetic data is essential to improve rehabilitation programs as well as the design of upcoming loading frames. However, kinetic information alone is difficult to interpret without concomitant kinematic data. The purpose of this preliminary study was to introduce a qualitative analysis describing the different body postures during LBE for a group of TFAs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This cross disciplinary study was conducted as two research and development projects. The outcome is a multimodal and dynamic chronicle, which incorporates the tracking of spatial, temporal and visual elements of performative practice-led and design-led research journeys. The distilled model provides a strong new approach to demonstrate rigour in non-traditional research outputs including provenance and an 'augmented web of facticity'.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper uses discourse analysis techniques associated with Foucauldian archaeology to examine a teacher education accreditation document from Australia to reveal how graduating teachers are constructed through the discourses presented. The findings reveal a discursive site of contestation within the document itself and a mismatch between the identified policy discourses and those from the academic archive. The authors suggest that rather than contradictory representations of what constitutes graduating teacher quality and professionalism, what is needed is an accreditation process that agrees on constructions of graduate identity and professional practice that enact an intellectual and reflexive form of professionalism.