Biblioteca Digital

897 resultados para Voiced or unvoiced classification

Using Web Search Logs to Identify Query Classification Terms

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose – The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users. Design/methodology/approach – The authors describe an iterative system that begins with a small set of manually labeled terms, which are used to label queries from the log. A set of background knowledge related to these labeled queries is acquired by combining web search results on these queries. This background set is used to obtain many terms that are related to the classification task. The system then ranks each of the related terms, choosing those that most fit the personal properties of the users. These terms are then used to begin the next iteration. Findings – The authors identify the difficulties of classifying web logs, by approaching this problem from a machine learning perspective. By applying the approach developed, the authors are able to show that many queries in a large query log can be classified. Research limitations/implications – Testing results in this type of classification work is difficult, as the true personal properties of web users are unknown. Evaluation of the classification results in terms of the comparison of classified queries to well known age-related sites is a direction that is currently being exploring. Practical implications – This research is background work that can be incorporated in search engines or other web-based applications, to help marketing companies and advertisers. Originality/value – This research enhances the current state of knowledge in short-text classification and query log learning. Classification schemes, Computer networks, Information retrieval, Man-machine systems, User interfaces

A classification framework for design-build variants from an operational perspective

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Design-build (DB) is a generic form of construction procurement, and, rather than simply representing a single system, it has evolved in practice into a variety of forms, each of which is similar to, and yet different from each other. Although the importance of selecting an appropriate DB variant has been widely accepted, difficulties occur in practice due to the multiplicity of terms and concepts used. What is needed is some kind of taxonomy or framework within which the individual variants can be placed and their relative attributes identified and understood. Through a comprehensive literature review and content analysis, this paper establishes a systematic classification framework for DB variants based on their operational attributes. In addition to providing much needed support for decision-making, this classification framework provides client/owners with perspectives to understand and examine different categories of DB variants from an operational perspective.

Unsupervised multi-label text classification using a world knowledge ontology

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The development of text classification techniques has been largely promoted in the past decade due to the increasing availability and widespread use of digital documents. Usually, the performance of text classification relies on the quality of categories and the accuracy of classifiers learned from samples. When training samples are unavailable or categories are unqualified, text classification performance would be degraded. In this paper, we propose an unsupervised multi-label text classification method to classify documents using a large set of categories stored in a world ontology. The approach has been promisingly evaluated by compared with typical text classification methods, using a real-world document collection and based on the ground truth encoded by human experts.

Strategy taxonomy and classification system development : study of two state governments

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There is limited understanding about business strategies related to parliamentary government's departments. This study focuses on the strategies of departments of two state governments in Australia. The strategies are derived from department strategic plans available in public domain and collected from respective websites. The results of this research indicate that strategies fall into seven categories: internal, development, political, partnership, environment, reorientation and status quo. The strategies of the departments are mainly internal or development where development strategy is mainly the focus of departments such as transport, and infrastructure. Political strategy is prevalent for departments related to communities, and education and training. Further three layers of strategies are identified as kernel, cluster and individual, which are mapped to the developed taxonomy.

Distribution feeder loads classification and decomposition

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Load in distribution networks is normally measured at the 11kV supply points; little or no information is known about the type of customers and their contributions to the load. This paper proposes statistical methods to decompose an unknown distribution feeder load to its customer load sector/subsector profiles. The approach used in this paper should assist electricity suppliers in economic load management, strategic planning and future network reinforcements.

A new classification and database for stratospheric dust particles

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the increasing number of stratospheric particles available for study (via the U2 and/or WB57F collections), it is essential that a simple, yet rational, classification scheme be developed for general use. Such a scheme should be applicable to all particles collected from the stratosphere, rather than limited to only extraterrestial or chemical sub-groups. Criteria for the efficacy of such a scheme would include: (a) objectivity , (b) ease of use, (c) acceptance within the broader scientific community and (d) how well the classification provides intrinsic categories which are consistent with our knowledge of particle types present in the stratosphere.

Supplementary Information : Hogan, Holland, Holloway, Petit and Read : Read Classification for Next Generation Sequencing, ESANN 2013, April 2013

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This item provides supplementary materials for the paper mentioned in the title, specifically a range of organisms used in the study. The full abstract for the main paper is as follows: Next Generation Sequencing (NGS) technologies have revolutionised molecular biology, allowing clinical sequencing to become a matter of routine. NGS data sets consist of short sequence reads obtained from the machine, given context and meaning through downstream assembly and annotation. For these techniques to operate successfully, the collected reads must be consistent with the assumed species or species group, and not corrupted in some way. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans,with some strains exhibiting antibiotic resistance. In this paper, we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from alternative pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.

Naming or creating a problem?

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this chapter, we are going to consider how language and practice interact in the process of supporting the learning of students with diverse abilities. You will learn that it is necessary for teachers to understand that while labels carry an administrative function in schools, when used carelessly they operate to stigmatise and exclude those whom we are working to include. This chapter will introduce the concept of equity and explain how the dilemma of difference emerges when we try to determine who should receive support and how. The chapter will also explain how an appreciation of language can help to inform and transform our pedagogy. An example of inclusion in action is provided to illustrate how inclusive language in practice can promote deep cultural changes that benefit both students and teachers. The process of determining appropriate and effective education of students with additional support requirements is troubled by what some refer to as the ‘dilemma of difference’. This dilemma derives mainly from the nature of language and our need to use certain words, terms and categories in order to share common understandings. Without these, educators cannot hope to arrive on the same page, yet such words can take on a life of their own; influencing thoughts, perspectives and attitudes in ways that far outstrip original intentions. The drive for clarity, however, through definition and diagnostic classification can ultimately obscure because of the cultural meanings that become invested within these terms through their use over time and in different professional contexts. In effect, trying to define “difference” in order to provide the right support to particular students is a process that entrenches normative boundaries that in turn create, accentuate and stigmatise whatever we have decided constitutes difference. Language is thus a powerful and dangerous weapon but, like other weapons, language can both hurt and defend. Understanding the power of language enables educators to use it both wisely and safely to the maximum benefit of their students. This chapter will discuss how teachers can recognise and support their students in ways that avoid stigma and the closure of stereotyping.

Classification of railway bridges based on criticality and vulnerability factors

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bridges are currently rated individually for maintenance and repair action according to the structural conditions of their elements. Dealing with thousands of bridges and the many factors that cause deterioration, makes this rating process extremely complicated. The current simplified but practical methods are not accurate enough. On the other hand, the sophisticated, more accurate methods are only used for a single or particular bridge type. It is therefore necessary to develop a practical and accurate rating system for a network of bridges. The first most important step in achieving this aim is to classify bridges based on the differences in nature and the unique characteristics of the critical factors and the relationship between them, for a network of bridges. Critical factors and vulnerable elements will be identified and placed in different categories. This classification method will be used to develop a new practical rating method for a network of railway bridges based on criticality and vulnerability analysis. This rating system will be more accurate and economical as well as improve the safety and serviceability of railway bridges.

Read classification for next generation sequencing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Next Generation Sequencing (NGS) has revolutionised molec- ular biology, allowing routine clinical sequencing. NGS data consists of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans, with some strains exhibiting antibiotic resistance. Here we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from other pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.

Predicting fault-prone software modules with rank sum classification

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The detection and correction of defects remains among the most time consuming and expensive aspects of software development. Extensive automated testing and code inspections may mitigate their effect, but some code fragments are necessarily more likely to be faulty than others, and automated identification of fault prone modules helps to focus testing and inspections, thus limiting wasted effort and potentially improving detection rates. However, software metrics data is often extremely noisy, with enormous imbalances in the size of the positive and negative classes. In this work, we present a new approach to predictive modelling of fault proneness in software modules, introducing a new feature representation to overcome some of these issues. This rank sum representation offers improved or at worst comparable performance to earlier approaches for standard data sets, and readily allows the user to choose an appropriate trade-off between precision and recall to optimise inspection effort to suit different testing environments. The method is evaluated using the NASA Metrics Data Program (MDP) data sets, and performance is compared with existing studies based on the Support Vector Machine (SVM) and Naïve Bayes (NB) Classifiers, and with our own comprehensive evaluation of these methods.

Local inter-session variability modelling for object classification

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Object classification is plagued by the issue of session variation. Session variation describes any variation that makes one instance of an object look different to another, for instance due to pose or illumination variation. Recent work in the challenging task of face verification has shown that session variability modelling provides a mechanism to overcome some of these limitations. However, for computer vision purposes, it has only been applied in the limited setting of face verification. In this paper we propose a local region based intersession variability (ISV) modelling approach, and apply it to challenging real-world data. We propose a region based session variability modelling approach so that local session variations can be modelled, termed Local ISV. We then demonstrate the efficacy of this technique on a challenging real-world fish image database which includes images taken underwater, providing significant real-world session variations. This Local ISV approach provides a relative performance improvement of, on average, 23% on the challenging MOBIO, Multi-PIE and SCface face databases. It also provides a relative performance improvement of 35% on our challenging fish image dataset.

Classification of pathology reports for cancer registry notifications

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: To develop a system for the automatic classification of pathology reports for Cancer Registry notifications. Method: A two pass approach is proposed to classify whether pathology reports are cancer notifiable or not. The first pass queries pathology HL7 messages for known report types that are received by the Queensland Cancer Registry (QCR), while the second pass aims to analyse the free text reports and identify those that are cancer notifiable. Cancer Registry business rules, natural language processing and symbolic reasoning using the SNOMED CT ontology were adopted in the system. Results: The system was developed on a corpus of 500 histology and cytology reports (with 47% notifiable reports) and evaluated on an independent set of 479 reports (with 52% notifiable reports). Results show that the system can reliably classify cancer notifiable reports with a sensitivity, specificity, and positive predicted value (PPV) of 0.99, 0.95, and 0.95, respectively for the development set, and 0.98, 0.96, and 0.96 for the evaluation set. High sensitivity can be achieved at a slight expense in specificity and PPV. Conclusion: The system demonstrates how medical free-text processing enables the classification of cancer notifiable pathology reports with high reliability for potential use by Cancer Registries and pathology laboratories.

Automatic classification of free-text radiology reports to identify limb fractures using machine learning and the SNOMED CT ontology

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective To develop and evaluate machine learning techniques that identify limb fractures and other abnormalities (e.g. dislocations) from radiology reports. Materials and Methods 99 free-text reports of limb radiology examinations were acquired from an Australian public hospital. Two clinicians were employed to identify fractures and abnormalities from the reports; a third senior clinician resolved disagreements. These assessors found that, of the 99 reports, 48 referred to fractures or abnormalities of limb structures. Automated methods were then used to extract features from these reports that could be useful for their automatic classification. The Naive Bayes classification algorithm and two implementations of the support vector machine algorithm were formally evaluated using cross-fold validation over the 99 reports. Result Results show that the Naive Bayes classifier accurately identifies fractures and other abnormalities from the radiology reports. These results were achieved when extracting stemmed token bigram and negation features, as well as using these features in combination with SNOMED CT concepts related to abnormalities and disorders. The latter feature has not been used in previous works that attempted classifying free-text radiology reports. Discussion Automated classification methods have proven effective at identifying fractures and other abnormalities from radiology reports (F-Measure up to 92.31%). Key to the success of these techniques are features such as stemmed token bigrams, negations, and SNOMED CT concepts associated with morphologic abnormalities and disorders. Conclusion This investigation shows early promising results and future work will further validate and strengthen the proposed approaches.

Supplementary material : large scale read classification for next generation sequencing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and an increasing role in clinical practice. Such applications necessarily require rapid identification of the organism as a prelude to annotation and further analysis. NGS data consist of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on careful attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.

«
1
2
3
4
5
6
7
8
...
59
60
»