904 resultados para SVM (Support Vector Machine)


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The deployment of systems for human-to-machine communication by voice requires overcoming a variety of obstacles that affect the speech-processing technologies. Problems encountered in the field might include variation in speaking style, acoustic noise, ambiguity of language, or confusion on the part of the speaker. The diversity of these practical problems encountered in the "real world" leads to the perceived gap between laboratory and "real-world" performance. To answer the question "What applications can speech technology support today?" the concept of the "degree of difficulty" of an application is introduced. The degree of difficulty depends not only on the demands placed on the speech recognition and speech synthesis technologies but also on the expectations of the user of the system. Experience has shown that deployment of effective speech communication systems requires an iterative process. This paper discusses general deployment principles, which are illustrated by several examples of human-machine communication systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Allozyme and molecular sequence data from the malaria vector Anopheles flavirostris (Ludlow) (Diptera: Culicidae) were analysed from 34 sites throughout the Philippines, including the type locality, to test the hypothesis that this taxon is a single panmictic species. A finer-scaled allozyme study, of mainly Luzon samples, revealed no fixed genetic differences in sympatric sites and only low levels of variation. We obtained data from partial sequences for the internal transcribed spacer 2 (ITS2) (483 bp), the third domain (D3) (330 bp) of the 28S ribosomal DNA subunit and cytochrome c oxidase subunit I (COI) of mitochondrial DNA (261 bp). No sequence variation was observed for ITS2, only a one base pair difference was observed between Philippine and Indonesian D3 sequences and An. flavirostris sequences were unique, confirming their diagnostic value for this taxon. Sixteen COI haplotypes were identified, giving 25 parsimony informative sites. Neighbour-Joining, Maximum Parsimony, Maximum Likelihood and Bayesian phylogenetic analysis of COI sequences for An. flavirostris and outgroup taxa revealed strong branch support for the monophyly of An. flavirostris, thus confirming that Philippine populations of this taxon comprise a single separate species within the Minimus Subgroup of the Funestus Group. Variation in the behaviour of An. flavirostris is likely to be intraspecific rather than interspecific in origin. © 2006 The Royal Entomological Society.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Retrospective clinical data presents many challenges for data mining and machine learning. The transcription of patient records from paper charts and subsequent manipulation of data often results in high volumes of noise as well as a loss of other important information. In addition, such datasets often fail to represent expert medical knowledge and reasoning in any explicit manner. In this research we describe applying data mining methods to retrospective clinical data to build a prediction model for asthma exacerbation severity for pediatric patients in the emergency department. Difficulties in building such a model forced us to investigate alternative strategies for analyzing and processing retrospective data. This paper describes this process together with an approach to mining retrospective clinical data by incorporating formalized external expert knowledge (secondary knowledge sources) into the classification task. This knowledge is used to partition the data into a number of coherent sets, where each set is explicitly described in terms of the secondary knowledge source. Instances from each set are then classified in a manner appropriate for the characteristics of the particular set. We present our methodology and outline a set of experiential results that demonstrate some advantages and some limitations of our approach. © 2008 Springer-Verlag Berlin Heidelberg.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: Biomedical events extraction concerns about events describing changes on the state of bio-molecules from literature. Comparing to the protein-protein interactions (PPIs) extraction task which often only involves the extraction of binary relations between two proteins, biomedical events extraction is much harder since it needs to deal with complex events consisting of embedded or hierarchical relations among proteins, events, and their textual triggers. In this paper, we propose an information extraction system based on the hidden vector state (HVS) model, called HVS-BioEvent, for biomedical events extraction, and investigate its capability in extracting complex events. Methods and material: HVS has been previously employed for extracting PPIs. In HVS-BioEvent, we propose an automated way to generate abstract annotations for HVS training and further propose novel machine learning approaches for event trigger words identification, and for biomedical events extraction from the HVS parse results. Results: Our proposed system achieves an F-score of 49.57% on the corpus used in the BioNLP'09 shared task, which is only 2.38% lower than the best performing system by UTurku in the BioNLP'09 shared task. Nevertheless, HVS-BioEvent outperforms UTurku's system on complex events extraction with 36.57% vs. 30.52% being achieved for extracting regulation events, and 40.61% vs. 38.99% for negative regulation events. Conclusions: The results suggest that the HVS model with the hierarchical hidden state structure is indeed more suitable for complex event extraction since it could naturally model embedded structural context in sentences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

IEEE 802.15.4 standard has been proposed for low power wireless personal area networks. It can be used as an important component in machine to machine (M2M) networks for data collection, monitoring and controlling functions. With an increasing number of machine devices enabled by M2M technology and equipped with 802.15.4 radios, it is likely that multiple 802.15.4 networks may be deployed closely, for example, to collect data for smart metering at residential or enterprise areas. In such scenarios, supporting reliable communications for monitoring and controlling applications is a big challenge. The problem becomes more severe due to the potential hidden terminals when the operations of multiple 802.15.4 networks are uncoordinated. In this paper, we investigate this problem from three typical scenarios and propose an analytic model to reveal how performance of coexisting 802.15.4 networks may be affected by uncoordinated operations under these scenarios. Simulations will be used to validate the analytic model. It is observed that uncoordinated operations may lead to a significant degradation of system performance in M2M applications. With the proposed analytic model, we also investigate the performance limits of the 802.15.4 networks, and the conditions under which coordinated operations may be required to support M2M applications. © 2012 Springer Science + Business Media, LLC.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clinical decision support systems (CDSSs) often base their knowledge and advice on human expertise. Knowledge representation needs to be in a format that can be easily understood by human users as well as supporting ongoing knowledge engineering, including evolution and consistency of knowledge. This paper reports on the development of an ontology specification for managing knowledge engineering in a CDSS for assessing and managing risks associated with mental-health problems. The Galatean Risk and Safety Tool, GRiST, represents mental-health expertise in the form of a psychological model of classification. The hierarchical structure was directly represented in the machine using an XML document. Functionality of the model and knowledge management were controlled using attributes in the XML nodes, with an accompanying paper manual for specifying how end-user tools should behave when interfacing with the XML. This paper explains the advantages of using the web-ontology language, OWL, as the specification, details some of the issues and problems encountered in translating the psychological model to OWL, and shows how OWL benefits knowledge engineering. The conclusions are that OWL can have an important role in managing complex knowledge domains for systems based on human expertise without impeding the end-users' understanding of the knowledge base. The generic classification model underpinning GRiST makes it applicable to many decision domains and the accompanying OWL specification facilitates its implementation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The convergence of data, audio and video on IP networks is changing the way individuals, groups and organizations communicate. This diversity of communication media presents opportunities for creating synergistic collaborative communications. This form of collaborative communication is however not without its challenges. The increasing number of communication service providers coupled with a combinatorial mix of offered services, varying Quality-of-Service and oscillating pricing of services increases the complexity for the user to manage and maintain ‘always best’ priced or performance services. Consumers have to manually manage and adapt their communication in line with differences in services across devices, networks and media while ensuring that the usage remain consistent with their intended goals. This dissertation proposes a novel user-centric approach to address this problem. The proposed approach aims to reduce the aforementioned complexity to the user by (1) providing high-level abstractions and a policy based methodology for automated selection of the communication services guided by high-level user policies and (2) providing services through the seamless integration of multiple communication service providers and providing an extensible framework to support the integration of multiple communication service providers. The approach was implemented in the Communication Virtual Machine (CVM), a model-driven technology for realizing communication applications. The CVM includes the Network Communication Broker, the layer responsible for providing a network-independent API to the upper layers of CVM. The initial prototype for the NCB supported only a single communication framework which limited the number, quality and types of services available. Experimental evaluation of the approach show the additional overhead of the approach is minimal compared to the individual communication services frameworks. Additionally the automated approach proposed out performed the individual communication services frameworks for cross framework switching.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The volume of Audiovisual Translation (AVT) is increasing to meet the rising demand for data that needs to be accessible around the world. Machine Translation (MT) is one of the most innovative technologies to be deployed in the field of translation, but it is still too early to predict how it can support the creativity and productivity of professional translators in the future. Currently, MT is more widely used in (non-AV) text translation than in AVT. In this article, we discuss MT technology and demonstrate why its use in AVT scenarios is particularly challenging. We also present some potentially useful methods and tools for measuring MT quality that have been developed primarily for text translation. The ultimate objective is to bridge the gap between the tech-savvy AVT community, on the one hand, and researchers and developers in the field of high-quality MT, on the other.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.

Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.

Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.

Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.