156 resultados para Aprendizagem automática (Machine Learning)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In fault detection and diagnostics, limitations coming from the sensor network architecture are one of the main challenges in evaluating a system’s health status. Usually the design of the sensor network architecture is not solely based on diagnostic purposes, other factors like controls, financial constraints, and practical limitations are also involved. As a result, it quite common to have one sensor (or one set of sensors) monitoring the behaviour of two or more components. This can significantly extend the complexity of diagnostic problems. In this paper a systematic approach is presented to deal with such complexities. It is shown how the problem can be formulated as a Bayesian network based diagnostic mechanism with latent variables. The developed approach is also applied to the problem of fault diagnosis in HVAC systems, an application area with considerable modeling and measurement constraints.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The risk, or probability of error, of the classifier produced by the AdaBoost algorithm is investigated. In particular, we consider the stopping strategy to be used in AdaBoost to achieve universal consistency. We show that provided AdaBoost is stopped after n1-ε iterations---for sample size n and ε ∈ (0,1)---the sequence of risks of the classifiers it produces approaches the Bayes risk.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the ocean science community, researchers have begun employing novel sensor platforms as integral pieces in oceanographic data collection, which have significantly advanced the study and prediction of complex and dynamic ocean phenomena. These innovative tools are able to provide scientists with data at unprecedented spatiotemporal resolutions. This paper focuses on the newly developed Wave Glider platform from Liquid Robotics. This vehicle produces forward motion by harvesting abundant natural energy from ocean waves, and provides a persistent ocean presence for detailed ocean observation. This study is targeted at determining a kinematic model for offline planning that provides an accurate estimation of the vehicle speed for a desired heading and set of environmental parameters. Given the significant wave height, ocean surface and subsurface currents, wind speed and direction, we present the formulation of a system identification to provide the vehicle’s speed over a range of possible directions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have used microarray gene expression profiling and machine learning to predict the presence of BRAF mutations in a panel of 61 melanoma cell lines. The BRAF gene was found to be mutated in 42 samples (69%) and intragenic mutations of the NRAS gene were detected in seven samples (11%). No cell line carried mutations of both genes. Using support vector machines, we have built a classifier that differentiates between melanoma cell lines based on BRAF mutation status. As few as 83 genes are able to discriminate between BRAF mutant and BRAF wild-type samples with clear separation observed using hierarchical clustering. Multidimensional scaling was used to visualize the relationship between a BRAF mutation signature and that of a generalized mitogen-activated protein kinase (MAPK) activation (either BRAF or NRAS mutation) in the context of the discriminating gene list. We observed that samples carrying NRAS mutations lie somewhere between those with or without BRAF mutations. These observations suggest that there are gene-specific mutation signals in addition to a common MAPK activation that result from the pleiotropic effects of either BRAF or NRAS on other signaling pathways, leading to measurably different transcriptional changes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Trees, shrubs and other vegetation are of continued importance to the environment and our daily life. They provide shade around our roads and houses, offer a habitat for birds and wildlife, and absorb air pollutants. However, vegetation touching power lines is a risk to public safety and the environment, and one of the main causes of power supply problems. Vegetation management, which includes tree trimming and vegetation control, is a significant cost component of the maintenance of electrical infrastructure. For example, Ergon Energy, the Australia’s largest geographic footprint energy distributor, currently spends over $80 million a year inspecting and managing vegetation that encroach on power line assets. Currently, most vegetation management programs for distribution systems are calendar-based ground patrol. However, calendar-based inspection by linesman is labour-intensive, time consuming and expensive. It also results in some zones being trimmed more frequently than needed and others not cut often enough. Moreover, it’s seldom practicable to measure all the plants around power line corridors by field methods. Remote sensing data captured from airborne sensors has great potential in assisting vegetation management in power line corridors. This thesis presented a comprehensive study on using spiking neural networks in a specific image analysis application: power line corridor monitoring. Theoretically, the thesis focuses on a biologically inspired spiking cortical model: pulse coupled neural network (PCNN). The original PCNN model was simplified in order to better analyze the pulse dynamics and control the performance. Some new and effective algorithms were developed based on the proposed spiking cortical model for object detection, image segmentation and invariant feature extraction. The developed algorithms were evaluated in a number of experiments using real image data collected from our flight trails. The experimental results demonstrated the effectiveness and advantages of spiking neural networks in image processing tasks. Operationally, the knowledge gained from this research project offers a good reference to our industry partner (i.e. Ergon Energy) and other energy utilities who wants to improve their vegetation management activities. The novel approaches described in this thesis showed the potential of using the cutting edge sensor technologies and intelligent computing techniques in improve power line corridor monitoring. The lessons learnt from this project are also expected to increase the confidence of energy companies to move from traditional vegetation management strategy to a more automated, accurate and cost-effective solution using aerial remote sensing techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose – The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users. Design/methodology/approach – The authors describe an iterative system that begins with a small set of manually labeled terms, which are used to label queries from the log. A set of background knowledge related to these labeled queries is acquired by combining web search results on these queries. This background set is used to obtain many terms that are related to the classification task. The system then ranks each of the related terms, choosing those that most fit the personal properties of the users. These terms are then used to begin the next iteration. Findings – The authors identify the difficulties of classifying web logs, by approaching this problem from a machine learning perspective. By applying the approach developed, the authors are able to show that many queries in a large query log can be classified. Research limitations/implications – Testing results in this type of classification work is difficult, as the true personal properties of web users are unknown. Evaluation of the classification results in terms of the comparison of classified queries to well known age-related sites is a direction that is currently being exploring. Practical implications – This research is background work that can be incorporated in search engines or other web-based applications, to help marketing companies and advertisers. Originality/value – This research enhances the current state of knowledge in short-text classification and query log learning. Classification schemes, Computer networks, Information retrieval, Man-machine systems, User interfaces

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rule extraction from neural network algorithms have been investigated for two decades and there have been significant applications. Despite this level of success, rule extraction from neural network methods are generally not part of data mining tools, and a significant commercial breakthrough may still be some time away. This paper briefly reviews the state-of-the-art and points to some of the obstacles, namely a lack of evaluation techniques in experiments and larger benchmark data sets. A significant new development is the view that rule extraction from neural networks is an interactive process which actively involves the user. This leads to the application of assessment and evaluation techniques from information retrieval which may lead to a range of new methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A rule-based approach for classifying previously identified medical concepts in the clinical free text into an assertion category is presented. There are six different categories of assertions for the task: Present, Absent, Possible, Conditional, Hypothetical and Not associated with the patient. The assertion classification algorithms were largely based on extending the popular NegEx and Context algorithms. In addition, a health based clinical terminology called SNOMED CT and other publicly available dictionaries were used to classify assertions, which did not fit the NegEx/Context model. The data for this task includes discharge summaries from Partners HealthCare and from Beth Israel Deaconess Medical Centre, as well as discharge summaries and progress notes from University of Pittsburgh Medical Centre. The set consists of 349 discharge reports, each with pairs of ground truth concept and assertion files for system development, and 477 reports for evaluation. The system’s performance on the evaluation data set was 0.83, 0.83 and 0.83 for recall, precision and F1-measure, respectively. Although the rule-based system shows promise, further improvements can be made by incorporating machine learning approaches.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is a big challenge to acquire correct user profiles for personalized text classification since users may be unsure in providing their interests. Traditional approaches to user profiling adopt machine learning (ML) to automatically discover classification knowledge from explicit user feedback in describing personal interests. However, the accuracy of ML-based methods cannot be significantly improved in many cases due to the term independence assumption and uncertainties associated with them. This paper presents a novel relevance feedback approach for personalized text classification. It basically applies data mining to discover knowledge from relevant and non-relevant text and constraints specific knowledge by reasoning rules to eliminate some conflicting information. We also developed a Dempster-Shafer (DS) approach as the means to utilise the specific knowledge to build high-quality data models for classification. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics support that the proposed technique achieves encouraging performance in comparing with the state-of-the-art relevance feedback models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Electronic services are a leitmotif in ‘hot’ topics like Software as a Service, Service Oriented Architecture (SOA), Service oriented Computing, Cloud Computing, application markets and smart devices. We propose to consider these in what has been termed the Service Ecosystem (SES). The SES encompasses all levels of electronic services and their interaction, with human consumption and initiation on its periphery in much the same way the ‘Web’ describes a plethora of technologies that eventuate to connect information and expose it to humans. Presently, the SES is heterogeneous, fragmented and confined to semi-closed systems. A key issue hampering the emergence of an integrated SES is Service Discovery (SD). A SES will be dynamic with areas of structured and unstructured information within which service providers and ‘lay’ human consumers interact; until now the two are disjointed, e.g., SOA-enabled organisations, industries and domains are choreographed by domain experts or ‘hard-wired’ to smart device application markets and web applications. In a SES, services are accessible, comparable and exchangeable to human consumers closing the gap to the providers. This requires a new SD with which humans can discover services transparently and effectively without special knowledge or training. We propose two modes of discovery, directed search following an agenda and explorative search, which speculatively expands knowledge of an area of interest by means of categories. Inspired by conceptual space theory from cognitive science, we propose to implement the modes of discovery using concepts to map a lay consumer’s service need to terminologically sophisticated descriptions of services. To this end, we reframe SD as an information retrieval task on the information attached to services, such as, descriptions, reviews, documentation and web sites - the Service Information Shadow. The Semantic Space model transforms the shadow's unstructured semantic information into a geometric, concept-like representation. We introduce an improved and extended Semantic Space including categorization calling it the Semantic Service Discovery model. We evaluate our model with a highly relevant, service related corpus simulating a Service Information Shadow including manually constructed complex service agendas, as well as manual groupings of services. We compare our model against state-of-the-art information retrieval systems and clustering algorithms. By means of an extensive series of empirical evaluations, we establish optimal parameter settings for the semantic space model. The evaluations demonstrate the model’s effectiveness for SD in terms of retrieval precision over state-of-the-art information retrieval models (directed search) and the meaningful, automatic categorization of service related information, which shows potential to form the basis of a useful, cognitively motivated map of the SES for exploratory search.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Monitoring the natural environment is increasingly important as habit degradation and climate change reduce theworld’s biodiversity.We have developed software tools and applications to assist ecologists with the collection and analysis of acoustic data at large spatial and temporal scales.One of our key objectives is automated animal call recognition, and our approach has three novel attributes. First, we work with raw environmental audio, contaminated by noise and artefacts and containing calls that vary greatly in volume depending on the animal’s proximity to the microphone. Second, initial experimentation suggested that no single recognizer could dealwith the enormous variety of calls. Therefore, we developed a toolbox of generic recognizers to extract invariant features for each call type. Third, many species are cryptic and offer little data with which to train a recognizer. Many popular machine learning methods require large volumes of training and validation data and considerable time and expertise to prepare. Consequently we adopt bootstrap techniques that can be initiated with little data and refined subsequently. In this paper, we describe our recognition tools and present results for real ecological problems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses users’ query reformulation behaviour while searching information on the Web. Query reformulations have emerged as an important component of Web search behaviour and human-computer interaction (HCI) because a user’s success of information retrieval (IR) depends on how he or she formulates queries. There are various factors, such as cognitive styles, that influence users’ query reformulation behaviour. Understanding how users with different cognitive styles formulate their queries while performing Web searches can help HCI researchers and information systems (IS) developers to provide assistance to the users. This paper aims to examine the effects of users’ cognitive styles on their query reformation behaviour. To achieve the goal of the study, a user study was conducted in which a total of 3613 search terms and 872 search queries were submitted by 50 users who engaged in 150 scenario-based search tasks. Riding’s (1991) Cognitive Style Analysis (CSA) test was used to assess users’ cognitive style as wholist or analytic, and verbaliser or imager. The study findings show that users’ query reformulation behaviour is affected by their cognitive styles. The results reveal that analytic users tended to prefer Add queries while all other users preferred New queries. A significant difference was found among wholists and analytics in the manner they performed Remove query reformulations. Future HCI researchers and IS developers can utilize the study results to develop interactive and user-cantered search model, and to provide context-based query suggestions for users.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Free association norms indicate that words are organized into semantic/associative neighborhoods within a larger network of words and links that bind the net together. We present evidence indicating that memory for a recent word event can depend on implicitly and simultaneously activating related words in its neighborhood. Processing a word during encoding primes its network representation as a function of the density of the links in its neighborhood. Such priming increases recall and recognition and can have long lasting effects when the word is processed in working memory. Evidence for this phenomenon is reviewed in extralist cuing, primed free association, intralist cuing, and single-item recognition tasks. The findings also show that when a related word is presented to cue the recall of a studied word, the cue activates it in an array of related words that distract and reduce the probability of its selection. The activation of the semantic network produces priming benefits during encoding and search costs during retrieval. In extralist cuing recall is a negative function of cue-to-distracter strength and a positive function of neighborhood density, cue-to-target strength, and target-to cue strength. We show how four measures derived from the network can be combined and used to predict memory performance. These measures play different roles in different tasks indicating that the contribution of the semantic network varies with the context provided by the task. We evaluate spreading activation and quantum-like entanglement explanations for the priming effect produced by neighborhood density.