168 resultados para Machine learning methods


Relevância:

80.00% 80.00%

Publicador:

Resumo:

We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and Gaussian complexities. In a decision theoretic setting, we prove general risk bounds in terms of these complexities. We consider function classes that can be expressed as combinations of functions from basis classes and show how the Rademacher and Gaussian complexities of such a function class can be bounded in terms of the complexity of the basis classes. We give examples of the application of these techniques in finding data-dependent risk bounds for decision trees, neural networks and support vector machines.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider the problem of choosing, sequentially, a map which assigns elements of a set A to a few elements of a set B. On each round, the algorithm suffers some cost associated with the chosen assignment, and the goal is to minimize the cumulative loss of these choices relative to the best map on the entire sequence. Even though the offline problem of finding the best map is provably hard, we show that there is an equivalent online approximation algorithm, Randomized Map Prediction (RMP), that is efficient and performs nearly as well. While drawing upon results from the "Online Prediction with Expert Advice" setting, we show how RMP can be utilized as an online approach to several standard batch problems. We apply RMP to online clustering as well as online feature selection and, surprisingly, RMP often outperforms the standard batch algorithms on these problems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider the problem of binary classification where the classifier can, for a particular cost, choose not to classify an observation. Just as in the conventional classification problem, minimization of the sample average of the cost is a difficult optimization problem. As an alternative, we propose the optimization of a certain convex loss function φ, analogous to the hinge loss used in support vector machines (SVMs). Its convexity ensures that the sample average of this surrogate loss can be efficiently minimized. We study its statistical properties. We show that minimizing the expected surrogate loss—the φ-risk—also minimizes the risk. We also study the rate at which the φ-risk approaches its minimum value. We show that fast rates are possible when the conditional probability P(Y=1|X) is unlikely to be close to certain critical values.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The topic of fault detection and diagnostics (FDD) is studied from the perspective of proactive testing. Unlike most research focus in the diagnosis area in which system outputs are analyzed for diagnosis purposes, in this paper the focus is on the other side of the problem: manipulating system inputs for better diagnosis reasoning. In other words, the question of how diagnostic mechanisms can direct system inputs for better diagnosis analysis is addressed here. It is shown how the problem can be formulated as decision making problem coupled with a Bayesian Network based diagnostic mechanism. The developed mechanism is applied to the problem of supervised testing in HVAC systems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In fault detection and diagnostics, limitations coming from the sensor network architecture are one of the main challenges in evaluating a system’s health status. Usually the design of the sensor network architecture is not solely based on diagnostic purposes, other factors like controls, financial constraints, and practical limitations are also involved. As a result, it quite common to have one sensor (or one set of sensors) monitoring the behaviour of two or more components. This can significantly extend the complexity of diagnostic problems. In this paper a systematic approach is presented to deal with such complexities. It is shown how the problem can be formulated as a Bayesian network based diagnostic mechanism with latent variables. The developed approach is also applied to the problem of fault diagnosis in HVAC systems, an application area with considerable modeling and measurement constraints.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The risk, or probability of error, of the classifier produced by the AdaBoost algorithm is investigated. In particular, we consider the stopping strategy to be used in AdaBoost to achieve universal consistency. We show that provided AdaBoost is stopped after n1-ε iterations---for sample size n and ε ∈ (0,1)---the sequence of risks of the classifiers it produces approaches the Bayes risk.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the ocean science community, researchers have begun employing novel sensor platforms as integral pieces in oceanographic data collection, which have significantly advanced the study and prediction of complex and dynamic ocean phenomena. These innovative tools are able to provide scientists with data at unprecedented spatiotemporal resolutions. This paper focuses on the newly developed Wave Glider platform from Liquid Robotics. This vehicle produces forward motion by harvesting abundant natural energy from ocean waves, and provides a persistent ocean presence for detailed ocean observation. This study is targeted at determining a kinematic model for offline planning that provides an accurate estimation of the vehicle speed for a desired heading and set of environmental parameters. Given the significant wave height, ocean surface and subsurface currents, wind speed and direction, we present the formulation of a system identification to provide the vehicle’s speed over a range of possible directions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We have used microarray gene expression profiling and machine learning to predict the presence of BRAF mutations in a panel of 61 melanoma cell lines. The BRAF gene was found to be mutated in 42 samples (69%) and intragenic mutations of the NRAS gene were detected in seven samples (11%). No cell line carried mutations of both genes. Using support vector machines, we have built a classifier that differentiates between melanoma cell lines based on BRAF mutation status. As few as 83 genes are able to discriminate between BRAF mutant and BRAF wild-type samples with clear separation observed using hierarchical clustering. Multidimensional scaling was used to visualize the relationship between a BRAF mutation signature and that of a generalized mitogen-activated protein kinase (MAPK) activation (either BRAF or NRAS mutation) in the context of the discriminating gene list. We observed that samples carrying NRAS mutations lie somewhere between those with or without BRAF mutations. These observations suggest that there are gene-specific mutation signals in addition to a common MAPK activation that result from the pleiotropic effects of either BRAF or NRAS on other signaling pathways, leading to measurably different transcriptional changes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The literature identifies transition as a key objective for capstone experiences. Capstones should take account of the particular needs of final year students by assisting them to transition from their student to their professional identity. The authors are currently completing an Australian Learning and Teaching Council (ALTC) funded project, “Curriculum Renewal in Legal Education: articulating final year curriculum design principles and a final year program”, which seeks to achieve curriculum renewal for legal education in the Australian context through the articulation of a set of curriculum design principles for the final year and the design of a transferable model for an effective final year program. The project has investigated the contemporary role of capstones in assisting transition out by reviewing the relevant literature and considering feedback from a project reference group, a final year student focus group and a recent graduate’s focus group. Analysis of this extensive research- and evidence-base suggests that capstone experiences should support transition through: • Assisting students to develop a sense of professional identity; • Consolidating students’ lifelong learning skills; • Providing opportunities for consolidation of career development and planning processes; • Enabling students to enhance professional skills and competencies; and • Preparing students as ethical citizens and leaders. This paper will examine the role of capstones in assisting students to transition to their professional identity and will propose learning and teaching approaches and assessment of learning methods that support transition out.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Purpose – The work presented in this paper aims to provide an approach to classifying web logs by personal properties of users. Design/methodology/approach – The authors describe an iterative system that begins with a small set of manually labeled terms, which are used to label queries from the log. A set of background knowledge related to these labeled queries is acquired by combining web search results on these queries. This background set is used to obtain many terms that are related to the classification task. The system then ranks each of the related terms, choosing those that most fit the personal properties of the users. These terms are then used to begin the next iteration. Findings – The authors identify the difficulties of classifying web logs, by approaching this problem from a machine learning perspective. By applying the approach developed, the authors are able to show that many queries in a large query log can be classified. Research limitations/implications – Testing results in this type of classification work is difficult, as the true personal properties of web users are unknown. Evaluation of the classification results in terms of the comparison of classified queries to well known age-related sites is a direction that is currently being exploring. Practical implications – This research is background work that can be incorporated in search engines or other web-based applications, to help marketing companies and advertisers. Originality/value – This research enhances the current state of knowledge in short-text classification and query log learning. Classification schemes, Computer networks, Information retrieval, Man-machine systems, User interfaces

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A rule-based approach for classifying previously identified medical concepts in the clinical free text into an assertion category is presented. There are six different categories of assertions for the task: Present, Absent, Possible, Conditional, Hypothetical and Not associated with the patient. The assertion classification algorithms were largely based on extending the popular NegEx and Context algorithms. In addition, a health based clinical terminology called SNOMED CT and other publicly available dictionaries were used to classify assertions, which did not fit the NegEx/Context model. The data for this task includes discharge summaries from Partners HealthCare and from Beth Israel Deaconess Medical Centre, as well as discharge summaries and progress notes from University of Pittsburgh Medical Centre. The set consists of 349 discharge reports, each with pairs of ground truth concept and assertion files for system development, and 477 reports for evaluation. The system’s performance on the evaluation data set was 0.83, 0.83 and 0.83 for recall, precision and F1-measure, respectively. Although the rule-based system shows promise, further improvements can be made by incorporating machine learning approaches.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Electronic services are a leitmotif in ‘hot’ topics like Software as a Service, Service Oriented Architecture (SOA), Service oriented Computing, Cloud Computing, application markets and smart devices. We propose to consider these in what has been termed the Service Ecosystem (SES). The SES encompasses all levels of electronic services and their interaction, with human consumption and initiation on its periphery in much the same way the ‘Web’ describes a plethora of technologies that eventuate to connect information and expose it to humans. Presently, the SES is heterogeneous, fragmented and confined to semi-closed systems. A key issue hampering the emergence of an integrated SES is Service Discovery (SD). A SES will be dynamic with areas of structured and unstructured information within which service providers and ‘lay’ human consumers interact; until now the two are disjointed, e.g., SOA-enabled organisations, industries and domains are choreographed by domain experts or ‘hard-wired’ to smart device application markets and web applications. In a SES, services are accessible, comparable and exchangeable to human consumers closing the gap to the providers. This requires a new SD with which humans can discover services transparently and effectively without special knowledge or training. We propose two modes of discovery, directed search following an agenda and explorative search, which speculatively expands knowledge of an area of interest by means of categories. Inspired by conceptual space theory from cognitive science, we propose to implement the modes of discovery using concepts to map a lay consumer’s service need to terminologically sophisticated descriptions of services. To this end, we reframe SD as an information retrieval task on the information attached to services, such as, descriptions, reviews, documentation and web sites - the Service Information Shadow. The Semantic Space model transforms the shadow's unstructured semantic information into a geometric, concept-like representation. We introduce an improved and extended Semantic Space including categorization calling it the Semantic Service Discovery model. We evaluate our model with a highly relevant, service related corpus simulating a Service Information Shadow including manually constructed complex service agendas, as well as manual groupings of services. We compare our model against state-of-the-art information retrieval systems and clustering algorithms. By means of an extensive series of empirical evaluations, we establish optimal parameter settings for the semantic space model. The evaluations demonstrate the model’s effectiveness for SD in terms of retrieval precision over state-of-the-art information retrieval models (directed search) and the meaningful, automatic categorization of service related information, which shows potential to form the basis of a useful, cognitively motivated map of the SES for exploratory search.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper discusses users’ query reformulation behaviour while searching information on the Web. Query reformulations have emerged as an important component of Web search behaviour and human-computer interaction (HCI) because a user’s success of information retrieval (IR) depends on how he or she formulates queries. There are various factors, such as cognitive styles, that influence users’ query reformulation behaviour. Understanding how users with different cognitive styles formulate their queries while performing Web searches can help HCI researchers and information systems (IS) developers to provide assistance to the users. This paper aims to examine the effects of users’ cognitive styles on their query reformation behaviour. To achieve the goal of the study, a user study was conducted in which a total of 3613 search terms and 872 search queries were submitted by 50 users who engaged in 150 scenario-based search tasks. Riding’s (1991) Cognitive Style Analysis (CSA) test was used to assess users’ cognitive style as wholist or analytic, and verbaliser or imager. The study findings show that users’ query reformulation behaviour is affected by their cognitive styles. The results reveal that analytic users tended to prefer Add queries while all other users preferred New queries. A significant difference was found among wholists and analytics in the manner they performed Remove query reformulations. Future HCI researchers and IS developers can utilize the study results to develop interactive and user-cantered search model, and to provide context-based query suggestions for users.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Free association norms indicate that words are organized into semantic/associative neighborhoods within a larger network of words and links that bind the net together. We present evidence indicating that memory for a recent word event can depend on implicitly and simultaneously activating related words in its neighborhood. Processing a word during encoding primes its network representation as a function of the density of the links in its neighborhood. Such priming increases recall and recognition and can have long lasting effects when the word is processed in working memory. Evidence for this phenomenon is reviewed in extralist cuing, primed free association, intralist cuing, and single-item recognition tasks. The findings also show that when a related word is presented to cue the recall of a studied word, the cue activates it in an array of related words that distract and reduce the probability of its selection. The activation of the semantic network produces priming benefits during encoding and search costs during retrieval. In extralist cuing recall is a negative function of cue-to-distracter strength and a positive function of neighborhood density, cue-to-target strength, and target-to cue strength. We show how four measures derived from the network can be combined and used to predict memory performance. These measures play different roles in different tasks indicating that the contribution of the semantic network varies with the context provided by the task. We evaluate spreading activation and quantum-like entanglement explanations for the priming effect produced by neighborhood density.