995 resultados para attribute selection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

R. Jensen and Q. Shen. Fuzzy-Rough Sets Assisted Attribute Selection. IEEE Transactions on Fuzzy Systems, vol. 15, no. 1, pp. 73-89, 2007.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering is a difficult problem especially when we consider the task in the context of a data stream of categorical attributes. In this paper, we propose σ-SCLOPE, a novel algorithm based on SCLOPE’s intuitive observation about cluster histograms. Unlike SCLOPE however, our algorithm consumes less memory per window and has a better clustering runtime for the same data stream in a given window. This positions σ-SCLOPE as a more attractive option over SCLOPE if a minor lost of clustering accuracy is insignificant in the application.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cardiac autonomic neuropathy (CAN) poses an important clinical problem, which often remains undetected due difficulty of conducting the current tests and their lack of sensitivity. CAN has been associated with growth in the risk of unexpected death in cardiac patients with diabetes mellitus. Heart rate variability (HRV) attributes have been actively investigated, since they are important for diagnostics in diabetes, Parkinson's disease, cardiac and renal disease. Due to the adverse effects of CAN it is important to obtain a robust and highly accurate diagnostic tool for identification of early CAN, when treatment has the best outcome. Use of HRV attributes to enhance the effectiveness of diagnosis of CAN progression may provide such a tool. In the present paper we propose a new machine learning algorithm, the Multi-Layer Attribute Selection and Classification (MLASC), for the diagnosis of CAN progression based on HRV attributes. It incorporates our new automated attribute selection procedure, Double Wrapper Subset Evaluator with Particle Swarm Optimization (DWSE-PSO). We present the results of experiments, which compare MLASC with other simpler versions and counterpart methods. The experiments used our large and well-known diabetes complications database. The results of experiments demonstrate that MLASC has significantly outperformed other simpler techniques.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Non-market effects of agriculture are often estimated using discrete choice models from stated preference surveys. In this context we propose two ways of modelling attribute non-attendance. The first involves constraining coefficients to zero in a latent class framework, whereas the second is based on stochastic attribute selection and grounded in Bayesian estimation. Their implications are explored in the context of a stated preference survey designed to value landscapes in Ireland. Taking account of attribute non-attendance with these data improves fit and tends to involve two attributes one of which is likely to be cost, thereby leading to substantive changes in derived welfare estimates.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A decision-maker, when faced with a limited and fixed budget to collect data in support of a multiple attribute selection decision, must decide how many samples to observe from each alternative and attribute. This allocation decision is of particular importance when the information gained leads to uncertain estimates of the attribute values as with sample data collected from observations such as measurements, experimental evaluations, or simulation runs. For example, when the U.S. Department of Homeland Security must decide upon a radiation detection system to acquire, a number of performance attributes are of interest and must be measured in order to characterize each of the considered systems. We identified and evaluated several approaches to incorporate the uncertainty in the attribute value estimates into a normative model for a multiple attribute selection decision. Assuming an additive multiple attribute value model, we demonstrated the idea of propagating the attribute value uncertainty and describing the decision values for each alternative as probability distributions. These distributions were used to select an alternative. With the goal of maximizing the probability of correct selection we developed and evaluated, under several different sets of assumptions, procedures to allocate the fixed experimental budget across the multiple attributes and alternatives. Through a series of simulation studies, we compared the performance of these allocation procedures to the simple, but common, allocation procedure that distributed the sample budget equally across the alternatives and attributes. We found the allocation procedures that were developed based on the inclusion of decision-maker knowledge, such as knowledge of the decision model, outperformed those that neglected such information. Beginning with general knowledge of the attribute values provided by Bayesian prior distributions, and updating this knowledge with each observed sample, the sequential allocation procedure performed particularly well. These observations demonstrate that managing projects focused on a selection decision so that the decision modeling and the experimental planning are done jointly, rather than in isolation, can improve the overall selection results.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper argues for a renewed focus on statistical reasoning in the beginning school years, with opportunities for children to engage in data modelling. Results are reported from the first year of a 3-year longitudinal study in which three classes of first-grade children (6-year-olds) and their teachers engaged in data modelling activities. The theme of Looking after our Environment, part of the children’s science curriculum, provided the task context. The goals for the two activities addressed here included engaging children in core components of data modelling, namely, selecting attributes, structuring and representing data, identifying variation in data, and making predictions from given data. Results include the various ways in which children represented and re represented collected data, including attribute selection, and the metarepresentational competence they displayed in doing so. The “data lenses” through which the children dealt with informal inference (variation and prediction) are also reported.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Parkinson's disease (PD) is a degenerative illness whose cardinal symptoms include rigidity, tremor, and slowness of movement. In addition to its widely recognized effects PD can have a profound effect on speech and voice.The speech symptoms most commonly demonstrated by patients with PD are reduced vocal loudness, monopitch, disruptions of voice quality, and abnormally fast rate of speech. This cluster of speech symptoms is often termed Hypokinetic Dysarthria.The disease can be difficult to diagnose accurately, especially in its early stages, due to this reason, automatic techniques based on Artificial Intelligence should increase the diagnosing accuracy and to help the doctors make better decisions. The aim of the thesis work is to predict the PD based on the audio files collected from various patients.Audio files are preprocessed in order to attain the features.The preprocessed data contains 23 attributes and 195 instances. On an average there are six voice recordings per person, By using data compression technique such as Discrete Cosine Transform (DCT) number of instances can be minimized, after data compression, attribute selection is done using several WEKA build in methods such as ChiSquared, GainRatio, Infogain after identifying the important attributes, we evaluate attributes one by one by using stepwise regression.Based on the selected attributes we process in WEKA by using cost sensitive classifier with various algorithms like MultiPass LVQ, Logistic Model Tree(LMT), K-Star.The classified results shows on an average 80%.By using this features 95% approximate classification of PD is acheived.This shows that using the audio dataset, PD could be predicted with a higher level of accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper focuses on the choice of a supervised learning algorithm and possible data preprocessing in the domain of data-driven haptic simulation. This is done through a comparison of the performance of different supervised learning techniques with and without data preprocessing. The simulation of haptic interactions with deformable objects using data-driven methods has emerged as an alternative to parametric methods. The accuracy of the simulation depends on the empirical data and the learning method. Several methods were suggested in the literature and here we provide a comparison between their performance and applicability to this domain. We selected four examples to be compared: singular learning mechanism which is artificial neural networks (ANN), attribute selection followed by ANN learning process, ensemble of multiple learning techniques, and attribute selection followed by the learning ensemble. These methods performance was compared in the domain of simulating multiple interactions with a deformable object with nonlinear material behavior.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Multi-element analysis of honey samples was carried out with the aim of developing a reliable method of tracing the origin of honey. Forty-two chemical elements were determined (Al, Cu, Pb, Zn, Mn, Cd, Tl, Co, Ni, Rb, Ba, Be, Bi, U, V, Fe, Pt, Pd, Te, Hf, Mo, Sn, Sb, P, La, Mg, I, Sm, Tb, Dy, Sd, Th, Pr, Nd, Tm, Yb, Lu, Gd, Ho, Er, Ce, Cr) by inductively coupled plasma mass spectrometry (ICP-MS). Then, three machine learning tools for classification and two for attribute selection were applied in order to prove that it is possible to use data mining tools to find the region where honey originated. Our results clearly demonstrate the potential of Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Random Forest (RF) chemometric tools for honey origin identification. Moreover, the selection tools allowed a reduction from 42 trace element concentrations to only 5. (C) 2012 Elsevier Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The cyber security threats from phishing emails have been growing buoyed by the capacity of their distributors to fine-tune their trickery and defeat previously known filtering techniques. The detection of novel phishing emails that had not appeared previously, also known as zero-day phishing emails, remains a particular challenge. This paper proposes a multilayer hybrid strategy (MHS) for zero-day filtering of phishing emails that appear during a separate time span by using training data collected previously during another time span. This strategy creates a large ensemble of classifiers and then applies a novel method for pruning the ensemble. The majority of known pruning algorithms belong to the following three categories: ranking based, clustering based, and optimization-based pruning. This paper introduces and investigates a multilayer hybrid pruning. Its application in MHS combines all three approaches in one scheme: ranking, clustering, and optimization. Furthermore, we carry out thorough empirical study of the performance of the MHS for the filtering of phishing emails. Our empirical study compares the performance of MHS strategy with other machine learning classifiers. The results of our empirical study demonstrate that MHS achieved the best outcomes and multilayer hybrid pruning performed better than other pruning techniques.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

IEEE, IEEE Comp Soc, Tech Council Software Engn

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The evaluation and selection of industrial projects before investment decision is customarily done using marketing, technical and financial information. Subsequently, environmental impact assessment and social impact assessment are carried out mainly to satisfy the statutory agencies. Because of stricter environment regulations in developed and developing countries, quite often impact assessment suggests alternate sites, technologies, designs, and implementation methods as mitigating measures. This causes considerable delay to complete project feasibility analysis and selection as complete analysis requires to be taken up again and again till the statutory regulatory authority approves the project. Moreover, project analysis through above process often results sub-optimal project as financial analysis may eliminate better options, as more environment friendly alternative will always be cost intensive. In this circumstance, this study proposes a decision support system, which analyses projects with respect to market, technicalities, and social and environmental impact in an integrated framework using analytic hierarchy process, a multiple-attribute decision-making technique. This not only reduces duration of project evaluation and selection, but also helps select optimal project for the organization for sustainable development. The entire methodology has been applied to a cross-country oil pipeline project in India and its effectiveness has been demonstrated. © 2005 Elsevier B.V. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Petroleum pipelines are the nervous system of the oil industry, as this transports crude oil from sources to refineries and petroleum products from refineries to demand points. Therefore, the efficient operation of these pipelines determines the effectiveness of the entire business. Pipeline route selection plays a major role when designing an effective pipeline system, as the health of the pipeline depends on its terrain. The present practice of route selection for petroleum pipelines is governed by factors such as the shortest distance, constructability, minimal effects on the environment, and approachability. Although this reduces capital expenditure, it often proves to be uneconomical when life cycle costing is considered. This study presents a route selection model with the application of an Analytic Hierarchy Process (AHP), a multiple attribute decision making technique. AHP considers all the above factors along with the operability and maintainability factors interactively. This system has been demonstrated here through a case study of pipeline route selection, from an Indian perspective. A cost-benefit comparison of the shortest route (conventionally selected) and optimal route establishes the effectiveness of the model.