215 resultados para Wrapper


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Discrete data representations are necessary, or at least convenient, in many machine learning problems. While feature selection (FS) techniques aim at finding relevant subsets of features, the goal of feature discretization (FD) is to find concise (quantized) data representations, adequate for the learning task at hand. In this paper, we propose two incremental methods for FD. The first method belongs to the filter family, in which the quality of the discretization is assessed by a (supervised or unsupervised) relevance criterion. The second method is a wrapper, where discretized features are assessed using a classifier. Both methods can be coupled with any static (unsupervised or supervised) discretization procedure and can be used to perform FS as pre-processing or post-processing stages. The proposed methods attain efficient representations suitable for binary and multi-class problems with different types of data, being competitive with existing methods. Moreover, using well-known FS methods with the features discretized by our techniques leads to better accuracy than with the features discretized by other methods or with the original features. (C) 2013 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Besides optimizing classifier predictive performance and addressing the curse of the dimensionality problem, feature selection techniques support a classification model as simple as possible. In this paper, we present a wrapper feature selection approach based on Bat Algorithm (BA) and Optimum-Path Forest (OPF), in which we model the problem of feature selection as an binary-based optimization technique, guided by BA using the OPF accuracy over a validating set as the fitness function to be maximized. Moreover, we present a methodology to better estimate the quality of the reduced feature set. Experiments conducted over six public datasets demonstrated that the proposed approach provides statistically significant more compact sets and, in some cases, it can indeed improve the classification effectiveness. © 2013 Elsevier Ltd. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recto of wrapper reads: "Oct. 31, 1740 to June 24, 1742. Papers relative to the charges against & defence made by Nathan Prince in 1741. Examined & done with." Verso reads "Hon. President [Josiah] Quincy." These annotations suggest that the records in this collection were consulted by Quincy, who served as Harvard's President from 1829 to 1845. It is likely that he used them when writing his two-volume History of Harvard University, which includes a lengthy passage about Prince and his trials at Harvard.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Paper wrapper reads: "Nicholas Shapleigh & John Shapleigh / Division of farm at Kittery / Recorded January 31st, 1798 / 17 cents duty." The legal document establishing the division of the land is signed by each of the three surveyors: Nicholas Morrell(?), William Fry, and Daniel Emery.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Web wrapper extracts data from HTML document. The accuracy and quality of the information extracted by web wrapper relies on the structure of the HTML document. If an HTML document is changed, the web wrapper may or may not function correctly. This paper presents an Adjacency-Weight method to be used in the web wrapper extraction process or in a wrapper self-maintenance mechanism to validate web wrappers. The algorithm and data structures are illustrated by some intuitive examples.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Cultural objects are increasingly generated and stored in digital form, yet effective methods for their indexing and retrieval still remain an important area of research. The main problem arises from the disconnection between the content-based indexing approach used by computer scientists and the description-based approach used by information scientists. There is also a lack of representational schemes that allow the alignment of the semantics and context with keywords and low-level features that can be automatically extracted from the content of these cultural objects. This paper presents an integrated approach to address these problems, taking advantage of both computer science and information science approaches. We firstly discuss the requirements from a number of perspectives: users, content providers, content managers and technical systems. We then present an overview of our system architecture and describe various techniques which underlie the major components of the system. These include: automatic object category detection; user-driven tagging; metadata transform and augmentation, and an expression language for digital cultural objects. In addition, we discuss our experience on testing and evaluating some existing collections, analyse the difficulties encountered and propose ways to address these problems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The high morbidity and mortality associated with atherosclerotic coronary vascular disease (CVD) and its complications are being lessened by the increased knowledge of risk factors, effective preventative measures and proven therapeutic interventions. However, significant CVD morbidity remains and sudden cardiac death continues to be a presenting feature for some subsequently diagnosed with CVD. Coronary vascular disease is also the leading cause of anaesthesia related complications. Stress electrocardiography/exercise testing is predictive of 10 year risk of CVD events and the cardiovascular variables used to score this test are monitored peri-operatively. Similar physiological time-series datasets are being subjected to data mining methods for the prediction of medical diagnoses and outcomes. This study aims to find predictors of CVD using anaesthesia time-series data and patient risk factor data. Several pre-processing and predictive data mining methods are applied to this data. Physiological time-series data related to anaesthetic procedures are subjected to pre-processing methods for removal of outliers, calculation of moving averages as well as data summarisation and data abstraction methods. Feature selection methods of both wrapper and filter types are applied to derived physiological time-series variable sets alone and to the same variables combined with risk factor variables. The ability of these methods to identify subsets of highly correlated but non-redundant variables is assessed. The major dataset is derived from the entire anaesthesia population and subsets of this population are considered to be at increased anaesthesia risk based on their need for more intensive monitoring (invasive haemodynamic monitoring and additional ECG leads). Because of the unbalanced class distribution in the data, majority class under-sampling and Kappa statistic together with misclassification rate and area under the ROC curve (AUC) are used for evaluation of models generated using different prediction algorithms. The performance based on models derived from feature reduced datasets reveal the filter method, Cfs subset evaluation, to be most consistently effective although Consistency derived subsets tended to slightly increased accuracy but markedly increased complexity. The use of misclassification rate (MR) for model performance evaluation is influenced by class distribution. This could be eliminated by consideration of the AUC or Kappa statistic as well by evaluation of subsets with under-sampled majority class. The noise and outlier removal pre-processing methods produced models with MR ranging from 10.69 to 12.62 with the lowest value being for data from which both outliers and noise were removed (MR 10.69). For the raw time-series dataset, MR is 12.34. Feature selection results in reduction in MR to 9.8 to 10.16 with time segmented summary data (dataset F) MR being 9.8 and raw time-series summary data (dataset A) being 9.92. However, for all time-series only based datasets, the complexity is high. For most pre-processing methods, Cfs could identify a subset of correlated and non-redundant variables from the time-series alone datasets but models derived from these subsets are of one leaf only. MR values are consistent with class distribution in the subset folds evaluated in the n-cross validation method. For models based on Cfs selected time-series derived and risk factor (RF) variables, the MR ranges from 8.83 to 10.36 with dataset RF_A (raw time-series data and RF) being 8.85 and dataset RF_F (time segmented time-series variables and RF) being 9.09. The models based on counts of outliers and counts of data points outside normal range (Dataset RF_E) and derived variables based on time series transformed using Symbolic Aggregate Approximation (SAX) with associated time-series pattern cluster membership (Dataset RF_ G) perform the least well with MR of 10.25 and 10.36 respectively. For coronary vascular disease prediction, nearest neighbour (NNge) and the support vector machine based method, SMO, have the highest MR of 10.1 and 10.28 while logistic regression (LR) and the decision tree (DT) method, J48, have MR of 8.85 and 9.0 respectively. DT rules are most comprehensible and clinically relevant. The predictive accuracy increase achieved by addition of risk factor variables to time-series variable based models is significant. The addition of time-series derived variables to models based on risk factor variables alone is associated with a trend to improved performance. Data mining of feature reduced, anaesthesia time-series variables together with risk factor variables can produce compact and moderately accurate models able to predict coronary vascular disease. Decision tree analysis of time-series data combined with risk factor variables yields rules which are more accurate than models based on time-series data alone. The limited additional value provided by electrocardiographic variables when compared to use of risk factors alone is similar to recent suggestions that exercise electrocardiography (exECG) under standardised conditions has limited additional diagnostic value over risk factor analysis and symptom pattern. The effect of the pre-processing used in this study had limited effect when time-series variables and risk factor variables are used as model input. In the absence of risk factor input, the use of time-series variables after outlier removal and time series variables based on physiological variable values’ being outside the accepted normal range is associated with some improvement in model performance.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Today, the majority of semiconductor fabrication plants (fabs) conduct equipment preventive maintenance based on statistically-derived time- or wafer-count-based intervals. While these practices have had relative success in managing equipment availability and product yield, the cost, both in time and materials, remains high. Condition-based maintenance has been successfully adopted in several industries, where costs associated with equipment downtime range from potential loss of life to unacceptable affects to companies’ bottom lines. In this paper, we present a method for the monitoring of complex systems in the presence of multiple operating regimes. In addition, the new representation of degradation processes will be used to define an optimization procedure that facilitates concurrent maintenance and operational decision-making in a manufacturing system. This decision-making procedure metaheuristically maximizes a customizable cost function that reflects the benefits of production uptime, and the losses incurred due to deficient quality and downtime. The new degradation monitoring method is illustrated through the monitoring of a deposition tool operating over a prolonged period of time in a major fab, while the operational decision-making is demonstrated using simulated operation of a generic cluster tool.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A new physically based classical continuous potential distribution model, particularly considering the channel center, is proposed for a short-channel undoped body symmetrical double-gate transistor. It involves a novel technique for solving the 2-D nonlinear Poisson's equation in a rectangular coordinate system, which makes the model valid from weak to strong inversion regimes and from the channel center to the surface. We demonstrated, using the proposed model, that the channel potential versus gate voltage characteristics for the devices having equal channel lengths but different thicknesses pass through a single common point (termed ``crossover point''). Based on the potential model, a new compact model for the subthreshold swing is formulated. It is shown that for the devices having very high short-channel effects (SCE), the effective subthreshold slope factor is mainly dictated by the potential close to the channel center rather than the surface. SCEs and drain-induced barrier lowering are also assessed using the proposed model and validated against a professional numerical device simulator.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An electromagnetically coupled feed arrangement is proposed for simultaneously exciting multiple concentric ring antennas for multi-frequency operation. This has a multi-layer dielectric configuration in which a transmission line is embedded below the layer containing radiating rings. Energy coupled to these rings from the line beneath is optimised by suitably adjusting the location and dimensions of stubs on the line. It has been shown that the resonant frequencies of these rings do not change as several of these single-frequency antennas are combined to form a multi-resonant antenna. Furthermore, all radiators are forced to operate at their primary mode and some harmonics of the lower resonant frequency rings appearing within the frequency range are suppressed when combined. The experimental prototype antenna has three resonant frequencies at which it has good radiation characteristics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For point to point multiple input multiple output systems, Dayal-Brehler-Varanasi have proved that training codes achieve the same diversity order as that of the underlying coherent space time block code (STBC) if a simple minimum mean squared error estimate of the channel formed using the training part is employed for coherent detection of the underlying STBC. In this letter, a similar strategy involving a combination of training, channel estimation and detection in conjunction with existing coherent distributed STBCs is proposed for noncoherent communication in Amplify-and-Forward (AF) relay networks. Simulation results show that the proposed simple strategy outperforms distributed differential space-time coding for AF relay networks. Finally, the proposed strategy is extended to asynchronous relay networks using orthogonal frequency division multiplexing.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A pulsewidth modulation (PWM) technique is proposed for minimizing the rms torque ripple in inverter-fed induction motor drives subject to a given average switching frequency of the inverter. The proposed PWM technique is a combination of optimal continuous modulation and discontinuous modulation. The proposed technique is evaluated both theoretically as well as experimentally and is compared with well-known PWM techniques. It is shown that the proposed method reduces the rms torque ripple by about 30% at the rated speed of the motor drive, compared to conventional space vector PWM.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we first recast the generalized symmetric eigenvalue problem, where the underlying matrix pencil consists of symmetric positive definite matrices, into an unconstrained minimization problem by constructing an appropriate cost function, We then extend it to the case of multiple eigenvectors using an inflation technique, Based on this asymptotic formulation, we derive a quasi-Newton-based adaptive algorithm for estimating the required generalized eigenvectors in the data case. The resulting algorithm is modular and parallel, and it is globally convergent with probability one, We also analyze the effect of inexact inflation on the convergence of this algorithm and that of inexact knowledge of one of the matrices (in the pencil) on the resulting eigenstructure. Simulation results demonstrate that the performance of this algorithm is almost identical to that of the rank-one updating algorithm of Karasalo. Further, the performance of the proposed algorithm has been found to remain stable even over 1 million updates without suffering from any error accumulation problems.