20 resultados para VARIABLE SELECTION
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
Tuberculosis (TB) is a worldwide infectious disease that has shown over time extremely high mortality levels. The urgent need to develop new antitubercular drugs is due to the increasing rate of appearance of multi-drug resistant strains to the commonly used drugs, and the longer durations of therapy and recovery, particularly in immuno-compromised patients. The major goal of the present study is the exploration of data from different families of compounds through the use of a variety of machine learning techniques so that robust QSAR-based models can be developed to further guide in the quest for new potent anti-TB compounds. Eight QSAR models were built using various types of descriptors (from ADRIANA.Code and Dragon software) with two publicly available structurally diverse data sets, including recent data deposited in PubChem. QSAR methodologies used Random Forests and Associative Neural Networks. Predictions for the external evaluation sets obtained accuracies in the range of 0.76-0.88 (for active/inactive classifications) and Q(2)=0.66-0.89 for regressions. Models developed in this study can be used to estimate the anti-TB activity of drug candidates at early stages of drug development (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Motion compensated frame interpolation (MCFI) is one of the most efficient solutions to generate side information (SI) in the context of distributed video coding. However, it creates SI with rather significant motion compensated errors for some frame regions while rather small for some other regions depending on the video content. In this paper, a low complexity Infra mode selection algorithm is proposed to select the most 'critical' blocks in the WZ frame and help the decoder with some reliable data for those blocks. For each block, the novel coding mode selection algorithm estimates the encoding rate for the Intra based and WZ coding modes and determines the best coding mode while maintaining a low encoder complexity. The proposed solution is evaluated in terms of rate-distortion performance with improvements up to 1.2 dB regarding a WZ coding mode only solution.
Resumo:
Reclaimed water from small wastewater treatment facilities in the rural areas of the Beira Interior region (Portugal) may constitute an alternative water source for aquifer recharge. A 21-month monitoring period in a constructed wetland treatment system has shown that 21,500 m(3) year(-1) of treated wastewater (reclaimed water) could be used for aquifer recharge. A GIS-based multi-criteria analysis was performed, combining ten thematic maps and economic, environmental and technical criteria, in order to produce a suitability map for the location of sites for reclaimed water infiltration. The areas chosen for aquifer recharge with infiltration basins are mainly composed of anthrosol with more than 1 m deep and fine sand texture, which allows an average infiltration velocity of up to 1 m d(-1). These characteristics will provide a final polishing treatment of the reclaimed water after infiltration (soil aquifer treatment (SAT)), suitable for the removal of the residual load (trace organics, nutrients, heavy metals and pathogens). The risk of groundwater contamination is low since the water table in the anthrosol areas ranges from 10 m to 50 m. Oil the other hand, these depths allow a guaranteed unsaturated area suitable for SAT. An area of 13,944 ha was selected for study, but only 1607 ha are suitable for reclaimed water infiltration. Approximately 1280 m(2) were considered enough to set up 4 infiltration basins to work in flooding and drying cycles.
Resumo:
Power converters play a vital role in the integration of wind power into the electrical grid. Variable-speed wind turbine generator systems have a considerable interest of application for grid connection at constant frequency. In this paper, comprehensive simulation studies are carried out with three power converter topologies: matrix, two-level and multilevel. A fractional-order control strategy is studied for the variable-speed operation of wind turbine generator systems. The studies are in order to compare power converter topologies and control strategies. The studies reveal that the multilevel converter and the proposed fractional-order control strategy enable an improvement in the power quality, in comparison with the other power converters using a classical integer-order control strategy. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
As wind power generation undergoes rapid growth, new technical challenges emerge: dynamic stability and power quality. The influence of wind speed disturbances and a pitch control malfunction on the quality of the energy injected into the electric grid is studied for variable-speed wind turbines with different power-electronic converter topologies. Additionally, a new control strategy is proposed for the variable-speed operation of wind turbines with permanent magnet synchronous generators. The performance of disturbance attenuation and system robustness is ascertained. Simulation results are presented and conclusions are duly drawn. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
Trabalho Final de mestrado para obtenção do grau de Mestre em Engenharia Civil
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia Mecânica
Resumo:
Electrocardiography (ECG) biometrics is emerging as a viable biometric trait. Recent developments at the sensor level have shown the feasibility of performing signal acquisition at the fingers and hand palms, using one-lead sensor technology and dry electrodes. These new locations lead to ECG signals with lower signal to noise ratio and more prone to noise artifacts; the heart rate variability is another of the major challenges of this biometric trait. In this paper we propose a novel approach to ECG biometrics, with the purpose of reducing the computational complexity and increasing the robustness of the recognition process enabling the fusion of information across sessions. Our approach is based on clustering, grouping individual heartbeats based on their morphology. We study several methods to perform automatic template selection and account for variations observed in a person's biometric data. This approach allows the identification of different template groupings, taking into account the heart rate variability, and the removal of outliers due to noise artifacts. Experimental evaluation on real world data demonstrates the advantages of our approach.
Resumo:
In research on Silent Speech Interfaces (SSI), different sources of information (modalities) have been combined, aiming at obtaining better performance than the individual modalities. However, when combining these modalities, the dimensionality of the feature space rapidly increases, yielding the well-known "curse of dimensionality". As a consequence, in order to extract useful information from this data, one has to resort to feature selection (FS) techniques to lower the dimensionality of the learning space. In this paper, we assess the impact of FS techniques for silent speech data, in a dataset with 4 non-invasive and promising modalities, namely: video, depth, ultrasonic Doppler sensing, and surface electromyography. We consider two supervised (mutual information and Fisher's ratio) and two unsupervised (meanmedian and arithmetic mean geometric mean) FS filters. The evaluation was made by assessing the classification accuracy (word recognition error) of three well-known classifiers (knearest neighbors, support vector machines, and dynamic time warping). The key results of this study show that both unsupervised and supervised FS techniques improve on the classification accuracy on both individual and combined modalities. For instance, on the video component, we attain relative performance gains of 36.2% in error rates. FS is also useful as pre-processing for feature fusion. Copyright © 2014 ISCA.
Resumo:
In this paper we develop an appropriate theory of positive definite functions on the complex plane from first principles and show some consequences of positive definiteness for meromorphic functions.
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion. © 2014 Springer-Verlag Berlin Heidelberg.
Resumo:
Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.
Resumo:
Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster.
Resumo:
The study of transient dynamical phenomena near bifurcation thresholds has attracted the interest of many researchers due to the relevance of bifurcations in different physical or biological systems. In the context of saddle-node bifurcations, where two or more fixed points collide annihilating each other, it is known that the dynamics can suffer the so-called delayed transition. This phenomenon emerges when the system spends a lot of time before reaching the remaining stable equilibrium, found after the bifurcation, because of the presence of a saddle-remnant in phase space. Some works have analytically tackled this phenomenon, especially in time-continuous dynamical systems, showing that the time delay, tau, scales according to an inverse square-root power law, tau similar to (mu-mu (c) )(-1/2), as the bifurcation parameter mu, is driven further away from its critical value, mu (c) . In this work, we first characterize analytically this scaling law using complex variable techniques for a family of one-dimensional maps, called the normal form for the saddle-node bifurcation. We then apply our general analytic results to a single-species ecological model with harvesting given by a unimodal map, characterizing the delayed transition and the scaling law arising due to the constant of harvesting. For both analyzed systems, we show that the numerical results are in perfect agreement with the analytical solutions we are providing. The procedure presented in this work can be used to characterize the scaling laws of one-dimensional discrete dynamical systems with saddle-node bifurcations.