843 resultados para Relevance feature
Resumo:
Feature discretization (FD) techniques often yield adequate and compact representations of the data, suitable for machine learning and pattern recognition problems. These representations usually decrease the training time, yielding higher classification accuracy while allowing for humans to better understand and visualize the data, as compared to the use of the original features. This paper proposes two new FD techniques. The first one is based on the well-known Linde-Buzo-Gray quantization algorithm, coupled with a relevance criterion, being able perform unsupervised, supervised, or semi-supervised discretization. The second technique works in supervised mode, being based on the maximization of the mutual information between each discrete feature and the class label. Our experimental results on standard benchmark datasets show that these techniques scale up to high-dimensional data, attaining in many cases better accuracy than existing unsupervised and supervised FD approaches, while using fewer discretization intervals.
Resumo:
The study of motor unit action potential (MUAP) activity from electrornyographic signals is an important stage on neurological investigations that aim to understand the state of the neuromuscular system. In this context, the identification and clustering of MUAPs that exhibit common characteristics, and the assessment of which data features are most relevant for the definition of such cluster structure are central issues. In this paper, we propose the application of an unsupervised Feature Relevance Determination (FRD) method to the analysis of experimental MUAPs obtained from healthy human subjects. In contrast to approaches that require the knowledge of a priori information from the data, this FRD method is embedded on a constrained mixture model, known as Generative Topographic Mapping, which simultaneously performs clustering and visualization of MUAPs. The experimental results of the analysis of a data set consisting of MUAPs measured from the surface of the First Dorsal Interosseous, a hand muscle, indicate that the MUAP features corresponding to the hyperpolarization period in the physisiological process of generation of muscle fibre action potentials are consistently estimated as the most relevant and, therefore, as those that should be paid preferential attention for the interpretation of the MUAP groupings.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Discrete data representations are necessary, or at least convenient, in many machine learning problems. While feature selection (FS) techniques aim at finding relevant subsets of features, the goal of feature discretization (FD) is to find concise (quantized) data representations, adequate for the learning task at hand. In this paper, we propose two incremental methods for FD. The first method belongs to the filter family, in which the quality of the discretization is assessed by a (supervised or unsupervised) relevance criterion. The second method is a wrapper, where discretized features are assessed using a classifier. Both methods can be coupled with any static (unsupervised or supervised) discretization procedure and can be used to perform FS as pre-processing or post-processing stages. The proposed methods attain efficient representations suitable for binary and multi-class problems with different types of data, being competitive with existing methods. Moreover, using well-known FS methods with the features discretized by our techniques leads to better accuracy than with the features discretized by other methods or with the original features. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.
Resumo:
Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster.
Resumo:
In machine learning and pattern recognition tasks, the use of feature discretization techniques may have several advantages. The discretized features may hold enough information for the learning task at hand, while ignoring minor fluctuations that are irrelevant or harmful for that task. The discretized features have more compact representations that may yield both better accuracy and lower training time, as compared to the use of the original features. However, in many cases, mainly with medium and high-dimensional data, the large number of features usually implies that there is some redundancy among them. Thus, we may further apply feature selection (FS) techniques on the discrete data, keeping the most relevant features, while discarding the irrelevant and redundant ones. In this paper, we propose relevance and redundancy criteria for supervised feature selection techniques on discrete data. These criteria are applied to the bin-class histograms of the discrete features. The experimental results, on public benchmark data, show that the proposed criteria can achieve better accuracy than widely used relevance and redundancy criteria, such as mutual information and the Fisher ratio.
Resumo:
Dissertação de mestrado integrado em Engenharia Biomédica (área de especialização em Engenharia Clínica)
Resumo:
Voluntary selective attention can prioritize different features in a visual scene. The frontal eye-fields (FEF) are one potential source of such feature-specific top-down signals, but causal evidence for influences on visual cortex (as was shown for "spatial" attention) has remained elusive. Here, we show that transcranial magnetic stimulation (TMS) applied to right FEF increased the blood oxygen level-dependent (BOLD) signals in visual areas processing "target feature" but not in "distracter feature"-processing regions. TMS-induced BOLD signals increase in motion-responsive visual cortex (MT+) when motion was attended in a display with moving dots superimposed on face stimuli, but in face-responsive fusiform area (FFA) when faces were attended to. These TMS effects on BOLD signal in both regions were negatively related to performance (on the motion task), supporting the behavioral relevance of this pathway. Our findings provide new causal evidence for the human FEF in the control of nonspatial "feature"-based attention, mediated by dynamic influences on feature-specific visual cortex that vary with the currently attended property.
Resumo:
The purpose of this study is to determine the expression of CCL19, CCL21, and CCR7 in samples of oral squamous cell carcinoma (OSCC) and their relationship with clinical and microscopic parameters. A comparative analysis was made of the mRNA expression of these chemokines and receptor in OSCC and normal oral mucosa. The immunoexpression of CCR7, CCL19, and CCL21 was also verified in OSCC and lymph nodes. Statistical significance was accepted at P < 0.05. Similar levels of CCR7, CCL19, and CCL21 mRNA in OSCC and normal oral mucosa were seen. A low expression of CCL19 and CCL21 in the intra- and peritumoral regions was observed. Scarce CCL19+ and CCL21+ cells were also noted in metastatic and non-metastatic lymph nodes. No association was found between the expression of these chemokines and clinical and microscopic parameters. Our findings would suggest that CCL19 and CCL21 may not be associated with cervical lymph node metastasis or other clinical and microscopic factors in OSCC. © 2012 International Society of Oncology and BioMarkers (ISOBM).
Resumo:
The role of platelets as inflammatory cells is demonstrated by the fact that they can release many growth factors and inflammatory mediators, including chemokines, when they are activated. The best known platelet chemokine family members are platelet factor 4 (PF4) and beta-thromboglobulin (beta-TG), which are synthesized in megakaryocytes, stored as preformed proteins in alpha-granules and released from activated platelets. However, platelets also contain many other chemokines such as interleukin-8 (IL-8), growth-regulating oncogene-alpha(GRO-alpha), epithelial neutrophil-activating protein 78 (ENA-78), regulated on activation normal T expressed and secreted (RANTES), macrophage inflammatory protein-1alpha (MIP-1alpha), and monocyte chemotactic protein-3 (MCP-3). They also express chemokine receptors such as CCR4, CXCR4, CCR1 and CCR3. Platelet activation is a feature of many inflammatory diseases such as heparin-induced thrombocytopenia, acquired immunodeficiency syndrome, and congestive heart failure. Substantial amounts of PF4, beta-TG and RANTES are released from platelets on activation, which may occur during storage. Although very few data are available on the in vivo effects of transfused chemokines, it has been suggested that the high incidence of adverse reactions often observed after platelet transfusions may be attributed to the chemokines present in the plasma of stored platelet concentrates.
Resumo:
INTRODUCTION Agonistic antibodies targeting TRAIL-receptors 1 and 2 (TRAIL-R1 and TRAIL-R2) are being developed as a novel therapeutic approach in cancer therapy including pancreatic cancer. However, the cellular distribution of these receptors in primary pancreatic cancer samples has not been sufficiently investigated and no study has yet addressed the issue of their prognostic significance in this tumor entity. AIMS AND METHODS Applying tissue microarray (TMA) analysis, we performed an immunohistochemical assessment of TRAIL-receptors in surgical samples from 84 consecutive patients affected by pancreatic adenocarcinoma and in 26 additional selected specimens from patients with no lymph nodes metastasis at the time of surgery. The prognostic significance of membrane staining and staining intensity for TRAIL-receptors was evaluated. RESULTS The fraction of pancreatic cancer samples with positive membrane staining for TRAIL-R1 and TRAIL-R2 was lower than that of cells from surrounding non-tumor tissues (TRAIL-R1: p<0.001, TRAIL-R2: p = 0.006). In addition, subgroup analyses showed that loss of membrane staining for TRAIL-R2 was associated with poorer prognosis in patients without nodal metastases (multivariate Cox regression analysis, Hazard Ratio: 0.44 [95% confidence interval: 0.22-0.87]; p = 0.019). In contrast, analysis of decoy receptors TRAIL-R3 and -R4 in tumor samples showed an exclusively cytoplasmatic staining pattern and no prognostic relevance. CONCLUSION This is a first report on the prognostic significance of TRAIL-receptors expression in pancreatic cancer showing that TRAIL-R2 might represent a prognostic marker for patients with early stage disease. In addition, our data suggest that loss of membrane-bound TRAIL-receptors could represent a molecular mechanism for therapeutic failure upon administration of TRAIL-receptors-targeting antibodies in pancreatic cancer. This hypothesis should be evaluated in future clinical trials.
Resumo:
Most data stream classification techniques assume that the underlying feature space is static. However, in real-world applications the set of features and their relevance to the target concept may change over time. In addition, when the underlying concepts reappear, reusing previously learnt models can enhance the learning process in terms of accuracy and processing time at the expense of manageable memory consumption. In this paper, we propose mining recurring concepts in a dynamic feature space (MReC-DFS), a data stream classification system to address the challenges of learning recurring concepts in a dynamic feature space while simultaneously reducing the memory cost associated with storing past models. MReC-DFS is able to detect and adapt to concept changes using the performance of the learning process and contextual information. To handle recurring concepts, stored models are combined in a dynamically weighted ensemble. Incremental feature selection is performed to reduce the combined feature space. This contribution allows MReC-DFS to store only the features most relevant to the learnt concepts, which in turn increases the memory efficiency of the technique. In addition, an incremental feature selection method is proposed that dynamically determines the threshold between relevant and irrelevant features. Experimental results demonstrating the high accuracy of MReC-DFS compared with state-of-the-art techniques on a variety of real datasets are presented. The results also show the superior memory efficiency of MReC-DFS.
Resumo:
Nonlinear analysis tools for studying and characterizing the dynamics of physiological signals have gained popularity, mainly because tracking sudden alterations of the inherent complexity of biological processes might be an indicator of altered physiological states. Typically, in order to perform an analysis with such tools, the physiological variables that describe the biological process under study are used to reconstruct the underlying dynamics of the biological processes. For that goal, a procedure called time-delay or uniform embedding is usually employed. Nonetheless, there is evidence of its inability for dealing with non-stationary signals, as those recorded from many physiological processes. To handle with such a drawback, this paper evaluates the utility of non-conventional time series reconstruction procedures based on non uniform embedding, applying them to automatic pattern recognition tasks. The paper compares a state of the art non uniform approach with a novel scheme which fuses embedding and feature selection at once, searching for better reconstructions of the dynamics of the system. Moreover, results are also compared with two classic uniform embedding techniques. Thus, the goal is comparing uniform and non uniform reconstruction techniques, including the one proposed in this work, for pattern recognition in biomedical signal processing tasks. Once the state space is reconstructed, the scheme followed characterizes with three classic nonlinear dynamic features (Largest Lyapunov Exponent, Correlation Dimension and Recurrence Period Density Entropy), while classification is carried out by means of a simple k-nn classifier. In order to test its generalization capabilities, the approach was tested with three different physiological databases (Speech Pathologies, Epilepsy and Heart Murmurs). In terms of the accuracy obtained to automatically detect the presence of pathologies, and for the three types of biosignals analyzed, the non uniform techniques used in this work lightly outperformed the results obtained using the uniform methods, suggesting their usefulness to characterize non-stationary biomedical signals in pattern recognition applications. On the other hand, in view of the results obtained and its low computational load, the proposed technique suggests its applicability for the applications under study.