865 resultados para Associative Classifiers
Resumo:
Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.
Machine Learning applicato al Web Semantico: Statistical Relational Learning vs Tensor Factorization
Resumo:
Obiettivo della tesi è analizzare e testare i principali approcci di Machine Learning applicabili in contesti semantici, partendo da algoritmi di Statistical Relational Learning, quali Relational Probability Trees, Relational Bayesian Classifiers e Relational Dependency Networks, per poi passare ad approcci basati su fattorizzazione tensori, in particolare CANDECOMP/PARAFAC, Tucker e RESCAL.
Resumo:
Satellite image classification involves designing and developing efficient image classifiers. With satellite image data and image analysis methods multiplying rapidly, selecting the right mix of data sources and data analysis approaches has become critical to the generation of quality land-use maps. In this study, a new postprocessing information fusion algorithm for the extraction and representation of land-use information based on high-resolution satellite imagery is presented. This approach can produce land-use maps with sharp interregional boundaries and homogeneous regions. The proposed approach is conducted in five steps. First, a GIS layer - ATKIS data - was used to generate two coarse homogeneous regions, i.e. urban and rural areas. Second, a thematic (class) map was generated by use of a hybrid spectral classifier combining Gaussian Maximum Likelihood algorithm (GML) and ISODATA classifier. Third, a probabilistic relaxation algorithm was performed on the thematic map, resulting in a smoothed thematic map. Fourth, edge detection and edge thinning techniques were used to generate a contour map with pixel-width interclass boundaries. Fifth, the contour map was superimposed on the thematic map by use of a region-growing algorithm with the contour map and the smoothed thematic map as two constraints. For the operation of the proposed method, a software package is developed using programming language C. This software package comprises the GML algorithm, a probabilistic relaxation algorithm, TBL edge detector, an edge thresholding algorithm, a fast parallel thinning algorithm, and a region-growing information fusion algorithm. The county of Landau of the State Rheinland-Pfalz, Germany was selected as a test site. The high-resolution IRS-1C imagery was used as the principal input data.
Resumo:
Definition of acute renal allograft rejection (AR) markers remains clinically relevant. Features of T-cell-mediated AR are tubulointerstitial and vascular inflammation associated with excessive extracellular matrix (ECM) remodeling, regulated by metzincins, including matrix metalloproteases (MMP). Our study focused on expression of metzincins (METS), and metzincins and related genes (MARGS) in renal allograft biopsies using four independent microarray data sets. Our own cases included normal histology (N, n = 20), borderline changes (BL, n = 4), AR (n = 10) and AR + IF/TA (n = 7). MARGS enriched in all data sets were further examined on mRNA and/or protein level in additional patients. METS and MARGS differentiated AR from BL, AR + IF/TA and N in a principal component analysis. Their expression changes correlated to Banff t- and i-scores. Two AR classifiers, based on METS (including MMP7, TIMP1), or on MARGS were established in our own and validated in the three additional data sets. Thirteen MARGS were significantly enriched in AR patients of all data sets comprising MMP7, -9, TIMP1, -2, thrombospondin2 (THBS2) and fibrillin1. RT-PCR using microdissected glomeruli/tubuli confirmed MMP7, -9 and THBS2 microarray results; immunohistochemistry showed augmentation of MMP2, -9 and TIMP1 in AR. TIMP1 and THBS2 were enriched in AR patient serum. Therefore, differentially expressed METS and MARGS especially TIMP1, MMP7/-9 represent potential molecular AR markers.
Resumo:
We explored the functional organization of semantic memory for music by comparing priming across familiar songs both within modalities (Experiment 1, tune to tune; Experiment 3, category label to lyrics) and across modalities (Experiment 2, category label to tune; Experiment 4, tune to lyrics). Participants judged whether or not the target tune or lyrics were real (akin to lexical decision tasks). We found significant priming, analogous to linguistic associative-priming effects, in reaction times for related primes as compared to unrelated primes, but primarily for within-modality comparisons. Reaction times to tunes (e.g., "Silent Night") were faster following related tunes ("Deck the Hall") than following unrelated tunes ("God Bless America"). However, a category label (e.g., Christmas) did not prime tunes from within that category. Lyrics were primed by a related category label, but not by a related tune. These results support the conceptual organization of music in semantic memory, but with potentially weaker associations across modalities.
Resumo:
We examined age differences in the effectiveness of multiple repetitions and providing associative facts on tune memory. For both tune and fact recognition, three presentations were beneficial. Age was irrelevant in fact recognition, but older adults were less successful than younger in tune recognition. The associative fact did not affect young adults' performance. Among older people, the neutral association harmed performance; the emotional fact mitigated performance back to baseline. Young adults seemed to rely solely on procedural memory, or repetition, to learn tunes. Older adults benefitted by using emotional associative information to counteract memory burdens imposed by neutral associative information.
Resumo:
Groups preserving a distributive product are encountered often in algebra. Examples include automorphism groups of associative and nonassociative rings, classical groups, and automorphism groups of p-groups. While the great variety of such products precludes any realistic hope of describing the general structure of the groups that preserve them, it is reasonable to expect that insight may be gained from an examination of the universal distributive products: tensor products. We give a detailed description of the groups preserving tensor products over semisimple and semiprimary rings, and present effective algorithms to construct generators for these groups. We also discuss applications of our methods to algorithmic problems for which all currently known methods require an exponential amount of work. (C) 2013 Elsevier B.V. All rights reserved.
Resumo:
The process of learning the categories of new tunes in older and younger adults was examined for this study. Tunes were presented either one or three times along with a category name to see if multiple repetitions aid in category memory. Additionally, toexamine if an association may help some listeners, especially older ones, to better remember category information, some tunes were presented with a short associative fact; this fact was either neutral or emotional. Participants were tested on song recognition,fact recognition, and category memory. For all tasks, there was a benefit of three presentations. There were no age differences in fact recognition. For both song recognition and categorization, the memory burden of a neutral association was lessened when the association was emotional.
Resumo:
Current methods to characterize mesenchymal stem cells (MSCs) are limited to CD marker expression, plastic adherence and their ability to differentiate into adipogenic, osteogenic and chondrogenic precursors. It seems evident that stem cells undergoing differentiation should differ in many aspects, such as morphology and possibly also behaviour; however, such a correlation has not yet been exploited for fate prediction of MSCs. Primary human MSCs from bone marrow were expanded and pelleted to form high-density cultures and were then randomly divided into four groups to differentiate into adipogenic, osteogenic chondrogenic and myogenic progenitor cells. The cells were expanded as heterogeneous and tracked with time-lapse microscopy to record cell shape, using phase-contrast microscopy. The cells were segmented using a custom-made image-processing pipeline. Seven morphological features were extracted for each of the segmented cells. Statistical analysis was performed on the seven-dimensional feature vectors, using a tree-like classification method. Differentiation of cells was monitored with key marker genes and histology. Cells in differentiation media were expressing the key genes for each of the three pathways after 21 days, i.e. adipogenic, osteogenic and chondrogenic, which was also confirmed by histological staining. Time-lapse microscopy data were obtained and contained new evidence that two cell shape features, eccentricity and filopodia (= 'fingers') are highly informative to classify myogenic differentiation from all others. However, no robust classifiers could be identified for the other cell differentiation paths. The results suggest that non-invasive automated time-lapse microscopy could potentially be used to predict the stem cell fate of hMSCs for clinical application, based on morphology for earlier time-points. The classification is challenged by cell density, proliferation and possible unknown donor-specific factors, which affect the performance of morphology-based approaches. Copyright © 2012 John Wiley & Sons, Ltd.
Resumo:
In clinical diagnostics, it is of outmost importance to correctly identify the source of a metastatic tumor, especially if no apparent primary tumor is present. Tissue-based proteomics might allow correct tumor classification. As a result, we performed MALDI imaging to generate proteomic signatures for different tumors. These signatures were used to classify common cancer types. At first, a cohort comprised of tissue samples from six adenocarcinoma entities located at different organ sites (esophagus, breast, colon, liver, stomach, thyroid gland, n = 171) was classified using two algorithms for a training and test set. For the test set, Support Vector Machine and Random Forest yielded overall accuracies of 82.74 and 81.18%, respectively. Then, colon cancer liver metastasis samples (n = 19) were introduced into the classification. The liver metastasis samples could be discriminated with high accuracy from primary tumors of colon cancer and hepatocellular carcinoma. Additionally, colon cancer liver metastasis samples could be successfully classified by using colon cancer primary tumor samples for the training of the classifier. These findings demonstrate that MALDI imaging-derived proteomic classifiers can discriminate between different tumor types at different organ sites and in the same site.
Resumo:
Deep brain stimulation (DBS) for Parkinson's disease often alleviates the motor symptoms, but causes cognitive and emotional side effects in a substantial number of cases. Identification of the motor part of the subthalamic nucleus (STN) as part of the presurgical workup could minimize these adverse effects. In this study, we assessed the STN's connectivity to motor, associative, and limbic brain areas, based on structural and functional connectivity analysis of volunteer data. For the structural connectivity, we used streamline counts derived from HARDI fiber tracking. The resulting tracks supported the existence of the so-called "hyperdirect" pathway in humans. Furthermore, we determined the connectivity of each STN voxel with the motor cortical areas. Functional connectivity was calculated based on functional MRI, as the correlation of the signal within a given brain voxel with the signal in the STN. Also, the signal per STN voxel was explained in terms of the correlation with motor or limbic brain seed ROI areas. Both right and left STN ROIs appeared to be structurally and functionally connected to brain areas that are part of the motor, associative, and limbic circuit. Furthermore, this study enabled us to assess the level of segregation of the STN motor part, which is relevant for the planning of STN DBS procedures.
Resumo:
Pavlovian fear conditioning, a simple form of associative learning, is thought to involve the induction of associative, NMDA receptor-dependent long-term potentiation (LTP) in the lateral amygdala. Using a combined genetic and electrophysiological approach, we show here that lack of a specific GABA(B) receptor subtype, GABA(B(1a,2)), unmasks a nonassociative, NMDA receptor-independent form of presynaptic LTP at cortico-amygdala afferents. Moreover, the level of presynaptic GABA(B(1a,2)) receptor activation, and hence the balance between associative and nonassociative forms of LTP, can be dynamically modulated by local inhibitory activity. At the behavioral level, genetic loss of GABA(B(1a)) results in a generalization of conditioned fear to nonconditioned stimuli. Our findings indicate that presynaptic inhibition through GABA(B(1a,2)) receptors serves as an activity-dependent constraint on the induction of homosynaptic plasticity, which may be important to prevent the generalization of conditioned fear.
Resumo:
With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of "signature" protein profiles specific to each pathologic state (e.g., normal vs. cancer) or differential profiles between experimental conditions (e.g., treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data analytic strategy for discovering protein biomarkers based on such high-dimensional mass-spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique. Our data analytic strategy takes properties of the SELDI mass-spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After these pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery.
Resumo:
The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.
Resumo:
The amygdala has been studied extensively for its critical role in associative fear conditioning in animals and humans. Noxious stimuli, such as those used for fear conditioning, are most effective in eliciting behavioral responses and amygdala activation when experienced in an unpredictable manner. Here, we show, using a translational approach in mice and humans, that unpredictability per se without interaction with motivational information is sufficient to induce sustained neural activity in the amygdala and to elicit anxiety-like behavior. Exposing mice to mere temporal unpredictability within a time series of neutral sound pulses in an otherwise neutral sensory environment increased expression of the immediate-early gene c-fos and prevented rapid habituation of single neuron activity in the basolateral amygdala. At the behavioral level, unpredictable, but not predictable, auditory stimulation induced avoidance and anxiety-like behavior. In humans, functional magnetic resonance imaging revealed that temporal unpredictably causes sustained neural activity in amygdala and anxiety-like behavior as quantified by enhanced attention toward emotional faces. Our findings show that unpredictability per se is an important feature of the sensory environment influencing habituation of neuronal activity in amygdala and emotional behavior and indicate that regulation of amygdala habituation represents an evolutionary-conserved mechanism for adapting behavior in anticipation of temporally unpredictable events.