21 resultados para Feature space
Resumo:
In this paper, we use the quantum Jensen-Shannon divergence as a means to establish the similarity between a pair of graphs and to develop a novel graph kernel. In quantum theory, the quantum Jensen-Shannon divergence is defined as a distance measure between quantum states. In order to compute the quantum Jensen-Shannon divergence between a pair of graphs, we first need to associate a density operator with each of them. Hence, we decide to simulate the evolution of a continuous-time quantum walk on each graph and we propose a way to associate a suitable quantum state with it. With the density operator of this quantum state to hand, the graph kernel is defined as a function of the quantum Jensen-Shannon divergence between the graph density operators. We evaluate the performance of our kernel on several standard graph datasets from bioinformatics. We use the Principle Component Analysis (PCA) on the kernel matrix to embed the graphs into a feature space for classification. The experimental results demonstrate the effectiveness of the proposed approach. © 2013 Springer-Verlag.
Resumo:
Sentiment classification over Twitter is usually affected by the noisy nature (abbreviations, irregular forms) of tweets data. A popular procedure to reduce the noise of textual data is to remove stopwords by using pre-compiled stopword lists or more sophisticated methods for dynamic stopword identification. However, the effectiveness of removing stopwords in the context of Twitter sentiment classification has been debated in the last few years. In this paper we investigate whether removing stopwords helps or hampers the effectiveness of Twitter sentiment classification methods. To this end, we apply six different stopword identification methods to Twitter data from six different datasets and observe how removing stopwords affects two well-known supervised sentiment classification methods. We assess the impact of removing stopwords by observing fluctuations on the level of data sparsity, the size of the classifier's feature space and its classification performance. Our results show that using pre-compiled lists of stopwords negatively impacts the performance of Twitter sentiment classification approaches. On the other hand, the dynamic generation of stopword lists, by removing those infrequent terms appearing only once in the corpus, appears to be the optimal method to maintaining a high classification performance while reducing the data sparsity and substantially shrinking the feature space
Resumo:
Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community.
Resumo:
Influential models of edge detection have generally supposed that an edge is detected at peaks in the 1st derivative of the luminance profile, or at zero-crossings in the 2nd derivative. However, when presented with blurred triangle-wave images, observers consistently marked edges not at these locations, but at peaks in the 3rd derivative. This new phenomenon, termed ‘Mach edges’ persisted when a luminance ramp was added to the blurred triangle-wave. Modelling of these Mach edge detection data required the addition of a physiologically plausible filter, prior to the 3rd derivative computation. A viable alternative model was examined, on the basis of data obtained with short-duration, high spatial-frequency stimuli. Detection and feature-making methods were used to examine the perception of Mach bands in an image set that spanned a range of Mach band detectabilities. A scale-space model that computed edge and bar features in parallel provided a better fit to the data than 4 competing models that combined information across scale in a different manner, or computed edge or bar features at a single scale. The perception of luminance bars was examined in 2 experiments. Data for one image-set suggested a simple rule for perception of a small Gaussian bar on a larger inverted Gaussian bar background. In previous research, discriminability (d’) has typically been reported to be a power function of contrast, where the exponent (p) is 2 to 3. However, using bar, grating, and Gaussian edge stimuli, with several methodologies, values of p were obtained that ranged from 1 to 1.7 across 6 experiments. This novel finding was explained by appealing to low stimulus uncertainty, or a near-linear transducer.
Resumo:
Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.
Resumo:
Africas World Cup: Critical Reflections on Play, Patriotism, Spectatorship, and Space focuses on a remarkable month in the modern history of Africa and in the global history of football. Peter Alegi and Chris Bolsmann are well-known experts on South African football, and they have assembled an impressive team of local and international journalists, academics, and football experts to reflect on the 2010 World Cup and its broader significance, its meanings, complexities, and contradictions. The World Cups sounds, sights, and aesthetics are explored, along with questions of patriotism, nationalism, and spectatorship in Africa and around the world. Experts on urban design and communities write on how the presence of the World Cup worked to refashion urban spaces and negotiate the local struggles in the hosting cities. The volume is richly illustrated by authors photographs, and the essays in this volume feature chronicles of match day experiences; travelogues; ethnographies of fan cultures; analyses of print, broadcast, and electronic media coverage of the tournament; reflections on the World Cups private and public spaces; football exhibits in South African museums; and critiques of the World Cups processes of inclusion and exclusion, as well as its political and economic legacies. The volume concludes with a forum on the World Cup, including Thabo Dladla, Director of Soccer at the University of KwaZulu-Natal, Mohlomi Kekeletso Maubane, a well-known Soweto-based writer and a soccer researcher, and Rodney Reiners, former professional footballer and current chief soccer writer for the Cape Argus newspaper in Cape Town. This collection will appeal to students, scholars, journalists, and fans. Cover illustration: South African fan blowing his vuvuzela at South Africa vs. France, Free State Stadium, Bloemfontein, June 22, 2010. Photo by Chris Bolsmann.