897 resultados para information bottleneck method


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In image processing, segmentation algorithms constitute one of the main focuses of research. In this paper, new image segmentation algorithms based on a hard version of the information bottleneck method are presented. The objective of this method is to extract a compact representation of a variable, considered the input, with minimal loss of mutual information with respect to another variable, considered the output. First, we introduce a split-and-merge algorithm based on the definition of an information channel between a set of regions (input) of the image and the intensity histogram bins (output). From this channel, the maximization of the mutual information gain is used to optimize the image partitioning. Then, the merging process of the regions obtained in the previous phase is carried out by minimizing the loss of mutual information. From the inversion of the above channel, we also present a new histogram clustering algorithm based on the minimization of the mutual information loss, where now the input variable represents the histogram bins and the output is given by the set of regions obtained from the above split-and-merge algorithm. Finally, we introduce two new clustering algorithms which show how the information bottleneck method can be applied to the registration channel obtained when two multimodal images are correctly aligned. Different experiments on 2-D and 3-D images show the behavior of the proposed algorithms

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information Bottleneck method can be used as a dimensionality reduction approach by grouping “similar” features together [1]. In application, a natural question is how many “features groups” will be appropriate. The dependency on prior knowledge restricts the applications of many Information Bottleneck algorithms. In this paper we alleviate this dependency by formulating the parameter determination as a model selection problem, and solve it using the minimum message length principle. An efficient encoding scheme is designed to describe the information bottleneck solutions and the original data, then the minimum message length principle is incorporated to automatically determine the optimal cardinality value. Empirical results in the documentation clustering scenario indicates that the proposed method works well for the determination of the optimal parameter value for information bottleneck method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Information Bottleneck method aims to extract a compact representation which preserves the maximum relevant information. The sub-optimality in agglomerative Information Bottleneck (aIB) algorithm restricts the applications of Information Bottleneck method. In this paper, the concept of density-based chains is adopted to evaluate the information loss among the neighbors of an element, rather than the information loss between pairs of elements. The DaIB algorithm is then presented to alleviate the sub-optimality problem in aIB while simultaneously keeping the useful hierarchical clustering tree-structure. The experiment results on the benchmark data sets show that the DaIB algorithm can get more relevant information and higher precision than aIB algorithm, and the paired t-test indicates that these improvements are statistically significant.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clustering with the agglomerative Information Bottleneck (aIB) algorithm suffers from the sub-optimality problem, which cannot guarantee to preserve as much relative information as possible. To handle this problem, we introduce a density connectivity chain, by which we consider not only the information between two data elements, but also the information among the neighbors of a data element. Based on this idea, we propose DCIB, a Density Connectivity Information Bottleneck algorithm that applies the Information Bottleneck method to quantify the relative information during the clustering procedure. As a hierarchical algorithm, the DCIB algorithm produces a pruned clustering tree-structure and gets clustering results in different sizes in a single execution. The experiment results in the documentation clustering indicate that the DCIB algorithm can preserve more relative information and achieve higher precision than the aIB algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Understanding the guiding principles of sensory coding strategies is a main goal in computational neuroscience. Among others, the principles of predictive coding and slowness appear to capture aspects of sensory processing. Predictive coding postulates that sensory systems are adapted to the structure of their input signals such that information about future inputs is encoded. Slow feature analysis (SFA) is a method for extracting slowly varying components from quickly varying input signals, thereby learning temporally invariant features. Here, we use the information bottleneck method to state an information-theoretic objective function for temporally local predictive coding. We then show that the linear case of SFA can be interpreted as a variant of predictive coding that maximizes the mutual information between the current output of the system and the input signal in the next time step. This demonstrates that the slowness principle and predictive coding are intimately related.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An online transaction always retrieves a large amount of information before making decisions. Currently, the parallel methods for retrieving such information can only provide a similar performance to serial methods. In this paper we first perform an analysis to determine the factors that affect the performance of exiting methods, i.e., HQR and EHQR, and show that the several of these factors are not considered by these methods. Motivated by this, we propose a new dispatch scheme called AEHQR, which takes into account the features of parallel dispatching. In addition, we provide cost models that determine the optimal performance achievable by any parallel dispatching method. Using experimental comparison, we illustrate that the AEHQR is significantly outperforms the HQR and EHQR under all conditions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Satellite image classification involves designing and developing efficient image classifiers. With satellite image data and image analysis methods multiplying rapidly, selecting the right mix of data sources and data analysis approaches has become critical to the generation of quality land-use maps. In this study, a new postprocessing information fusion algorithm for the extraction and representation of land-use information based on high-resolution satellite imagery is presented. This approach can produce land-use maps with sharp interregional boundaries and homogeneous regions. The proposed approach is conducted in five steps. First, a GIS layer - ATKIS data - was used to generate two coarse homogeneous regions, i.e. urban and rural areas. Second, a thematic (class) map was generated by use of a hybrid spectral classifier combining Gaussian Maximum Likelihood algorithm (GML) and ISODATA classifier. Third, a probabilistic relaxation algorithm was performed on the thematic map, resulting in a smoothed thematic map. Fourth, edge detection and edge thinning techniques were used to generate a contour map with pixel-width interclass boundaries. Fifth, the contour map was superimposed on the thematic map by use of a region-growing algorithm with the contour map and the smoothed thematic map as two constraints. For the operation of the proposed method, a software package is developed using programming language C. This software package comprises the GML algorithm, a probabilistic relaxation algorithm, TBL edge detector, an edge thresholding algorithm, a fast parallel thinning algorithm, and a region-growing information fusion algorithm. The county of Landau of the State Rheinland-Pfalz, Germany was selected as a test site. The high-resolution IRS-1C imagery was used as the principal input data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we propose a novel filter for feature selection. Such filter relies on the estimation of the mutual information between features and classes. We bypass the estimation of the probability density function with the aid of the entropic-graphs approximation of Rényi entropy, and the subsequent approximation of the Shannon one. The complexity of such bypassing process does not depend on the number of dimensions but on the number of patterns/samples, and thus the curse of dimensionality is circumvented. We show that it is then possible to outperform a greedy algorithm based on the maximal relevance and minimal redundancy criterion. We successfully test our method both in the contexts of image classification and microarray data classification.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Introduction This paper reports on university students' experiences of learning information literacy. Method Phenomenography was selected as the research approach as it describes the experience from the perspective of the study participants, which in this case is a mixture of undergraduate and postgraduate students studying education at an Australian university. Semi-structured, one-on-one interviews were conducted with fifteen students. Analysis The interview transcripts were iteratively reviewed for similarities and differences in students' experiences of learning information literacy. Categories were constructed from an analysis of the distinct features of the experiences that students reported. The categories were grouped into a hierarchical structure that represents students' increasingly sophisticated experiences of learning information literacy. Results The study reveals that students experience learning information literacy in six ways: learning to find information; learning a process to use information; learning to use information to create a product; learning to use information to build a personal knowledge base; learning to use information to advance disciplinary knowledge; and learning to use information to grow as a person and to contribute to others. Conclusions Understanding the complexity of the concept of information literacy, and the collective and diverse range of ways students experience learning information literacy, enables academics and librarians to draw on the range of experiences reported by students to design academic curricula and information literacy education that targets more powerful ways of learning to find and use information.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper outlines a novel information sharing method using Binary Decision Diagrams (BBDs). It is inspired by the work of Al-Shaer and Hamed, who applied BDDs into the modelling of network firewalls. This is applied into an information sharing policy system which optimizes the search of redundancy, shadowing, generalisation and correlation within information sharing rules.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We use an information-theoretic method developed by Neifeld and Lee [J. Opt. Soc. Am. A 25, C31 (2008)] to analyze the performance of a slow-light system. Slow-light is realized in this system via stimulated Brillouin scattering in a 2 km-long, room-temperature, highly nonlinear fiber pumped by a laser whose spectrum is tailored and broadened to 5 GHz. We compute the information throughput (IT), which quantifies the fraction of information transferred from the source to the receiver and the information delay (ID), which quantifies the delay of a data stream at which the information transfer is largest, for a range of experimental parameters. We also measure the eye-opening (EO) and signal-to-noise ratio (SNR) of the transmitted data stream and find that they scale in a similar fashion to the information-theoretic method. Our experimental findings are compared to a model of the slow-light system that accounts for all pertinent noise sources in the system as well as data-pulse distortion due to the filtering effect of the SBS process. The agreement between our observations and the predictions of our model is very good. Furthermore, we compare measurements of the IT for an optimal flattop gain profile and for a Gaussian-shaped gain profile. For a given pump-beam power, we find that the optimal profile gives a 36% larger ID and somewhat higher IT compared to the Gaussian profile. Specifically, the optimal (Gaussian) profile produces a fractional slow-light ID of 0.94 (0.69) and an IT of 0.86 (0.86) at a pump-beam power of 450 mW and a data rate of 2.5 Gbps. Thus, the optimal profile better utilizes the available pump-beam power, which is often a valuable resource in a system design.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We present an information-theoretic method to measure the structural information content of networks and apply it to chemical graphs. As a result, we find that our entropy measure is more general than classical information indices known in mathematical and computational chemistry. Further, we demonstrate that our measure reflects the essence of molecular branching meaningfully by determining the structural information content of some chemical graphs numerically.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The time-of-detection method for aural avian point counts is a new method of estimating abundance, allowing for uncertain probability of detection. The method has been specifically designed to allow for variation in singing rates of birds. It involves dividing the time interval of the point count into several subintervals and recording the detection history of the subintervals when each bird sings. The method can be viewed as generating data equivalent to closed capture–recapture information. The method is different from the distance and multiple-observer methods in that it is not required that all the birds sing during the point count. As this method is new and there is some concern as to how well individual birds can be followed, we carried out a field test of the method using simulated known populations of singing birds, using a laptop computer to send signals to audio stations distributed around a point. The system mimics actual aural avian point counts, but also allows us to know the size and spatial distribution of the populations we are sampling. Fifty 8-min point counts (broken into four 2-min intervals) using eight species of birds were simulated. Singing rate of an individual bird of a species was simulated following a Markovian process (singing bouts followed by periods of silence), which we felt was more realistic than a truly random process. The main emphasis of our paper is to compare results from species singing at (high and low) homogenous rates per interval with those singing at (high and low) heterogeneous rates. Population size was estimated accurately for the species simulated, with a high homogeneous probability of singing. Populations of simulated species with lower but homogeneous singing probabilities were somewhat underestimated. Populations of species simulated with heterogeneous singing probabilities were substantially underestimated. Underestimation was caused by both the very low detection probabilities of all distant individuals and by individuals with low singing rates also having very low detection probabilities.