13 resultados para mutual information

em Indian Institute of Science - Bangalore - Índia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Outlier detection in high dimensional categorical data has been a problem of much interest due to the extensive use of qualitative features for describing the data across various application areas. Though there exist various established methods for dealing with the dimensionality aspect through feature selection on numerical data, the categorical domain is actively being explored. As outlier detection is generally considered as an unsupervised learning problem due to lack of knowledge about the nature of various types of outliers, the related feature selection task also needs to be handled in a similar manner. This motivates the need to develop an unsupervised feature selection algorithm for efficient detection of outliers in categorical data. Addressing this aspect, we propose a novel feature selection algorithm based on the mutual information measure and the entropy computation. The redundancy among the features is characterized using the mutual information measure for identifying a suitable feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The significance of treating rainfall as a chaotic system instead of a stochastic system for a better understanding of the underlying dynamics has been taken up by various studies recently. However, an important limitation of all these approaches is the dependence on a single method for identifying the chaotic nature and the parameters involved. Many of these approaches aim at only analyzing the chaotic nature and not its prediction. In the present study, an attempt is made to identify chaos using various techniques and prediction is also done by generating ensembles in order to quantify the uncertainty involved. Daily rainfall data of three regions with contrasting characteristics (mainly in the spatial area covered), Malaprabha, Mahanadi and All-India for the period 1955-2000 are used for the study. Auto-correlation and mutual information methods are used to determine the delay time for the phase space reconstruction. Optimum embedding dimension is determined using correlation dimension, false nearest neighbour algorithm and also nonlinear prediction methods. The low embedding dimensions obtained from these methods indicate the existence of low dimensional chaos in the three rainfall series. Correlation dimension method is done on th phase randomized and first derivative of the data series to check whether the saturation of the dimension is due to the inherent linear correlation structure or due to low dimensional dynamics. Positive Lyapunov exponents obtained prove the exponential divergence of the trajectories and hence the unpredictability. Surrogate data test is also done to further confirm the nonlinear structure of the rainfall series. A range of plausible parameters is used for generating an ensemble of predictions of rainfall for each year separately for the period 1996-2000 using the data till the preceding year. For analyzing the sensitiveness to initial conditions, predictions are done from two different months in a year viz., from the beginning of January and June. The reasonably good predictions obtained indicate the efficiency of the nonlinear prediction method for predicting the rainfall series. Also, the rank probability skill score and the rank histograms show that the ensembles generated are reliable with a good spread and skill. A comparison of results of the three regions indicates that although they are chaotic in nature, the spatial averaging over a large area can increase the dimension and improve the predictability, thus destroying the chaotic nature. (C) 2010 Elsevier Ltd. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Guo and Nixon proposed a feature selection method based on maximizing I(x; Y),the multidimensional mutual information between feature vector x and class variable Y. Because computing I(x; Y) can be difficult in practice, Guo and Nixon proposed an approximation of I(x; Y) as the criterion for feature selection. We show that Guo and Nixon's criterion originates from approximating the joint probability distributions in I(x; Y) by second-order product distributions. We remark on the limitations of the approximation and discuss computationally attractive alternatives to compute I(x; Y).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cooperative communication using rateless codes, in which the source transmits an infinite number of parity bits to the destination until the receipt of an acknowledgment, has recently attracted considerable interest. It provides a natural and efficient mechanism for accumulating mutual information from multiple transmitting relays. We develop an analysis of queued cooperative relay systems that combines the communication-theoretic transmission aspects of cooperative communication using rateless codes over Rayleigh fading channels with the queuing-theoretic aspects associated with buffering messages at the relays. Relay cooperation combined with queuing reduces the message transmission times and also helps distribute the traffic load in the network, which improves throughput significantly.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We consider Gaussian multiple-input multiple-output (MIMO) channels with discrete input alphabets. We propose a non-diagonal precoder based on X-Codes in to increase the mutual information. The MIMO channel is transformed into a set of parallel subchannels using Singular Value Decomposition (SVD) and X-codes are then used to pair the subchannels. X-Codes are fully characterized by the pairings and the 2 × 2 real rotation matrices for each pair (parameterized with a single angle). This precoding structure enables to express the total mutual information as a sum of the mutual information of all the pairs. The problem of finding the optimal precoder with the above structure, which maximizes the total mutual information, is equivalent to i) optimizing the rotation angle and the power allocation within each pair and ii) finding the optimal pairing and power allocation among the pairs. It is shown that the mutual information achieved with the proposed pairing scheme is very close to that achieved with the optimal precoder by Cruz et al., and significantly better than mercury/waterfilling strategy by Lozano et al.. Our approach greatly simplifies both the precoder optimization and the detection complexity, making it suitable for practical applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We consider Gaussian multiple-input multiple-output (MIMO) channels with discrete input alphabets. We propose a non-diagonal precoder based on the X-Codes in 1] to increase the mutual information. The MIMO channel is transformed into a set of parallel subchannels using singular value decomposition (SVD) and X-Codes are then used to pair the subchannels. X-Codes are fully characterized by the pairings and a 2 x 2 real rotation matrix for each pair (parameterized with a single angle). This precoding structure enables us to express the total mutual information as a sum of the mutual information of all the pairs. The problem of finding the optimal precoder with the above structure, which maximizes the total mutual information, is solved by: i) optimizing the rotation angle and the power allocation within each pair and ii) finding the optimal pairing and power allocation among the pairs. It is shown that the mutual information achieved with the proposed pairing scheme is very close to that achieved with the optimal precoder by Cruz et al., and is significantly better than Mercury/waterfilling strategy by Lozano et al. Our approach greatly simplifies both the precoder optimization and the detection complexity, making it suitable for practical applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

For an n(t) transmit, n(r) receive antenna system (n(t) x n(r) system), a full-rate space time block code (STBC) transmits at least n(min) = min(n(t), n(r))complex symbols per channel use. The well-known Golden code is an example of a full-rate, full-diversity STBC for two transmit antennas. Its ML-decoding complexity is of the order of M(2.5) for square M-QAM. The Silver code for two transmit antennas has all the desirable properties of the Golden code except its coding gain, but offers lower ML-decoding complexity of the order of M(2). Importantly, the slight loss in coding gain is negligible compared to the advantage it offers in terms of lowering the ML-decoding complexity. For higher number of transmit antennas, the best known codes are the Perfect codes, which are full-rate, full-diversity, information lossless codes (for n(r) >= n(t)) but have a high ML-decoding complexity of the order of M(ntnmin) (for n(r) < n(t), the punctured Perfect codes are considered). In this paper, a scheme to obtain full-rate STBCs for 2(a) transmit antennas and any n(r) with reduced ML-decoding complexity of the order of M(nt)(n(min)-3/4)-0.5 is presented. The codes constructed are also information lossless for >= n(t), like the Perfect codes, and allow higher mutual information than the comparable punctured Perfect codes for n(r) < n(t). These codes are referred to as the generalized Silver codes, since they enjoy the same desirable properties as the comparable Perfect codes (except possibly the coding gain) with lower ML-decoding complexity, analogous to the Silver code and the Golden code for two transmit antennas. Simulation results of the symbol error rates for four and eight transmit antennas show that the generalized Silver codes match the punctured Perfect codes in error performance while offering lower ML-decoding complexity.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Motivated by the recent Coherent Space-Time Shift Keying (CSTSK) philosophy, we construct new dispersion matrices for rotationally invariant PSK signaling sets. Given a specific PSK signal constellation, the dispersion matrices of the existing CSTSK scheme were chosen by maximizing the mutual information over randomly generated sets of dispersion matrices. In this contribution we propose a general method for constructing a set of structured dispersion matrices for arbitrary PSK signaling sets using Field Extension (FE) codes and then study the attainable Symbol Error Rate (SER) performance of some example constructions. We demonstrate that the proposed dispersion scheme is capable of outperforming the existing dispersion arrangement at medium to high SNRs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we explore fundamental limits on the number of tests required to identify a given number of ``healthy'' items from a large population containing a small number of ``defective'' items, in a nonadaptive group testing framework. Specifically, we derive mutual information-based upper bounds on the number of tests required to identify the required number of healthy items. Our results show that an impressive reduction in the number of tests is achievable compared to the conventional approach of using classical group testing to first identify the defective items and then pick the required number of healthy items from the complement set. For example, to identify L healthy items out of a population of N items containing K defective items, when the tests are reliable, our results show that O(K(L - 1)/(N - K)) measurements are sufficient. In contrast, the conventional approach requires O(K log(N/K)) measurements. We derive our results in a general sparse signal setup, and hence, they are applicable to other sparse signal-based applications such as compressive sensing also.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper considers linear precoding for the constant channel-coefficient K-user MIMO Gaussian interference channel (MIMO GIC) where each transmitter-i (Tx-i) requires the sending of d(i) independent complex symbols per channel use that take values from fixed finite constellations with uniform distribution to receiver-i (Rx-i) for i = 1, 2, ..., K. We define the maximum rate achieved by Tx-i using any linear precoder as the signal-to-noise ratio (SNR) tends to infinity when the interference channel coefficients are zero to be the constellation constrained saturation capacity (CCSC) for Tx-i. We derive a high-SNR approximation for the rate achieved by Tx-i when interference is treated as noise and this rate is given by the mutual information between Tx-i and Rx-i, denoted as I(X) under bar (i); (Y) under bar (i)]. A set of necessary and sufficient conditions on the precoders under which I(X) under bar (i); (Y) under bar (i)] tends to CCSC for Tx-i is derived. Interestingly, the precoders designed for interference alignment (IA) satisfy these necessary and sufficient conditions. Furthermore, we propose gradient-ascentbased algorithms to optimize the sum rate achieved by precoding with finite constellation inputs and treating interference as noise. A simulation study using the proposed algorithms for a three-user MIMO GIC with two antennas at each node with d(i) = 1 for all i and with BPSK and QPSK inputs shows more than 0.1-b/s/Hz gain in the ergodic sum rate over that yielded by precoders obtained from some known IA algorithms at moderate SNRs.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

T-cell responses in humans are initiated by the binding of a peptide antigen to a human leukocyte antigen (HLA) molecule. The peptide-HLA complex then recruits an appropriate T cell, leading to cell-mediated immunity. More than 2000 HLA class-I alleles are known in humans, and they vary only in their peptide-binding grooves. The polymorphism they exhibit enables them to bind a wide range of peptide antigens from diverse sources. HLA molecules and peptides present a complex molecular recognition pattern, as many peptides bind to a given allele and a given peptide can be recognized by many alleles. A powerful grouping scheme that not only provides an insightful classification, but is also capable of dissecting the physicochemical basis of recognition specificity is necessary to address this complexity. We present a hierarchical classification of 2010 class-I alleles by using a systematic divisive clustering method. All-pair distances of alleles were obtained by comparing binding pockets in the structural models. By varying the similarity thresholds, a multilevel classification was obtained, with 7 supergroups, each further subclassifying to yield 72 groups. An independent clustering performed based only on similarities in their epitope pools correlated highly with pocket-based clustering. Physicochemical feature combinations that best explain the basis of clustering are identified. Mutual information calculated for the set of peptide ligands enables identification of binding site residues contributing to peptide specificity. The grouping of HLA molecules achieved here will be useful for rational vaccine design, understanding disease susceptibilities and predicting risk of organ transplants.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper studies a pilot-assisted physical layer data fusion technique known as Distributed Co-Phasing (DCP). In this two-phase scheme, the sensors first estimate the channel to the fusion center (FC) using pilots sent by the latter; and then they simultaneously transmit their common data by pre-rotating them by the estimated channel phase, thereby achieving physical layer data fusion. First, by analyzing the symmetric mutual information of the system, it is shown that the use of higher order constellations (HOC) can improve the throughput of DCP compared to the binary signaling considered heretofore. Using an HOC in the DCP setting requires the estimation of the composite DCP channel at the FC for data decoding. To this end, two blind algorithms are proposed: 1) power method, and 2) modified K-means algorithm. The latter algorithm is shown to be computationally efficient and converges significantly faster than the conventional K-means algorithm. Analytical expressions for the probability of error are derived, and it is found that even at moderate to low SNRs, the modified K-means algorithm achieves a probability of error comparable to that achievable with a perfect channel estimate at the FC, while requiring no pilot symbols to be transmitted from the sensor nodes. Also, the problem of signal corruption due to imperfect DCP is investigated, and constellation shaping to minimize the probability of signal corruption is proposed and analyzed. The analysis is validated, and the promising performance of DCP for energy-efficient physical layer data fusion is illustrated, using Monte Carlo simulations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A two-stage iterative algorithm for selecting a subset of a training set of samples for use in a condensed nearest neighbor (CNN) decision rule is introduced. The proposed method uses the concept of mutual nearest neighborhood for selecting samples close to the decision line. The efficacy of the algorithm is brought out by means of an example.