279 resultados para Alphabet.
Resumo:
Let X-1,..., X-m be a set of m statistically dependent sources over the common alphabet F-q, that are linearly independent when considered as functions over the sample space. We consider a distributed function computation setting in which the receiver is interested in the lossless computation of the elements of an s-dimensional subspace W spanned by the elements of the row vector X-1,..., X-m]Gamma in which the (m x s) matrix Gamma has rank s. A sequence of three increasingly refined approaches is presented, all based on linear encoders. The first approach uses a common matrix to encode all the sources and a Korner-Marton like receiver to directly compute W. The second improves upon the first by showing that it is often more efficient to compute a carefully chosen superspace U of W. The superspace is identified by showing that the joint distribution of the {X-i} induces a unique decomposition of the set of all linear combinations of the {X-i}, into a chain of subspaces identified by a normalized measure of entropy. This subspace chain also suggests a third approach, one that employs nested codes. For any joint distribution of the {X-i} and any W, the sum-rate of the nested code approach is no larger than that under the Slepian-Wolf (SW) approach. Under the SW approach, W is computed by first recovering each of the {X-i}. For a large class of joint distributions and subspaces W, the nested code approach is shown to improve upon SW. Additionally, a class of source distributions and subspaces are identified, for which the nested-code approach is sum-rate optimal.
Resumo:
We consider nonparametric or universal sequential hypothesis testing when the distribution under the null hypothesis is fully known but the alternate hypothesis corresponds to some other unknown distribution. These algorithms are primarily motivated from spectrum sensing in Cognitive Radios and intruder detection in wireless sensor networks. We use easily implementable universal lossless source codes to propose simple algorithms for such a setup. The algorithms are first proposed for discrete alphabet. Their performance and asymptotic properties are studied theoretically. Later these are extended to continuous alphabets. Their performance with two well known universal source codes, Lempel-Ziv code and KT-estimator with Arithmetic Encoder are compared. These algorithms are also compared with the tests using various other nonparametric estimators. Finally a decentralized version utilizing spatial diversity is also proposed and analysed.
Resumo:
Protein structure alignment is a crucial step in protein structure-function analysis. Despite the advances in protein structure alignment algorithms, some of the local conformationally similar regions are mislabeled as structurally variable regions (SVRs). These regions are not well superimposed because of differences in their spatial orientations. The Database of Structural Alignments (DoSA) addresses this gap in identification of local structural similarities obscured in global protein structural alignments by realigning SVRs using an algorithm based on protein blocks. A set of protein blocks is a structural alphabet that abstracts protein structures into 16 unique local structural motifs. DoSA provides unique information about 159 780 conformationally similar and 56 140 conformationally dissimilar SVRs in 74 705 pairwise structural alignments of homologous proteins. The information provided on conformationally similar and dissimilar SVRs can be helpful to model loop regions. It is also conceivable that conformationally similar SVRs with conserved residues could potentially contribute toward functional integrity of homologues, and hence identifying such SVRs could be helpful in understanding the structural basis of protein function.
Resumo:
An n-length block code C is said to be r-query locally correctable, if for any codeword x ∈ C, one can probabilistically recover any one of the n coordinates of the codeword x by querying at most r coordinates of a possibly corrupted version of x. It is known that linear codes whose duals contain 2-designs are locally correctable. In this article, we consider linear codes whose duals contain t-designs for larger t. It is shown here that for such codes, for a given number of queries r, under linear decoding, one can, in general, handle a larger number of corrupted bits. We exhibit to our knowledge, for the first time, a finite length code, whose dual contains 4-designs, which can tolerate a fraction of up to 0.567/r corrupted symbols as against a maximum of 0.5/r in prior constructions. We also present an upper bound that shows that 0.567 is the best possible for this code length and query complexity over this symbol alphabet thereby establishing optimality of this code in this respect. A second result in the article is a finite-length bound which relates the number of queries r and the fraction of errors that can be tolerated, for a locally correctable code that employs a randomized algorithm in which each instance of the algorithm involves t-error correction.
Resumo:
Conformational changes in proteins are extremely important for their biochemical functions. Correlation between inherent conformational variations in a protein and conformational differences in its homologues of known structure is still unclear. In this study, we have used a structural alphabet called Protein Blocks (PBs). PBs are used to perform abstraction of protein 3-D structures into a 1-D strings of 16 alphabets (a-p) based on dihedral angles of overlapping pentapeptides. We have analyzed the variations in local conformations in terms of PBs represented in the ensembles of 801 protein structures determined using NMR spectroscopy. In the analysis of concatenated data over all the residues in all the NMR ensembles, we observe that the overall nature of inherent local structural variations in NMR ensembles is similar to the nature of local structural differences in homologous proteins with a high correlation coefficient of .94. High correlation at the alignment positions corresponding to helical and beta-sheet regions is only expected. However, the correlation coefficient by considering only the loop regions is also quite high (.91). Surprisingly, segregated position-wise analysis shows that this high correlation does not hold true to loop regions at the structurally equivalent positions in NMR ensembles and their homologues of known structure. This suggests that the general nature of local structural changes is unique; however most of the local structural variations in loop regions of NMR ensembles do not correlate to their local structural differences at structurally equivalent positions in homologues.
Resumo:
Regenerating codes and codes with locality are two coding schemes that have recently been proposed, which in addition to ensuring data collection and reliability, also enable efficient node repair. In a situation where one is attempting to repair a failed node, regenerating codes seek to minimize the amount of data downloaded for node repair, while codes with locality attempt to minimize the number of helper nodes accessed. This paper presents results in two directions. In one, this paper extends the notion of codes with locality so as to permit local recovery of an erased code symbol even in the presence of multiple erasures, by employing local codes having minimum distance >2. An upper bound on the minimum distance of such codes is presented and codes that are optimal with respect to this bound are constructed. The second direction seeks to build codes that combine the advantages of both codes with locality as well as regenerating codes. These codes, termed here as codes with local regeneration, are codes with locality over a vector alphabet, in which the local codes themselves are regenerating codes. We derive an upper bound on the minimum distance of vector-alphabet codes with locality for the case when their constituent local codes have a certain uniform rank accumulation property. This property is possessed by both minimum storage regeneration (MSR) and minimum bandwidth regeneration (MBR) codes. We provide several constructions of codes with local regeneration which achieve this bound, where the local codes are either MSR or MBR codes. Also included in this paper, is an upper bound on the minimum distance of a general vector code with locality as well as the performance comparison of various code constructions of fixed block length and minimum distance.
Resumo:
Spatial modulation (SM) is attractive for multiantenna wireless communications. SM uses multiple transmit antenna elements but only one transmit radio frequency (RF) chain. In SM, in addition to the information bits conveyed through conventional modulation symbols (e.g., QAM), the index of the active transmit antenna also conveys information bits. In this paper, we establish that SM has significant signal-to-noise (SNR) advantage over conventional modulation in large-scale multiuser (multiple-input multiple-output) MIMO systems. Our new contribution in this paper addresses the key issue of large-dimension signal processing at the base station (BS) receiver (e.g., signal detection) in large-scale multiuser SM-MIMO systems, where each user is equipped with multiple transmit antennas (e.g., 2 or 4 antennas) but only one transmit RF chain, and the BS is equipped with tens to hundreds of (e.g., 128) receive antennas. Specifically, we propose two novel algorithms for detection of large-scale SM-MIMO signals at the BS; one is based on message passing and the other is based on local search. The proposed algorithms achieve very good performance and scale well. For the same spectral efficiency, multiuser SM-MIMO outperforms conventional multiuser MIMO (recently being referred to as massive MIMO) by several dBs. The SNR advantage of SM-MIMO over massive MIMO can be attributed to: (i) because of the spatial index bits, SM-MIMO can use a lower-order QAM alphabet compared to that in massive MIMO to achieve the same spectral efficiency, and (ii) for the same spectral efficiency and QAM size, massive MIMO will need more spatial streams per user which leads to increased spatial interference.
Resumo:
Generalized spatial modulation (GSM) uses n(t) transmit antenna elements but fewer transmit radio frequency (RF) chains, n(rf). Spatial modulation (SM) and spatial multiplexing are special cases of GSM with n(rf) = 1 and n(rf) = n(t), respectively. In GSM, in addition to conveying information bits through n(rf) conventional modulation symbols (for example, QAM), the indices of the n(rf) active transmit antennas also convey information bits. In this paper, we investigate GSM for large-scale multiuser MIMO communications on the uplink. Our contributions in this paper include: 1) an average bit error probability (ABEP) analysis for maximum-likelihood detection in multiuser GSM-MIMO on the uplink, where we derive an upper bound on the ABEP, and 2) low-complexity algorithms for GSM-MIMO signal detection and channel estimation at the base station receiver based on message passing. The analytical upper bounds on the ABEP are found to be tight at moderate to high signal-to-noise ratios (SNR). The proposed receiver algorithms are found to scale very well in complexity while achieving near-optimal performance in large dimensions. Simulation results show that, for the same spectral efficiency, multiuser GSM-MIMO can outperform multiuser SM-MIMO as well as conventional multiuser MIMO, by about 2 to 9 dB at a bit error rate of 10(-3). Such SNR gains in GSM-MIMO compared to SM-MIMO and conventional MIMO can be attributed to the fact that, because of a larger number of spatial index bits, GSM-MIMO can use a lower-order QAM alphabet which is more power efficient.
Resumo:
We consider carrier frequency offset (CFO) estimation in the context of multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems over noisy frequency-selective wireless channels with both single- and multiuser scenarios. We conceived a new approach for parameter estimation by discretizing the continuous-valued CFO parameter into a discrete set of bins and then invoked detection theory, analogous to the minimum-bit-error-ratio optimization framework for detecting the finite-alphabet received signal. Using this radical approach, we propose a novel CFO estimation method and study its performance using both analytical results and Monte Carlo simulations. We obtain expressions for the variance of the CFO estimation error and the resultant BER degradation with the single- user scenario. Our simulations demonstrate that the overall BER performance of a MIMO-OFDM system using the proposed method is substantially improved for all the modulation schemes considered, albeit this is achieved at increased complexity.
Resumo:
We consider systems of equations of the form where A is the underlying alphabet, the Xi are variables, the Pi,a are boolean functions in the variables Xi, and each δi is either the empty word or the empty set. The symbols υ and denote concatenation and union of languages over A. We show that any such system has a unique solution which, moreover, is regular. These equations correspond to a type of automation, called boolean automation, which is a generalization of a nondeterministic automation. The equations are then used to determine the language accepted by a sequential network; they are obtainable directly from the network.
Resumo:
Predecessors’ research found that feeling-of-knowing and feeling-of-not-knowing was two different cognitional processes. Processing depth had more good effects on FOK judgment, but it had little effects on FOnK judgment, furthermore, it perhaps decreased the accuracy of FOnK judgment. On the base of predecessors’ research the experiment discussed the different effects on FOK judgment and FOnK judgment by processing depth and memory materials of different kinds. The first purpose was to find that the effects of processing depth on FOK judgment and FOnK judgment were different or not. The second purpose was to reveal the two different memory materials of the Paired-Chinese-words and the Paired- Chinese-phonetic-alphabet would cause difference on the grade and accuracy of FOK judgment or not, and if the two different kinds of memory materials took different effects on FOK judgment and FOnK judgment. The third purpose was to search if there was interaction on processing depth and different kinds of memory materials. The experiment used the Paired-Chinese-words and the Paired- Chinese-phonetic-alphabet as the materials, and regarded processing depth in the time of encoding stage and different kinds of memory materials as the independent variable. The experiment regarded validity of memory; the grade of FOK judgment; the accuracy of FOK judgment; the accuracy of FOnK judgment as the dependent variable. The experiment adopted the “RJR” normal researching form of FOK judgment projected by Hart. The result of the researching proved that in the condition of deep processing in the time of encoding stage, the validity of memory; the grade of FOK judgment; the accuracy of FOK judgment were higher than in the condition of superficial processing, but processing depth had little effect on accuracy of FOnK judgment. FOK judgment and FOnK judgment were two different cognitional processes. Memory materials of different kinds led clear difference on the dependent variable of the validity of memory; the grade of FOK judgment; the accuracy of FOK judgment, and also had little effect on accuracy of FOnK judgment. Processing depth and different kinds of memory materials had interaction on their effects on FOK judgment. Regard the accuracy of recall, the percentage of “feeling of knowing”, the percentage of “feeling of not knowing”, and the grade of FOK judgment as the dependent variables, memory materials of different kinds make little effect in the condition of superficial processing in the time of encoding stage, but in the condition of deep processing in the time of encoding stage, Chinese characters was higher than Chinese phonetic alphabet.
Resumo:
The need for the ability to cluster unknown data to better understand its relationship to know data is prevalent throughout science. Besides a better understanding of the data itself or learning about a new unknown object, cluster analysis can help with processing data, data standardization, and outlier detection. Most clustering algorithms are based on known features or expectations, such as the popular partition based, hierarchical, density-based, grid based, and model based algorithms. The choice of algorithm depends on many factors, including the type of data and the reason for clustering, nearly all rely on some known properties of the data being analyzed. Recently, Li et al. proposed a new universal similarity metric, this metric needs no prior knowledge about the object. Their similarity metric is based on the Kolmogorov Complexity of objects, the objects minimal description. While the Kolmogorov Complexity of an object is not computable, in "Clustering by Compression," Cilibrasi and Vitanyi use common compression algorithms to approximate the universal similarity metric and cluster objects with high success. Unfortunately, clustering using compression does not trivially extend to higher dimensions. Here we outline a method to adapt their procedure to images. We test these techniques on images of letters of the alphabet.
Resumo:
The problem of discovering frequent poly-regions (i.e. regions of high occurrence of a set of items or patterns of a given alphabet) in a sequence is studied, and three efficient approaches are proposed to solve it. The first one is entropy-based and applies a recursive segmentation technique that produces a set of candidate segments which may potentially lead to a poly-region. The key idea of the second approach is the use of a set of sliding windows over the sequence. Each sliding window covers a sequence segment and keeps a set of statistics that mainly include the number of occurrences of each item or pattern in that segment. Combining these statistics efficiently yields the complete set of poly-regions in the given sequence. The third approach applies a technique based on the majority vote, achieving linear running time with a minimal number of false negatives. After identifying the poly-regions, the sequence is converted to a sequence of labeled intervals (each one corresponding to a poly-region). An efficient algorithm for mining frequent arrangements of intervals is applied to the converted sequence to discover frequently occurring arrangements of poly-regions in different parts of DNA, including coding regions. The proposed algorithms are tested on various DNA sequences producing results of significant biological meaning.