983 resultados para Information entropy
Resumo:
We revisit the well-known problem of sorting under partial information: sort a finite set given the outcomes of comparisons between some pairs of elements. The input is a partially ordered set P, and solving the problem amounts to discovering an unknown linear extension of P, using pairwise comparisons. The information-theoretic lower bound on the number of comparisons needed in the worst case is log e(P), the binary logarithm of the number of linear extensions of P. In a breakthrough paper, Jeff Kahn and Jeong Han Kim (STOC 1992) showed that there exists a polynomial-time algorithm for the problem achieving this bound up to a constant factor. Their algorithm invokes the ellipsoid algorithm at each iteration for determining the next comparison, making it impractical. We develop efficient algorithms for sorting under partial information. Like Kahn and Kim, our approach relies on graph entropy. However, our algorithms differ in essential ways from theirs. Rather than resorting to convex programming for computing the entropy, we approximate the entropy, or make sure it is computed only once in a restricted class of graphs, permitting the use of a simpler algorithm. Specifically, we present: an O(n2) algorithm performing O(log n·log e(P)) comparisons; an O(n2.5) algorithm performing at most (1+ε) log e(P) + Oε(n) comparisons; an O(n2.5) algorithm performing O(log e(P)) comparisons. All our algorithms are simple to implement. © 2010 ACM.
Resumo:
In attempts to conserve the species diversity of trees in tropical forests, monitoring of diversity in inventories is essential. For effective monitoring it is crucial to be able to make meaningful comparisons between different regions, or comparisons of the diversity of a region at different times. Many species diversity measures have been defined, including the well-known abundance and entropy measures. All such measures share a number of problems in their effective practical use. However, probably the most problematic is that they cannot be used to meaningfully assess changes, since thay are only concerned with the number of species or the proportions of the population/sample which they constitute. A natural (though simplistic) model of a species frequency distribution is the multinomial distribution. It is shown that the likelihood analysis of samples from such a distribution are closely related to a number of entropy-type measures of diversity. Hence a comparison of the species distribution on two plots, using the multinomial model and likelihood methods, leads to generalised cross-entropy as the LRT test statistic of the null that the species distributions are the same. Data from 30 contiguous plots in a forest in Sumatra are analysed using these methods. Significance tests between all pairs of plots yield extremely low p-values, indicating strongly that it ought to been "Obvious" that the observed species distributions are different on different plots. In terms of how different the plots are, and how these differences vary over the whole study site, a display of the degrees of freedom of the test, (equivalent to the number of shared species) seems to be the most revealing indicator, as well as the simplest.
Resumo:
This paper presents an analysis of entropy-based molecular descriptors. Specifically, we use real chemical structures, as well as synthetic isomeric structures, and investigate properties of and among descriptors with respect to the used data set by a statistical analysis. Our numerical results provide evidence that synthetic chemical structures are notably different to real chemical structures and, hence, should not be used to investigate molecular descriptors. Instead, an analysis based on real chemical structures is favorable. Further, we find strong hints that molecular descriptors can be partitioned into distinct classes capturing complementary information.
Resumo:
We present an information-theoretic method to measure the structural information content of networks and apply it to chemical graphs. As a result, we find that our entropy measure is more general than classical information indices known in mathematical and computational chemistry. Further, we demonstrate that our measure reflects the essence of molecular branching meaningfully by determining the structural information content of some chemical graphs numerically.
Resumo:
We introduce a novel graph class we call universal hierarchical graphs (UHG) whose topology can be found numerously in problems representing, e.g., temporal, spacial or general process structures of systems. For this graph class we show, that we can naturally assign two probability distributions, for nodes and for edges, which lead us directly to the definition of the entropy and joint entropy and, hence, mutual information establishing an information theory for this graph class. Furthermore, we provide some results under which conditions these constraint probability distributions maximize the corresponding entropy. Also, we demonstrate that these entropic measures can be computed efficiently which is a prerequisite for every large scale practical application and show some numerical examples. (c) 2007 Elsevier Inc. All rights reserved.
Resumo:
Wavelet entropy assesses the degree of order or disorder in signals and presents this complex information in a simple metric. Relative wavelet entropy assesses the similarity between the spectral distributions of two signals, again in a simple metric. Wavelet entropy is therefore potentially a very attractive tool for waveform analysis. The ability of this method to track the effects of pharmacologic modulation of vascular function on Doppler blood velocity waveforms was assessed. Waveforms were captured from ophthalmic arteries of 10 healthy subjects at baseline, after the administration of glyceryl trinitrate (GTN) and after two doses of N(G)-nitro-L-arginine-methyl ester (L-NAME) to produce vasodilation and vasoconstriction, respectively. Wavelet entropy had a tendency to decrease from baseline in response to GTN, but significantly increased after the administration of L-NAME (mean: 1.60 ± 0.07 after 0.25 mg/kg and 1.72 ± 0.13 after 0.5 mg/kg vs. 1.50 ± 0.10 at baseline, p < 0.05). Relative wavelet entropy had a spectral distribution from increasing doses of L-NAME comparable to baseline, 0.07 ± 0.04 and 0.08 ± 0.03, respectively, whereas GTN had the most dissimilar spectral distribution compared with baseline (0.17 ± 0.08, p = 0.002). Wavelet entropy can detect subtle changes in Doppler blood velocity waveform structure in response to nitric-oxide-mediated changes in arteriolar smooth muscle tone.
Resumo:
As técnicas estatísticas são fundamentais em ciência e a análise de regressão linear é, quiçá, uma das metodologias mais usadas. É bem conhecido da literatura que, sob determinadas condições, a regressão linear é uma ferramenta estatística poderosíssima. Infelizmente, na prática, algumas dessas condições raramente são satisfeitas e os modelos de regressão tornam-se mal-postos, inviabilizando, assim, a aplicação dos tradicionais métodos de estimação. Este trabalho apresenta algumas contribuições para a teoria de máxima entropia na estimação de modelos mal-postos, em particular na estimação de modelos de regressão linear com pequenas amostras, afetados por colinearidade e outliers. A investigação é desenvolvida em três vertentes, nomeadamente na estimação de eficiência técnica com fronteiras de produção condicionadas a estados contingentes, na estimação do parâmetro ridge em regressão ridge e, por último, em novos desenvolvimentos na estimação com máxima entropia. Na estimação de eficiência técnica com fronteiras de produção condicionadas a estados contingentes, o trabalho desenvolvido evidencia um melhor desempenho dos estimadores de máxima entropia em relação ao estimador de máxima verosimilhança. Este bom desempenho é notório em modelos com poucas observações por estado e em modelos com um grande número de estados, os quais são comummente afetados por colinearidade. Espera-se que a utilização de estimadores de máxima entropia contribua para o tão desejado aumento de trabalho empírico com estas fronteiras de produção. Em regressão ridge o maior desafio é a estimação do parâmetro ridge. Embora existam inúmeros procedimentos disponíveis na literatura, a verdade é que não existe nenhum que supere todos os outros. Neste trabalho é proposto um novo estimador do parâmetro ridge, que combina a análise do traço ridge e a estimação com máxima entropia. Os resultados obtidos nos estudos de simulação sugerem que este novo estimador é um dos melhores procedimentos existentes na literatura para a estimação do parâmetro ridge. O estimador de máxima entropia de Leuven é baseado no método dos mínimos quadrados, na entropia de Shannon e em conceitos da eletrodinâmica quântica. Este estimador suplanta a principal crítica apontada ao estimador de máxima entropia generalizada, uma vez que prescinde dos suportes para os parâmetros e erros do modelo de regressão. Neste trabalho são apresentadas novas contribuições para a teoria de máxima entropia na estimação de modelos mal-postos, tendo por base o estimador de máxima entropia de Leuven, a teoria da informação e a regressão robusta. Os estimadores desenvolvidos revelam um bom desempenho em modelos de regressão linear com pequenas amostras, afetados por colinearidade e outliers. Por último, são apresentados alguns códigos computacionais para estimação com máxima entropia, contribuindo, deste modo, para um aumento dos escassos recursos computacionais atualmente disponíveis.
Resumo:
Montado ecosystem in the Alentejo Region, south of Portugal, has enormous agro-ecological and economics heterogeneities. A definition of homogeneous sub-units among this heterogeneous ecosystem was made, but for them is disposal only partial statistical information about soil allocation agro-forestry activities. The paper proposal is to recover the unknown soil allocation at each homogeneous sub-unit, disaggregating a complete data set for the Montado ecosystem area using incomplete information at sub-units level. The methodological framework is based on a Generalized Maximum Entropy approach, which is developed in thee steps concerning the specification of a r order Markov process, the estimates of aggregate transition probabilities and the disaggregation data to recover the unknown soil allocation at each homogeneous sub-units. The results quality is evaluated using the predicted absolute deviation (PAD) and the "Disagegation Information Gain" (DIG) and shows very acceptable estimation errors.
Resumo:
This paper presents several combined agricultural data disaggregation models in order to recover the farms' land uses, the livestock numbers and main crops' productions. The proposed approach estimates incomplete information at disaggregated level through entropy, using an information prior, and generating information for a combined calculation use of data in the estimation of other variables. The models were applied to the region of Algarve, to some rural pilot areas (Salir-Ameixial-Cachopo and Alcoutim) for livestock data, since this data in some Algarve's inland areas is needed for a European forest fire prevention project, and to the agrarian zones in a more complex framework. The results are promising. They were validated, in cross reference to real data, having proven to be valid and reliable. The total error was small and a considerable level of information heterogeneity was recovered. The total error was about 27,9% for the counties' land uses and 21% for the agrarian zones, and for the livestock it was also acceptable. The level of heterogeneity recovered was always higher than 50%, revealing some improvements regarding previous studies.
Resumo:
Seismic data is difficult to analyze and classical mathematical tools reveal strong limitations in exposing hidden relationships between earthquakes. In this paper, we study earthquake phenomena in the perspective of complex systems. Global seismic data, covering the period from 1962 up to 2011 is analyzed. The events, characterized by their magnitude, geographic location and time of occurrence, are divided into groups, either according to the Flinn-Engdahl (F-E) seismic regions of Earth or using a rectangular grid based in latitude and longitude coordinates. Two methods of analysis are considered and compared in this study. In a first method, the distributions of magnitudes are approximated by Gutenberg-Richter (G-R) distributions and the parameters used to reveal the relationships among regions. In the second method, the mutual information is calculated and adopted as a measure of similarity between regions. In both cases, using clustering analysis, visualization maps are generated, providing an intuitive and useful representation of the complex relationships that are present among seismic data. Such relationships might not be perceived on classical geographic maps. Therefore, the generated charts are a valid alternative to other visualization tools, for understanding the global behavior of earthquakes.
Resumo:
Machine tool chatter is an unfavorable phenomenon during metal cutting, which results in heavy vibration of cutting tool. With increase in depth of cut, the cutting regime changes from chatter-free cutting to one with chatter. In this paper, we propose the use of permutation entropy (PE), a conceptually simple and computationally fast measurement to detect the onset of chatter from the time series using sound signal recorded with a unidirectional microphone. PE can efficiently distinguish the regular and complex nature of any signal and extract information about the dynamics of the process by indicating sudden change in its value. Under situations where the data sets are huge and there is no time for preprocessing and fine-tuning, PE can effectively detect dynamical changes of the system. This makes PE an ideal choice for online detection of chatter, which is not possible with other conventional nonlinear methods. In the present study, the variation of PE under two cutting conditions is analyzed. Abrupt variation in the value of PE with increase in depth of cut indicates the onset of chatter vibrations. The results are verified using frequency spectra of the signals and the nonlinear measure, normalized coarse-grained information rate (NCIR).
Resumo:
Timely detection of sudden change in dynamics that adversely affect the performance of systems and quality of products has great scientific relevance. This work focuses on effective detection of dynamical changes of real time signals from mechanical as well as biological systems using a fast and robust technique of permutation entropy (PE). The results are used in detecting chatter onset in machine turning and identifying vocal disorders from speech signal.Permutation Entropy is a nonlinear complexity measure which can efficiently distinguish regular and complex nature of any signal and extract information about the change in dynamics of the process by indicating sudden change in its value. Here we propose the use of permutation entropy (PE), to detect the dynamical changes in two non linear processes, turning under mechanical system and speech under biological system.Effectiveness of PE in detecting the change in dynamics in turning process from the time series generated with samples of audio and current signals is studied. Experiments are carried out on a lathe machine for sudden increase in depth of cut and continuous increase in depth of cut on mild steel work pieces keeping the speed and feed rate constant. The results are applied to detect chatter onset in machining. These results are verified using frequency spectra of the signals and the non linear measure, normalized coarse-grained information rate (NCIR).PE analysis is carried out to investigate the variation in surface texture caused by chatter on the machined work piece. Statistical parameter from the optical grey level intensity histogram of laser speckle pattern recorded using a charge coupled device (CCD) camera is used to generate the time series required for PE analysis. Standard optical roughness parameter is used to confirm the results.Application of PE in identifying the vocal disorders is studied from speech signal recorded using microphone. Here analysis is carried out using speech signals of subjects with different pathological conditions and normal subjects, and the results are used for identifying vocal disorders. Standard linear technique of FFT is used to substantiate thc results.The results of PE analysis in all three cases clearly indicate that this complexity measure is sensitive to change in regularity of a signal and hence can suitably be used for detection of dynamical changes in real world systems. This work establishes the application of the simple, inexpensive and fast algorithm of PE for the benefit of advanced manufacturing process as well as clinical diagnosis in vocal disorders.
Resumo:
Recently, cumulative residual entropy (CRE) has been found to be a new measure of information that parallels Shannon’s entropy (see Rao et al. [Cumulative residual entropy: A new measure of information, IEEE Trans. Inform. Theory. 50(6) (2004), pp. 1220–1228] and Asadi and Zohrevand [On the dynamic cumulative residual entropy, J. Stat. Plann. Inference 137 (2007), pp. 1931–1941]). Motivated by this finding, in this paper, we introduce a generalized measure of it, namely cumulative residual Renyi’s entropy, and study its properties.We also examine it in relation to some applied problems such as weighted and equilibrium models. Finally, we extend this measure into the bivariate set-up and prove certain characterizing relationships to identify different bivariate lifetime models
Resumo:
The valuation of farmland is a perennial issue for agricultural policy, given its importance in the farm investment portfolio. Despite the significance of farmland values to farmer wealth, prediction remains a difficult task. This study develops a dynamic information measure to examine the informational content of farmland values and farm income in explaining the distribution of farmland values over time.