996 resultados para Diversity entropy
Resumo:
If one has a distribution of words (SLUNs or CLUNS) in a text written in language L(MT), and is adjusted one of the mathematical expressions of distribution that exists in the mathematical literature, some parameter of the elected expression it can be considered as a measure of the diversity. But because the adjustment is not always perfect as usual measure; it is preferable to select an index that doesn't postulate a regularity of distribution expressible for a simple formula. The problem can be approachable statistically, without having special interest for the organization of the text. It can serve as index any monotonous function that has a minimum value when all their elements belong to the same class, that is to say, all the individuals belong to oneself symbol, and a maximum value when each element belongs to a different class, that is to say, each individual is of a different symbol. It should also gather certain conditions like they are: to be not very sensitive to the extension of the text and being invariant to certain number of operations of selection in the text. These operations can be theoretically random. The expressions that offer more advantages are those coming from the theory of the information of Shannon-Weaver. Based on them, the authors develop a theoretical study for indexes of diversity to be applied in texts built in modeling language L(MT), although anything impedes that they can be applied to texts written in natural languages.
Resumo:
The properties of complex networks are highly Influenced by border effects frequently found as a consequence of the finite nature of real-world networks as well as network Sampling Therefore, it becomes critical to devise effective means for sound estimation of net work topological and dynamical properties will le avoiding these types of artifacts. In the current work, an algorithm for minimization of border effects is proposed and discussed, and its potential IS Illustrated with respect to two real-world networks. namely bone canals and air transportation (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
In attempts to conserve the species diversity of trees in tropical forests, monitoring of diversity in inventories is essential. For effective monitoring it is crucial to be able to make meaningful comparisons between different regions, or comparisons of the diversity of a region at different times. Many species diversity measures have been defined, including the well-known abundance and entropy measures. All such measures share a number of problems in their effective practical use. However, probably the most problematic is that they cannot be used to meaningfully assess changes, since thay are only concerned with the number of species or the proportions of the population/sample which they constitute. A natural (though simplistic) model of a species frequency distribution is the multinomial distribution. It is shown that the likelihood analysis of samples from such a distribution are closely related to a number of entropy-type measures of diversity. Hence a comparison of the species distribution on two plots, using the multinomial model and likelihood methods, leads to generalised cross-entropy as the LRT test statistic of the null that the species distributions are the same. Data from 30 contiguous plots in a forest in Sumatra are analysed using these methods. Significance tests between all pairs of plots yield extremely low p-values, indicating strongly that it ought to been "Obvious" that the observed species distributions are different on different plots. In terms of how different the plots are, and how these differences vary over the whole study site, a display of the degrees of freedom of the test, (equivalent to the number of shared species) seems to be the most revealing indicator, as well as the simplest.
Resumo:
Users can rarely reveal their information need in full detail to a search engine within 1--2 words, so search engines need to "hedge their bets" and present diverse results within the precious 10 response slots. Diversity in ranking is of much recent interest. Most existing solutions estimate the marginal utility of an item given a set of items already in the response, and then use variants of greedy set cover. Others design graphs with the items as nodes and choose diverse items based on visit rates (PageRank). Here we introduce a radically new and natural formulation of diversity as finding centers in resistive graphs. Unlike in PageRank, we do not specify the edge resistances (equivalently, conductances) and ask for node visit rates. Instead, we look for a sparse set of center nodes so that the effective conductance from the center to the rest of the graph has maximum entropy. We give a cogent semantic justification for turning PageRank thus on its head. In marked deviation from prior work, our edge resistances are learnt from training data. Inference and learning are NP-hard, but we give practical solutions. In extensive experiments with subtopic retrieval, social network search, and document summarization, our approach convincingly surpasses recently-published diversity algorithms like subtopic cover, max-marginal relevance (MMR), Grasshopper, DivRank, and SVMdiv.
Resumo:
The main interest in the assessment of forest species diversity for conservation purposes is in the rare species. The main problem in the tropical rain forests is that most of the species are rare. Assessment of species diversity in the tropical rain forests is therefore often concerned with estimating that which is not observed in recorded samples. Statistical methodology is therefore required to try to estimate the truncated tail of the species frequency distribution, or to estimate the asymptote of species/diversity-area curves. A Horvitz-Thompson estimator of the number of unobserved (“virtual”) species in each species intensity class is proposed. The approach allows a definition of an extended definition of diversity, ( or generalised Renyi entropy). The paper presents a case study from data collected in Jambi, Sumatra, and the “extended diversity measure” is used on the species data.
Resumo:
We investigated the soil arthropod communities of urban and suburban holm oak (Quercus ilex L.) stands in a small (Siena) and a large Italian city (Naples) and tested whether the abundance and diversity of higher arthropod taxa are affected by the biotic and abiotic conditions of urban forest soils, including pollution. Acarina and Collembola were the dominant taxa in both cities. In Siena the total number of arthropod individuals collected in the samples was over 1/3 greater than in Naples, but all diversity indices scored higher in Naples than in Siena, probably in response to the higher heterogeneity of microclimatic and pedological conditions found in Naples study area. Oribatids resulted twice more abundant in Siena and so were the total mites with respect to Collembola. While “taxonomic richness” per site increased with distance from road traffic, entropy and evenness indices scored higher at the two ends of the impact gradient in both cities. The overall variation in basic pedological and microbiological soil parameters positively correlated with the total abundance of arthropods, and negatively correlated with their taxonomic richness. At the resolution employed, no significant relation emerged between anthropogenic factors, such as traffic load and soil pollution, and the arthropod fauna density and variety. These results are consistent with conclusions drawn from a previous study on the enchytraeid fauna examined at species level, which is remarkable considering the different taxonomic resolutions of the two studies. CCA results suggest that the higher abundance of Oribatid mites, Protura and Thysanura and the lower abundance of Diplopoda and Symphyla in Siena could depend on a higher fungi/bacteria ratio. This observation can be interpreted in terms of differences in fungi and bacteria between the two cities: Siena is shifted towards the fungal decomposition channel, which supports taxa such as oribatid mites, while Naples is shifted towards the bacterial channel, which supports chiefly detritivorous groups, such as diplopods.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
In this paper, we focus on the design of bivariate EDAs for discrete optimization problems and propose a new approach named HSMIEC. While the current EDAs require much time in the statistical learning process as the relationships among the variables are too complicated, we employ the Selfish gene theory (SG) in this approach, as well as a Mutual Information and Entropy based Cluster (MIEC) model is also set to optimize the probability distribution of the virtual population. This model uses a hybrid sampling method by considering both the clustering accuracy and clustering diversity and an incremental learning and resample scheme is also set to optimize the parameters of the correlations of the variables. Compared with several benchmark problems, our experimental results demonstrate that HSMIEC often performs better than some other EDAs, such as BMDA, COMIT, MIMIC and ECGA. © 2009 Elsevier B.V. All rights reserved.
Resumo:
A newly developed framework for quantifying aerosol particle diversity and mixing state based on information-theoretic entropy is applied for the first time to single particle mass spectrometry field data. Single particle mass fraction estimates for black carbon, organic aerosol, ammonium, nitrate and sulfate, derived using single particle mass spectrometer, aerosol mass spectrometer and multi-angle absorption photometer measurements are used to calculate single particle species diversity (Di). The average single particle species diversity (Dα) is then related to the species diversity of the bulk population (Dγ) to derive a mixing state index value (χ) at hourly resolution. The mixing state index is a single parameter representation of how internally/externally mixed a particle population is at a given time. The index describes a continuum, with values of 0 and 100% representing fully external and internal mixing, respectively. This framework was applied to data collected as part of the MEGAPOLI winter campaign in Paris, France, 2010. Di values are low (∼ 2) for fresh traffic and wood-burning particles that contain high mass fractions of black carbon and organic aerosol but low mass fractions of inorganic ions. Conversely, Di values are higher (∼ 4) for aged carbonaceous particles containing similar mass fractions of black carbon, organic aerosol, ammonium, nitrate and sulfate. Aerosol in Paris is estimated to be 59% internally mixed in the size range 150-1067 nm, and mixing state is dependent both upon time of day and air mass origin. Daytime primary emissions associated with vehicular traffic and wood-burning result in low χ values, while enhanced condensation of ammonium nitrate on existing particles at night leads to higher χ values. Advection of particles from continental Europe containing ammonium, nitrate and sulfate leads to increases in Dα, Dγ and χ. The mixing state index represents a useful metric by which to compare and contrast ambient particle mixing state at other locations globally.