64 resultados para Log steaming
Resumo:
We study the damage enhanced creep rupture of disordered materials by means of a fiber bundle model. Broken fibers undergo a slow stress relaxation modeled by a Maxwell element whose stress exponent m can vary in a broad range. Under global load sharing we show that due to the strength disorder of fibers, the lifetime ʧ of the bundle has sample-to-sample fluctuations characterized by a log-normal distribution independent of the type of disorder. We determine the Monkman-Grant relation of the model and establish a relation between the rupture life tʄ and the characteristic time tm of the intermediate creep regime of the bundle where the minimum strain rate is reached, making possible reliable estimates of ʧ from short term measurements. Approaching macroscopic failure, the deformation rate has a finite time power law singularity whose exponent is a decreasing function of m. On the microlevel the distribution of waiting times is found to have a power law behavior with m-dependent exponents different below and above the critical load of the bundle. Approaching the critical load from above, the cutoff value of the distributions has a power law divergence whose exponent coincides with the stress exponent of Maxwell elements
Resumo:
GeneID is a program to predict genes in anonymous genomic sequences designed with a hierarchical structure. In the first step, splice sites, and start and stop codons are predicted and scored along the sequence using position weight matrices (PWMs). In the second step, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites, plus the log-likelihood ratio of a Markov model for coding DNA. In the last step, from the set of predicted exons, the gene structure is assembled, maximizing the sum of the scores of the assembled exons. In this paper we describe the obtention of PWMs for sites, and the Markov model of coding DNA in Drosophila melanogaster. We also compare other models of coding DNA with the Markov model. Finally, we present and discuss the results obtained when GeneID is used to predict genes in the Adh region. These results show that the accuracy of GeneID predictions compares currently with that of other existing tools but that GeneID is likely to be more efficient in terms of speed and memory usage.
Resumo:
Biplots are graphical displays of data matrices based on the decomposition of a matrix as the product of two matrices. Elements of these two matrices are used as coordinates for the rows and columns of the data matrix, with an interpretation of the joint presentation that relies on the properties of the scalar product. Because the decomposition is not unique, there are several alternative ways to scale the row and column points of the biplot, which can cause confusion amongst users, especially when software packages are not united in their approach to this issue. We propose a new scaling of the solution, called the standard biplot, which applies equally well to a wide variety of analyses such as correspondence analysis, principal component analysis, log-ratio analysis and the graphical results of a discriminant analysis/MANOVA, in fact to any method based on the singular-value decomposition. The standard biplot also handles data matrices with widely different levels of inherent variance. Two concepts taken from correspondence analysis are important to this idea: the weighting of row and column points, and the contributions made by the points to the solution. In the standard biplot one set of points, usually the rows of the data matrix, optimally represent the positions of the cases or sample units, which are weighted and usually standardized in some way unless the matrix contains values that are comparable in their raw form. The other set of points, usually the columns, is represented in accordance with their contributions to the low-dimensional solution. As for any biplot, the projections of the row points onto vectors defined by the column points approximate the centred and (optionally) standardized data. The method is illustrated with several examples to demonstrate how the standard biplot copes in different situations to give a joint map which needs only one common scale on the principal axes, thus avoiding the problem of enlarging or contracting the scale of one set of points to make the biplot readable. The proposal also solves the problem in correspondence analysis of low-frequency categories that are located on the periphery of the map, giving the false impression that they are important.
Resumo:
The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Centralnotations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform.In this way very elaborated aspects of mathematical statistics can be understoodeasily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating,combination of likelihood and robust M-estimation functions are simple additions/perturbations in A2(Pprior). Weighting observations corresponds to a weightedaddition of the corresponding evidence.Likelihood based statistics for general exponential families turns out to have aparticularly easy interpretation in terms of A2(P). Regular exponential families formfinite dimensional linear subspaces of A2(P) and they correspond to finite dimensionalsubspaces formed by their posterior in the dual information space A2(Pprior).The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P.The discussion of A2(P) valued random variables, such as estimation functionsor likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning
Resumo:
We obtain minimax lower bounds on the regret for the classicaltwo--armed bandit problem. We provide a finite--sample minimax version of the well--known log $n$ asymptotic lower bound of Lai and Robbins. Also, in contrast to the log $n$ asymptotic results on the regret, we show that the minimax regret is achieved by mere random guessing under fairly mild conditions on the set of allowable configurations of the two arms. That is, we show that for {\sl every} allocation rule and for {\sl every} $n$, there is a configuration such that the regret at time $n$ is at least 1 -- $\epsilon$ times the regret of random guessing, where $\epsilon$ is any small positive constant.
Resumo:
We show that the welfare of a representative consumer can be related to observable aggregatedata. To a first order, the change in welfare is summarized by (the present value of) the Solowproductivity residual and by the growth rate of the capital stock per capita. We also show thatproductivity and the capital stock suffice to calculate differences in welfare across countries, withboth variables computed as log level deviations from a reference country. These results hold forarbitrary production technology, regardless of the degree of product market competition, and applyto open economies as well if TFP is constructed using absorption rather than GDP as the measureof output. They require that TFP be constructed using prices and quantities as perceived byconsumers. Thus, factor shares need to be calculated using after-tax wages and rental rates, andwill typically sum to less than one. We apply these results to calculate welfare gaps and growthrates in a sample of developed countries for which high-quality TFP and capital data are available.We find that under realistic scenarios the United Kingdom and Spain had the highest growth ratesof welfare over our sample period of 1985-2005, but the United States had the highest level ofwelfare.
Resumo:
Power transformations of positive data tables, prior to applying the correspondence analysis algorithm, are shown to open up a family of methods with direct connections to the analysis of log-ratios. Two variations of this idea are illustrated. The first approach is simply to power the original data and perform a correspondence analysis this method is shown to converge to unweighted log-ratio analysis as the power parameter tends to zero. The second approach is to apply the power transformation to thecontingency ratios, that is the values in the table relative to expected values based on the marginals this method converges to weighted log-ratio analysis, or the spectral map. Two applications are described: first, a matrix of population genetic data which is inherently two-dimensional, and second, a larger cross-tabulation with higher dimensionality, from a linguistic analysis of several books.
Resumo:
In order to interpret the biplot it is necessary to know which points usually variables are the ones that are important contributors to the solution, and this information is available separately as part of the biplot s numerical results. We propose a new scaling of the display, called the contribution biplot, which incorporates this diagnostic directly into the graphical display, showing visually the important contributors and thus facilitating the biplot interpretation and often simplifying the graphical representation considerably. The contribution biplot can be applied to a wide variety of analyses such as correspondence analysis, principal component analysis, log-ratio analysis and the graphical results of a discriminant analysis/MANOVA, in fact to any method based on the singular-value decomposition. In the contribution biplot one set of points, usually the rows of the data matrix, optimally represent the spatial positions of the cases or sample units, according to some distance measure that usually incorporates some form of standardization unless all data are comparable in scale. The other set of points, usually the columns, is represented by vectors that are related to their contributions to the low-dimensional solution. A fringe benefit is that usually only one common scale for row and column points is needed on the principal axes, thus avoiding the problem of enlarging or contracting the scale of one set of points to make the biplot legible. Furthermore, this version of the biplot also solves the problem in correspondence analysis of low-frequency categories that are located on the periphery of the map, giving the false impression that they are important, when they are in fact contributing minimally to the solution.
Resumo:
We use aggregate GDP data and within-country income shares for theperiod 1970-1998 to assign a level of income to each person in theworld. We then estimate the gaussian kernel density function for theworldwide distribution of income. We compute world poverty rates byintegrating the density function below the poverty lines. The $1/daypoverty rate has fallen from 20% to 5% over the last twenty five years.The $2/day rate has fallen from 44% to 18%. There are between 300 and500 million less poor people in 1998 than there were in the 70s.We estimate global income inequality using seven different popularindexes: the Gini coefficient, the variance of log-income, two ofAtkinson s indexes, the Mean Logarithmic Deviation, the Theil indexand the coefficient of variation. All indexes show a reduction in globalincome inequality between 1980 and 1998. We also find that most globaldisparities can be accounted for by across-country, not within-country,inequalities. Within-country disparities have increased slightly duringthe sample period, but not nearly enough to offset the substantialreduction in across-country disparities. The across-country reductionsin inequality are driven mainly, but not fully, by the large growth rateof the incomes of the 1.2 billion Chinese citizens. Unless Africa startsgrowing in the near future, we project that income inequalities willstart rising again. If Africa does not start growing, then China, India,the OECD and the rest of middle-income and rich countries diverge awayfrom it, and global inequality will rise. Thus, the aggregate GDP growthof the African continent should be the priority of anyone concerned withincreasing global income inequality.
Resumo:
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyse the ratios of the data values. The usual approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property. This weighted log-ratio analysis is theoretically equivalent to spectral mapping , a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modelling. The weighted log-ratio methodology is applied here to frequency data in linguistics and to chemical compositional data in archaeology.
Resumo:
Subcompositional coherence is a fundamental property of Aitchison s approach to compositional data analysis, and is the principal justification for using ratios of components. We maintain, however, that lack of subcompositional coherence, that is incoherence, can be measured in an attempt to evaluate whether any given technique is close enough, for all practical purposes, to being subcompositionally coherent. This opens up the field to alternative methods, which might be better suited to cope with problems such as data zeros and outliers, while being only slightly incoherent. The measure that we propose is based on the distance measure between components. We show that the two-part subcompositions, which appear to be the most sensitive to subcompositional incoherence, can be used to establish a distance matrix which can be directly compared with the pairwise distances in the full composition. The closeness of these two matrices can be quantified using a stress measure that is common in multidimensional scaling, providing a measure of subcompositional incoherence. The approach is illustrated using power-transformed correspondence analysis, which has already been shown to converge to log-ratio analysis as the power transform tends to zero.
Resumo:
Let a class $\F$ of densities be given. We draw an i.i.d.\ sample from a density $f$ which may or may not be in $\F$. After every $n$, one must make a guess whether $f \in \F$ or not. A class is almost surely testable if there exists such a testing sequence such that for any $f$, we make finitely many errors almost surely. In this paper, several results are given that allowone to decide whether a class is almost surely testable. For example, continuity and square integrability are not testable, but unimodality, log-concavity, and boundedness by a given constant are.
Resumo:
We consider adaptive sequential lossy coding of bounded individual sequences when the performance is measured by the sequentially accumulated mean squared distortion. Theencoder and the decoder are connected via a noiseless channel of capacity $R$ and both are assumed to have zero delay. No probabilistic assumptions are made on how the sequence to be encoded is generated. For any bounded sequence of length $n$, the distortion redundancy is defined as the normalized cumulative distortion of the sequential scheme minus the normalized cumulative distortion of the best scalarquantizer of rate $R$ which is matched to this particular sequence. We demonstrate the existence of a zero-delay sequential scheme which uses common randomization in the encoder and the decoder such that the normalized maximum distortion redundancy converges to zero at a rate $n^{-1/5}\log n$ as the length of the encoded sequence $n$ increases without bound.
Resumo:
We continue the development of a method for the selection of a bandwidth or a number of design parameters in density estimation. We provideexplicit non-asymptotic density-free inequalities that relate the $L_1$ error of the selected estimate with that of the best possible estimate,and study in particular the connection between the richness of the classof density estimates and the performance bound. For example, our methodallows one to pick the bandwidth and kernel order in the kernel estimatesimultaneously and still assure that for {\it all densities}, the $L_1$error of the corresponding kernel estimate is not larger than aboutthree times the error of the estimate with the optimal smoothing factor and kernel plus a constant times $\sqrt{\log n/n}$, where $n$ is the sample size, and the constant only depends on the complexity of the family of kernels used in the estimate. Further applications include multivariate kernel estimates, transformed kernel estimates, and variablekernel estimates.
Resumo:
L'associació mundial de jugadors de bàsquet ha establert un marc de col·laboració amb la UOC per implementar un sistema de BD per donar resposta a la necessitat dels jugadors de bàsquet a nivell mundial, que volen crear una nova plataforma centralitzada per tal d'unificar la informació de cadascun d'ells i permetre els equips i les federacions disposar d'aquesta informació a l'hora d'escollir els jugadors que integraran els diferents equips. La implementació, a més d'emmagatzemar la informació, ha de suportar els procediments ABM (Alta, Baixa i Modificació) dels jugadors, contractes, partits i les estadístiques dels jugadors; ha d'implementar certs procediments de consulta i cal realitzar un mòdul estadístic, a més, totes les crides a consultes es deuen emmagatzemar en una taula de registre log.