972 resultados para Maximum entropy statistical estimate
Resumo:
This paper studies the statistical distributions of worldwide earthquakes from year 1963 up to year 2012. A Cartesian grid, dividing Earth into geographic regions, is considered. Entropy and the Jensen–Shannon divergence are used to analyze and compare real-world data. Hierarchical clustering and multi-dimensional scaling techniques are adopted for data visualization. Entropy-based indices have the advantage of leading to a single parameter expressing the relationships between the seismic data. Classical and generalized (fractional) entropy and Jensen–Shannon divergence are tested. The generalized measures lead to a clear identification of patterns embedded in the data and contribute to better understand earthquake distributions.
Resumo:
In the last two decades, small strain shear modulus became one of the most important geotechnical parameters to characterize soil stiffness. Finite element analysis have shown that in-situ stiffness of soils and rocks is much higher than what was previously thought and that stress-strain behaviour of these materials is non-linear in most cases with small strain levels, especially in the ground around retaining walls, foundations and tunnels, typically in the order of 10−2 to 10−4 of strain. Although the best approach to estimate shear modulus seems to be based in measuring seismic wave velocities, deriving the parameter through correlations with in-situ tests is usually considered very useful for design practice.The use of Neural Networks for modeling systems has been widespread, in particular within areas where the great amount of available data and the complexity of the systems keeps the problem very unfriendly to treat following traditional data analysis methodologies. In this work, the use of Neural Networks and Support Vector Regression is proposed to estimate small strain shear modulus for sedimentary soils from the basic or intermediate parameters derived from Marchetti Dilatometer Test. The results are discussed and compared with some of the most common available methodologies for this evaluation.
Resumo:
In the last two decades, small strain shear modulus became one of the most important geotechnical parameters to characterize soil stiffness. Finite element analysis have shown that in-situ stiffness of soils and rocks is much higher than what was previously thought and that stress-strain behaviour of these materials is non-linear in most cases with small strain levels, especially in the ground around retaining walls, foundations and tunnels, typically in the order of 10−2 to 10−4 of strain. Although the best approach to estimate shear modulus seems to be based in measuring seismic wave velocities, deriving the parameter through correlations with in-situ tests is usually considered very useful for design practice.The use of Neural Networks for modeling systems has been widespread, in particular within areas where the great amount of available data and the complexity of the systems keeps the problem very unfriendly to treat following traditional data analysis methodologies. In this work, the use of Neural Networks and Support Vector Regression is proposed to estimate small strain shear modulus for sedimentary soils from the basic or intermediate parameters derived from Marchetti Dilatometer Test. The results are discussed and compared with some of the most common available methodologies for this evaluation.
Resumo:
Dissertação de mestrado integrado em Engenharia Civil
Resumo:
Spatial heterogeneity, spatial dependence and spatial scale constitute key features of spatial analysis of housing markets. However, the common practice of modelling spatial dependence as being generated by spatial interactions through a known spatial weights matrix is often not satisfactory. While existing estimators of spatial weights matrices are based on repeat sales or panel data, this paper takes this approach to a cross-section setting. Specifically, based on an a priori definition of housing submarkets and the assumption of a multifactor model, we develop maximum likelihood methodology to estimate hedonic models that facilitate understanding of both spatial heterogeneity and spatial interactions. The methodology, based on statistical orthogonal factor analysis, is applied to the urban housing market of Aveiro, Portugal at two different spatial scales.
Resumo:
To analyze the genetic relatedness and phylogeographic structure of Aedes aegypti, we collected samples from 36 localities throughout the Americas (Brazil, Peru, Venezuela, Guatemala, US), three from Africa (Guinea, Senegal, Uganda), and three from Asia (Singapore, Cambodia, Tahiti). Amplification and sequencing of a fragment of the mitochondrial NADH dehydrogenase subunit 4 gene identified 20 distinct haplotypes, of which 14 are exclusive to the Americas, four to African/Asian countries, one is common to the Americas and Africa, and one to the Americas and Asia. Nested clade analysis (NCA), pairwise distribution, statistical parsimony, and maximum parsimony analyses were used to infer evolutionary and historic processes, and to estimate phylogenetic relationships among haplotypes. Two clusters were found in all the analyses. Haplotypes clustered in the two clades were separated by eight mutational steps. Phylogeographic structure detected by the NCA was consistent with distant colonization within one clade and fragmentation followed by range expansion via long distance dispersal in the other. Three percent of nucleotide divergence between these two clades is suggestive of a gene pool division that may support the hypothesis of occurrence of two subspecies of Ae. aegypti in the Americas.
Resumo:
The statistical analysis of literary style is the part of stylometry that compares measurable characteristicsin a text that are rarely controlled by the author, with those in other texts. When thegoal is to settle authorship questions, these characteristics should relate to the author’s style andnot to the genre, epoch or editor, and they should be such that their variation between authors islarger than the variation within comparable texts from the same author.For an overview of the literature on stylometry and some of the techniques involved, see for exampleMosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) orLebart, Salem and Berry (1998).Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be“the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writterslike Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translatedseveral times into Spanish, Italian and French, with modern English translations by Rosenthal(1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465,but it was not printed until 1490.There is an intense and long lasting debate around its authorship sprouting from its first edition,where its introduction states that the whole book is the work of Martorell (1413?-1468), while atthe end it is stated that the last one fourth of the book is by Galba (?-1490), after the death ofMartorell. Some of the authors that support the theory of single authorship are Riquer (1990),Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer(1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990).Neither of the two candidate authors left any text comparable to the one under study, and thereforediscriminant analysis can not be used to help classify chapters by author. By using sample textsencompassing about ten percent of the book, and looking at word length and at the use of 44conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that mightindicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba andGinebra (2000) estimates that stylistic boundary to be near chapter 383.Following the lead of the extensive literature, this paper looks into word length, the use of the mostfrequent words and into the use of vowels in each chapter of the book. Given that the featuresselected are categorical, that leads to three contingency tables of ordered rows and therefore tothree sequences of multinomial observations.Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3describes the problem of the estimation of a suden change-point in those sequences, in the followingsections we propose various ways to estimate change-points in multinomial sequences; the methodin section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma modelsonto the sequence of Chi-square distances between each row profiles and the average profile, theone in Section 6 fits models onto the sequence of values taken by the first component of thecorrespondence analysis as well as onto sequences of other summary measures like the averageword length. In Section 7 we fit models onto the marginal binomial sequences to identify thefeatures that distinguish the chapters before and after that boundary. Most methods rely heavilyon the use of generalized linear models
Resumo:
Given the significant impact the use of glucocorticoids can have on fracture risk independent of bone density, their use has been incorporated as one of the clinical risk factors for calculating the 10-year fracture risk in the World Health Organization's Fracture Risk Assessment Tool (FRAX(®)). Like the other clinical risk factors, the use of glucocorticoids is included as a dichotomous variable with use of steroids defined as past or present exposure of 3 months or more of use of a daily dose of 5 mg or more of prednisolone or equivalent. The purpose of this report is to give clinicians guidance on adjustments which should be made to the 10-year risk based on the dose, duration of use and mode of delivery of glucocorticoids preparations. A subcommittee of the International Society for Clinical Densitometry and International Osteoporosis Foundation joint Position Development Conference presented its findings to an expert panel and the following recommendations were selected. 1) There is a dose relationship between glucocorticoid use of greater than 3 months and fracture risk. The average dose exposure captured within FRAX(®) is likely to be a prednisone dose of 2.5-7.5 mg/day or its equivalent. Fracture probability is under-estimated when prednisone dose is greater than 7.5 mg/day and is over-estimated when the prednisone dose is less than 2.5 mg/day. 2) Frequent intermittent use of higher doses of glucocorticoids increases fracture risk. Because of the variability in dose and dosing schedule, quantification of this risk is not possible. 3) High dose inhaled glucocorticoids may be a risk factor for fracture. FRAX(®) may underestimate fracture probability in users of high dose inhaled glucocorticoids. 4) Appropriate glucocorticoid replacement in individuals with adrenal insufficiency has not been found to increase fracture risk. In such patients, use of glucocorticoids should not be included in FRAX(®) calculations.
Resumo:
SummaryDiscrete data arise in various research fields, typically when the observations are count data.I propose a robust and efficient parametric procedure for estimation of discrete distributions. The estimation is done in two phases. First, a very robust, but possibly inefficient, estimate of the model parameters is computed and used to indentify outliers. Then the outliers are either removed from the sample or given low weights, and a weighted maximum likelihood estimate (WML) is computed.The weights are determined via an adaptive process such that if the data follow the model, then asymptotically no observation is downweighted.I prove that the final estimator inherits the breakdown point of the initial one, and that its influence function at the model is the same as the influence function of the maximum likelihood estimator, which strongly suggests that it is asymptotically fully efficient.The initial estimator is a minimum disparity estimator (MDE). MDEs can be shown to have full asymptotic efficiency, and some MDEs have very high breakdown points and very low bias under contamination. Several initial estimators are considered, and the performances of the WMLs based on each of them are studied.It results that in a great variety of situations the WML substantially improves the initial estimator, both in terms of finite sample mean square error and in terms of bias under contamination. Besides, the performances of the WML are rather stable under a change of the MDE even if the MDEs have very different behaviors.Two examples of application of the WML to real data are considered. In both of them, the necessity for a robust estimator is clear: the maximum likelihood estimator is badly corrupted by the presence of a few outliers.This procedure is particularly natural in the discrete distribution setting, but could be extended to the continuous case, for which a possible procedure is sketched.RésuméLes données discrètes sont présentes dans différents domaines de recherche, en particulier lorsque les observations sont des comptages.Je propose une méthode paramétrique robuste et efficace pour l'estimation de distributions discrètes. L'estimation est faite en deux phases. Tout d'abord, un estimateur très robuste des paramètres du modèle est calculé, et utilisé pour la détection des données aberrantes (outliers). Cet estimateur n'est pas nécessairement efficace. Ensuite, soit les outliers sont retirés de l'échantillon, soit des faibles poids leur sont attribués, et un estimateur du maximum de vraisemblance pondéré (WML) est calculé.Les poids sont déterminés via un processus adaptif, tel qu'asymptotiquement, si les données suivent le modèle, aucune observation n'est dépondérée.Je prouve que le point de rupture de l'estimateur final est au moins aussi élevé que celui de l'estimateur initial, et que sa fonction d'influence au modèle est la même que celle du maximum de vraisemblance, ce qui suggère que cet estimateur est pleinement efficace asymptotiquement.L'estimateur initial est un estimateur de disparité minimale (MDE). Les MDE sont asymptotiquement pleinement efficaces, et certains d'entre eux ont un point de rupture très élevé et un très faible biais sous contamination. J'étudie les performances du WML basé sur différents MDEs.Le résultat est que dans une grande variété de situations le WML améliore largement les performances de l'estimateur initial, autant en terme du carré moyen de l'erreur que du biais sous contamination. De plus, les performances du WML restent assez stables lorsqu'on change l'estimateur initial, même si les différents MDEs ont des comportements très différents.Je considère deux exemples d'application du WML à des données réelles, où la nécessité d'un estimateur robuste est manifeste : l'estimateur du maximum de vraisemblance est fortement corrompu par la présence de quelques outliers.La méthode proposée est particulièrement naturelle dans le cadre des distributions discrètes, mais pourrait être étendue au cas continu.
Resumo:
Nonlinear regression problems can often be reduced to linearity by transforming the response variable (e.g., using the Box-Cox family of transformations). The classic estimates of the parameter defining the transformation as well as of the regression coefficients are based on the maximum likelihood criterion, assuming homoscedastic normal errors for the transformed response. These estimates are nonrobust in the presence of outliers and can be inconsistent when the errors are nonnormal or heteroscedastic. This article proposes new robust estimates that are consistent and asymptotically normal for any unimodal and homoscedastic error distribution. For this purpose, a robust version of conditional expectation is introduced for which the prediction mean squared error is replaced with an M scale. This concept is then used to develop a nonparametric criterion to estimate the transformation parameter as well as the regression coefficients. A finite sample estimate of this criterion based on a robust version of smearing is also proposed. Monte Carlo experiments show that the new estimates compare favorably with respect to the available competitors.
Resumo:
The classical binary classification problem is investigatedwhen it is known in advance that the posterior probability function(or regression function) belongs to some class of functions. We introduceand analyze a method which effectively exploits this knowledge. The methodis based on minimizing the empirical risk over a carefully selected``skeleton'' of the class of regression functions. The skeleton is acovering of the class based on a data--dependent metric, especiallyfitted for classification. A new scale--sensitive dimension isintroduced which is more useful for the studied classification problemthan other, previously defined, dimension measures. This fact isdemonstrated by performance bounds for the skeleton estimate in termsof the new dimension.
Resumo:
Accurate detection of subpopulation size determinations in bimodal populations remains problematic yet it represents a powerful way by which cellular heterogeneity under different environmental conditions can be compared. So far, most studies have relied on qualitative descriptions of population distribution patterns, on population-independent descriptors, or on arbitrary placement of thresholds distinguishing biological ON from OFF states. We found that all these methods fall short of accurately describing small population sizes in bimodal populations. Here we propose a simple, statistics-based method for the analysis of small subpopulation sizes for use in the free software environment R and test this method on real as well as simulated data. Four so-called population splitting methods were designed with different algorithms that can estimate subpopulation sizes from bimodal populations. All four methods proved more precise than previously used methods when analyzing subpopulation sizes of transfer competent cells arising in populations of the bacterium Pseudomonas knackmussii B13. The methods' resolving powers were further explored by bootstrapping and simulations. Two of the methods were not severely limited by the proportions of subpopulations they could estimate correctly, but the two others only allowed accurate subpopulation quantification when this amounted to less than 25% of the total population. In contrast, only one method was still sufficiently accurate with subpopulations smaller than 1% of the total population. This study proposes a number of rational approximations to quantifying small subpopulations and offers an easy-to-use protocol for their implementation in the open source statistical software environment R.
Resumo:
This paper exploits an unusual transportation setting to estimate the value of a statistical life(VSL). We estimate the trade-offs individuals are willing to make between mortality risk andcost as they travel to and from the international airport in Sierra Leone (which is separated fromthe capital Freetown by a body of water). Travelers choose from among multiple transportoptions ? namely, ferry, helicopter, hovercraft, and water taxi. The setting and original datasetallow us to address some typical omitted variable concerns in order to generate some of the firstrevealed preference VSL estimates from Africa. The data also allows us to compare VSLestimates for travelers from 56 countries, including 20 African and 36 non-African countries, allfacing the same choice situation. The average VSL estimate for African travelers in the sample isUS$577,000 compared to US$924,000 for non-Africans. Individual characteristics, particularlyjob earnings, can largely account for the difference between Africans and non-Africans; Africansin the sample typically earn somewhat less. There is little evidence that individual VSL estimatesare driven by a lack of information, predicted life expectancy, or cultural norms around risktakingor fatalism. The data implies an income elasticity of the VSL of 1.77. These revealedpreference VSL estimates from a developing country fill an important gap in the existingliterature, and can be used for a variety of public policy purposes, including in current debateswithin Sierra Leone regarding the desirability of constructing new transportation infrastructure.
Resumo:
We present a non-equilibrium theory in a system with heat and radiative fluxes. The obtained expression for the entropy production is applied to a simple one-dimensional climate model based on the first law of thermodynamics. In the model, the dissipative fluxes are assumed to be independent variables, following the criteria of the Extended Irreversible Thermodynamics (BIT) that enlarges, in reference to the classical expression, the applicability of a macroscopic thermodynamic theory for systems far from equilibrium. We analyze the second differential of the classical and the generalized entropy as a criteria of stability of the steady states. Finally, the extreme state is obtained using variational techniques and observing that the system is close to the maximum dissipation rate
Resumo:
The second differential of the entropy is used for analysing the stability of a thermodynamic climatic model. A delay time for the heat flux is introduced whereby it becomes an independent variable. Two different expressions for the second differential of the entropy are used: one follows classical irreversible thermodynamics theory; the second is related to the introduction of response time and is due to the extended irreversible thermodynamics theory. the second differential of the classical entropy leads to unstable solutions for high values of delay times. the extended expression always implies stable states for an ice-free earth. When the ice-albedo feedback is included, a discontinuous distribution of stable states is found for high response times. Following the thermodynamic analysis of the model, the maximum rates of entropy production at the steady state are obtained. A latitudinally isothermal earth produces the extremum in global entropy production. the material contribution to entropy production (by which we mean the production of entropy by material transport of heat) is a maximum when the latitudinal distribution of temperatures becomes less homogeneous than present values