52 resultados para non separable data
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
Starting with logratio biplots for compositional data, which are based on the principle of subcompositional coherence, and then adding weights, as in correspondence analysis, we rediscover Lewi's spectral map and many connections to analyses of two-way tables of non-negative data. Thanks to the weighting, the method also achieves the property of distributional equivalence
Resumo:
Globalization involves several facility location problems that need to be handled at large scale. Location Allocation (LA) is a combinatorial problem in which the distance among points in the data space matter. Precisely, taking advantage of the distance property of the domain we exploit the capability of clustering techniques to partition the data space in order to convert an initial large LA problem into several simpler LA problems. Particularly, our motivation problem involves a huge geographical area that can be partitioned under overall conditions. We present different types of clustering techniques and then we perform a cluster analysis over our dataset in order to partition it. After that, we solve the LA problem applying simulated annealing algorithm to the clustered and non-clustered data in order to work out how profitable is the clustering and which of the presented methods is the most suitable
Resumo:
The low levels of unemployment recorded in the UK in recent years are widely cited asevidence of the country’s improved economic performance, and the apparent convergence of unemployment rates across the country’s regions used to suggest that the longstanding divide in living standards between the relatively prosperous ‘south’ and the more depressed ‘north’ has been substantially narrowed. Dissenters from theseconclusions have drawn attention to the greatly increased extent of non-employment(around a quarter of the UK’s working age population are not in employment) and themarked regional dimension in its distribution across the country. Amongst these dissenters it is generally agreed that non-employment is concentrated amongst oldermales previously employed in the now very much smaller ‘heavy’ industries (e.g. coal,steel, shipbuilding).This paper uses the tools of compositiona l data analysis to provide a much richer picture of non-employment and one which challenges the conventional analysis wisdom about UK labour market performance as well as the dissenters view of the nature of theproblem. It is shown that, associated with the striking ‘north/south’ divide in nonemployment rates, there is a statistically significant relationship between the size of the non-employment rate and the composition of non-employment. Specifically, it is shown that the share of unemployment in non-employment is negatively correlated with the overall non-employment rate: in regions where the non-employment rate is high the share of unemployment is relatively low. So the unemployment rate is not a very reliable indicator of regional disparities in labour market performance. Even more importantly from a policy viewpoint, a significant positive relationship is found between the size ofthe non-employment rate and the share of those not employed through reason of sicknessor disability and it seems (contrary to the dissenters) that this connection is just as strong for women as it is for men
Resumo:
Non crystalline (nc) EuIG and DyIG have been prepared by dc¿sputtering. Mössbauer data on 57Fe, 151Eu and 161Dy reveal sharp magnetic transitions at 62 K and 70 K for nc EuIG and DyIG, respectively. The 57Fe hyperfine (hf) spectra consist of three superpositioned patterns for Fe3+ in tetrahedral and octahedral and for Fe2+ in tetrahedral oxygen coordination. The saturation hf fields at 4.2 K are reduced compared to the values of the corresponding crystalline materials. The induced hf field at 151Eu is only 1/8 of that for crystalline EuIG
Resumo:
Report for the scientific sojourn carried out at the University of New South Wales from February to June the 2007. Two different biogeochemical models are coupled to a three dimensional configuration of the Princeton Ocean Model (POM) for the Northwestern Mediterranean Sea (Ahumada and Cruzado, 2007). The first biogeochemical model (BLANES) is the three-dimensional version of the model described by Bahamon and Cruzado (2003) and computes the nitrogen fluxes through six compartments using semi-empirical descriptions of biological processes. The second biogeochemical model (BIOMEC) is the biomechanical NPZD model described in Baird et al. (2004), which uses a combination of physiological and physical descriptions to quantify the rates of planktonic interactions. Physical descriptions include, for example, the diffusion of nutrients to phytoplankton cells and the encounter rate of predators and prey. The link between physical and biogeochemical processes in both models is expressed by the advection-diffusion of the non-conservative tracers. The similarities in the mathematical formulation of the biogeochemical processes in the two models are exploited to determine the parameter set for the biomechanical model that best fits the parameter set used in the first model. Three years of integration have been carried out for each model to reach the so called perpetual year run for biogeochemical conditions. Outputs from both models are averaged monthly and then compared to remote sensing images obtained from sensor MERIS for chlorophyll.
Resumo:
We present a real data set of claims amounts where costs related to damage are recorded separately from those related to medical expenses. Only claims with positive costs are considered here. Two approaches to density estimation are presented: a classical parametric and a semi-parametric method, based on transformation kernel density estimation. We explore the data set with standard univariate methods. We also propose ways to select the bandwidth and transformation parameters in the univariate case based on Bayesian methods. We indicate how to compare the results of alternative methods both looking at the shape of the overall density domain and exploring the density estimates in the right tail.
Resumo:
This paper analyses the impact of using different correlation assumptions between lines of business when estimating the risk-based capital reserve, the Solvency Capital Requirement (SCR), under Solvency II regulations. A case study is presented and the SCR is calculated according to the Standard Model approach. Alternatively, the requirement is then calculated using an Internal Model based on a Monte Carlo simulation of the net underwriting result at a one-year horizon, with copulas being used to model the dependence between lines of business. To address the impact of these model assumptions on the SCR we conduct a sensitivity analysis. We examine changes in the correlation matrix between lines of business and address the choice of copulas. Drawing on aggregate historical data from the Spanish non-life insurance market between 2000 and 2009, we conclude that modifications of the correlation and dependence assumptions have a significant impact on SCR estimation.
Resumo:
Half-lives of radionuclides span more than 50 orders of magnitude. We characterize the probability distribution of this broad-range data set at the same time that explore a method for fitting power-laws and testing goodness-of-fit. It is found that the procedure proposed recently by Clauset et al. [SIAM Rev. 51, 661 (2009)] does not perform well as it rejects the power-law hypothesis even for power-law synthetic data. In contrast, we establish the existence of a power-law exponent with a value around 1.1 for the half-life density, which can be explained by the sharp relationship between decay rate and released energy, for different disintegration types. For the case of alpha emission, this relationship constitutes an original mechanism of power-law generation.
Resumo:
Land cover classification is a key research field in remote sensing and land change science as thematic maps derived from remotely sensed data have become the basis for analyzing many socio-ecological issues. However, land cover classification remains a difficult task and it is especially challenging in heterogeneous tropical landscapes where nonetheless such maps are of great importance. The present study aims to establish an efficient classification approach to accurately map all broad land cover classes in a large, heterogeneous tropical area of Bolivia, as a basis for further studies (e.g., land cover-land use change). Specifically, we compare the performance of parametric (maximum likelihood), non-parametric (k-nearest neighbour and four different support vector machines - SVM), and hybrid classifiers, using both hard and soft (fuzzy) accuracy assessments. In addition, we test whether the inclusion of a textural index (homogeneity) in the classifications improves their performance. We classified Landsat imagery for two dates corresponding to dry and wet seasons and found that non-parametric, and particularly SVM classifiers, outperformed both parametric and hybrid classifiers. We also found that the use of the homogeneity index along with reflectance bands significantly increased the overall accuracy of all the classifications, but particularly of SVM algorithms. We observed that improvements in producer’s and user’s accuracies through the inclusion of the homogeneity index were different depending on land cover classes. Earlygrowth/degraded forests, pastures, grasslands and savanna were the classes most improved, especially with the SVM radial basis function and SVM sigmoid classifiers, though with both classifiers all land cover classes were mapped with producer’s and user’s accuracies of around 90%. Our approach seems very well suited to accurately map land cover in tropical regions, thus having the potential to contribute to conservation initiatives, climate change mitigation schemes such as REDD+, and rural development policies.
Resumo:
In this work discuss the use of the standard model for the calculation of the solvency capital requirement (SCR) when the company aims to use the specific parameters of the model on the basis of the experience of its portfolio. In particular, this analysis focuses on the formula presented in the latest quantitative impact study (2010 CEIOPS) for non-life underwriting premium and reserve risk. One of the keys of the standard model for premium and reserves risk is the correlation matrix between lines of business. In this work we present how the correlation matrix between lines of business could be estimated from a quantitative perspective, as well as the possibility of using a credibility model for the estimation of the matrix of correlation between lines of business that merge qualitative and quantitative perspective.
Resumo:
This analysis was stimulated by the real data analysis problem of householdexpenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that tryto add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spendingexcluding alcohol/tobacco similar for teetotal and non-teetotal households?In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than onecomponent, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durableswithin the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small.While this analysis is based on around economic data, the ideas carry over tomany other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)
Resumo:
Human Immunodeficiency Virus continues to be a pandemic. Spain is one of the European countries with the highest incidence of HIV. Within Catalonia, Spain many projects have been implemented with the intention of improving HIV knowledge and lowering the incidence. HIV knowledge is also known to have a positive effect on lowering stigma and discrimination of the people living with HIV. However, few studies study the distribution of HIV knowledge and its association to HIV status, age, sex, geographical zone of origin and level of education within the same study. Objectives: To identify if HIV knowledge is associated with HIV status, age, sex, geographical zone of origin and level of education. Method: Quantitative, cross-sectional, centre-based study comprising of people receiving an HIV test in Catalonia, Spain. Data will be collected from the 11 HIV Non-Governmental Organisations in Catalonia, Spain. The Brief HIV Knowledge Scale will be used to assess HIV knowledge; information from the HIV test session will be used to assess HIV status, age, sex, geographic zone of origin and level of education. The association between HIV knowledge and the afore mentioned variables will then be calculated.
Resumo:
As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completelyabsent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and byMartín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involvedparts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method isintroduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that thetheoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approachhas reasonable properties from a compositional point of view. In particular, it is “natural” in the sense thatit recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in thesame paper a substitution method for missing values on compositional data sets is introduced
Resumo:
The log-ratio methodology makes available powerful tools for analyzing compositionaldata. Nevertheless, the use of this methodology is only possible for those data setswithout null values. Consequently, in those data sets where the zeros are present, aprevious treatment becomes necessary. Last advances in the treatment of compositionalzeros have been centered especially in the zeros of structural nature and in the roundedzeros. These tools do not contemplate the particular case of count compositional datasets with null values. In this work we deal with \count zeros" and we introduce atreatment based on a mixed Bayesian-multiplicative estimation. We use the Dirichletprobability distribution as a prior and we estimate the posterior probabilities. Then weapply a multiplicative modi¯cation for the non-zero values. We present a case studywhere this new methodology is applied.Key words: count data, multiplicative replacement, composition, log-ratio analysis
Resumo:
One of the tantalising remaining problems in compositional data analysis lies in how to deal with data sets in which there are components which are essential zeros. By anessential zero we mean a component which is truly zero, not something recorded as zero simply because the experimental design or the measuring instrument has not been sufficiently sensitive to detect a trace of the part. Such essential zeros occur inmany compositional situations, such as household budget patterns, time budgets,palaeontological zonation studies, ecological abundance studies. Devices such as nonzero replacement and amalgamation are almost invariably ad hoc and unsuccessful insuch situations. From consideration of such examples it seems sensible to build up amodel in two stages, the first determining where the zeros will occur and the secondhow the unit available is distributed among the non-zero parts. In this paper we suggest two such models, an independent binomial conditional logistic normal model and a hierarchical dependent binomial conditional logistic normal model. The compositional data in such modelling consist of an incidence matrix and a conditional compositional matrix. Interesting statistical problems arise, such as the question of estimability of parameters, the nature of the computational process for the estimation of both the incidence and compositional parameters caused by the complexity of the subcompositional structure, the formation of meaningful hypotheses, and the devising of suitable testing methodology within a lattice of such essential zero-compositional hypotheses. The methodology is illustrated by application to both simulated and real compositional data