51 resultados para PRINCIPAL COMPONENTS-ANALYSIS
Resumo:
We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, the method leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots.
Resumo:
The Baix Empordà-Selva-Gavarres aquifer system is related to the fault set that created the tectonic basins of Empordà and Selva areas (NE Spain) during the Neogene. In this work, we describe groundwater hydrogeological, hydrochemical and isotopical (3H, δD, δ18O, and the 87Sr/86Sr ratio) characteristics of this system in order to illustrate the relevance of fault zones in groundwater flow-paths and the recharge. In that way, we identify two flow systems, with distinct hydrochemistry and isotopes. A local flow system originates at the Gavarres Range, and it flows towards the basins of the Baix Empordà and Selva, with an approximate residence time of 20 years. Additionally, a regional flow system has only been identified in the Selva basin. This one is related to the main fault zones, as preferential flow paths. Its recharge is located in mountain ranges with higher altitudes, namely the Transversal and Guilleries Ranges, with residence times larger than 50 years. Isotopical data has also shown mixing processes between both flow systems and rainfall recharge while multivariate statistical analysis of principal components has shown the main processes that control hydrochemistry of each flow systems
Resumo:
In this work we present a simulation of a recognition process with perimeter characterization of a simple plant leaves as a unique discriminating parameter. Data coding allowing for independence of leaves size and orientation may penalize performance recognition for some varieties. Border description sequences are then used, and Principal Component Analysis (PCA) is applied in order to study which is the best number of components for the classification task, implemented by means of a Support Vector Machine (SVM) System. Obtained results are satisfactory, and compared with [4] our system improves the recognition success, diminishing the variance at the same time.
Resumo:
In recent years there has been growing interest in composite indicators as an efficient tool of analysis and a method of prioritizing policies. This paper presents a composite index of intermediary determinants of child health using a multivariate statistical approach. The index shows how specific determinants of child health vary across Colombian departments (administrative subdivisions). We used data collected from the 2010 Colombian Demographic and Health Survey (DHS) for 32 departments and the capital city, Bogotá. Adapting the conceptual framework of Commission on Social Determinants of Health (CSDH), five dimensions related to child health are represented in the index: material circumstances, behavioural factors, psychosocial factors, biological factors and the health system. In order to generate the weight of the variables, and taking into account the discrete nature of the data, principal component analysis (PCA) using polychoric correlations was employed in constructing the index. From this method five principal components were selected. The index was estimated using a weighted average of the retained components. A hierarchical cluster analysis was also carried out. The results show that the biggest differences in intermediary determinants of child health are associated with health care before and during delivery.
Resumo:
This paper presents a composite index of early childhood health using a multivariate statistical approach. The index shows how child health varies across Colombian departments, -administrative subdivisions-. In recent years there has been growing interest in composite indicators as an efficient analysis tool and a way of prioritizing policies. These indicators not only enable multi-dimensional phenomena to be simplified but also make it easier to measure, visualize, monitor and compare a country’s performance in particular issues. We used data collected from the Colombian Demographic and Health Survey, DHS, for 32 departments and the capital city, Bogotá, in 2005 and 2010. The variables included in the index provide a measure of three dimensions related to child health: health status, health determinants and the health system. In order to generate the weight of the variables and take into account the discrete nature of the data, we employed a principal component analysis, PCA, using polychoric correlation. From this method, five principal components were selected. The index was estimated using a weighted average of the components retained. A hierarchical cluster analysis was also carried out. We observed that the departments ranking in the lowest positions are located on the Colombian periphery. They are departments with low per capita incomes and they present critical social indicators. The results suggest that the regional disparities in child health may be associated with differences in parental characteristics, household conditions and economic development levels, which makes clear the importance of context in the study of child health in Colombia.
Resumo:
In recent years, new analytical tools have allowed researchers to extract historical information contained in molecular data, which has fundamentally transformed our understanding of processes ruling biological invasions. However, the use of these new analytical tools has been largely restricted to studies of terrestrial organisms despite the growing recognition that the sea contains ecosystems that are amongst the most heavily affected by biological invasions, and that marine invasion histories are often remarkably complex. Here, we studied the routes of invasion and colonisation histories of an invasive marine invertebrate Microcosmus squamiger (Ascidiacea) using microsatellite loci, mitochondrial DNA sequence data and 11 worldwide populations. Discriminant analysis of principal components, clustering methods and approximate Bayesian computation (ABC) methods showed that the most likely source of the introduced populations was a single admixture event that involved populations from two genetically differentiated ancestral regions - the western and eastern coasts of Australia. The ABC analyses revealed that colonisation of the introduced range of M. squamiger consisted of a series of non-independent introductions along the coastlines of Africa, North America and Europe. Furthermore, we inferred that the sequence of colonisation across continents was in line with historical taxonomic records - first the Mediterranean Sea and South Africa from an unsampled ancestral population, followed by sequential introductions in California and, more recently, the NE Atlantic Ocean. We revealed the most likely invasion history for world populations of M. squamiger, which is broadly characterized by the presence of multiple ancestral sources and non-independent introductions within the introduced range. The results presented here illustrate the complexity of marine invasion routes and identify a cause-effect relationship between human-mediated transport and the success of widespread marine non-indigenous species, which benefit from stepping-stone invasions and admixture processes involving different sources for the spread and expansion of their range.
Resumo:
A continuous random variable is expanded as a sum of a sequence of uncorrelated random variables. These variables are principal dimensions in continuous scaling on a distance function, as an extension of classic scaling on a distance matrix. For a particular distance, these dimensions are principal components. Then some properties are studied and an inequality is obtained. Diagonal expansions are considered from the same continuous scaling point of view, by means of the chi-square distance. The geometric dimension of a bivariate distribution is defined and illustrated with copulas. It is shown that the dimension can have the power of continuum.
Resumo:
The psychometric properties of the Personal Wellbeing Index are analyzed on a Spanish and Portuguese adolescent sample. We test the reliability of the scale using Cronbach’s alpha. And complementarily we analyze the item-total correlations in the different wellbeing domains included. We execute an exploratory factor analysis (principal components) and a multigroup Confirmatory Factor Analysis (CFA). The results show that Cronbach’s alpha is 0.79 for the Chilean version and in the Brazilian version is 0.78 confirming adequate levels of reliability found in previous studies. Correlations between fields of well-being shows values ranging between 0.224 and 0.496 for Chile and from 0.24 to 0.46 for Brazil. The results are similar to those obtained in other countries. The monofactorial structure of the scale is cinfirmed, also the adjustment to the scale structure to the data of the two samples and the comparability of means of global indices. The results suggest the existence of other well-being domains that had not been considered in the original proposal of the scale
Resumo:
Psychometric analysis of the AF5 multidimensional scale of self-concept in a sample of adolescents and adults in Catalonia. The aim of this study is to carry out a psychometric study of the AF5 scale in a sample of 4.825 Catalan subjects from 11 to 63 years-old. They are students from secondary compulsory education (ESO), from high school, middle-level vocational training (CFGM) and from the university. Using a principal component analysis (PCA) the theoretical validity of the components is established and the reliability of the instrument is also analyzed. Differential analyses are performed by gender and normative group using a 2 6 factorial design. The normative group variable includes the different levels classifi ed into 6 sub-groups: university, post-compulsory secondary education (high school and CFGM), 4th of ESO, 3rd of ESO, 2nd of ESO and 1st of ESO. The results indicate that the reliability of the Catalan version of the scale is similar to the original scale. The factorial structure also fi ts with the original model established beforehand. Signifi cant differences by normative group in the four components of self-concept explored (social, family, academic/occupational and physical) are observed. By gender, signifi cant differences appear in the component of physical self-concept, academic and social but not in the family component
Resumo:
In this paper we examine the out-of-sample forecast performance of high-yield credit spreads regarding real-time and revised data on employment and industrial production in the US. We evaluate models using both a point forecast and a probability forecast exercise. Our main findings suggest the use of few factors obtained by pooling information from a number of sector-specific high-yield credit spreads. This can be justified by observing that, especially for employment, there is a gain from using a principal components model fitted to high-yield credit spreads compared to the prediction produced by benchmarks, such as an AR, and ARDL models that use either the term spread or the aggregate high-yield spread as exogenous regressor. Moreover, forecasts based on real-time data are generally comparable to forecasts based on revised data. JEL Classification: C22; C53; E32 Keywords: Credit spreads; Principal components; Forecasting; Real-time data.
Resumo:
El treball té com a objectiu l'estudi de les propietats semàntiques d'un grup de verbs de desplaçament i els seus corresponents arguments. La informació sobre el tipus de complement que demana cada verb és important de cara a conèixer l'estructura sintàctica de la frase i oferir solucions pràctiques en tasques de Processament del Llenguatge Natural. L'anàlisi se centrarà en els verbs conduir, navegar i volar, a partir dels sentits bàsics que el Diccionari d'ús dels verbs catalans (DUVC) descriu per a cadascun d'aquests verbs i de les seves restriccions selectives. Comprovarem, mitjançant un centenar de frases extretes del Corpus d'Ús del Català a la Web de la Universitat Pompeu Fabra i del Corpus Textual Informatitzat de la Llengua Catalana de l'Institut d'Estudis Catalans, si en la llengua es donen només els sentits i usos descrits en el DUVC i quins són els més freqüents. Finalment, descriurem els noms que fan de nucli dels arguments en termes de trets semàntics.
Resumo:
This study examines how structural determinants influence intermediary factors of child health inequities and how they operate through the communities where children live. In particular, we explore individual, family and community level characteristics associated with a composite indicator that quantitatively measures intermediary determinants of early childhood health in Colombia. We use data from the 2010 Colombian Demographic and Health Survey (DHS). Adopting the conceptual framework of the Commission on Social Determinants of Health (CSDH), three dimensions related to child health are represented in the index: behavioural factors, psychosocial factors and health system. In order to generate the weight of the variables and take into account the discrete nature of the data, principal component analysis (PCA) using polychoric correlations are employed in the index construction. Weighted multilevel models are used to examine community effects. The results show that the effect of household’s SES is attenuated when community characteristics are included, indicating the importance that the level of community development may have in mediating individual and family characteristics. The findings indicate that there is a significant variance in intermediary determinants of child health between-community, especially for those determinants linked to the health system, even after controlling for individual, family and community characteristics. These results likely reflect that whilst the community context can exert a greater influence on intermediary factors linked directly to health, in the case of psychosocial factors and the parent’s behaviours, the family context can be more important. This underlines the importance of distinguishing between community and family intervention programmes.
Resumo:
Three multivariate statistical tools (principal component analysis, factor analysis, analysis discriminant) have been tested to characterize and model the sags registered in distribution substations. Those models use several features to represent the magnitude, duration and unbalanced grade of sags. They have been obtained from voltage and current waveforms. The techniques are tested and compared using 69 registers of sags. The advantages and drawbacks of each technique are listed
Resumo:
A statistical method for classification of sags their origin downstream or upstream from the recording point is proposed in this work. The goal is to obtain a statistical model using the sag waveforms useful to characterise one type of sags and to discriminate them from the other type. This model is built on the basis of multi-way principal component analysis an later used to project the available registers in a new space with lower dimension. Thus, a case base of diagnosed sags is built in the projection space. Finally classification is done by comparing new sags against the existing in the case base. Similarity is defined in the projection space using a combination of distances to recover the nearest neighbours to the new sag. Finally the method assigns the origin of the new sag according to the origin of their neighbours
Resumo:
The work presented in this paper belongs to the power quality knowledge area and deals with the voltage sags in power transmission and distribution systems. Propagating throughout the power network, voltage sags can cause plenty of problems for domestic and industrial loads that can financially cost a lot. To impose penalties to responsible party and to improve monitoring and mitigation strategies, sags must be located in the power network. With such a worthwhile objective, this paper comes up with a new method for associating a sag waveform with its origin in transmission and distribution networks. It solves this problem through developing hybrid methods which hire multiway principal component analysis (MPCA) as a dimension reduction tool. MPCA reexpresses sag waveforms in a new subspace just in a few scores. We train some well-known classifiers with these scores and exploit them for classification of future sags. The capabilities of the proposed method for dimension reduction and classification are examined using the real data gathered from three substations in Catalonia, Spain. The obtained classification rates certify the goodness and powerfulness of the developed hybrid methods as brand-new tools for sag classification