928 resultados para improved principal components analysis (IPCA) algorithm
Resumo:
Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.
Resumo:
This work concerns the influence of industrialized agriculture in the tropics on precipitation chemistry. A total of 264 rain events were sampled using a wet-only collector in central Sao Paulo State, Brazil, between January 2003 and July 2007. Electroneutrality balance calculations (considering H(+), K(+), Na(+), NH(4)(+), Ca(2)(+), Mg(2)(+), Cl(-), NO(3)(-), SO(4)(2-), F(-), PO(4)(3-), H(3)CCOO(-), HCOO(-), C(2)O(4)(2-) and HCO(3)(-)) showed that there was an excess of cations (similar to 15%), which was attributed to the presence of unmeasured organic anion species originating from biomass burning and biogenic emissions. On average, the three ions NH(4)(+), NO(3)(-) and H(+) were responsible for >55% of the total ion concentrations in the rainwater samples. Concentrations (except of H(+)) were significantly higher (t-test; P = 0.05), by between two to six-fold depending on species, during the winter sugar cane harvest period, due to the practice of pre-harvest burning of the crop. Principal component analysis showed that three components could explain 88% of the variance for measurements made throughout the year: PC1 (52%, biomass burning and soil dust resuspension); PC2 (26%, secondary aerosols); PC3 (10%, road transport emissions). Differences between harvest and non-harvest periods appeared to be mainly due to an increased relative importance of road transport/industrial emissions during the summer (non-harvest) period. The volume-weighted mean (VWM) concentrations of ammonium (23.4 mu mol L(-1)) and nitrate (17.5 mu mol L(-1)) in rainwater samples collected during the harvest period were similar to those found in rainwater from Sao Paulo city, which emphasizes the importance of including rural agro-industrial emissions in regional-scale atmospheric chemistry and transport models. Since there was evidence of a biomass burning source throughout the year, it appears that rainwater composition will continue to be affected by vegetation fires, even after sugar cane burning is phased out as envisaged by recent Sao Paulo State legislation. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
We compare Bayesian methodology utilizing free-ware BUGS (Bayesian Inference Using Gibbs Sampling) with the traditional structural equation modelling approach based on another free-ware package, Mx. Dichotomous and ordinal (three category) twin data were simulated according to different additive genetic and common environment models for phenotypic variation. Practical issues are discussed in using Gibbs sampling as implemented by BUGS to fit subject-specific Bayesian generalized linear models, where the components of variation may be estimated directly. The simulation study (based on 2000 twin pairs) indicated that there is a consistent advantage in using the Bayesian method to detect a correct model under certain specifications of additive genetics and common environmental effects. For binary data, both methods had difficulty in detecting the correct model when the additive genetic effect was low (between 10 and 20%) or of moderate range (between 20 and 40%). Furthermore, neither method could adequately detect a correct model that included a modest common environmental effect (20%) even when the additive genetic effect was large (50%). Power was significantly improved with ordinal data for most scenarios, except for the case of low heritability under a true ACE model. We illustrate and compare both methods using data from 1239 twin pairs over the age of 50 years, who were registered with the Australian National Health and Medical Research Council Twin Registry (ATR) and presented symptoms associated with osteoarthritis occurring in joints of the hand.
Resumo:
A combinação da agricultura de precisão e do Sistema Integrado de Recomendação Foliar (DRIS) possibilita monitorar espacialmente o balanço nutricional dos cafezais para fornecer recomendações de adubação mais equilibradas e mais ajustadas economicamente. O objetivo deste trabalho foi avaliar a variabilidade espacial do estado nutricional do cafeeiro conilon, utilizando o Índice de Balanço Nutricional (IBN) e sua relação com a produtividade. A produtividade das plantas em cada ponto amostral foi determinada e construiu-se o seu mapa considerando a variabilidade espacial; determinou-se o Índice de Equilíbrio Nutricional (IBN) das plantas em cada ponto amostral e construiu-se o seu mapa; e utilizou-se a análise de componentes principais (ACP) para estimar o IBN do cafeeiro por cokrigagem. Os dados do cafeeiro conilon foram coletados em fazenda experimental, no município de Cachoeiro de Itapemirim-ES. O IBN do cafeeiro e a sua produtividade foram analisados por meio de geoestatística, com base nos modelos e parâmetros dos semivariogramas, utilizando o método de interpolação krigagem ordinária para estimar valores para locais não amostrados. O índice de Balanço Nutricional da lavoura do cafeeiro conilon apresentou dependência espacial, porém não apresentou correlação linear e nem espacial com a produtividade. A lavoura em estudo se encontra em desequilíbrio nutricional, sendo que entre os macronutrientes, o Potássio foi o que apresentou maior desequilíbrio na área, entre os micronutrientes, o Zinco e o Ferro foram os que apresentaram menores concentrações nas folhas. A confecção dos mapas possibilitou a distinção de regiões com maior e menor desequilíbrio nutricional e produtividade, o que possibilita adotar o manejo de forma diferenciada e localizada. A análise multivariada baseada em componentes principais fornece componentes com alta correlação com as variáveis originais P, Ca, Zn , Cu, K e B. A cokrigagem utilizando as componentes principais permite estimar o IBN e a produtividade da área.
Resumo:
Exploratory factor analysis is a widely used statistical technique in the social sciences. It attempts to identify underlying factors that explain the pattern of correlations within a set of observed variables. A statistical software package is needed to perform the calcula- tions. However, there are some limitations with popular statistical software packages, like SPSS. The R programming language is a free software package for statistical and graphical computing. It o ers many packages written by contributors from all over the world and programming resources that allow it to overcome the dialog limitations of SPSS. This paper o ers an SPSS dialog written in the R programming language with the help of some packages, so that researchers with little or no knowledge in programming, or those who are accustomed to making their calculations based on statistical dialogs, have more options when applying factor analysis to their data and hence can adopt a better approach when dealing with ordinal, Likert-type data.
Resumo:
Mestrado em Intervenção Sócio-Organizacional na Saúde - Área de especialização: Qualidade e Tecnologias da Saúde.
Resumo:
OBJECTIVE: To identify clusters of the major occurrences of leprosy and their associated socioeconomic and demographic factors. METHODS: Cases of leprosy that occurred between 1998 and 2007 in São José do Rio Preto (southeastern Brazil) were geocodified and the incidence rates were calculated by census tract. A socioeconomic classification score was obtained using principal component analysis of socioeconomic variables. Thematic maps to visualize the spatial distribution of the incidence of leprosy with respect to socioeconomic levels and demographic density were constructed using geostatistics. RESULTS: While the incidence rate for the entire city was 10.4 cases per 100,000 inhabitants annually between 1998 and 2007, the incidence rates of individual census tracts were heterogeneous, with values that ranged from 0 to 26.9 cases per 100,000 inhabitants per year. Areas with a high leprosy incidence were associated with lower socioeconomic levels. There were identified clusters of leprosy cases, however there was no association between disease incidence and demographic density. There was a disparity between the places where the majority of ill people lived and the location of healthcare services. CONCLUSIONS: The spatial analysis techniques utilized identified the poorer neighborhoods of the city as the areas with the highest risk for the disease. These data show that health departments must prioritize politico-administrative policies to minimize the effects of social inequality and improve the standards of living, hygiene, and education of the population in order to reduce the incidence of leprosy.
Resumo:
Relatório do Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia de Electrónica e Telecomunicações
Resumo:
Mestrado em Intervenção Sócio-Organizacional em Saúde - Ramo de especialização: Políticas de Administração e Gestão de Serviços de Saúde
Resumo:
Pine forests constitute some of the most important renewable resources supplying timber, paper and chemical industries, among other functions. Characterization of the volatiles emitted by different Pinus species has proven to be an important tool to decode the process of host tree selection by herbivore insects, some of which cause serious economic damage to pines. Variations in the relative composition of the bouquet of semiochemicals are responsible for the outcome of different biological processes, such as mate finding, egg-laying site recognition and host selection. The volatiles present in phloem samples of four pine species, P. halepensis, P. sylvestris, P. pinaster and P. pinea, were identified and characterized with the aim of finding possible host-plant attractants for native pests, such as the bark beetle Tomicus piniperda. The volatile compounds emitted by phloem samples of pines were extracted by headspace solid-phase micro extraction, using a 2 cm 50/30 mm divinylbenzene/carboxen/polydimethylsiloxane table flex solid-phase microextraction fiber and its contents analyzed by high-resolution gas chromatography, using flame ionization and a non polar and chiral column phases. The components of the volatile fraction emitted by the phloem samples were identified by mass spectrometry using time-of-flight and quadrupole mass analyzers. The estimated relative composition was used to perform a discriminant analysis among pine species, by means of cluster and principal component analysis. It can be concluded that it is possible to discriminate pine species based on the monoterpenes emissions of phloem samples.
Resumo:
ABSTRACT OBJECTIVE To validate an instrument designed to assess health promotion in the school environment. METHODS A questionnaire, based on guidelines from the World Health Organization and in line with the Brazilian school health context, was developed to validate the research instrument. There were 60 items in the instrument that included 40 questions for the school manager and 20 items with direct observations made by the interviewer. The items’ content validation was performed using the Delphi technique, with the instrument being applied in 53 schools from two medium-sized cities in the South region of Brazil. Reliability (Cronbach’s alpha and split-half) and validity (principal component analysis) analyses were performed. RESULTS The final instrument remained composed of 28 items, distributed into three dimensions: pedagogical, structural and relational. The resulting components showed good factorial loads (> 0.4) and acceptable reliability (> 0.6) for most items. The pedagogical dimension identifies educational activities regarding drugs and sexuality, violence and prejudice, auto care and peace and quality of life. The structural dimension is comprised of access, sanitary structure, and conservation and equipment. The relational dimension includes relationships within the school and with the community. CONCLUSIONS The proposed instrument presents satisfactory validity and reliability values, which include aspects relevant to promote health in schools. Its use allows the description of the health promotion conditions to which students from each educational institution are exposed. Because this instrument includes items directly observed by the investigator, it should only be used during periods when there are full and regular activities at the school in question.
Resumo:
In this paper a new PCA-based positioning sensor and localization system for mobile robots to operate in unstructured environments (e. g. industry, services, domestic ...) is proposed and experimentally validated. The inexpensive positioning system resorts to principal component analysis (PCA) of images acquired by a video camera installed onboard, looking upwards to the ceiling. This solution has the advantage of avoiding the need of selecting and extracting features. The principal components of the acquired images are compared with previously registered images, stored in a reduced onboard image database, and the position measured is fused with odometry data. The optimal estimates of position and slippage are provided by Kalman filters, with global stable error dynamics. The experimental validation reported in this work focuses on the results of a set of experiments carried out in a real environment, where the robot travels along a lawn-mower trajectory. A small position error estimate with bounded co-variance was always observed, for arbitrarily long experiments, and slippage was estimated accurately in real time.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
Dissertação apresentada para a obtenção do grau de Doutor em Conservação e Restauro pela Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia
Resumo:
Beyond the classical statistical approaches (determination of basic statistics, regression analysis, ANOVA, etc.) a new set of applications of different statistical techniques has increasingly gained relevance in the analysis, processing and interpretation of data concerning the characteristics of forest soils. This is possible to be seen in some of the recent publications in the context of Multivariate Statistics. These new methods require additional care that is not always included or refered in some approaches. In the particular case of geostatistical data applications it is necessary, besides to geo-reference all the data acquisition, to collect the samples in regular grids and in sufficient quantity so that the variograms can reflect the spatial distribution of soil properties in a representative manner. In the case of the great majority of Multivariate Statistics techniques (Principal Component Analysis, Correspondence Analysis, Cluster Analysis, etc.) despite the fact they do not require in most cases the assumption of normal distribution, they however need a proper and rigorous strategy for its utilization. In this work, some reflections about these methodologies and, in particular, about the main constraints that often occur during the information collecting process and about the various linking possibilities of these different techniques will be presented. At the end, illustrations of some particular cases of the applications of these statistical methods will also be presented.