937 resultados para principal components analysis (PCA) algorithm
Resumo:
Background The HCL-32 is a widely-used screening questionnaire for hypomania. We aimed to use a Rasch analysis approach to (i) evaluate the measurement properties, principally unidimensionality, of the HCL-32, and (ii) generate a score table to allow researchers to convert raw HCL-32 scores into an interval-level measurement which will be more appropriate for statistical analyses. Methods Subjects were part of the Bipolar Disorder Research Network (BDRN) study with DSM-IV bipolar disorder (n=389). Multidimensionality was assessed using the Rasch fit statistics and principle components analysis of the residuals (PCA). Item invariance (differential item functioning, DIF) was tested for gender, bipolar diagnosis and current mental state. Item estimates and reliabilities were calculated. Results Three items (29, 30, 32) had unacceptable fit to the Rasch unidimensional model. Item 14 displayed significant DIF for gender and items 8 and 17 for current mental state. Item estimates confirmed that not all items measure hypomania equally. Limitations This sample was recruited as part of a large ongoing genetic epidemiology study of bipolar disorder and may not be fully representative of the broader clinical population of individuals with bipolar disorder. Conclusion The HCL-32 is unidimensional in practice, but measurements may be further strengthened by the removal of four items. Re-scored linear measurements may be more appropriate for clinical research.
Resumo:
Determination of combustion metrics for a diesel engine has the potential of providing feedback for closed-loop combustion phasing control to meet current and upcoming emission and fuel consumption regulations. This thesis focused on the estimation of combustion metrics including start of combustion (SOC), crank angle location of 50% cumulative heat release (CA50), peak pressure crank angle location (PPCL), and peak pressure amplitude (PPA), peak apparent heat release rate crank angle location (PACL), mean absolute pressure error (MAPE), and peak apparent heat release rate amplitude (PAA). In-cylinder pressure has been used in the laboratory as the primary mechanism for characterization of combustion rates and more recently in-cylinder pressure has been used in series production vehicles for feedback control. However, the intrusive measurement with the in-cylinder pressure sensor is expensive and requires special mounting process and engine structure modification. As an alternative method, this work investigated block mounted accelerometers to estimate combustion metrics in a 9L I6 diesel engine. So the transfer path between the accelerometer signal and the in-cylinder pressure signal needs to be modeled. Depending on the transfer path, the in-cylinder pressure signal and the combustion metrics can be accurately estimated - recovered from accelerometer signals. The method and applicability for determining the transfer path is critical in utilizing an accelerometer(s) for feedback. Single-input single-output (SISO) frequency response function (FRF) is the most common transfer path model; however, it is shown here to have low robustness for varying engine operating conditions. This thesis examines mechanisms to improve the robustness of FRF for combustion metrics estimation. First, an adaptation process based on the particle swarm optimization algorithm was developed and added to the single-input single-output model. Second, a multiple-input single-output (MISO) FRF model coupled with principal component analysis and an offset compensation process was investigated and applied. Improvement of the FRF robustness was achieved based on these two approaches. Furthermore a neural network as a nonlinear model of the transfer path between the accelerometer signal and the apparent heat release rate was also investigated. Transfer path between the acoustical emissions and the in-cylinder pressure signal was also investigated in this dissertation on a high pressure common rail (HPCR) 1.9L TDI diesel engine. The acoustical emissions are an important factor in the powertrain development process. In this part of the research a transfer path was developed between the two and then used to predict the engine noise level with the measured in-cylinder pressure as the input. Three methods for transfer path modeling were applied and the method based on the cepstral smoothing technique led to the most accurate results with averaged estimation errors of 2 dBA and a root mean square error of 1.5dBA. Finally, a linear model for engine noise level estimation was proposed with the in-cylinder pressure signal and the engine speed as components.
Resumo:
This thesis is concerned with change point analysis for time series, i.e. with detection of structural breaks in time-ordered, random data. This long-standing research field regained popularity over the last few years and is still undergoing, as statistical analysis in general, a transformation to high-dimensional problems. We focus on the fundamental »change in the mean« problem and provide extensions of the classical non-parametric Darling-Erdős-type cumulative sum (CUSUM) testing and estimation theory within highdimensional Hilbert space settings. In the first part we contribute to (long run) principal component based testing methods for Hilbert space valued time series under a rather broad (abrupt, epidemic, gradual, multiple) change setting and under dependence. For the dependence structure we consider either traditional m-dependence assumptions or more recently developed m-approximability conditions which cover, e.g., MA, AR and ARCH models. We derive Gumbel and Brownian bridge type approximations of the distribution of the test statistic under the null hypothesis of no change and consistency conditions under the alternative. A new formulation of the test statistic using projections on subspaces allows us to simplify the standard proof techniques and to weaken common assumptions on the covariance structure. Furthermore, we propose to adjust the principal components by an implicit estimation of a (possible) change direction. This approach adds flexibility to projection based methods, weakens typical technical conditions and provides better consistency properties under the alternative. In the second part we contribute to estimation methods for common changes in the means of panels of Hilbert space valued time series. We analyze weighted CUSUM estimates within a recently proposed »high-dimensional low sample size (HDLSS)« framework, where the sample size is fixed but the number of panels increases. We derive sharp conditions on »pointwise asymptotic accuracy« or »uniform asymptotic accuracy« of those estimates in terms of the weighting function. Particularly, we prove that a covariance-based correction of Darling-Erdős-type CUSUM estimates is required to guarantee uniform asymptotic accuracy under moderate dependence conditions within panels and that these conditions are fulfilled, e.g., by any MA(1) time series. As a counterexample we show that for AR(1) time series, close to the non-stationary case, the dependence is too strong and uniform asymptotic accuracy cannot be ensured. Finally, we conduct simulations to demonstrate that our results are practically applicable and that our methodological suggestions are advantageous.
Resumo:
The current approach to data analysis for the Laser Interferometry Space Antenna (LISA) depends on the time delay interferometry observables (TDI) which have to be generated before any weak signal detection can be performed. These are linear combinations of the raw data with appropriate time shifts that lead to the cancellation of the laser frequency noises. This is possible because of the multiple occurrences of the same noises in the different raw data. Originally, these observables were manually generated starting with LISA as a simple stationary array and then adjusted to incorporate the antenna's motions. However, none of the observables survived the flexing of the arms in that they did not lead to cancellation with the same structure. The principal component approach is another way of handling these noises that was presented by Romano and Woan which simplified the data analysis by removing the need to create them before the analysis. This method also depends on the multiple occurrences of the same noises but, instead of using them for cancellation, it takes advantage of the correlations that they produce between the different readings. These correlations can be expressed in a noise (data) covariance matrix which occurs in the Bayesian likelihood function when the noises are assumed be Gaussian. Romano and Woan showed that performing an eigendecomposition of this matrix produced two distinct sets of eigenvalues that can be distinguished by the absence of laser frequency noise from one set. The transformation of the raw data using the corresponding eigenvectors also produced data that was free from the laser frequency noises. This result led to the idea that the principal components may actually be time delay interferometry observables since they produced the same outcome, that is, data that are free from laser frequency noise. The aims here were (i) to investigate the connection between the principal components and these observables, (ii) to prove that the data analysis using them is equivalent to that using the traditional observables and (ii) to determine how this method adapts to real LISA especially the flexing of the antenna. For testing the connection between the principal components and the TDI observables a 10x 10 covariance matrix containing integer values was used in order to obtain an algebraic solution for the eigendecomposition. The matrix was generated using fixed unequal arm lengths and stationary noises with equal variances for each noise type. Results confirm that all four Sagnac observables can be generated from the eigenvectors of the principal components. The observables obtained from this method however, are tied to the length of the data and are not general expressions like the traditional observables, for example, the Sagnac observables for two different time stamps were generated from different sets of eigenvectors. It was also possible to generate the frequency domain optimal AET observables from the principal components obtained from the power spectral density matrix. These results indicate that this method is another way of producing the observables therefore analysis using principal components should give the same results as that using the traditional observables. This was proven by fact that the same relative likelihoods (within 0.3%) were obtained from the Bayesian estimates of the signal amplitude of a simple sinusoidal gravitational wave using the principal components and the optimal AET observables. This method fails if the eigenvalues that are free from laser frequency noises are not generated. These are obtained from the covariance matrix and the properties of LISA that are required for its computation are the phase-locking, arm lengths and noise variances. Preliminary results of the effects of these properties on the principal components indicate that only the absence of phase-locking prevented their production. The flexing of the antenna results in time varying arm lengths which will appear in the covariance matrix and, from our toy model investigations, this did not prevent the occurrence of the principal components. The difficulty with flexing, and also non-stationary noises, is that the Toeplitz structure of the matrix will be destroyed which will affect any computation methods that take advantage of this structure. In terms of separating the two sets of data for the analysis, this was not necessary because the laser frequency noises are very large compared to the photodetector noises which resulted in a significant reduction in the data containing them after the matrix inversion. In the frequency domain the power spectral density matrices were block diagonals which simplified the computation of the eigenvalues by allowing them to be done separately for each block. The results in general showed a lack of principal components in the absence of phase-locking except for the zero bin. The major difference with the power spectral density matrix is that the time varying arm lengths and non-stationarity do not show up because of the summation in the Fourier transform.
Resumo:
Facial attractiveness is a particularly salient social cue that influences many important social outcomes. Using a standard key-press task to measure motivational salience of faces and an old/new memory task to measure memory for face photographs, this thesis investigated both within-woman and between-women variations in response to facial attractiveness. The results indicated that within-woman variables, such as fluctuations in hormone levels, influenced the motivational salience of facial attractiveness. However, the between-women variable, romantic relationship status, did not appear to modulate women’s responses to facial attractiveness. In addition to attractiveness, dominance also contributed to both the motivational salience and memorability of faces. This latter result demonstrates that, although attractiveness is an important factor for the motivational salience of faces, other factors might also cause faces to hold motivational salience. In Chapter 2, I investigated the possible effects of women’s salivary hormone levels (estradiol, progesterone, testosterone, and estradiol-to-progesterone ratio) on the motivational salience of facial attractiveness. Physically attractive faces generally hold greater motivational salience, replicating results from previous studies. Importantly, however, the effect of attractiveness on the motivational salience of faces was greater in test sessions where women had high testosterone levels. Additionally, the motivational salience of attractive female faces was greater in test sessions where women had high estradiol-to-progesterone ratios. While results from Chapter 2 suggested that the motivational salience of faces was generally positively correlated with their physical attractiveness, Chapter 3 explored whether physical characteristics other than attractiveness contributed to the motivational salience of faces. To address this issue, I first had the faces rated on multiple traits. Principal component analysis of third-party ratings of faces for these traits revealed two orthogonal components that were highly correlated with trustworthiness and dominance ratings respectively. Both components were positively and independently related to the motivational salience of faces. While Chapter 2 and 3 did not examine the between-woman differences in response to facial attractiveness, Chapter 4 examined whether women’s responses to facial attractiveness differed as a function of their romantic partnership status. As several researchers have proposed that partnership status influences women’s perception of attractiveness, in Chapter 4 I compared the effects of men’s attractiveness on partnered and unpartnered women’s performance on two response measures: memory for face photographs and the motivational salience of faces. Consistent with previous research, women’s memory was poorer for face photographs of more attractive men and more attractive men’s faces held greater motivational salience. However, in neither study were the effects of attractiveness modulated by women’s partnership status or partnered women’s reported commitment to or happiness with their romantic relationship. A key result from Chapter 4 was that more attractive faces were harder to remember. Building on this result, Chapter 5 investigated the different characteristics that contributed to the memorability of face photographs. While some work emphasizes relationships with typicality, familiarity, and memorability ratings, more recent work suggests that ratings of social traits, such as attractiveness, intelligence, and responsibility, predict the memorability of face photographs independently of typicality, familiarity, and memorability ratings. However, what components underlie these traits remains unknown, as well as whether these components relate to the actual memorability of face photographs. Principal component analysis of all these face ratings produced three orthogonal components that were highly correlated with trustworthiness, dominance, and memorability ratings, respectively. Importantly, each of these components also predicted the actual memorability of face photographs.
Resumo:
Blast is a major disease of rice in Brazil, the largest rice-producing country outside Asia. This study aimed to assess the genetic structure and mating-type frequency in a contemporary Pyricularia oryzae population, which caused widespread epidemics during the 2012/13 season in the Brazilian lowland subtropical region. Symptomatic leaves and panicles were sampled at flooded rice fields in the states of Rio Grande do Sul (RS, 34 fields) and Santa Catarina (SC, 21 fields). The polymorphism at ten simple sequence repeats (SSR or microsatellite) loci and the presence of MAT1-1 or MAT1-2 idiomorphs were assessed in a population comprised of 187 isolates. Only the MAT1-2 idiomorph was found and 162 genotypes were identified by the SSR analysis. A discriminant analysis of principal components (DAPC) of SSR data resolved four genetic groups, which were strongly associated with the cultivar of origin of the isolates. There was high level of genotypic diversity and moderate level of gene diversity regardless whether isolates were grouped in subpopulations based on geographic region, cultivar host or cultivar within region. While regional subpopulations were weakly differentiated, high genetic differentiation was found among subpopulations comprised of isolates from different cultivars. The data suggest that the rice blast pathogen population in southern Brazil is comprised of clonal lineages that are adapting to specific cultivar hosts. Farmers should avoid the use of susceptible cultivars over large areas and breeders should focus at enlarging the genetic basis of new cultivars.
Resumo:
2016
Resumo:
In this work, the volatile chromatographic profiles of roasted Arabica coffees, previously analyzed for their sensorial attributes, were explored by principal component analysis. The volatile extraction technique used was the solid phase microextraction. The correlation optimized warping algorithm was used to align the gas chromatographic profiles. Fifty four compounds were found to be related to the sensorial attributes investigated. The volatiles pyrrole, 1-methyl-pyrrole, cyclopentanone, dihydro-2-methyl-3-furanone, furfural, 2-ethyl-5-methyl-pyrazine, 2-etenyl-n-methyl-pyrazine, 5-methyl-2-propionyl-furan compounds were important for the differentiation of coffee beverage according to the flavour, cleanliness and overall quality. Two figures of merit, sensitivity and specificity (or selectivity), were used to interpret the sensory attributes studied.
Resumo:
A modified version of the intruder-resident paradigm was used to investigate if social recognition memory lasts at least 24 h. One hundred and forty-six adult male Wistar rats were used. Independent groups of rats were exposed to an intruder for 0.083, 0.5, 2, 24, or 168 h and tested 24 h after the first encounter with the familiar or a different conspecific. Factor analysis was employed to identify associations between behaviors and treatments. Resident rats exhibited a 24-h social recognition memory, as indicated by a 3- to 5-fold decrease in social behaviors in the second encounter with the same conspecific compared to those observed for a different conspecific, when the duration of the first encounter was 2 h or longer. It was possible to distinguish between two different categories of social behaviors and their expression depended on the duration of the first encounter. Sniffing the anogenital area (49.9% of the social behaviors), sniffing the body (17.9%), sniffing the head (3%), and following the conspecific (3.1%), exhibited mostly by resident rats, characterized social investigation and revealed long-term social recognition memory. However, dominance (23.8%) and mild aggression (2.3%), exhibited by both resident and intruders, characterized social agonistic behaviors and were not affected by memory. Differently, sniffing the environment (76.8% of the non-social behaviors) and rearing (14.3%), both exhibited mostly by adult intruder rats, characterized non-social behaviors. Together, these results show that social recognition memory in rats may last at least 24 h after a 2-h or longer exposure to the conspecific.
Resumo:
Características físico-químicas (cor, pH, acidez total titulável, sólidos solúveis totais, conteúdo de lipídios e umidade) e níveis de compostos bioativos (ácido ascórbico, fenólicos totais) foram determinados em quinze amostras de polpas de frutos procedentes da região Amazônica (abiu, acerola, açaí, araçá-boi, bacaba, bacuri, buriti, cajá, cajarana, caju, cupuaçu, graviola, murici, noni e tamarindo). A atividade de radicais livres foi avaliada pelo método de ABTS. Algumas polpas apresentaram alta potencialidade antioxidante, associada com a atividade antirradicais livres obtida e os conteúdos dos componentes bioativos como compostos fenólicos e ácido ascórbico, destacando-se acerola e acaí. O conteúdo total de compostos fenólicos foi correlacionado à capacidade antioxidante das polpas.
Resumo:
OBJETIVOS: identificar os padrões alimentares de crianças e sua associação com o nível socioeconômico das famílias. MÉTODOS: estudo transversal com 1260 crianças de 4 a 11 anos, residentes em Salvador-Bahia que incluiu aplicação de um Questionário de Frequência Alimentar semi-quantitativo. Os padrões alimentares foram identificados, empregando-se análise fatorial por componentes principais. O nível socioeconômico foi avaliado por meio de um indicador socioeconômico composto. Regressão logística multivariada foi empregada. RESULTADOS: identificaram-se quatro padrões que explicaram 45,9% da variabilidade dos dados de frequência alimentar. Crianças que pertencem ao nível socioeconômico mais alto têm 1,60 vezes mais chance (p<0,001) de apresentarem maior frequência de consumo de alimentos do padrão 1 (frutas, verduras, leguminosas, cereais e pescados) e 3,09 vezes mais chance (p<0,001) de apresentarem maior frequência de consumo dos alimentos do padrão 2 (leite/ derivados, catchup/ maionese/ mostarda e frango), quando se compara com aquele de crianças de nível socioeconômico mais baixo. Resultado inverso foi observado no padrão 4 (embutidos, ovos e carnes vermelhas); isto é, quanto maior o nível socioeconômico menor a chance da adoção desse padrão. Tendência similar foi notada para o padrão 3 (frituras, doces, salgadinhos, refrigerante/ suco artificial). CONCLUSÕES: padrões alimentares de crianças são dependentes das condições socioeconômicas das famílias e a adoção de itens alimentares mais saudáveis associa-se aos grupos de mais altos níveis socioeconômicos.
Resumo:
The supervised pattern recognition methods K-Nearest Neighbors (KNN), stepwise discriminant analysis (SDA), and soft independent modelling of class analogy (SIMCA) were employed in this work with the aim to investigate the relationship between the molecular structure of 27 cannabinoid compounds and their analgesic activity. Previous analyses using two unsupervised pattern recognition methods (PCA-principal component analysis and HCA-hierarchical cluster analysis) were performed and five descriptors were selected as the most relevants for the analgesic activity of the compounds studied: R (3) (charge density on substituent at position C(3)), Q (1) (charge on atom C(1)), A (surface area), log P (logarithm of the partition coefficient) and MR (molecular refractivity). The supervised pattern recognition methods (SDA, KNN, and SIMCA) were employed in order to construct a reliable model that can be able to predict the analgesic activity of new cannabinoid compounds and to validate our previous study. The results obtained using the SDA, KNN, and SIMCA methods agree perfectly with our previous model. Comparing the SDA, KNN, and SIMCA results with the PCA and HCA ones we could notice that all multivariate statistical methods classified the cannabinoid compounds studied in three groups exactly in the same way: active, moderately active, and inactive.
Resumo:
Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.
Resumo:
This work concerns the influence of industrialized agriculture in the tropics on precipitation chemistry. A total of 264 rain events were sampled using a wet-only collector in central Sao Paulo State, Brazil, between January 2003 and July 2007. Electroneutrality balance calculations (considering H(+), K(+), Na(+), NH(4)(+), Ca(2)(+), Mg(2)(+), Cl(-), NO(3)(-), SO(4)(2-), F(-), PO(4)(3-), H(3)CCOO(-), HCOO(-), C(2)O(4)(2-) and HCO(3)(-)) showed that there was an excess of cations (similar to 15%), which was attributed to the presence of unmeasured organic anion species originating from biomass burning and biogenic emissions. On average, the three ions NH(4)(+), NO(3)(-) and H(+) were responsible for >55% of the total ion concentrations in the rainwater samples. Concentrations (except of H(+)) were significantly higher (t-test; P = 0.05), by between two to six-fold depending on species, during the winter sugar cane harvest period, due to the practice of pre-harvest burning of the crop. Principal component analysis showed that three components could explain 88% of the variance for measurements made throughout the year: PC1 (52%, biomass burning and soil dust resuspension); PC2 (26%, secondary aerosols); PC3 (10%, road transport emissions). Differences between harvest and non-harvest periods appeared to be mainly due to an increased relative importance of road transport/industrial emissions during the summer (non-harvest) period. The volume-weighted mean (VWM) concentrations of ammonium (23.4 mu mol L(-1)) and nitrate (17.5 mu mol L(-1)) in rainwater samples collected during the harvest period were similar to those found in rainwater from Sao Paulo city, which emphasizes the importance of including rural agro-industrial emissions in regional-scale atmospheric chemistry and transport models. Since there was evidence of a biomass burning source throughout the year, it appears that rainwater composition will continue to be affected by vegetation fires, even after sugar cane burning is phased out as envisaged by recent Sao Paulo State legislation. (C) 2011 Elsevier Ltd. All rights reserved.
Resumo:
Exploratory factor analysis is a widely used statistical technique in the social sciences. It attempts to identify underlying factors that explain the pattern of correlations within a set of observed variables. A statistical software package is needed to perform the calcula- tions. However, there are some limitations with popular statistical software packages, like SPSS. The R programming language is a free software package for statistical and graphical computing. It o ers many packages written by contributors from all over the world and programming resources that allow it to overcome the dialog limitations of SPSS. This paper o ers an SPSS dialog written in the R programming language with the help of some packages, so that researchers with little or no knowledge in programming, or those who are accustomed to making their calculations based on statistical dialogs, have more options when applying factor analysis to their data and hence can adopt a better approach when dealing with ordinal, Likert-type data.