939 resultados para PRINCIPAL-COMPONENTS


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The objectives of this research are to analyze and develop a modified Principal Component Analysis (PCA) and to develop a two-dimensional PCA with applications in image processing. PCA is a classical multivariate technique where its mathematical treatment is purely based on the eigensystem of positive-definite symmetric matrices. Its main function is to statistically transform a set of correlated variables to a new set of uncorrelated variables over $\IR\sp{n}$ by retaining most of the variations present in the original variables.^ The variances of the Principal Components (PCs) obtained from the modified PCA form a correlation matrix of the original variables. The decomposition of this correlation matrix into a diagonal matrix produces a set of orthonormal basis that can be used to linearly transform the given PCs. It is this linear transformation that reproduces the original variables. The two-dimensional PCA can be devised as a two successive of one-dimensional PCA. It can be shown that, for an $m\times n$ matrix, the PCs obtained from the two-dimensional PCA are the singular values of that matrix.^ In this research, several applications for image analysis based on PCA are developed, i.e., edge detection, feature extraction, and multi-resolution PCA decomposition and reconstruction. ^

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Prices of U.S. Treasury securities vary over time and across maturities. When the market in Treasurys is sufficiently complete and frictionless, these prices may be modeled by a function time and maturity. A cross-section of this function for time held fixed is called the yield curve; the aggregate of these sections is the evolution of the yield curve. This dissertation studies aspects of this evolution. ^ There are two complementary approaches to the study of yield curve evolution here. The first is principal components analysis; the second is wavelet analysis. In both approaches both the time and maturity variables are discretized. In principal components analysis the vectors of yield curve shifts are viewed as observations of a multivariate normal distribution. The resulting covariance matrix is diagonalized; the resulting eigenvalues and eigenvectors (the principal components) are used to draw inferences about the yield curve evolution. ^ In wavelet analysis, the vectors of shifts are resolved into hierarchies of localized fundamental shifts (wavelets) that leave specified global properties invariant (average change and duration change). The hierarchies relate to the degree of localization with movements restricted to a single maturity at the base and general movements at the apex. Second generation wavelet techniques allow better adaptation of the model to economic observables. Statistically, the wavelet approach is inherently nonparametric while the wavelets themselves are better adapted to describing a complete market. ^ Principal components analysis provides information on the dimension of the yield curve process. While there is no clear demarkation between operative factors and noise, the top six principal components pick up 99% of total interest rate variation 95% of the time. An economically justified basis of this process is hard to find; for example a simple linear model will not suffice for the first principal component and the shape of this component is nonstationary. ^ Wavelet analysis works more directly with yield curve observations than principal components analysis. In fact the complete process from bond data to multiresolution is presented, including the dedicated Perl programs and the details of the portfolio metrics and specially adapted wavelet construction. The result is more robust statistics which provide balance to the more fragile principal components analysis. ^

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The current approach to data analysis for the Laser Interferometry Space Antenna (LISA) depends on the time delay interferometry observables (TDI) which have to be generated before any weak signal detection can be performed. These are linear combinations of the raw data with appropriate time shifts that lead to the cancellation of the laser frequency noises. This is possible because of the multiple occurrences of the same noises in the different raw data. Originally, these observables were manually generated starting with LISA as a simple stationary array and then adjusted to incorporate the antenna's motions. However, none of the observables survived the flexing of the arms in that they did not lead to cancellation with the same structure. The principal component approach is another way of handling these noises that was presented by Romano and Woan which simplified the data analysis by removing the need to create them before the analysis. This method also depends on the multiple occurrences of the same noises but, instead of using them for cancellation, it takes advantage of the correlations that they produce between the different readings. These correlations can be expressed in a noise (data) covariance matrix which occurs in the Bayesian likelihood function when the noises are assumed be Gaussian. Romano and Woan showed that performing an eigendecomposition of this matrix produced two distinct sets of eigenvalues that can be distinguished by the absence of laser frequency noise from one set. The transformation of the raw data using the corresponding eigenvectors also produced data that was free from the laser frequency noises. This result led to the idea that the principal components may actually be time delay interferometry observables since they produced the same outcome, that is, data that are free from laser frequency noise. The aims here were (i) to investigate the connection between the principal components and these observables, (ii) to prove that the data analysis using them is equivalent to that using the traditional observables and (ii) to determine how this method adapts to real LISA especially the flexing of the antenna. For testing the connection between the principal components and the TDI observables a 10x 10 covariance matrix containing integer values was used in order to obtain an algebraic solution for the eigendecomposition. The matrix was generated using fixed unequal arm lengths and stationary noises with equal variances for each noise type. Results confirm that all four Sagnac observables can be generated from the eigenvectors of the principal components. The observables obtained from this method however, are tied to the length of the data and are not general expressions like the traditional observables, for example, the Sagnac observables for two different time stamps were generated from different sets of eigenvectors. It was also possible to generate the frequency domain optimal AET observables from the principal components obtained from the power spectral density matrix. These results indicate that this method is another way of producing the observables therefore analysis using principal components should give the same results as that using the traditional observables. This was proven by fact that the same relative likelihoods (within 0.3%) were obtained from the Bayesian estimates of the signal amplitude of a simple sinusoidal gravitational wave using the principal components and the optimal AET observables. This method fails if the eigenvalues that are free from laser frequency noises are not generated. These are obtained from the covariance matrix and the properties of LISA that are required for its computation are the phase-locking, arm lengths and noise variances. Preliminary results of the effects of these properties on the principal components indicate that only the absence of phase-locking prevented their production. The flexing of the antenna results in time varying arm lengths which will appear in the covariance matrix and, from our toy model investigations, this did not prevent the occurrence of the principal components. The difficulty with flexing, and also non-stationary noises, is that the Toeplitz structure of the matrix will be destroyed which will affect any computation methods that take advantage of this structure. In terms of separating the two sets of data for the analysis, this was not necessary because the laser frequency noises are very large compared to the photodetector noises which resulted in a significant reduction in the data containing them after the matrix inversion. In the frequency domain the power spectral density matrices were block diagonals which simplified the computation of the eigenvalues by allowing them to be done separately for each block. The results in general showed a lack of principal components in the absence of phase-locking except for the zero bin. The major difference with the power spectral density matrix is that the time varying arm lengths and non-stationarity do not show up because of the summation in the Fourier transform.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Vigna unguiculata (L.) Walp (cowpea) is a food crop with high nutritional value that is cultivated throughout tropical and subtropical regions of the world. The main constraint on high productivity of cowpea is water deficit, caused by the long periods of drought that occur in these regions. The aim of the present study was to select elite cowpea genotypes with enhanced drought tolerance, by applying principal component analysis to 219 first-cycle progenies obtained in a recurrent selection program. The experimental design comprised a simple 15 x 15 lattice with 450 plots, each of two rows of 10 plants. Plants were grown under water-deficit conditions by applying a water depth of 205 mm representing one-half of that required by cowpea. Variables assessed were flowering, maturation, pod length, number and mass of beans/pod, mass of 100 beans, and productivity/plot. Ten elite cowpea genotypes were selected, in which principal components 1 and 2 encompassed variables related to yield (pod length, beans/pod, and productivity/plot) and life precocity (flowering and maturation), respectively.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Universidade Estadual de Campinas . Faculdade de Educação Física

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Universidade Estadual de Campinas . Faculdade de Educação Física

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A origem e a dispersão dos povos Tupiguarani têm sido intensamente debatidas entre arqueólogos e linguistas nas últimas cinco décadas. Em resumo, pode-se dizer que a ideia de que esses povos, que ocuparam grande parte do território brasileiro e parte da Bolívia, do Paraguai, do Uruguai e da Argentina, tiveram sua etnogênese na Amazônia e dali partiram para o leste e para o sul, por volta de 2.500 anos antes do presente, é bastante aceita entre os especialistas, embora uma dispersão no sentido oposto, isto é, do sul para o norte, com origem na bacia do Tietê-Paraná, não seja completamente descartada. Entre os arqueólogos que consideram a Amazônia como berço desses povos, alguns acreditam que esse surgimento se deu na Amazônia central. Outros acreditam que a etnogênese Tupiguarani ocorreu no sudoeste da Amazônia, onde hoje se concentra a maior diversidade linguística do tronco Tupi. Neste trabalho, a morfologia de 19 crânios associados à cerâmica Tupiguarani ou etnograficamente classificados como tais foram comparados a várias séries cranianas pré-históricas e etnográficas brasileiras por meio de estatísticas multivariadas. Duas técnicas multivariadas foram empregadas: Análise de Componentes Principais, aplicada sobre os centróides de cada série, e Distâncias de Mahalanobis, aplicadas aos dados individuais. Os resultados obtidos sugerem uma origem amazônica para os povos Tupiguarani, sobretudo pela forte associação encontrada entre crânios Tupi e Guarani do sudeste e do sul brasileiro e dos Tupi do norte do Brasil, com os espécimes provenientes da ilha de Marajó incluídos no estudo.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The hedonic level of commercial cachaças, was evaluated by consumers and by a tasters. The results of sensorial methods analyzed trough Principal Components Analysis, Hierarchical Cluster Analysis and the Pearson linear correlation indicated that the best classified cachaças were produced in copper stills and aged in oak casks. By contrast the worst classified exhibited as the main features be not aged and high alcohol percentage. The index of preference is positively correlated with the intensity of yellow color, wood flavor, sweetness and fruit aroma. There is a negative preference correlation with the acidity, the taste of alcohol and bitterness.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The concentration of 14 organic acids of 50 sugarcane spirits samples was determined by gas chromatography using flame ionization detection. The organic acids analytical quantitative profile in stills and column distilled spirits from wines obtained from the same must were compared. The comparison was also carried in "head", "heart" and "tail fractions of stills distilled spirits. The experimental data were analyzed by Principal Components Analysis (PCA) and pointed out that the distillation process (stills and column) strongly influences the lead spirits' organic acid composition and that producers' operational "cuts off" to produce "tail", "heart" and "head", fractions should be optimized.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

OBJETIVOS: identificar os padrões alimentares de crianças e sua associação com o nível socioeconômico das famílias. MÉTODOS: estudo transversal com 1260 crianças de 4 a 11 anos, residentes em Salvador-Bahia que incluiu aplicação de um Questionário de Frequência Alimentar semi-quantitativo. Os padrões alimentares foram identificados, empregando-se análise fatorial por componentes principais. O nível socioeconômico foi avaliado por meio de um indicador socioeconômico composto. Regressão logística multivariada foi empregada. RESULTADOS: identificaram-se quatro padrões que explicaram 45,9% da variabilidade dos dados de frequência alimentar. Crianças que pertencem ao nível socioeconômico mais alto têm 1,60 vezes mais chance (p<0,001) de apresentarem maior frequência de consumo de alimentos do padrão 1 (frutas, verduras, leguminosas, cereais e pescados) e 3,09 vezes mais chance (p<0,001) de apresentarem maior frequência de consumo dos alimentos do padrão 2 (leite/ derivados, catchup/ maionese/ mostarda e frango), quando se compara com aquele de crianças de nível socioeconômico mais baixo. Resultado inverso foi observado no padrão 4 (embutidos, ovos e carnes vermelhas); isto é, quanto maior o nível socioeconômico menor a chance da adoção desse padrão. Tendência similar foi notada para o padrão 3 (frituras, doces, salgadinhos, refrigerante/ suco artificial). CONCLUSÕES: padrões alimentares de crianças são dependentes das condições socioeconômicas das famílias e a adoção de itens alimentares mais saudáveis associa-se aos grupos de mais altos níveis socioeconômicos.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective: The biochemical alterations between inflammatory fibrous hyperplasia (IFH) and normal tissues of buccal mucosa were probed by using the FT-Raman spectroscopy technique. The aim was to find the minimal set of Raman bands that would furnish the best discrimination. Background: Raman-based optical biopsy is a widely recognized potential technique for noninvasive real-time diagnosis. However, few studies had been devoted to the discrimination of very common subtle or early pathologic states as inflammatory processes that are always present on, for example, cancer lesion borders. Methods: Seventy spectra of IFH from 14 patients were compared with 30 spectra of normal tissues from six patients. The statistical analysis was performed with principal components analysis and soft independent modeling class analogy cross-validated, leave-one-out methods. Results: Bands close to 574, 1,100, 1,250 to 1,350, and 1,500 cm(-1) (mainly amino acids and collagen bands) showed the main intragroup variations that are due to the acanthosis process in the IFH epithelium. The 1,200 (C-C aromatic/DNA), 1,350 (CH(2) bending/collagen 1), and 1,730 cm(-1) (collagen III) regions presented the main intergroup variations. This finding was interpreted as originating in an extracellular matrix-degeneration process occurring in the inflammatory tissues. The statistical analysis results indicated that the best discrimination capability (sensitivity of 95% and specificity of 100%) was found by using the 530-580 cm(-1) spectral region. Conclusions: The existence of this narrow spectral window enabling normal and inflammatory diagnosis also had useful implications for an in vivo dispersive Raman setup for clinical applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Prostate cancer cells in primary tumors have been typed CD10(-)/CD13(-)/CD24(hi)/CD26(+)/CD38(lo)/CD44(-)/CD104(-). This CD phenotype suggests a lineage relationship between cancer cells and luminal cells. The Gleason grade of tumors is a descriptive of tumor glandular differentiation. Higher Gleason scores are associated with treatment failure. Methods: CD26(+) cancer cells were isolated from Gleason 3+3 (G3) and Gleason 4+4 (G4) tumors by cell sorting, and their gene expression or transcriptome was determined by Affymetrix DNA array analysis. Dataset analysis was used to determine gene expression similarities and differences between G3 and G4 as well as to prostate cancer cell lines and histologically normal prostate luminal cells. Results: The G3 and G4 transcriptomes were compared to those of prostatic cell types of non-cancer, which included luminal, basal, stromal fibromuscular, and endothelial. A principal components analysis of the various transcriptome datasets indicated a closer relationship between luminal and G3 than luminal and G4. Dataset comparison also showed that the cancer transcriptomes differed substantially from those of prostate cancer cell lines. Conclusions: Genes differentially expressed in cancer are potential biomarkers for cancer detection, and those differentially expressed between G3 and G4 are potential biomarkers for disease stratification given that G4 cancer is associated with poor outcomes. Differentially expressed genes likely contribute to the prostate cancer phenotype and constitute the signatures of these particular cancer cell types.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Online music databases have increased significantly as a consequence of the rapid growth of the Internet and digital audio, requiring the development of faster and more efficient tools for music content analysis. Musical genres are widely used to organize music collections. In this paper, the problem of automatic single and multi-label music genre classification is addressed by exploring rhythm-based features obtained from a respective complex network representation. A Markov model is built in order to analyse the temporal sequence of rhythmic notation events. Feature analysis is performed by using two multi-variate statistical approaches: principal components analysis (unsupervised) and linear discriminant analysis (supervised). Similarly, two classifiers are applied in order to identify the category of rhythms: parametric Bayesian classifier under the Gaussian hypothesis (supervised) and agglomerative hierarchical clustering (unsupervised). Qualitative results obtained by using the kappa coefficient and the obtained clusters corroborated the effectiveness of the proposed method.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The flowpaths by which water moves from watersheds to streams has important consequences for the runoff dynamics and biogeochemistry of surface waters in the Amazon Basin. The clearing of Amazon forest to cattle pasture has the potential to change runoff sources to streams by shifting runoff to more surficial flow pathways. We applied end-member mixing analysis (EMMA) to 10 small watersheds throughout the Amazon in which solute composition of streamwater and groundwater, overland flow, soil solution, throughfall and rainwater were measured, largely as part of the Large-Scale Biosphere-Atmosphere Experiment in Amazonia. We found a range in the extent to which streamwater samples fell within the mixing space determined by potential flowpath end-members, suggesting that some water sources to streams were not sampled. The contribution of overland flow as a source of stream flow was greater in pasture watersheds than in forest watersheds of comparable size. Increases in overland flow contribution to pasture streams ranged in some cases from 0% in forest to 27-28% in pasture and were broadly consistent with results from hydrometric sampling of Amazon forest and pasture watersheds that indicate 17- to 18-fold increase in the overland flow contribution to stream flow in pastures. In forest, overland flow was an important contribution to stream flow (45-57%) in ephemeral streams where flows were dominated by stormflow. Overland flow contribution to stream flow decreased in importance with increasing watershed area, from 21 to 57% in forest and 60-89% in pasture watersheds of less than 10 ha to 0% in forest and 27-28% in pastures in watersheds greater than 100 ha. Soil solution contributions to stream flow were similar across watershed area and groundwater inputs generally increased in proportion to decreases in overland flow. Application of EMMA across multiple watersheds indicated patterns across gradients of stream size and land cover that were consistent with patterns determined by detailed hydrometric sampling.