925 resultados para probabilistic principal component analysis (probabilistic PCA)
Resumo:
Learning Disability (LD) is a neurological condition that affects a child’s brain and impairs his ability to carry out one or many specific tasks. LD affects about 15 % of children enrolled in schools. The prediction of LD is a vital and intricate job. The aim of this paper is to design an effective and powerful tool, using the two intelligent methods viz., Artificial Neural Network and Adaptive Neuro-Fuzzy Inference System, for measuring the percentage of LD that affected in school-age children. In this study, we are proposing some soft computing methods in data preprocessing for improving the accuracy of the tool as well as the classifier. The data preprocessing is performed through Principal Component Analysis for attribute reduction and closest fit algorithm is used for imputing missing values. The main idea in developing the LD prediction tool is not only to predict the LD present in children but also to measure its percentage along with its class like low or minor or major. The system is implemented in Mathworks Software MatLab 7.10. The results obtained from this study have illustrated that the designed prediction system or tool is capable of measuring the LD effectively
Resumo:
A spectral angle based feature extraction method, Spectral Clustering Independent Component Analysis (SC-ICA), is proposed in this work to improve the brain tissue classification from Magnetic Resonance Images (MRI). SC-ICA provides equal priority to global and local features; thereby it tries to resolve the inefficiency of conventional approaches in abnormal tissue extraction. First, input multispectral MRI is divided into different clusters by a spectral distance based clustering. Then, Independent Component Analysis (ICA) is applied on the clustered data, in conjunction with Support Vector Machines (SVM) for brain tissue analysis. Normal and abnormal datasets, consisting of real and synthetic T1-weighted, T2-weighted and proton density/fluid-attenuated inversion recovery images, were used to evaluate the performance of the new method. Comparative analysis with ICA based SVM and other conventional classifiers established the stability and efficiency of SC-ICA based classification, especially in reproduction of small abnormalities. Clinical abnormal case analysis demonstrated it through the highest Tanimoto Index/accuracy values, 0.75/98.8%, observed against ICA based SVM results, 0.17/96.1%, for reproduced lesions. Experimental results recommend the proposed method as a promising approach in clinical and pathological studies of brain diseases
Resumo:
Multispectral analysis is a promising approach in tissue classification and abnormality detection from Magnetic Resonance (MR) images. But instability in accuracy and reproducibility of the classification results from conventional techniques keeps it far from clinical applications. Recent studies proposed Independent Component Analysis (ICA) as an effective method for source signals separation from multispectral MR data. However, it often fails to extract the local features like small abnormalities, especially from dependent real data. A multisignal wavelet analysis prior to ICA is proposed in this work to resolve these issues. Best de-correlated detail coefficients are combined with input images to give better classification results. Performance improvement of the proposed method over conventional ICA is effectively demonstrated by segmentation and classification using k-means clustering. Experimental results from synthetic and real data strongly confirm the positive effect of the new method with an improved Tanimoto index/Sensitivity values, 0.884/93.605, for reproduced small white matter lesions
Resumo:
In this paper an attempt has been made to determine the number of Premature Ventricular Contraction (PVC) cycles accurately from a given Electrocardiogram (ECG) using a wavelet constructed from multiple Gaussian functions. It is difficult to assess the ECGs of patients who are continuously monitored over a long period of time. Hence the proposed method of classification will be helpful to doctors to determine the severity of PVC in a patient. Principal Component Analysis (PCA) and a simple classifier have been used in addition to the specially developed wavelet transform. The proposed wavelet has been designed using multiple Gaussian functions which when summed up looks similar to that of a normal ECG. The number of Gaussians used depends on the number of peaks present in a normal ECG. The developed wavelet satisfied all the properties of a traditional continuous wavelet. The new wavelet was optimized using genetic algorithm (GA). ECG records from Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) database have been used for validation. Out of the 8694 ECG cycles used for evaluation, the classification algorithm responded with an accuracy of 97.77%. In order to compare the performance of the new wavelet, classification was also performed using the standard wavelets like morlet, meyer, bior3.9, db5, db3, sym3 and haar. The new wavelet outperforms the rest
Resumo:
This article present the result from a study of two sediment cores collected from the environmentally distinct zones of CES. Accumulation status of five toxic metals: Cadmium (Cd), Chromium (Cr), Cobalt (Co), Copper (Cu) and Lead (Pb) were analyzed. Besides texture and CHNS were determined to understand the composition of the sediment. Enrichment Factor (EF) and Anthropogenic Factor (AF) were used to differentiate the typical metal sources. Metal enrichment in the cores revealed heavy load at the northern (NS1 ) region compared with the southern zone (SS1). Elevation of metal content in core NS1 showed the industrial input. Statistical analyses were employed to understand the origin of metals in the sediment samples. Principal Component Analysis (PCA) distinguishes the two zones with different metal accumulation capacity: highest at NS1 and lowest at SS1. Correlation analysis revealed positive significant relation only in core NS1, adhering to the exposition of the intensified industrial pollution
Resumo:
Information and communication technology (ICT) projects have a great potential to revolutionise the information delivery system by bridging the gap between farmers and extension personnel. aAQUA (Almost All Questions Answered) portal was launched by the Developmental Informatics Laboratory (DIL) at Indian Institute of Technology (IIT) Mumbai, Maharashtra, India in 2003 as an information providing system to deliver technology options and tailored information for the problems and queries raised by Indian dairy farmers. To measure the effectiveness of this service the attitudinal dimensions of the users of aAQUA e-Agriservice were investigated using a 22 item scale. A simple random sampling technique was used to select 120 dairy farmers from which data were collected and subjected to factor analysis to identify the underlying constructs in this research. From the attitude items, four components were extracted and named as the pessimistic, utility, technical and efficacy perspective, which influenced the development of varied level of attitudinal inclination towards the e-Agriservice. These components explained 64.40 per cent of variation in the attitude of the users towards the aAQUA e-Agriservice. This study provides a framework for technically efficient service provision that might help to reduce the pessimistic attitude of target population to adopt e-Agriservice in their farming system. The results should also be helpful for researchers, academics, ICT based service providers and policy makers to consider these perspectives while planning and implementing ICT projects.
Resumo:
First discussion on compositional data analysis is attributable to Karl Pearson, in 1897. However, notwithstanding the recent developments on algebraic structure of the simplex, more than twenty years after Aitchison’s idea of log-transformations of closed data, scientific literature is again full of statistical treatments of this type of data by using traditional methodologies. This is particularly true in environmental geochemistry where besides the problem of the closure, the spatial structure (dependence) of the data have to be considered. In this work we propose the use of log-contrast values, obtained by a simplicial principal component analysis, as LQGLFDWRUV of given environmental conditions. The investigation of the log-constrast frequency distributions allows pointing out the statistical laws able to generate the values and to govern their variability. The changes, if compared, for example, with the mean values of the random variables assumed as models, or other reference parameters, allow defining monitors to be used to assess the extent of possible environmental contamination. Case study on running and ground waters from Chiavenna Valley (Northern Italy) by using Na+, K+, Ca2+, Mg2+, HCO3-, SO4 2- and Cl- concentrations will be illustrated
Resumo:
In an earlier investigation (Burger et al., 2000) five sediment cores near the Rodrigues Triple Junction in the Indian Ocean were studied applying classical statistical methods (fuzzy c-means clustering, linear mixing model, principal component analysis) for the extraction of endmembers and evaluating the spatial and temporal variation of geochemical signals. Three main factors of sedimentation were expected by the marine geologists: a volcano-genetic, a hydro-hydrothermal and an ultra-basic factor. The display of fuzzy membership values and/or factor scores versus depth provided consistent results for two factors only; the ultra-basic component could not be identified. The reason for this may be that only traditional statistical methods were applied, i.e. the untransformed components were used and the cosine-theta coefficient as similarity measure. During the last decade considerable progress in compositional data analysis was made and many case studies were published using new tools for exploratory analysis of these data. Therefore it makes sense to check if the application of suitable data transformations, reduction of the D-part simplex to two or three factors and visual interpretation of the factor scores would lead to a revision of earlier results and to answers to open questions . In this paper we follow the lines of a paper of R. Tolosana- Delgado et al. (2005) starting with a problem-oriented interpretation of the biplot scattergram, extracting compositional factors, ilr-transformation of the components and visualization of the factor scores in a spatial context: The compositional factors will be plotted versus depth (time) of the core samples in order to facilitate the identification of the expected sources of the sedimentary process. Kew words: compositional data analysis, biplot, deep sea sediments
Resumo:
Many multivariate methods that are apparently distinct can be linked by introducing one or more parameters in their definition. Methods that can be linked in this way are correspondence analysis, unweighted or weighted logratio analysis (the latter also known as "spectral mapping"), nonsymmetric correspondence analysis, principal component analysis (with and without logarithmic transformation of the data) and multidimensional scaling. In this presentation I will show how several of these methods, which are frequently used in compositional data analysis, may be linked through parametrizations such as power transformations, linear transformations and convex linear combinations. Since the methods of interest here all lead to visual maps of data, a "movie" can be made where where the linking parameter is allowed to vary in small steps: the results are recalculated "frame by frame" and one can see the smooth change from one method to another. Several of these "movies" will be shown, giving a deeper insight into the similarities and differences between these methods
Resumo:
The North Atlantic Oscillation (NAO) is an important large-scale atmospheric circulation that influences the European countries climate. This study evaluated NAO impact in air quality in Porto Metropolitan Area (PMA), Portugal, for the period 2002-2006. NAO, air pollutants and meteorological data were statistically analyzed. All data were obtained from PMA Weather Station, PMA Air Quality Stations and NOAA analysis. Two statistical methods were applied in different time scale : principal component and correlation coefficient. Annual time scale, using multivariate analysis (PCA, principal component analysis), were applied in order to identified positive and significant association between air pollutants such as PM10, PM2.5, CO, NO and NO2, with NAO. On the other hand, the correlation coefficient using seasonal time scale were also applied to the same data. The results of PCA analysis present a general negative significant association between the total precipitation and NAO, in Factor 1 and 2 (explaining around 70% of the variance), presented in the years of 2002, 2004 and 2005. During the same years, some air pollutants (such as PM10, PM2.5, SO2, NOx and CO) present also a positive association with NAO. The O3 shows as well a positive association with NAP during 2002 and 2004, at 2nd Factor, explaining 30% of the variance. From the seasonal analysis using correlation coefficient, it was found significant correlation between PM10 (0.72., p<0.05, in 2002), PM2.5 (0 74, p<0.05, in 2004), and SO2 (0.78, p<0.01, in 2002) with NAO during March-December (no winter period) period. Significant associations between air pollutants and NAO were also verified in the winter period (December to April) mainly with ozone (2005, r=-0.55, p.<0.01). Once that human health and hospital morbidities may be affected by air pollution, the results suggest that NAO forecast can be an important tool to prevent them, in the Iberian Peninsula and specially Portugal.
Resumo:
This workshop paper reports recent developments to a vision system for traffic interpretation which relies extensively on the use of geometrical and scene context. Firstly, a new approach to pose refinement is reported, based on forces derived from prominent image derivatives found close to an initial hypothesis. Secondly, a parameterised vehicle model is reported, able to represent different vehicle classes. This general vehicle model has been fitted to sample data, and subjected to a Principal Component Analysis to create a deformable model of common car types having 6 parameters. We show that the new pose recovery technique is also able to operate on the PCA model, to allow the structure of an initial vehicle hypothesis to be adapted to fit the prevailing context. We report initial experiments with the model, which demonstrate significant improvements to pose recovery.
Resumo:
The complexity inherent in climate data makes it necessary to introduce more than one statistical tool to the researcher to gain insight into the climate system. Empirical orthogonal function (EOF) analysis is one of the most widely used methods to analyze weather/climate modes of variability and to reduce the dimensionality of the system. Simple structure rotation of EOFs can enhance interpretability of the obtained patterns but cannot provide anything more than temporal uncorrelatedness. In this paper, an alternative rotation method based on independent component analysis (ICA) is considered. The ICA is viewed here as a method of EOF rotation. Starting from an initial EOF solution rather than rotating the loadings toward simplicity, ICA seeks a rotation matrix that maximizes the independence between the components in the time domain. If the underlying climate signals have an independent forcing, one can expect to find loadings with interpretable patterns whose time coefficients have properties that go beyond simple noncorrelation observed in EOFs. The methodology is presented and an application to monthly means sea level pressure (SLP) field is discussed. Among the rotated (to independence) EOFs, the North Atlantic Oscillation (NAO) pattern, an Arctic Oscillation–like pattern, and a Scandinavian-like pattern have been identified. There is the suggestion that the NAO is an intrinsic mode of variability independent of the Pacific.
Resumo:
Locality to other nodes on a peer-to-peer overlay network can be established by means of a set of landmarks shared among the participating nodes. Each node independently collects a set of latency measures to landmark nodes, which are used as a multi-dimensional feature vector. Each peer node uses the feature vector to generate a unique scalar index which is correlated to its topological locality. A popular dimensionality reduction technique is the space filling Hilbert’s curve, as it possesses good locality preserving properties. However, there exists little comparison between Hilbert’s curve and other techniques for dimensionality reduction. This work carries out a quantitative analysis of their properties. Linear and non-linear techniques for scaling the landmark vectors to a single dimension are investigated. Hilbert’s curve, Sammon’s mapping and Principal Component Analysis have been used to generate a 1d space with locality preserving properties. This work provides empirical evidence to support the use of Hilbert’s curve in the context of locality preservation when generating peer identifiers by means of landmark vector analysis. A comparative analysis is carried out with an artificial 2d network model and with a realistic network topology model with a typical power-law distribution of node connectivity in the Internet. Nearest neighbour analysis confirms Hilbert’s curve to be very effective in both artificial and realistic network topologies. Nevertheless, the results in the realistic network model show that there is scope for improvements and better techniques to preserve locality information are required.
Resumo:
This study clarifies the taxonomic status of Anemone coronaria and segregates the species and A. coronaria infraspecific variants using morphological and morphometric analyses. Principal component analysis of the coronaria group was performed on 25 quantitative and qualitative characters, and morphometric analysis of the A. coronaria infraspecific variants was performed on 21 quantitative and qualitative characters. The results showed that the A. coronaria group clustered into four major groups: A. coronaria L., A. biflora DC, A. bucharica (Regel) Juz.ex Komarov, and a final group including A. eranthioides Regel and A. tschernjaewii Regel. The data on the A. coronaria infraspecific variants clustered into six groups: A. coronaria L. var. coronaria L., var. cyanea Ard., var. albiflora Rouy & Fouc., var. parviflora Regel, var. ventreana Ard., and var. rissoana Ard. © 2007 The Linnean Society of London
Resumo:
The rheological properties of dough and gluten are important for end-use quality of flour but there is a lack of knowledge of the relationships between fundamental and empirical tests and how they relate to flour composition and gluten quality. Dough and gluten from six breadmaking wheat qualities were subjected to a range of rheological tests. Fundamental (small-deformation) rheological characterizations (dynamic oscillatory shear and creep recovery) were performed on gluten to avoid the nonlinear influence of the starch component, whereas large deformation tests were conducted on both dough and gluten. A number of variables from the various curves were considered and subjected to a principal component analysis (PCA) to get an overview of relationships between the various variables. The first component represented variability in protein quality, associated with elasticity and tenacity in large deformation (large positive loadings for resistance to extension and initial slope of dough and gluten extension curves recorded by the SMS/Kieffer dough and gluten extensibility rig, and the tenacity and strain hardening index of dough measured by the Dobraszczyk/Roberts dough inflation system), the elastic character of the hydrated gluten proteins (large positive loading for elastic modulus [G'], large negative loadings for tan delta and steady state compliance [J(e)(0)]), the presence of high molecular weight glutenin subunits (HMW-GS) 5+10 vs. 2+12, and a size distribution of glutenin polymers shifted toward the high-end range. The second principal component was associated with flour protein content. Certain rheological data were influenced by protein content in addition to protein quality (area under dough extension curves and dough inflation curves [W]). The approach made it possible to bridge the gap between fundamental rheological properties, empirical measurements of physical properties, protein composition, and size distribution. The interpretation of this study gave indications of the molecular basis for differences in breadmaking performance.