112 resultados para Sparse component analysis
Resumo:
In this study, the promising metabolomic approach integrating with ingenuity pathway analysis (IPA) was applied to characterize the tissue specific metabolic perturbation of rats that was induced by indomethacin. The selective pattern recognition analyses were applied to analyze global metabolic profiling of urine of rats treated by indomethacin at an acute dosage of reference that has been proven to induce tissue disorders in rats, evaluated throughout the time-course of -24-72 h. The results preliminarily revealed that modifications of amino acid metabolism, fatty acid metabolism and energetically associated metabolic pathways accounted for metabolic perturbation of the rats that was induced by indomethacin. Furthermore, IPA was applied to deeply analyze the biomarkers and their relations with the metabolic perturbations evidenced by pattern recognition analyses. Specific biochemical functions affected by indomethacin suggested that there is an important correlation of its effects in kidney and liver metabolism, based on the determined metabolites and their pathway-based analysis. The IPA correlation of the three major biomarkers, identified as creatinine, prostaglandin E2 and guanosine, suggested that the administration of indomethacin induced certain levels of toxicity in the kidneys and liver. The changes in the levels of biomarker metabolites allowed the phenotypical determination of the metabolic perturbations induced by indomethacin in a time-dependent manner.
Resumo:
Objective To explore the characteristics of regional distribution of cancer deaths in Shandong Province with the principle components analysis. Methods The principle components analysis with co-variance matrix for age-adjusted mortality rates and percentages of 20 types of cancer in 22 counties (cities) were carried out using SAS Software. Results Over 90% of the total information could be reflected by the top 3 principle components and the first principle component alone represented more than half of the overall regional variances. The first component mainly reflected the area differences of esophageal cancer. The second component mainly reflected the area differences of lung cancer, stomach cancer and liver cancer. The value of the first principal component scores showed a clear trend that the west areas possessed higher values and the east the lower values. Based on the top two components,the 22 counties (cities) could be divided into several geographical clusters. Conclusion The overall difference of regional distribution of cancers in Shandong is dominated by several major cancers including esophageal cancer, lung cancer, stomach cancer and liver cancer. Among them,esophageal cancer makes the largest contribution. If the range of counties (cities) analyzed could be further widened, the characteristics of regional distribution of cancer mortality would be better examined. Abstract in Chinese 目的 利用主成分分析探讨山东省恶性肿瘤死亡的地区分布特征. 方法 利用SAS软件对山东省22个县市区2004~2006午的20种恶性肿瘤标化死亡率和构成比分别进行协方差矩阵主成分分析. 结果 前3个主成分就反映了总体差异90%以上的信息,其中仅第1主成分就提供了总体差异一半以上的信息.第1主成分主要反映了食管癌的地区差异,第2主成分主要反映肺癌的地区差异,兼顾胃癌和肝癌.各地区第1主成分得分呈现西高东低的趋势,根据第1和第2主成分可以将调查地区分为若干类别,表现为明显的地理聚集性. 结论 山东省各地区恶性肿瘤死亡的总体差异主要取决于少数高发肿瘤,包括食管癌、肺癌、胃癌、肝癌等,其中以食管癌地位最为突出.如能进一步扩大分析范围,可更好地查明恶性肿瘤死亡的地区特征.
Resumo:
Near-infrared spectroscopy (NIRS) calibrations were developed for the discrimination of Chinese hawthorn (Crataegus pinnatifida Bge. var. major) fruit from three geographical regions as well as for the estimation of the total sugar, total acid, total phenolic content, and total antioxidant activity. Principal component analysis (PCA) was used for the discrimination of the fruit on the basis of their geographical origin. Three pattern recognition methods, linear discriminant analysis, partial least-squares-discriminant analysis, and back-propagation artificial neural networks, were applied to classify and compare these samples. Furthermore, three multivariate calibration models based on the first derivative NIR spectroscopy, partial least-squares regression, back-propagation artificial neural networks, and least-squares-support vector machines, were constructed for quantitative analysis of the four analytes, total sugar, total acid, total phenolic content, and total antioxidant activity, and validated by prediction data sets.
Resumo:
The main purpose of this article is to gain an insight into the relationships between variables describing the environmental conditions of the Far Northern section of the Great Barrier Reef, Australia. Several of the variables describing these conditions had different measurement levels and often they had non-linear relationships. Using non-linear principal component analysis, it was possible to acquire an insight into these relationships. Furthermore, three geographical areas with unique environmental characteristics could be identified.
Resumo:
Data associated with germplasm collections are typically large and multivariate with a considerable number of descriptors measured on each of many accessions. Pattern analysis methods of clustering and ordination have been identified as techniques for statistically evaluating the available diversity in germplasm data. While used in many studies, the approaches have not dealt explicitly with the computational consequences of large data sets (i.e. greater than 5000 accessions). To consider the application of these techniques to germplasm evaluation data, 11328 accessions of groundnut (Arachis hypogaea L) from the International Research Institute for the Semi-Arid Tropics, Andhra Pradesh, India were examined. Data for nine quantitative descriptors measured in the rainy and post-rainy growing seasons were used. The ordination technique of principal component analysis was used to reduce the dimensionality of the germplasm data. The identification of phenotypically similar groups of accessions within large scale data via the computationally intensive hierarchical clustering techniques was not feasible and non-hierarchical techniques had to be used. Finite mixture models that maximise the likelihood of an accession belonging to a cluster were used to cluster the accessions in this collection. The patterns of response for the different growing seasons were found to be highly correlated. However, in relating the results to passport and other characterisation and evaluation descriptors, the observed patterns did not appear to be related to taxonomy or any other well known characteristics of groundnut.
Resumo:
As a sequel to a paper that dealt with the analysis of two-way quantitative data in large germplasm collections, this paper presents analytical methods appropriate for two-way data matrices consisting of mixed data types, namely, ordered multicategory and quantitative data types. While various pattern analysis techniques have been identified as suitable for analysis of the mixed data types which occur in germplasm collections, the clustering and ordination methods used often can not deal explicitly with the computational consequences of large data sets (i.e. greater than 5000 accessions) with incomplete information. However, it is shown that the ordination technique of principal component analysis and the mixture maximum likelihood method of clustering can be employed to achieve such analyses. Germplasm evaluation data for 11436 accessions of groundnut (Arachis hypogaea L.) from the International Research Institute of the Semi-Arid Tropics, Andhra Pradesh, India were examined. Data for nine quantitative descriptors measured in the post-rainy season and five ordered multicategory descriptors were used. Pattern analysis results generally indicated that the accessions could be distinguished into four regions along the continuum of growth habit (or plant erectness). Interpretation of accession membership in these regions was found to be consistent with taxonomic information, such as subspecies. Each growth habit region contained accessions from three of the most common groundnut botanical varieties. This implies that within each of the habit types there is the full range of expression for the other descriptors used in the analysis. Using these types of insights, the patterns of variability in germplasm collections can provide scientists with valuable information for their plant improvement programs.
Resumo:
Information on the variation available for different plant attributes has enabled germplasm collections to be effectively utilised in plant breeding. A world sourced collection of white clover germplasm has been developed at the White Clover Resource Centre at Glen Innes, New South Wales. This collection of 439 accessions was characterised under field conditions as a preliminary study of the genotypic variation for morphological attributes; stolon density, stolon branching, number of nodes. number of rooted nodes, stolon thickness, internode length, leaf length, plant height and plant spread, together with seasonal herbage yield. Characterisation was conducted on different batches of germplasm (subsets of accessions taken from the complete collection) over a period of five years. Inclusion of two check cultivars, Haifa and Huia, in each batch enabled adjustment of the characterisation data for year effects and attribute-by-year interaction effects. The component of variance for seasonal herbage yield among batches was large relative to that for accessions. Accession-by-experiment and accession-by-season interactions for herbage yield were not detected. Accession mean repeatability for herbage yield across seasons was intermediate (0.453). The components of genotypic variance among accessions for all attributes, except plant height, were larger than their respective standard errors. The estimates of accession mean repeatability for the attributes ranged from low (0.277 for plant height) to intermediate (0.544 for internode length). Multivariate techniques of clustering and ordination were used to investigate the diversity present among the accessions in the collection. Both cluster analysis and principal component analysis suggested that seven groups of accessions existed. It was also proposed from the pattern analysis results that accessions from a group characterised by large leaves, tall plants and thick stolons could be crossed with accessions from a group that had above average stolon density and stolon branching. This material could produce breeding populations to be used in recurrent selection for the development of white clover cultivars for dryland summer moisture stress environments in Australia. The germplasm collection was also found to be deficient in genotypes with high stolon density, high number of branches high number of rooted nodes and large leaves. This warrants addition of new germplasm accessions possessing these characteristics to the present germplasm collection.
Resumo:
A catchment-scale multivariate statistical analysis of hydrochemistry enabled assessment of interactions between alluvial groundwater and Cressbrook Creek, an intermittent drainage system in southeast Queensland, Australia. Hierarchical cluster analyses and principal component analysis were applied to time-series data to evaluate the hydrochemical evolution of groundwater during periods of extreme drought and severe flooding. A simple three-dimensional geological model was developed to conceptualise the catchment morphology and the stratigraphic framework of the alluvium. The alluvium forms a two-layer system with a basal coarse-grained layer overlain by a clay-rich low-permeability unit. In the upper and middle catchment, alluvial groundwater is chemically similar to streamwater, particularly near the creek (reflected by high HCO3/Cl and K/Na ratios and low salinities), indicating a high degree of connectivity. In the lower catchment, groundwater is more saline with lower HCO3/Cl and K/Na ratios, notably during dry periods. Groundwater salinity substantially decreased following severe flooding in 2011, notably in the lower catchment, confirming that flooding is an important mechanism for both recharge and maintaining groundwater quality. The integrated approach used in this study enabled effective interpretation of hydrological processes and can be applied to a variety of hydrological settings to synthesise and evaluate large hydrochemical datasets.
Resumo:
Samples of Forsythia suspensa from raw (Laoqiao) and ripe (Qingqiao) fruit were analyzed with the use of HPLC-DAD and the EIS-MS techniques. Seventeen peaks were detected, and of these, twelve were identified. Most were related to the glucopyranoside molecular fragment. Samples collected from three geographical areas (Shanxi, Henan and Shandong Provinces), were discriminated with the use of hierarchical clustering analysis (HCA), discriminant analysis (DA), and principal component analysis (PCA) models, but only PCA was able to provide further information about the relationships between objects and loadings; eight peaks were related to the provinces of sample origin. The supervised classification models-K-nearest neighbor (KNN), least squares support vector machines (LS-SVM), and counter propagation artificial neural network (CP-ANN) methods, indicated successful classification but KNN produced 100% classification rate. Thus, the fruit were discriminated on the basis of their places of origin.
Resumo:
We propose a new information-theoretic metric, the symmetric Kullback-Leibler divergence (sKL-divergence), to measure the difference between two water diffusivity profiles in high angular resolution diffusion imaging (HARDI). Water diffusivity profiles are modeled as probability density functions on the unit sphere, and the sKL-divergence is computed from a spherical harmonic series, which greatly reduces computational complexity. Adjustment of the orientation of diffusivity functions is essential when the image is being warped, so we propose a fast algorithm to determine the principal direction of diffusivity functions using principal component analysis (PCA). We compare sKL-divergence with other inner-product based cost functions using synthetic samples and real HARDI data, and show that the sKL-divergence is highly sensitive in detecting small differences between two diffusivity profiles and therefore shows promise for applications in the nonlinear registration and multisubject statistical analysis of HARDI data.
Resumo:
A combined data matrix consisting of high performance liquid chromatography–diode array detector (HPLC–DAD) and inductively coupled plasma-mass spectrometry (ICP-MS) measurements of samples from the plant roots of the Cortex moutan (CM), produced much better classification and prediction results in comparison with those obtained from either of the individual data sets. The HPLC peaks (organic components) of the CM samples, and the ICP-MS measurements (trace metal elements) were investigated with the use of principal component analysis (PCA) and the linear discriminant analysis (LDA) methods of data analysis; essentially, qualitative results suggested that discrimination of the CM samples from three different provinces was possible with the combined matrix producing best results. Another three methods, K-nearest neighbor (KNN), back-propagation artificial neural network (BP-ANN) and least squares support vector machines (LS-SVM) were applied for the classification and prediction of the samples. Again, the combined data matrix analyzed by the KNN method produced best results (100% correct; prediction set data). Additionally, multiple linear regression (MLR) was utilized to explore any relationship between the organic constituents and the metal elements of the CM samples; the extracted linear regression equations showed that the essential metals as well as some metallic pollutants were related to the organic compounds on the basis of their concentrations
Resumo:
A novel combined near- and mid-infrared (NIR and MIR) spectroscopic method has been researched and developed for the analysis of complex substances such as the Traditional Chinese Medicine (TCM), Illicium verum Hook. F. (IVHF), and its noxious adulterant, Iuicium lanceolatum A.C. Smith (ILACS). Three types of spectral matrix were submitted for classification with the use of the linear discriminant analysis (LDA) method. The data were pretreated with either the successive projections algorithm (SPA) or the discrete wavelet transform (DWT) method. The SPA method performed somewhat better, principally because it required less spectral features for its pretreatment model. Thus, NIR or MIR matrix as well as the combined NIR/MIR one, were pretreated by the SPA method, and then analysed by LDA. This approach enabled the prediction and classification of the IVHF, ILACS and mixed samples. The MIR spectral data produced somewhat better classification rates than the NIR data. However, the best results were obtained from the combined NIR/MIR data matrix with 95–100% correct classifications for calibration, validation and prediction. Principal component analysis (PCA) of the three types of spectral data supported the results obtained with the LDA classification method.
Resumo:
A novel near-infrared spectroscopy (NIRS) method has been researched and developed for the simultaneous analyses of the chemical components and associated properties of mint (Mentha haplocalyx Briq.) tea samples. The common analytes were: total polysaccharide content, total flavonoid content, total phenolic content, and total antioxidant activity. To resolve the NIRS data matrix for such analyses, least squares support vector machines was found to be the best chemometrics method for prediction, although it was closely followed by the radial basis function/partial least squares model. Interestingly, the commonly used partial least squares was unsatisfactory in this case. Additionally, principal component analysis and hierarchical cluster analysis were able to distinguish the mint samples according to their four geographical provinces of origin, and this was further facilitated with the use of the chemometrics classification methods-K-nearest neighbors, linear discriminant analysis, and partial least squares discriminant analysis. In general, given the potential savings with sampling and analysis time as well as with the costs of special analytical reagents required for the standard individual methods, NIRS offered a very attractive alternative for the simultaneous analysis of mint samples.
Resumo:
Frog species have been declining worldwide at unprecedented rates in the past decades. There are many reasons for this decline including pollution, habitat loss, and invasive species [1]. To preserve, protect, and restore frog biodiversity, it is important to monitor and assess frog species. In this paper, a novel method using image processing techniques for analyzing Australian frog vocalisations is proposed. An FFT is applied to audio data to produce a spectrogram. Then, acoustic events are detected and isolated into corresponding segments through image processing techniques applied to the spectrogram. For each segment, spectral peak tracks are extracted with selected seeds and a region growing technique is utilised to obtain the contour of each frog vocalisation. Based on spectral peak tracks and the contour of each frog vocalisation, six feature sets are extracted. Principal component analysis reduces each feature set down to six principal components which are tested for classification performance with a k-nearest neighbor classifier. This experiment tests the proposed method of classification on fourteen frog species which are geographically well distributed throughout Queensland, Australia. The experimental results show that the best average classification accuracy for the fourteen frog species can be up to 87%.