141 resultados para LEAST-SQUARES METHODS
Resumo:
Spatial organisation of proteins according to their function plays an important role in the specificity of their molecular interactions. Emerging proteomics methods seek to assign proteins to sub-cellular locations by partial separation of organelles and computational analysis of protein abundance distributions among partially separated fractions. Such methods permit simultaneous analysis of unpurified organelles and promise proteome-wide localisation in scenarios wherein perturbation may prompt dynamic re-distribution. Resolving organelles that display similar behavior during a protocol designed to provide partial enrichment represents a possible shortcoming. We employ the Localisation of Organelle Proteins by Isotope Tagging (LOPIT) organelle proteomics platform to demonstrate that combining information from distinct separations of the same material can improve organelle resolution and assignment of proteins to sub-cellular locations. Two previously published experiments, whose distinct gradients are alone unable to fully resolve six known protein-organelle groupings, are subjected to a rigorous analysis to assess protein-organelle association via a contemporary pattern recognition algorithm. Upon straightforward combination of single-gradient data, we observe significant improvement in protein-organelle association via both a non-linear support vector machine algorithm and partial least-squares discriminant analysis. The outcome yields suggestions for further improvements to present organelle proteomics platforms, and a robust analytical methodology via which to associate proteins with sub-cellular organelles.
Resumo:
Near-infrared spectroscopy (NIRS) calibrations were developed for the discrimination of Chinese hawthorn (Crataegus pinnatifida Bge. var. major) fruit from three geographical regions as well as for the estimation of the total sugar, total acid, total phenolic content, and total antioxidant activity. Principal component analysis (PCA) was used for the discrimination of the fruit on the basis of their geographical origin. Three pattern recognition methods, linear discriminant analysis, partial least-squares-discriminant analysis, and back-propagation artificial neural networks, were applied to classify and compare these samples. Furthermore, three multivariate calibration models based on the first derivative NIR spectroscopy, partial least-squares regression, back-propagation artificial neural networks, and least-squares-support vector machines, were constructed for quantitative analysis of the four analytes, total sugar, total acid, total phenolic content, and total antioxidant activity, and validated by prediction data sets.
Resumo:
This study examined the prevalence of depressive symptoms and elucidated the causal pathway between socioeconomic status and depression in a community in the central region of Vietnam. The study used a combination of qualitative and quantitative research methods. Indepth interviews were applied with two local psychiatric experts and ten residents for qualitative research. A cross sectional survey with structured interview technique was implemented with 100 residents in the pilot quantitative survey. The Center for Epidemiological Studies-Depression Scale (CES-D) was applied to valuate depressive symptoms ( CES-D score over 21) and depression ( CESD core over 25). Ordinary Least Squares Regression following the three steps of Baron and Kenny’s framework was employed for testing mediation models. There was a strong social gradient with respect to depressive symptoms. People with higher education levels reported fewer depressive symptoms (lower CES-D scores). Incomes were also inversely associated with depressive symptoms, but only the ones at the bottom of the quartile income. Low level and unstable individuals in terms of occupation were associated with higher depressive symptoms compared with the highest occupation group. Employment status showed the strongest gradient with respect to its impact on the burden of depressive symptoms compared with other indicators of SES. Findings from this pilot study suggest a pattern on the negative association between socioeconomic status and depression in Vietnamese adults.
Resumo:
Due to knowledge gaps in relation to urban stormwater quality processes, an in-depth understanding of model uncertainty can enhance decision making. Uncertainty in stormwater quality models can originate from a range of sources such as the complexity of urban rainfall-runoff-stormwater pollutant processes and the paucity of observed data. Unfortunately, studies relating to epistemic uncertainty, which arises from the simplification of reality are limited and often deemed mostly unquantifiable. This paper presents a statistical modelling framework for ascertaining epistemic uncertainty associated with pollutant wash-off under a regression modelling paradigm using Ordinary Least Squares Regression (OLSR) and Weighted Least Squares Regression (WLSR) methods with a Bayesian/Gibbs sampling statistical approach. The study results confirmed that WLSR assuming probability distributed data provides more realistic uncertainty estimates of the observed and predicted wash-off values compared to OLSR modelling. It was also noted that the Bayesian/Gibbs sampling approach is superior compared to the most commonly adopted classical statistical and deterministic approaches commonly used in water quality modelling. The study outcomes confirmed that the predication error associated with wash-off replication is relatively higher due to limited data availability. The uncertainty analysis also highlighted the variability of the wash-off modelling coefficient k as a function of complex physical processes, which is primarily influenced by surface characteristics and rainfall intensity.
Resumo:
Samples of sea water contain phytoplankton taxa in varying amounts, and marine scientists are interested in the relative abundance of each taxa. Their relative biomass can be ascertained indirectly by measuring the quantity of various pigments using high performance liquid chromatography. However, the conversion from pigment to taxa is mathematically non trivial as it is a positive matrix factorisation problem where both matrices are unknown beyond the level of initial estimates. The prior information on the pigment to taxa conversion matrix is used to give the problem a unique solution. An iteration of two non-negative least squares algorithms gives satisfactory results. Some sample analysis of data indicates prospects for this type of analysis. An alternative more computationally intensive approach using Bayesian methods is discussed.
Resumo:
Person re-identification is particularly challenging due to significant appearance changes across separate camera views. In order to re-identify people, a representative human signature should effectively handle differences in illumination, pose and camera parameters. While general appearance-based methods are modelled in Euclidean spaces, it has been argued that some applications in image and video analysis are better modelled via non-Euclidean manifold geometry. To this end, recent approaches represent images as covariance matrices, and interpret such matrices as points on Riemannian manifolds. As direct classification on such manifolds can be difficult, in this paper we propose to represent each manifold point as a vector of similarities to class representers, via a recently introduced form of Bregman matrix divergence known as the Stein divergence. This is followed by using a discriminative mapping of similarity vectors for final classification. The use of similarity vectors is in contrast to the traditional approach of embedding manifolds into tangent spaces, which can suffer from representing the manifold structure inaccurately. Comparative evaluations on benchmark ETHZ and iLIDS datasets for the person re-identification task show that the proposed approach obtains better performance than recent techniques such as Histogram Plus Epitome, Partial Least Squares, and Symmetry-Driven Accumulation of Local Features.
Resumo:
This paper presents new schemes for recursive estimation of the state transition probabilities for hidden Markov models (HMM's) via extended least squares (ELS) and recursive state prediction error (RSPE) methods. Local convergence analysis for the proposed RSPE algorithm is shown using the ordinary differential equation (ODE) approach developed for the more familiar recursive output prediction error (RPE) methods. The presented scheme converges and is relatively well conditioned compared with the ...
Resumo:
In this paper new online adaptive hidden Markov model (HMM) state estimation schemes are developed, based on extended least squares (ELS) concepts and recursive prediction error (RPE) methods. The best of the new schemes exploit the idempotent nature of Markov chains and work with a least squares prediction error index, using a posterior estimates, more suited to Markov models then traditionally used in identification of linear systems.
Resumo:
The ambiguity acceptance test is an important quality control procedure in high precision GNSS data processing. Although the ambiguity acceptance test methods have been extensively investigated, its threshold determine method is still not well understood. Currently, the threshold is determined with the empirical approach or the fixed failure rate (FF-) approach. The empirical approach is simple but lacking in theoretical basis, while the FF-approach is theoretical rigorous but computationally demanding. Hence, the key of the threshold determination problem is how to efficiently determine the threshold in a reasonable way. In this study, a new threshold determination method named threshold function method is proposed to reduce the complexity of the FF-approach. The threshold function method simplifies the FF-approach by a modeling procedure and an approximation procedure. The modeling procedure uses a rational function model to describe the relationship between the FF-difference test threshold and the integer least-squares (ILS) success rate. The approximation procedure replaces the ILS success rate with the easy-to-calculate integer bootstrapping (IB) success rate. Corresponding modeling error and approximation error are analysed with simulation data to avoid nuisance biases and unrealistic stochastic model impact. The results indicate the proposed method can greatly simplify the FF-approach without introducing significant modeling error. The threshold function method makes the fixed failure rate threshold determination method feasible for real-time applications.
Resumo:
Ambiguity validation as an important procedure of integer ambiguity resolution is to test the correctness of the fixed integer ambiguity of phase measurements before being used for positioning computation. Most existing investigations on ambiguity validation focus on test statistic. How to determine the threshold more reasonably is less understood, although it is one of the most important topics in ambiguity validation. Currently, there are two threshold determination methods in the ambiguity validation procedure: the empirical approach and the fixed failure rate (FF-) approach. The empirical approach is simple but lacks of theoretical basis. The fixed failure rate approach has a rigorous probability theory basis, but it employs a more complicated procedure. This paper focuses on how to determine the threshold easily and reasonably. Both FF-ratio test and FF-difference test are investigated in this research and the extensive simulation results show that the FF-difference test can achieve comparable or even better performance than the well-known FF-ratio test. Another benefit of adopting the FF-difference test is that its threshold can be expressed as a function of integer least-squares (ILS) success rate with specified failure rate tolerance. Thus, a new threshold determination method named threshold function for the FF-difference test is proposed. The threshold function method preserves the fixed failure rate characteristic and is also easy-to-apply. The performance of the threshold function is validated with simulated data. The validation results show that with the threshold function method, the impact of the modelling error on the failure rate is less than 0.08%. Overall, the threshold function for the FF-difference test is a very promising threshold validation method and it makes the FF-approach applicable for the real-time GNSS positioning applications.
Resumo:
Samples of Forsythia suspensa from raw (Laoqiao) and ripe (Qingqiao) fruit were analyzed with the use of HPLC-DAD and the EIS-MS techniques. Seventeen peaks were detected, and of these, twelve were identified. Most were related to the glucopyranoside molecular fragment. Samples collected from three geographical areas (Shanxi, Henan and Shandong Provinces), were discriminated with the use of hierarchical clustering analysis (HCA), discriminant analysis (DA), and principal component analysis (PCA) models, but only PCA was able to provide further information about the relationships between objects and loadings; eight peaks were related to the provinces of sample origin. The supervised classification models-K-nearest neighbor (KNN), least squares support vector machines (LS-SVM), and counter propagation artificial neural network (CP-ANN) methods, indicated successful classification but KNN produced 100% classification rate. Thus, the fruit were discriminated on the basis of their places of origin.
Resumo:
A novel differential pulse voltammetry (DPV) method was developed for the simultaneous analysis of herbicides in water. A mixture of four herbicides, atrazine, simazine, propazine and terbuthylazine was analyzed simultaneously and the complex, overlapping DPV voltammograms were resolved by several chemometrics methods such as partial least squares (PLS), principal component regression (PCR) and principal component–artificial networks (PC–ANN). The complex profiles of the voltammograms collected from a synthetic set of samples were best resolved with the use of the PC–ANN method, and the best predictions of the concentrations of the analytes were obtained with the PC-ANN model (%RPET = 6.1 and average %Recovery = 99.0). The new method was also used for analysis of real samples, and the obtained results were compared well with those from the GC-MS technique. Such conclusions suggest that the novel method is a viable alternative to the other commonly used methods such as GC, HPLC and GC-MS.
Resumo:
This review is focused on the impact of chemometrics for resolving data sets collected from investigations of the interactions of small molecules with biopolymers. These samples have been analyzed with various instrumental techniques, such as fluorescence, ultraviolet–visible spectroscopy, and voltammetry. The impact of two powerful and demonstrably useful multivariate methods for resolution of complex data—multivariate curve resolution–alternating least squares (MCR–ALS) and parallel factor analysis (PARAFAC)—is highlighted through analysis of applications involving the interactions of small molecules with the biopolymers, serum albumin, and deoxyribonucleic acid. The outcomes illustrated that significant information extracted by the chemometric methods was unattainable by simple, univariate data analysis. In addition, although the techniques used to collect data were confined to ultraviolet–visible spectroscopy, fluorescence spectroscopy, circular dichroism, and voltammetry, data profiles produced by other techniques may also be processed. Topics considered including binding sites and modes, cooperative and competitive small molecule binding, kinetics, and thermodynamics of ligand binding, and the folding and unfolding of biopolymers. Applications of the MCR–ALS and PARAFAC methods reviewed were primarily published between 2008 and 2013.
Resumo:
Flos Chrysanthemum is a generic name for a particular group of edible plants, which also have medicinal properties. There are, in fact, twenty to thirty different cultivars, which are commonly used in beverages and for medicinal purposes. In this work, four Flos Chrysanthemum cultivars, Hangju, Taiju, Gongju, and Boju, were collected and chromatographic fingerprints were used to distinguish and assess these cultivars for quality control purposes. Chromatography fingerprints contain chemical information but also often have baseline drifts and peak shifts, which complicate data processing, and adaptive iteratively reweighted, penalized least squares, and correlation optimized warping were applied to correct the fingerprint peaks. The adjusted data were submitted to unsupervised and supervised pattern recognition methods. Principal component analysis was used to qualitatively differentiate the Flos Chrysanthemum cultivars. Partial least squares, continuum power regression, and K-nearest neighbors were used to predict the unknown samples. Finally, the elliptic joint confidence region method was used to evaluate the prediction ability of these models. The partial least squares and continuum power regression methods were shown to best represent the experimental results.
Resumo:
A combined data matrix consisting of high performance liquid chromatography–diode array detector (HPLC–DAD) and inductively coupled plasma-mass spectrometry (ICP-MS) measurements of samples from the plant roots of the Cortex moutan (CM), produced much better classification and prediction results in comparison with those obtained from either of the individual data sets. The HPLC peaks (organic components) of the CM samples, and the ICP-MS measurements (trace metal elements) were investigated with the use of principal component analysis (PCA) and the linear discriminant analysis (LDA) methods of data analysis; essentially, qualitative results suggested that discrimination of the CM samples from three different provinces was possible with the combined matrix producing best results. Another three methods, K-nearest neighbor (KNN), back-propagation artificial neural network (BP-ANN) and least squares support vector machines (LS-SVM) were applied for the classification and prediction of the samples. Again, the combined data matrix analyzed by the KNN method produced best results (100% correct; prediction set data). Additionally, multiple linear regression (MLR) was utilized to explore any relationship between the organic constituents and the metal elements of the CM samples; the extracted linear regression equations showed that the essential metals as well as some metallic pollutants were related to the organic compounds on the basis of their concentrations