3 resultados para Multivariate data analysis
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
In this paper we present a methodology which enables the graphical representation, in a bi-dimensional Euclidean space, of atmospheric pollutants emissions in European countries. This approach relies on the use of Multidimensional Unfolding (MDU), an exploratory multivariate data analysis technique. This technique illustrates both the relationships between the emitted gases and the gases and their geographical origins. The main contribution of this work concerns the evaluation of MDU solutions. We use simulated data to define thresholds for the model fitting measures, allowing the MDU output quality evaluation. The quality assessment of the model adjustment is thus carried out as a step before interpretation of the gas types and geographical origins results. The MDU maps analysis generates useful insights, with an immediate substantive result and enables the formulation of hypotheses for further analysis and modeling.
Resumo:
Human mesenchymal stem/stromal cells (MSCs) have received considerable attention in the field of cell-based therapies due to their high differentiation potential and ability to modulate immune responses. However, since these cells can only be isolated in very low quantities, successful realization of these therapies requires MSCs ex-vivo expansion to achieve relevant cell doses. The metabolic activity is one of the parameters often monitored during MSCs cultivation by using expensive multi-analytical methods, some of them time-consuming. The present work evaluates the use of mid-infrared (MIR) spectroscopy, through rapid and economic high-throughput analyses associated to multivariate data analysis, to monitor three different MSCs cultivation runs conducted in spinner flasks, under xeno-free culture conditions, which differ in the type of microcarriers used and the culture feeding strategy applied. After evaluating diverse spectral preprocessing techniques, the optimized partial least square (PLS) regression models based on the MIR spectra to estimate the glucose, lactate and ammonia concentrations yielded high coefficients of determination (R2 ≥ 0.98, ≥0.98, and ≥0.94, respectively) and low prediction errors (RMSECV ≤ 4.7%, ≤4.4% and ≤5.7%, respectively). Besides PLS models valid for specific expansion protocols, a robust model simultaneously valid for the three processes was also built for predicting glucose, lactate and ammonia, yielding a R2 of 0.95, 0.97 and 0.86, and a RMSECV of 0.33, 0.57, and 0.09 mM, respectively. Therefore, MIR spectroscopy combined with multivariate data analysis represents a promising tool for both optimization and control of MSCs expansion processes.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.