15 resultados para infrared spectroscopy,chemometrics,least squares support vector machines
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
The development of biopharmaceutical manufacturing processes presents critical constraints, with the major constraint being that living cells synthesize these molecules, presenting inherent behavior variability due to their high sensitivity to small fluctuations in the cultivation environment. To speed up the development process and to control this critical manufacturing step, it is relevant to develop high-throughput and in situ monitoring techniques, respectively. Here, high-throughput mid-infrared (MIR) spectral analysis of dehydrated cell pellets and in situ near-infrared (NIR) spectral analysis of the whole culture broth were compared to monitor plasmid production in recombinant Escherichia coil cultures. Good partial least squares (PLS) regression models were built, either based on MIR or NIR spectral data, yielding high coefficients of determination (R-2) and low predictive errors (root mean square error, or RMSE) to estimate host cell growth, plasmid production, carbon source consumption (glucose and glycerol), and by-product acetate production and consumption. The predictive errors for biomass, plasmid, glucose, glycerol, and acetate based on MIR data were 0.7 g/L, 9 mg/L, 0.3 g/L, 0.4 g/L, and 0.4 g/L, respectively, whereas for NIR data the predictive errors obtained were 0.4 g/L, 8 mg/L, 0.3 g/L, 0.2 g/L, and 0.4 g/L, respectively. The models obtained are robust as they are valid for cultivations conducted with different media compositions and with different cultivation strategies (batch and fed-batch). Besides being conducted in situ with a sterilized fiber optic probe, NIR spectroscopy allows building PLS models for estimating plasmid, glucose, and acetate that are as accurate as those obtained from the high-throughput MIR setup, and better models for estimating biomass and glycerol, yielding a decrease in 57 and 50% of the RMSE, respectively, compared to the MIR setup. However, MIR spectroscopy could be a valid alternative in the case of optimization protocols, due to possible space constraints or high costs associated with the use of multi-fiber optic probes for multi-bioreactors. In this case, MIR could be conducted in a high-throughput manner, analyzing hundreds of culture samples in a rapid and automatic mode.
Resumo:
Infrared spectroscopy, either in the near and mid (NIR/MIR) region of the spectra, has gained great acceptance in the industry for bioprocess monitoring according to Process Analytical Technology, due to its rapid, economic, high sensitivity mode of application and versatility. Due to the relevance of cyprosin (mostly for dairy industry), and as NIR and MIR spectroscopy presents specific characteristics that ultimately may complement each other, in the present work these techniques were compared to monitor and characterize by in situ and by at-line high-throughput analysis, respectively, recombinant cyprosin production by Saccharomyces cerevisiae. Partial least-square regression models, relating NIR and MIR-spectral features with biomass, cyprosin activity, specific activity, glucose, galactose, ethanol and acetate concentration were developed, all presenting, in general, high regression coefficients and low prediction errors. In the case of biomass and glucose slight better models were achieved by in situ NIR spectroscopic analysis, while for cyprosin activity and specific activity slight better models were achieved by at-line MIR spectroscopic analysis. Therefore both techniques enabled to monitor the highly dynamic cyprosin production bioprocess, promoting by this way more efficient platforms for the bioprocess optimization and control.
Resumo:
This work provides an assessment of layerwise mixed models using least-squares formulation for the coupled electromechanical static analysis of multilayered plates. In agreement with three-dimensional (3D) exact solutions, due to compatibility and equilibrium conditions at the layers interfaces, certain mechanical and electrical variables must fulfill interlaminar C-0 continuity, namely: displacements, in-plane strains, transverse stresses, electric potential, in-plane electric field components and transverse electric displacement (if no potential is imposed between layers). Hence, two layerwise mixed least-squares models are here investigated, with two different sets of chosen independent variables: Model A, developed earlier, fulfills a priori the interiaminar C-0 continuity of all those aforementioned variables, taken as independent variables; Model B, here newly developed, rather reduces the number of independent variables, but also fulfills a priori the interlaminar C-0 continuity of displacements, transverse stresses, electric potential and transverse electric displacement, taken as independent variables. The predictive capabilities of both models are assessed by comparison with 3D exact solutions, considering multilayered piezoelectric composite plates of different aspect ratios, under an applied transverse load or surface potential. It is shown that both models are able to predict an accurate quasi-3D description of the static electromechanical analysis of multilayered plates for all aspect ratios.
Resumo:
Chronic liver disease (CLD) is most of the time an asymptomatic, progressive, and ultimately potentially fatal disease. In this study, an automatic hierarchical procedure to stage CLD using ultrasound images, laboratory tests, and clinical records are described. The first stage of the proposed method, called clinical based classifier (CBC), discriminates healthy from pathologic conditions. When nonhealthy conditions are detected, the method refines the results in three exclusive pathologies in a hierarchical basis: 1) chronic hepatitis; 2) compensated cirrhosis; and 3) decompensated cirrhosis. The features used as well as the classifiers (Bayes, Parzen, support vector machine, and k-nearest neighbor) are optimally selected for each stage. A large multimodal feature database was specifically built for this study containing 30 chronic hepatitis cases, 34 compensated cirrhosis cases, and 36 decompensated cirrhosis cases, all validated after histopathologic analysis by liver biopsy. The CBC classification scheme outperformed the nonhierachical one against all scheme, achieving an overall accuracy of 98.67% for the normal detector, 87.45% for the chronic hepatitis detector, and 95.71% for the cirrhosis detector.
Resumo:
Chronic Liver Disease is a progressive, most of the time asymptomatic, and potentially fatal disease. In this paper, a semi-automatic procedure to stage this disease is proposed based on ultrasound liver images, clinical and laboratorial data. In the core of the algorithm two classifiers are used: a k nearest neighbor and a Support Vector Machine, with different kernels. The classifiers were trained with the proposed multi-modal feature set and the results obtained were compared with the laboratorial and clinical feature set. The results showed that using ultrasound based features, in association with laboratorial and clinical features, improve the classification accuracy. The support vector machine, polynomial kernel, outperformed the others classifiers in every class studied. For the Normal class we achieved 100% accuracy, for the chronic hepatitis with cirrhosis 73.08%, for compensated cirrhosis 59.26% and for decompensated cirrhosis 91.67%.
Resumo:
In this work the identification and diagnosis of various stages of chronic liver disease is addressed. The classification results of a support vector machine, a decision tree and a k-nearest neighbor classifier are compared. Ultrasound image intensity and textural features are jointly used with clinical and laboratorial data in the staging process. The classifiers training is performed by using a population of 97 patients at six different stages of chronic liver disease and a leave-one-out cross-validation strategy. The best results are obtained using the support vector machine with a radial-basis kernel, with 73.20% of overall accuracy. The good performance of the method is a promising indicator that it can be used, in a non invasive way, to provide reliable information about the chronic liver disease staging.
Resumo:
Trabalho de Projeto para obtenção do grau de Mestre em Engenharia Informática e de Computadores
Resumo:
In research on Silent Speech Interfaces (SSI), different sources of information (modalities) have been combined, aiming at obtaining better performance than the individual modalities. However, when combining these modalities, the dimensionality of the feature space rapidly increases, yielding the well-known "curse of dimensionality". As a consequence, in order to extract useful information from this data, one has to resort to feature selection (FS) techniques to lower the dimensionality of the learning space. In this paper, we assess the impact of FS techniques for silent speech data, in a dataset with 4 non-invasive and promising modalities, namely: video, depth, ultrasonic Doppler sensing, and surface electromyography. We consider two supervised (mutual information and Fisher's ratio) and two unsupervised (meanmedian and arithmetic mean geometric mean) FS filters. The evaluation was made by assessing the classification accuracy (word recognition error) of three well-known classifiers (knearest neighbors, support vector machines, and dynamic time warping). The key results of this study show that both unsupervised and supervised FS techniques improve on the classification accuracy on both individual and combined modalities. For instance, on the video component, we attain relative performance gains of 36.2% in error rates. FS is also useful as pre-processing for feature fusion. Copyright © 2014 ISCA.
Resumo:
Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.
Resumo:
Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.
Resumo:
Reporter genes are routinely used in every laboratory for molecular and cellular biology for studying heterologous gene expression and general cellular biological mechanisms, such as transfection processes. Although well characterized and broadly implemented, reporter genes present serious limitations, either by involving time-consuming procedures or by presenting possible side effects on the expression of the heterologous gene or even in the general cellular metabolism. Fourier transform mid-infrared (FT-MIR) spectroscopy was evaluated to simultaneously analyze in a rapid (minutes) and high-throughput mode (using 96-wells microplates), the transfection efficiency, and the effect of the transfection process on the host cell biochemical composition and metabolism. Semi-adherent HEK and adherent AGS cell lines, transfected with the plasmid pVAX-GFP using Lipofectamine, were used as model systems. Good partial least squares (PLS) models were built to estimate the transfection efficiency, either considering each cell line independently (R 2 ≥ 0.92; RMSECV ≤ 2 %) or simultaneously considering both cell lines (R 2 = 0.90; RMSECV = 2 %). Additionally, the effect of the transfection process on the HEK cell biochemical and metabolic features could be evaluated directly from the FT-IR spectra. Due to the high sensitivity of the technique, it was also possible to discriminate the effect of the transfection process from the transfection reagent on KEK cells, e.g., by the analysis of spectral biomarkers and biochemical and metabolic features. The present results are far beyond what any reporter gene assay or other specific probe can offer for these purposes.
Resumo:
BACKGROUNDWhile the pharmaceutical industry keeps an eye on plasmid DNA production for new generation gene therapies, real-time monitoring techniques for plasmid bioproduction are as yet unavailable. This work shows the possibility of in situ monitoring of plasmid production in Escherichia coli cultures using a near infrared (NIR) fiber optic probe. RESULTSPartial least squares (PLS) regression models based on the NIR spectra were developed for predicting bioprocess critical variables such as the concentrations of biomass, plasmid, carbon sources (glucose and glycerol) and acetate. In order to achieve robust models able to predict the performance of plasmid production processes, independently of the composition of the cultivation medium, cultivation strategy (batch versus fed-batch) and E. coli strain used, three strategies were adopted, using: (i) E. coliDH5 cultures conducted under different media compositions and culture strategies (batch and fed-batch); (ii) engineered E. coli strains, MG1655endArecApgi and MG1655endArecA, grown on the same medium and culture strategy; (iii) diverse E. coli strains, over batch and fed-batch cultivations and using different media compositions. PLS models showed high accuracy for predicting all variables in the three groups of cultures. CONCLUSIONNIR spectroscopy combined with PLS modeling provides a fast, inexpensive and contamination-free technique to accurately monitoring plasmid bioprocesses in real time, independently of the medium composition, cultivation strategy and the E. coli strain used.
Resumo:
Hyperspectral remote sensing exploits the electromagnetic scattering patterns of the different materials at specific wavelengths [2, 3]. Hyperspectral sensors have been developed to sample the scattered portion of the electromagnetic spectrum extending from the visible region through the near-infrared and mid-infrared, in hundreds of narrow contiguous bands [4, 5]. The number and variety of potential civilian and military applications of hyperspectral remote sensing is enormous [6, 7]. Very often, the resolution cell corresponding to a single pixel in an image contains several substances (endmembers) [4]. In this situation, the scattered energy is a mixing of the endmember spectra. A challenging task underlying many hyperspectral imagery applications is then decomposing a mixed pixel into a collection of reflectance spectra, called endmember signatures, and the corresponding abundance fractions [8–10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. Linear mixing model holds approximately when the mixing scale is macroscopic [13] and there is negligible interaction among distinct endmembers [3, 14]. If, however, the mixing scale is microscopic (or intimate mixtures) [15, 16] and the incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [17], the linear model is no longer accurate. Linear spectral unmixing has been intensively researched in the last years [9, 10, 12, 18–21]. It considers that a mixed pixel is a linear combination of endmember signatures weighted by the correspondent abundance fractions. Under this model, and assuming that the number of substances and their reflectance spectra are known, hyperspectral unmixing is a linear problem for which many solutions have been proposed (e.g., maximum likelihood estimation [8], spectral signature matching [22], spectral angle mapper [23], subspace projection methods [24,25], and constrained least squares [26]). In most cases, the number of substances and their reflectances are not known and, then, hyperspectral unmixing falls into the class of blind source separation problems [27]. Independent component analysis (ICA) has recently been proposed as a tool to blindly unmix hyperspectral data [28–31]. ICA is based on the assumption of mutually independent sources (abundance fractions), which is not the case of hyperspectral data, since the sum of abundance fractions is constant, implying statistical dependence among them. This dependence compromises ICA applicability to hyperspectral images as shown in Refs. [21, 32]. In fact, ICA finds the endmember signatures by multiplying the spectral vectors with an unmixing matrix, which minimizes the mutual information among sources. If sources are independent, ICA provides the correct unmixing, since the minimum of the mutual information is obtained only when sources are independent. This is no longer true for dependent abundance fractions. Nevertheless, some endmembers may be approximately unmixed. These aspects are addressed in Ref. [33]. Under the linear mixing model, the observations from a scene are in a simplex whose vertices correspond to the endmembers. Several approaches [34–36] have exploited this geometric feature of hyperspectral mixtures [35]. Minimum volume transform (MVT) algorithm [36] determines the simplex of minimum volume containing the data. The method presented in Ref. [37] is also of MVT type but, by introducing the notion of bundles, it takes into account the endmember variability usually present in hyperspectral mixtures. The MVT type approaches are complex from the computational point of view. Usually, these algorithms find in the first place the convex hull defined by the observed data and then fit a minimum volume simplex to it. For example, the gift wrapping algorithm [38] computes the convex hull of n data points in a d-dimensional space with a computational complexity of O(nbd=2cþ1), where bxc is the highest integer lower or equal than x and n is the number of samples. The complexity of the method presented in Ref. [37] is even higher, since the temperature of the simulated annealing algorithm used shall follow a log( ) law [39] to assure convergence (in probability) to the desired solution. Aiming at a lower computational complexity, some algorithms such as the pixel purity index (PPI) [35] and the N-FINDR [40] still find the minimum volume simplex containing the data cloud, but they assume the presence of at least one pure pixel of each endmember in the data. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. PPI algorithm uses the minimum noise fraction (MNF) [41] as a preprocessing step to reduce dimensionality and to improve the signal-to-noise ratio (SNR). The algorithm then projects every spectral vector onto skewers (large number of random vectors) [35, 42,43]. The points corresponding to extremes, for each skewer direction, are stored. A cumulative account records the number of times each pixel (i.e., a given spectral vector) is found to be an extreme. The pixels with the highest scores are the purest ones. N-FINDR algorithm [40] is based on the fact that in p spectral dimensions, the p-volume defined by a simplex formed by the purest pixels is larger than any other volume defined by any other combination of pixels. This algorithm finds the set of pixels defining the largest volume by inflating a simplex inside the data. ORA SIS [44, 45] is a hyperspectral framework developed by the U.S. Naval Research Laboratory consisting of several algorithms organized in six modules: exemplar selector, adaptative learner, demixer, knowledge base or spectral library, and spatial postrocessor. The first step consists in flat-fielding the spectra. Next, the exemplar selection module is used to select spectral vectors that best represent the smaller convex cone containing the data. The other pixels are rejected when the spectral angle distance (SAD) is less than a given thresh old. The procedure finds the basis for a subspace of a lower dimension using a modified Gram–Schmidt orthogonalizati on. The selected vectors are then projected onto this subspace and a simplex is found by an MV T pro cess. ORA SIS is oriented to real-time target detection from uncrewed air vehicles using hyperspectral data [46]. In this chapter we develop a new algorithm to unmix linear mixtures of endmember spectra. First, the algorithm determines the number of endmembers and the signal subspace using a newly developed concept [47, 48]. Second, the algorithm extracts the most pure pixels present in the data. Unlike other methods, this algorithm is completely automatic and unsupervised. To estimate the number of endmembers and the signal subspace in hyperspectral linear mixtures, the proposed scheme begins by estimating sign al and noise correlation matrices. The latter is based on multiple regression theory. The signal subspace is then identified by selectin g the set of signal eigenvalue s that best represents the data, in the least-square sense [48,49 ], we note, however, that VCA works with projected and with unprojected data. The extraction of the end members exploits two facts: (1) the endmembers are the vertices of a simplex and (2) the affine transformation of a simplex is also a simplex. As PPI and N-FIND R algorithms, VCA also assumes the presence of pure pixels in the data. The algorithm iteratively projects data on to a direction orthogonal to the subspace spanned by the endmembers already determined. The new end member signature corresponds to the extreme of the projection. The algorithm iterates until all end members are exhausted. VCA performs much better than PPI and better than or comparable to N-FI NDR; yet it has a computational complexity between on e and two orders of magnitude lower than N-FINDR. The chapter is structure d as follows. Section 19.2 describes the fundamentals of the proposed method. Section 19.3 and Section 19.4 evaluate the proposed algorithm using simulated and real data, respectively. Section 19.5 presents some concluding remarks.
Resumo:
Bifunctional Pt-HMOR catalysts were prepared by incipient wetness impregnation of various desilicated MOR obtained by alkaline treatment using NaOH concentrations ranging from 0.1 to 0.5 M. The zeolite structural changes upon modification were investigated by several techniques including powder X-ray diffraction,Al-27 and Si-29 MAS-NMR spectroscopy, N-2 adsorption, pyridine adsorption followed by infrared spectroscopy and the catalytic model reaction of m-xylene transformation. For low alkaline concentration the zeolite acidity is preserved, along with a slight increase of the volume correspondent to the larger micropores due to the removal of extra-framework debris already existent at the parent zeolite. At higher NaOH concentrations there is a significant loss of crystalinity and acidity as well as the formation of mesoporosity. The characterization of the metal function shows similar patterns for Pt-HMOR and Pt-M/0.1 samples, with Pt particles located mainly inside the inner porosity. In contrast, large Pt particles become visible at the intercrystalline mesoporosity of MOR crystals developed during the desilication treatments at severe alkaline conditions. The catalytic results obtained for n-hexane hydroisomerization showed an improved selectivity for dibranched over monobranched isomers for Pt-M/0.1 sample, likely due to the preservation of the support acidity and the slight enlargement of the micropores. This work is a new example in which the mesoporous development does not improve the catalytic efficiency of the zeolites, whereas mild alkaline desilication might be considered as an effective solution to produce customized catalysts with enhanced performance for a given application. (C) 2014 Elsevier B.V. All rights reserved.
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia Química e Biológica