918 resultados para least squares method
Resumo:
Hyperspectral remote sensing exploits the electromagnetic scattering patterns of the different materials at specific wavelengths [2, 3]. Hyperspectral sensors have been developed to sample the scattered portion of the electromagnetic spectrum extending from the visible region through the near-infrared and mid-infrared, in hundreds of narrow contiguous bands [4, 5]. The number and variety of potential civilian and military applications of hyperspectral remote sensing is enormous [6, 7]. Very often, the resolution cell corresponding to a single pixel in an image contains several substances (endmembers) [4]. In this situation, the scattered energy is a mixing of the endmember spectra. A challenging task underlying many hyperspectral imagery applications is then decomposing a mixed pixel into a collection of reflectance spectra, called endmember signatures, and the corresponding abundance fractions [8–10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. Linear mixing model holds approximately when the mixing scale is macroscopic [13] and there is negligible interaction among distinct endmembers [3, 14]. If, however, the mixing scale is microscopic (or intimate mixtures) [15, 16] and the incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [17], the linear model is no longer accurate. Linear spectral unmixing has been intensively researched in the last years [9, 10, 12, 18–21]. It considers that a mixed pixel is a linear combination of endmember signatures weighted by the correspondent abundance fractions. Under this model, and assuming that the number of substances and their reflectance spectra are known, hyperspectral unmixing is a linear problem for which many solutions have been proposed (e.g., maximum likelihood estimation [8], spectral signature matching [22], spectral angle mapper [23], subspace projection methods [24,25], and constrained least squares [26]). In most cases, the number of substances and their reflectances are not known and, then, hyperspectral unmixing falls into the class of blind source separation problems [27]. Independent component analysis (ICA) has recently been proposed as a tool to blindly unmix hyperspectral data [28–31]. ICA is based on the assumption of mutually independent sources (abundance fractions), which is not the case of hyperspectral data, since the sum of abundance fractions is constant, implying statistical dependence among them. This dependence compromises ICA applicability to hyperspectral images as shown in Refs. [21, 32]. In fact, ICA finds the endmember signatures by multiplying the spectral vectors with an unmixing matrix, which minimizes the mutual information among sources. If sources are independent, ICA provides the correct unmixing, since the minimum of the mutual information is obtained only when sources are independent. This is no longer true for dependent abundance fractions. Nevertheless, some endmembers may be approximately unmixed. These aspects are addressed in Ref. [33]. Under the linear mixing model, the observations from a scene are in a simplex whose vertices correspond to the endmembers. Several approaches [34–36] have exploited this geometric feature of hyperspectral mixtures [35]. Minimum volume transform (MVT) algorithm [36] determines the simplex of minimum volume containing the data. The method presented in Ref. [37] is also of MVT type but, by introducing the notion of bundles, it takes into account the endmember variability usually present in hyperspectral mixtures. The MVT type approaches are complex from the computational point of view. Usually, these algorithms find in the first place the convex hull defined by the observed data and then fit a minimum volume simplex to it. For example, the gift wrapping algorithm [38] computes the convex hull of n data points in a d-dimensional space with a computational complexity of O(nbd=2cþ1), where bxc is the highest integer lower or equal than x and n is the number of samples. The complexity of the method presented in Ref. [37] is even higher, since the temperature of the simulated annealing algorithm used shall follow a log( ) law [39] to assure convergence (in probability) to the desired solution. Aiming at a lower computational complexity, some algorithms such as the pixel purity index (PPI) [35] and the N-FINDR [40] still find the minimum volume simplex containing the data cloud, but they assume the presence of at least one pure pixel of each endmember in the data. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. PPI algorithm uses the minimum noise fraction (MNF) [41] as a preprocessing step to reduce dimensionality and to improve the signal-to-noise ratio (SNR). The algorithm then projects every spectral vector onto skewers (large number of random vectors) [35, 42,43]. The points corresponding to extremes, for each skewer direction, are stored. A cumulative account records the number of times each pixel (i.e., a given spectral vector) is found to be an extreme. The pixels with the highest scores are the purest ones. N-FINDR algorithm [40] is based on the fact that in p spectral dimensions, the p-volume defined by a simplex formed by the purest pixels is larger than any other volume defined by any other combination of pixels. This algorithm finds the set of pixels defining the largest volume by inflating a simplex inside the data. ORA SIS [44, 45] is a hyperspectral framework developed by the U.S. Naval Research Laboratory consisting of several algorithms organized in six modules: exemplar selector, adaptative learner, demixer, knowledge base or spectral library, and spatial postrocessor. The first step consists in flat-fielding the spectra. Next, the exemplar selection module is used to select spectral vectors that best represent the smaller convex cone containing the data. The other pixels are rejected when the spectral angle distance (SAD) is less than a given thresh old. The procedure finds the basis for a subspace of a lower dimension using a modified Gram–Schmidt orthogonalizati on. The selected vectors are then projected onto this subspace and a simplex is found by an MV T pro cess. ORA SIS is oriented to real-time target detection from uncrewed air vehicles using hyperspectral data [46]. In this chapter we develop a new algorithm to unmix linear mixtures of endmember spectra. First, the algorithm determines the number of endmembers and the signal subspace using a newly developed concept [47, 48]. Second, the algorithm extracts the most pure pixels present in the data. Unlike other methods, this algorithm is completely automatic and unsupervised. To estimate the number of endmembers and the signal subspace in hyperspectral linear mixtures, the proposed scheme begins by estimating sign al and noise correlation matrices. The latter is based on multiple regression theory. The signal subspace is then identified by selectin g the set of signal eigenvalue s that best represents the data, in the least-square sense [48,49 ], we note, however, that VCA works with projected and with unprojected data. The extraction of the end members exploits two facts: (1) the endmembers are the vertices of a simplex and (2) the affine transformation of a simplex is also a simplex. As PPI and N-FIND R algorithms, VCA also assumes the presence of pure pixels in the data. The algorithm iteratively projects data on to a direction orthogonal to the subspace spanned by the endmembers already determined. The new end member signature corresponds to the extreme of the projection. The algorithm iterates until all end members are exhausted. VCA performs much better than PPI and better than or comparable to N-FI NDR; yet it has a computational complexity between on e and two orders of magnitude lower than N-FINDR. The chapter is structure d as follows. Section 19.2 describes the fundamentals of the proposed method. Section 19.3 and Section 19.4 evaluate the proposed algorithm using simulated and real data, respectively. Section 19.5 presents some concluding remarks.
Resumo:
Submitted in partial fulfillment for the Requirements for the Degree of PhD in Mathematics, in the Speciality of Statistics in the Faculdade de Ciências e Tecnologia
Resumo:
Durante as últimas décadas observou-se o crescimento da importância das avaliações fornecidas pelas agências de rating, sendo este um fator decisivo na tomada de decisão dos investidores. Também os emitentes de dívida são largamente afetados pelas alterações das classificações atribuídas por estas agências. Esta investigação pretende, por um lado, compreender se estas agências têm poder para conseguirem influenciar a evolução da dívida pública e qual o seu papel no mercado financeiro. Por outro, pretende compreender quais os fatores determinantes da dívida pública portuguesa, bem como a realização de uma análise por percentis com o objetivo de lhe atribuir um rating. Para a análise dos fatores que poderão influenciar a dívida pública, a metodologia utilizada é uma regressão linear múltipla estimada através do Método dos Mínimos Quadrados (Ordinary Least Squares – OLS), em que num cenário inicial era composta por onze variáveis independentes, sendo a dívida pública a variável dependente, para um período compreendido entre 1996 e 2013. Foram realizados vários testes ao modelo inicial, com o objetivo de encontrar um modelo que fosse o mais explicativo possível. Conseguimos ainda identificar uma relação inversa entre o rating atribuído por estas agências e a evolução da dívida pública, no sentido em que para períodos em que o rating desce, o crescimento da dívida é mais acentuado. Não nos foi, no entanto, possível atribuir um rating à dívida pública através de uma análise de percentis.
Resumo:
In this work, kriging with covariates is used to model and map the spatial distribution of salinity measurements gathered by an autonomous underwater vehicle in a sea outfall monitoring campaign aiming to distinguish the effluent plume from the receiving waters and characterize its spatial variability in the vicinity of the discharge. Four different geostatistical linear models for salinity were assumed, where the distance to diffuser, the west-east positioning, and the south-north positioning were used as covariates. Sample variograms were fitted by the Mat`ern models using weighted least squares and maximum likelihood estimation methods as a way to detect eventual discrepancies. Typically, the maximum likelihood method estimated very low ranges which have limited the kriging process. So, at least for these data sets, weighted least squares showed to be the most appropriate estimation method for variogram fitting. The kriged maps show clearly the spatial variation of salinity, and it is possible to identify the effluent plume in the area studied. The results obtained show some guidelines for sewage monitoring if a geostatistical analysis of the data is in mind. It is important to treat properly the existence of anomalous values and to adopt a sampling strategy that includes transects parallel and perpendicular to the effluent dispersion.
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
Madine Darby Canine Kidney (MDCK) cell lines have been extensively evaluated for their potential as host cells for influenza vaccine production. Recent studies allowed the cultivation of these cells in a fully defined medium and in suspension. However, reaching high cell densities in animal cell cultures still remains a challenge. To address this shortcoming, a combined methodology allied with knowledge from systems biology was reported to study the impact of the cell environment on the flux distribution. An optimization of the medium composition was proposed for both a batch and a continuous system in order to reach higher cell densities. To obtain insight into the metabolic activity of these cells, a detailed metabolic model previously developed by Wahl A. et. al was used. The experimental data of four cultivations of MDCK suspension cells, grown under different conditions and used in this work came from the Max Planck Institute, Magdeburg, Germany. Classical metabolic flux analysis (MFA) was used to estimate the intracellular flux distribution of each cultivation and then combined with partial least squares (PLS) method to establish a link between the estimated metabolic state and the cell environment. The validation of the MFA model was made and its consistency checked. The resulted PLS model explained almost 70% of the variance present in the flux distribution. The medium optimization for the continuous system and for the batch system resulted in higher biomass growth rates than the ones obtained experimentally, 0.034 h-1 and 0.030 h-1, respectively, thus reducing in almost 10 hours the duplication time. Additionally, the optimal medium obtained for the continuous system almost did not consider pyruvate. Overall the proposed methodology seems to be effective and both proposed medium optimizations seem to be promising to reach high cell densities.
Resumo:
Different oil-containing substrates, namely, used cooking oil (UCO), fatty acids-byproduct from biodiesel production (FAB) and olive oil deodorizer distillate (OODD) were tested as inexpensive carbon sources for the production of polyhydroxyalkanoates (PHA) using twelve bacterial strains, in batch experiments. The OODD and FAB were exploited for the first time as alternative substrates for PHA production. Among the tested bacterial strains, Cupriavidus necator and Pseudomonas resinovorans exhibited the most promising results, producing poly-3-hydroxybutyrate, P(3HB), form UCO and OODD and mcl-PHA mainly composed of 3-hydroxyoctanoate (3HO) and 3-hydroxydecanoate (3HD) monomers from OODD, respectively. Afterwards, these bacterial strains were cultivated in bioreactor. C. necator were cultivated in bioreactor using UCO as carbon source. Different feeding strategies were tested for the bioreactor cultivation of C. necator, namely, batch, exponential feeding and DO-stat mode. The highest overall PHA productivity (12.6±0.78 g L-1 day-1) was obtained using DO-stat mode. Apparently, the different feeding regimes had no impact on polymer thermal properties. However, differences in polymer‟s molecular mass distribution were observed. C. necator was also tested in batch and fed-batch modes using a different type of oil-containing substrate, extracted from spent coffee grounds (SCG) by super critical carbon dioxide (sc-CO2). Under fed-batch mode (DO-stat), the overall PHA productivity were 4.7 g L-1 day-1 with a storage yield of 0.77 g g-1. Results showed that SCG can be a bioresource for production of PHA with interesting properties. Furthermore, P. resinovorans was cultivated using OODD as substrate in bioreactor under fed-batch mode (pulse feeding regime). The polymer was highly amorphous, as shown by its low crystallinity of 6±0.2%, with low melting and glass transition temperatures of 36±1.2 and -16±0.8 ºC, respectively. Due to its sticky behavior at room temperature, adhesiveness and mechanical properties were also studied. Its shear bond strength for wood (67±9.4 kPa) and glass (65±7.3 kPa) suggests it may be used for the development of biobased glues. Bioreactor operation and monitoring with oil-containing substrates is very challenging, since this substrate is water immiscible. Thus, near-infrared spectroscopy (NIR) was implemented for online monitoring of the C. necator cultivation with UCO, using a transflectance probe. Partial least squares (PLS) regression was applied to relate NIR spectra with biomass, UCO and PHA concentrations in the broth. The NIR predictions were compared with values obtained by offline reference methods. Prediction errors to these parameters were 1.18 g L-1, 2.37 g L-1 and 1.58 g L-1 for biomass, UCO and PHA, respectively, which indicates the suitability of the NIR spectroscopy method for online monitoring and as a method to assist bioreactor control. UCO and OODD are low cost substrates with potential to be used in PHA batch and fed-batch production. The use of NIR in this bioprocess also opened an opportunity for optimization and control of PHA production process.
Resumo:
The photometric determination of ascorbic acid with the "E. E. L. portable colorimeter" can be carried" out rapid and conveniently using either 3% HPO3 or 0,4% (COOH) 2 as protective agent. The standards would contain from 2 to 20 micrograms of ascorbic acid per ml of metaphosphoric or oxalic acid solutions. We mix 10 ml of these solutions with 3 ml of the adequate citrate buffer solutions, and we pipet 5 ml of the resulting mixture to a matched test tube containing 5 ml of sodium - 2,6 - dichlorobenzenoneindophenol (80 mg per liter); then we shake well and after 15 seconds the extintion is read using green filter. The readings are subtracted from the blank one. Designating the differences by x and the concentrations of ascorbic acid/ml in the standards by y, we get, with the acid of the method of least squares, the following regression equations: for the metaphosphoric acid Y = 0,543x + 0,629 for the oxalic acid Y = 0,516x + 0,422, which permit, by interpolating, the determination of the ascorbic acid content in plant materials.
Resumo:
Leaders must scan the internal and external environment, chart strategic and task objectives, and provide performance feedback. These instrumental leadership (IL) functions go beyond the motivational and quid-pro quo leader behaviors that comprise the full-range-transformational, transactional, and laissez faire-leadership model. In four studies we examined the construct validity of IL. We found evidence for a four-factor IL model that was highly prototypical of good leadership. IL predicted top-level leader emergence controlling for the full-range factors, initiating structure, and consideration. It also explained unique variance in outcomes beyond the full-range factors; the effects of transformational leadership were vastly overstated when IL was omitted from the model. We discuss the importance of a "fuller full-range" leadership theory for theory and practice. We also showcase our methodological contributions regarding corrections for common method variance (i.e., endogeneity) bias using two-stage least squares (2SLS) regression and Monte Carlo split-sample designs.
Resumo:
Several methods have been suggested to estimate non-linear models with interaction terms in the presence of measurement error. Structural equation models eliminate measurement error bias, but require large samples. Ordinary least squares regression on summated scales, regression on factor scores and partial least squares are appropriate for small samples but do not correct measurement error bias. Two stage least squares regression does correct measurement error bias but the results strongly depend on the instrumental variable choice. This article discusses the old disattenuated regression method as an alternative for correcting measurement error in small samples. The method is extended to the case of interaction terms and is illustrated on a model that examines the interaction effect of innovation and style of use of budgets on business performance. Alternative reliability estimates that can be used to disattenuate the estimates are discussed. A comparison is made with the alternative methods. Methods that do not correct for measurement error bias perform very similarly and considerably worse than disattenuated regression
Resumo:
OBJECTIVE Our objective was to test the efficacy and tolerability of three doses of flutamide (125, 250, and 375 mg) combined with a triphasic oral contraceptive (ethynylestradiol/levonorgestrel) during 12 months to treat moderate to severe hirsutism in patients with polycystic ovary syndrome or idiopathic hirsutism. DESIGN We conducted a randomized, double-blind, placebo-controlled, parallel clinical trial. PATIENTS A total of 131 premenopausal women, suffering from moderate to severe hirsutism, were randomized to placebo or 125, 250, or 375 mg flutamide daily associated with a triphasic oral contraceptive pill. Hirsutism (Ferriman-Gallwey), acne and seborrhea (Cremoncini), and hormone serum levels were monitored at baseline and at 3 (except hormone serum levels), 6, and 12 months. Side effects and biochemical, hematological, and hepatic parameters were assessed. METHODS We used three-way ANOVA (subject, dose, and visit) with Scheffé adjustment for multiple comparisons or nonparametrical Friedman test and least-squares mean (paired data) and Kruskall-Wallis test for unpaired data analyses. We used chi(2) or Fisher's test for categorical data. RESULTS A total of 119 patients were included in the intention-to-treat analysis. All flutamide doses induced a significant decrease in hirsutism, acne, and seborrhea scores after 12 months compared with placebo without differences among dose levels. Similar related side effects were observed with placebo and 125 mg flutamide (12.5%), and slightly higher with 250 mg (17.3%) and 375 mg (21.2%). No statistically significant differences were observed either among doses or compared with placebo. CONCLUSIONS Flutamide at 125 mg daily during 12 months was the minimum effective dose to diminish hirsutism in patients with polycystic ovary syndrome or with idiopathic hirsutism.
Resumo:
BACKGROUND Functional brain images such as Single-Photon Emission Computed Tomography (SPECT) and Positron Emission Tomography (PET) have been widely used to guide the clinicians in the Alzheimer's Disease (AD) diagnosis. However, the subjectivity involved in their evaluation has favoured the development of Computer Aided Diagnosis (CAD) Systems. METHODS It is proposed a novel combination of feature extraction techniques to improve the diagnosis of AD. Firstly, Regions of Interest (ROIs) are selected by means of a t-test carried out on 3D Normalised Mean Square Error (NMSE) features restricted to be located within a predefined brain activation mask. In order to address the small sample-size problem, the dimension of the feature space was further reduced by: Large Margin Nearest Neighbours using a rectangular matrix (LMNN-RECT), Principal Component Analysis (PCA) or Partial Least Squares (PLS) (the two latter also analysed with a LMNN transformation). Regarding the classifiers, kernel Support Vector Machines (SVMs) and LMNN using Euclidean, Mahalanobis and Energy-based metrics were compared. RESULTS Several experiments were conducted in order to evaluate the proposed LMNN-based feature extraction algorithms and its benefits as: i) linear transformation of the PLS or PCA reduced data, ii) feature reduction technique, and iii) classifier (with Euclidean, Mahalanobis or Energy-based methodology). The system was evaluated by means of k-fold cross-validation yielding accuracy, sensitivity and specificity values of 92.78%, 91.07% and 95.12% (for SPECT) and 90.67%, 88% and 93.33% (for PET), respectively, when a NMSE-PLS-LMNN feature extraction method was used in combination with a SVM classifier, thus outperforming recently reported baseline methods. CONCLUSIONS All the proposed methods turned out to be a valid solution for the presented problem. One of the advances is the robustness of the LMNN algorithm that not only provides higher separation rate between the classes but it also makes (in combination with NMSE and PLS) this rate variation more stable. In addition, their generalization ability is another advance since several experiments were performed on two image modalities (SPECT and PET).
Resumo:
Customer satisfaction and retention are key issues for organizations in today’s competitive market place. As such, much research and revenue has been invested in developing accurate ways of assessing consumer satisfaction at both the macro (national) and micro (organizational) level, facilitating comparisons in performance both within and between industries. Since the instigation of the national customer satisfaction indices (CSI), partial least squares (PLS) has been used to estimate the CSI models in preference to structural equation models (SEM) because they do not rely on strict assumptions about the data. However, this choice was based upon some misconceptions about the use of SEM’s and does not take into consideration more recent advances in SEM, including estimation methods that are robust to non-normality and missing data. In this paper, both SEM and PLS approaches were compared by evaluating perceptions of the Isle of Man Post Office Products and Customer service using a CSI format. The new robust SEM procedures were found to be advantageous over PLS. Product quality was found to be the only driver of customer satisfaction, while image and satisfaction were the only predictors of loyalty, thus arguing for the specificity of postal services
Resumo:
AbstractFor a wide range of environmental, hydrological, and engineering applications there is a fast growing need for high-resolution imaging. In this context, waveform tomographic imaging of crosshole georadar data is a powerful method able to provide images of pertinent electrical properties in near-surface environments with unprecedented spatial resolution. In contrast, conventional ray-based tomographic methods, which consider only a very limited part of the recorded signal (first-arrival traveltimes and maximum first-cycle amplitudes), suffer from inherent limitations in resolution and may prove to be inadequate in complex environments. For a typical crosshole georadar survey the potential improvement in resolution when using waveform-based approaches instead of ray-based approaches is in the range of one order-of- magnitude. Moreover, the spatial resolution of waveform-based inversions is comparable to that of common logging methods. While in exploration seismology waveform tomographic imaging has become well established over the past two decades, it is comparably still underdeveloped in the georadar domain despite corresponding needs. Recently, different groups have presented finite-difference time-domain waveform inversion schemes for crosshole georadar data, which are adaptations and extensions of Tarantola's seminal nonlinear generalized least-squares approach developed for the seismic case. First applications of these new crosshole georadar waveform inversion schemes on synthetic and field data have shown promising results. However, there is little known about the limits and performance of such schemes in complex environments. To this end, the general motivation of my thesis is the evaluation of the robustness and limitations of waveform inversion algorithms for crosshole georadar data in order to apply such schemes to a wide range of real world problems.One crucial issue to making applicable and effective any waveform scheme to real-world crosshole georadar problems is the accurate estimation of the source wavelet, which is unknown in reality. Waveform inversion schemes for crosshole georadar data require forward simulations of the wavefield in order to iteratively solve the inverse problem. Therefore, accurate knowledge of the source wavelet is critically important for successful application of such schemes. Relatively small differences in the estimated source wavelet shape can lead to large differences in the resulting tomograms. In the first part of my thesis, I explore the viability and robustness of a relatively simple iterative deconvolution technique that incorporates the estimation of the source wavelet into the waveform inversion procedure rather than adding additional model parameters into the inversion problem. Extensive tests indicate that this source wavelet estimation technique is simple yet effective, and is able to provide remarkably accurate and robust estimates of the source wavelet in the presence of strong heterogeneity in both the dielectric permittivity and electrical conductivity as well as significant ambient noise in the recorded data. Furthermore, our tests also indicate that the approach is insensitive to the phase characteristics of the starting wavelet, which is not the case when directly incorporating the wavelet estimation into the inverse problem.Another critical issue with crosshole georadar waveform inversion schemes which clearly needs to be investigated is the consequence of the common assumption of frequency- independent electromagnetic constitutive parameters. This is crucial since in reality, these parameters are known to be frequency-dependent and complex and thus recorded georadar data may show significant dispersive behaviour. In particular, in the presence of water, there is a wide body of evidence showing that the dielectric permittivity can be significantly frequency dependent over the GPR frequency range, due to a variety of relaxation processes. The second part of my thesis is therefore dedicated to the evaluation of the reconstruction limits of a non-dispersive crosshole georadar waveform inversion scheme in the presence of varying degrees of dielectric dispersion. I show that the inversion algorithm, combined with the iterative deconvolution-based source wavelet estimation procedure that is partially able to account for the frequency-dependent effects through an "effective" wavelet, performs remarkably well in weakly to moderately dispersive environments and has the ability to provide adequate tomographic reconstructions.
Resumo:
We construct a weighted Euclidean distance that approximates any distance or dissimilarity measure between individuals that is based on a rectangular cases-by-variables data matrix. In contrast to regular multidimensional scaling methods for dissimilarity data, the method leads to biplots of individuals and variables while preserving all the good properties of dimension-reduction methods that are based on the singular-value decomposition. The main benefits are the decomposition of variance into components along principal axes, which provide the numerical diagnostics known as contributions, and the estimation of nonnegative weights for each variable. The idea is inspired by the distance functions used in correspondence analysis and in principal component analysis of standardized data, where the normalizations inherent in the distances can be considered as differential weighting of the variables. In weighted Euclidean biplots we allow these weights to be unknown parameters, which are estimated from the data to maximize the fit to the chosen distances or dissimilarities. These weights are estimated using a majorization algorithm. Once this extra weight-estimation step is accomplished, the procedure follows the classical path in decomposing the matrix and displaying its rows and columns in biplots.