844 resultados para partial least square
Resumo:
Complex diseases will have multiple functional sites, and it will be invaluable to understand the cross-locus interaction in terms of linkage disequilibrium (LD) between those sites (epistasis) in addition to the haplotype-LD effects. We investigated the statistical properties of a class of matrix-based statistics to assess this epistasis. These statistical methods include two LD contrast tests (Zaykin et al., 2006) and partial least squares regression (Wang et al., 2008). To estimate Type 1 error rates and power, we simulated multiple two-variant disease models using the SIMLA software package. SIMLA allows for the joint action of up to two disease genes in the simulated data with all possible multiplicative interaction effects between them. Our goal was to detect an interaction between multiple disease-causing variants by means of their linkage disequilibrium (LD) patterns with other markers. We measured the effects of marginal disease effect size, haplotype LD, disease prevalence and minor allele frequency have on cross-locus interaction (epistasis). In the setting of strong allele effects and strong interaction, the correlation between the two disease genes was weak (r=0.2). In a complex system with multiple correlations (both marginal and interaction), it was difficult to determine the source of a significant result. Despite these complications, the partial least squares and modified LD contrast methods maintained adequate power to detect the epistatic effects; however, for many of the analyses we often could not separate interaction from a strong marginal effect. While we did not exhaust the entire parameter space of possible models, we do provide guidance on the effects that population parameters have on cross-locus interaction.
Resumo:
PURPOSE: The role of PM10 in the development of allergic diseases remains controversial among epidemiological studies, partly due to the inability to control for spatial variations in large-scale risk factors. This study aims to investigate spatial correspondence between the level of PM10 and allergic diseases at the sub-district level in Seoul, Korea, in order to evaluate whether the impact of PM10 is observable and spatially varies across the subdistricts. METHODS: PM10 measurements at 25 monitoring stations in the city were interpolated to 424 sub-districts where annual inpatient and outpatient count data for 3 types of allergic diseases (atopic dermatitis, asthma, and allergic rhinitis) were collected. We estimated multiple ordinary least square regression models to examine the association of the PM10 level with each of the allergic diseases, controlling for various sub-district level covariates. Geographically weighted regression (GWR) models were conducted to evaluate how the impact of PM10 varies across the sub-districts. RESULTS: PM10 was found to be a significant predictor of atopic dermatitis patient count (P<0.01), with greater association when spatially interpolated at the sub-district level. No significant effect of PM10 was observed on allergic rhinitis and asthma when socioeconomic factors were controlled for. GWR models revealed spatial variation of PM10 effects on atopic dermatitis across the sub-districts in Seoul. The relationship of PM10 levels to atopic dermatitis patient counts is found to be significant only in the Gangbuk region (P<0.01), along with other covariates including average land value, poverty rate, level of education and apartment rate (P<0.01). CONCLUSIONS: Our findings imply that PM10 effects on allergic diseases might not be consistent throughout Seoul. GIS-based spatial modeling techniques could play a role in evaluating spatial variation of air pollution impacts on allergic diseases at the sub-district level, which could provide valuable guidelines for environmental and public health policymakers.
Resumo:
Artificial neural network (ANN) models for water loss (WL) and solid gain (SG) were evaluated as potential alternative to multiple linear regression (MLR) for osmotic dehydration of apple, banana and potato. The radial basis function (RBF) network with a Gaussian function was used in this study. The RBF employed the orthogonal least square learning method. When predictions of experimental data from MLR and ANN were compared, an agreement was found for ANN models than MLR models for SG than WL. The regression coefficient for determination (R2) for SG in MLR models was 0.31, and for ANN was 0.91. The R2 in MLR for WL was 0.89, whereas ANN was 0.84.Osmotic dehydration experiments found that the amount of WL and SG occurred in the following descending order: Golden Delicious apple > Cox apple > potato > banana. The effect of temperature and concentration of osmotic solution on WL and SG of the plant materials followed a descending order as: 55 > 40 > 32.2C and 70 > 60 > 50 > 40%, respectively.
Resumo:
Satellite altimetry has revolutionized our understanding of ocean dynamics thanks to frequent sampling and global coverage. Nevertheless, coastal data have been flagged as unreliable due to land and calm water interference in the altimeter and radiometer footprint and uncertainty in the modelling of high-frequency tidal and atmospheric forcing. Our study addresses the first issue, i.e. altimeter footprint contamination, via retracking, presenting ALES, the Adaptive Leading Edge Subwaveform retracker. ALES is potentially applicable to all the pulse-limited altimetry missions and its aim is to retrack both open ocean and coastal data with the same accuracy using just one algorithm. ALES selects part of each returned echo and models it with a classic ”open ocean” Brown functional form, by means of least square estimation whose convergence is found through the Nelder-Mead nonlinear optimization technique. By avoiding echoes from bright targets along the trailing edge, it is capable of retrieving more coastal waveforms than the standard processing. By adapting the width of the estimation window according to the significant wave height, it aims at maintaining the accuracy of the standard processing in both the open ocean and the coastal strip. This innovative retracker is validated against tide gauges in the Adriatic Sea and in the Greater Agulhas System for three different missions: Envisat, Jason-1 and Jason-2. Considerations of noise and biases provide a further verification of the strategy. The results show that ALES is able to provide more reliable 20-Hz data for all three missions in areas where even 1-Hz averages are flagged as unreliable in standard products. Application of the ALES retracker led to roughly a half of the analysed tracks showing a marked improvement in correlation with the tide gauge records, with the rms difference being reduced by a factor of 1.5 for Jason-1 and Jason-2 and over 4 for Envisat in the Adriatic Sea (at the closest point to the tide gauge).
Resumo:
The analysis of chironomid taxa and environmental datasets from 46 New Zealand lakes identified temperature (February mean air temperature) and lake production (chlorophyll a (Chl a)) as the main drivers of chironomid distribution. Temperature was the strongest driver of chironomid distribution and consequently produced the most robust inference models. We present two possible temperature transfer functions from this dataset. The most robust model (weighted averaging-partial least squares (WA-PLS), n = 36) was based on a dataset with the most productive (Chl a > 10 lg l)1) lakes removed. This model produced a coefficient of determination (r2 jack) of 0.77, and a root mean squared error of prediction (RMSEPjack) of 1.31C. The Chl a transfer function (partial least squares (PLS), n = 37) was far less reliable, with an r2 jack of 0.49 and an RMSEPjack of 0.46 Log10lg l)1. Both of these transfer functions could be improved by a revision of the taxonomy for the New Zealand chironomid taxa, particularly the genus Chironomus. The Chironomus morphotype was common in high altitude, cool, oligotrophic lakes and lowland, warm, eutrophic lakes. This could reflect the widespread distribution of one eurythermic species, or the collective distribution of a number of different Chironomus species with more limited tolerances. The Chl a transfer function could also be improved by inputting mean Chl a values into the inference model rather than the spot measurements that were available for this study.
Resumo:
Raman spectroscopy has been used to predict the abundance of the FA in clarified butterfat that was obtained from dairy cows fed a range of levels of rapeseed oil in their diet. Partial least squares regression of the Raman spectra against FA compositions obtained by GC showed good prediction for the five major (abundance >5%) FA with R-2=0.74-0.92 and a root mean SE of prediction (RMSEP) that was 5-7% of the mean. In general, the prediction accuracy fell with decreasing abundance in the sample, but the RMSEP was 1.25%. The Raman method has the best prediction ability for unsaturated FA (R-2=0.85-0.92), and in particular trans unsaturated FA (best-predicted FA was 18:1 tDelta9). This enhancement was attributed to the isolation of the unsaturated modes from the saturated modes and the significantly higher spectral response of unsaturated bonds compared with saturated bonds. Raman spectra of the melted butter samples could also be used to predict bulk parameters calculated from standard analyzes, such as iodine value (R-2=0.80) and solid fat content at low temperature (R-2=0.87). For solid fat contents determined at higher temperatures, the prediction ability was significantly reduced (R-2=0.42), and this decrease in performance was attributed to the smaller range of values in solid fat content at the higher temperatures. Finally, although the prediction errors for the abundances of each of the FA in a given sample are much larger with Raman than with full GC analysis, the accuracy is acceptably high for quality control applications. This, combined with the fact that Raman spectra can be obtained with no sample preparation and with 60-s data collection times, means that high-throughput, on-line Raman analysis of butter samples should be possible.
Resumo:
This paper introduces the application of linear multivariate statistical techniques, including partial least squares (PLS), canonical correlation analysis (CCA) and reduced rank regression (RRR), into the area of Systems Biology. This new approach aims to extract the important proteins embedded in complex signal transduction pathway models.The analysis is performed on a model of intracellular signalling along the janus-associated kinases/signal transducers and transcription factors (JAK/STAT) and mitogen activated protein kinases (MAPK) signal transduction pathways in interleukin-6 (IL6) stimulated hepatocytes, which produce signal transducer and activator of transcription factor 3 (STAT3).A region of redundancy within the MAPK pathway that does not affect the STAT3 transcription was identified using CCA. This is the core finding of this analysis and cannot be obtained by inspecting the model by eye. In addition, RRR was found to isolate terms that do not significantly contribute to changes in protein concentrations, while the application of PLS does not provide such a detailed picture by virtue of its construction.This analysis has a similar objective to conventional model reduction techniques with the advantage of maintaining the meaning of the states prior to and after the reduction process. A significant model reduction is performed, with a marginal loss in accuracy, offering a more concise model while maintaining the main influencing factors on the STAT3 transcription.The findings offer a deeper understanding of the reaction terms involved, confirm the relevance of several proteins to the production of Acute Phase Proteins and complement existing findings regarding cross-talk between the two signalling pathways.
Resumo:
This paper describes the application of multivariate regression techniques to the Tennessee Eastman benchmark process for modelling and fault detection. Two methods are applied : linear partial least squares, and a nonlinear variant of this procedure using a radial basis function inner relation. The performance of the RBF networks is enhanced through the use of a recently developed training algorithm which uses quasi-Newton optimization to ensure an efficient and parsimonious network; details of this algorithm can be found in this paper. The PLS and PLS/RBF methods are then used to create on-line inferential models of delayed process measurements. As these measurements relate to the final product composition, these models suggest that on-line statistical quality control analysis should be possible for this plant. The generation of `soft sensors' for these measurements has the further effect of introducing a redundant element into the system, redundancy which can then be used to generate a fault detection and isolation scheme for these sensors. This is achieved by arranging the sensors and models in a manner comparable to the dedicated estimator scheme of Clarke et al. 1975, IEEE Trans. Pero. Elect. Sys., AES-14R, 465-473. The effectiveness of this scheme is demonstrated on a series of simulated sensor and process faults, with full detection and isolation shown to be possible for sensor malfunctions, and detection feasible in the case of process faults. Suggestions for enhancing the diagnostic capacity in the latter case are covered towards the end of the paper.
Resumo:
This study investigates the superposition-based cooperative transmission system. In this system, a key point is for the relay node to detect data transmitted from the source node. This issued was less considered in the existing literature as the channel is usually assumed to be flat fading and a priori known. In practice, however, the channel is not only a priori unknown but subject to frequency selective fading. Channel estimation is thus necessary. Of particular interest is the channel estimation at the relay node which imposes extra requirement for the system resources. The authors propose a novel turbo least-square channel estimator by exploring the superposition structure of the transmission data. The proposed channel estimator not only requires no pilot symbols but also has significantly better performance than the classic approach. The soft-in-soft-out minimum mean square error (MMSE) equaliser is also re-derived to match the superimposed data structure. Finally computer simulation results are shown to verify the proposed algorithm.
Resumo:
Recently polymeric adsorbents have been emerging as highly effective alternatives to activated carbons for pollutant removal from industrial effluents. Poly(methyl methacrylate) (PMMA), polymerized using the atom transfer radical polymerization (ATRP) technique has been investigated for its feasibility to remove phenol from aqueous solution. Adsorption equilibrium and kinetic investigations were undertaken to evaluate the effect of contact time, initial concentration (10-90 mg/L), and temperature (25-55 degrees C). Phenol uptake was found to increase with increase in initial concentration and agitation time. The adsorption kinetics were found to follow the pseudo-second-order kinetic model. The intra-particle diffusion analysis indicated that film diffusion may be the rate controlling step in the removal process. Experimental equilibrium data were fitted to five different isotherm models namely Langmuir, Freundlich, Dubinin-Radushkevich, Temkin and Redlich-Peterson by non-linear least square regression and their goodness-of-fit evaluated in terms of mean relative error (MRE) and standard error of estimate (SEE). The adsorption equilibrium data were best represented by Freundlich and Redlich-Peterson isotherms. Thermodynamic parameters such as Delta G degrees and Delta H degrees indicated that the sorption process is exothermic and spontaneous in nature and that higher ambient temperature results in more favourable adsorption. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
A study was undertaken to examine a range of sample preparation and near infrared reflectance spectroscopy (NIPS) methodologies, using undried samples, for predicting organic matter digestibility (OMD g kg(-1)) and ad libitum intake (g kg(-1) W-0.75) of grass silages. A total of eight sample preparation/NIRS scanning methods were examined involving three extents of silage comminution, two liquid extracts and scanning via either external probe (1100-2200 nm) or internal cell (1100-2500 nm). The spectral data (log 1/R) for each of the eight methods were examined by three regression techniques each with a range of data transformations. The 136 silages used in the study were obtained from farms across Northern Ireland, over a two year period, and had in vivo OMD (sheep) and ad libitum intake (cattle) determined under uniform conditions. In the comparisons of the eight sample preparation/scanning methods, and the differing mathematical treatments of the spectral data, the sample population was divided into calibration (n = 91) and validation (n = 45) sets. The standard error of performance (SEP) on the validation set was used in comparisons of prediction accuracy. Across all 8 sample preparation/scanning methods, the modified partial least squares (MPLS) technique, generally minimized SEP's for both OMD and intake. The accuracy of prediction also increased with degree of comminution of the forage and with scanning by internal cell rather than external probe. The system providing the lowest SEP used the MPLS regression technique on spectra from the finely milled material scanned through the internal cell. This resulted in SEP and R-2 (variance accounted for in validation set) values of 24 (g/kg OM) and 0.88 (OMD) and 5.37 (g/kg W-0.75) and 0.77 (intake) respectively. These data indicate that with appropriate techniques NIRS scanning of undried samples of grass silage can produce predictions of intake and digestibility with accuracies similar to those achieved previously using NIRS with dried samples. (C) 1998 Elsevier Science B.V.
Resumo:
A study combining high resolution mass spectrometry (liquid chromatography-quadrupole time-of-flight-mass spectrometry, UPLC-QTof-MS) and chemometrics for the analysis of post-mortem brain tissue from subjects with Alzheimer’s disease (AD) (n = 15) and healthy age-matched controls (n = 15) was undertaken. The huge potential of this metabolomics approach for distinguishing AD cases is underlined by the correct prediction of disease status in 94–97% of cases. Predictive power was confirmed in a blind test set of 60 samples, reaching 100% diagnostic accuracy. The approach also indicated compounds significantly altered in concentration following the onset of human AD. Using orthogonal partial least-squares discriminant analysis (OPLS-DA), a multivariate model was created for both modes of acquisition explaining the maximum amount of variation between sample groups (Positive Mode-R2 = 97%; Q2 = 93%; root mean squared error of validation (RMSEV) = 13%; Negative Mode-R2 = 99%; Q2 = 92%; RMSEV = 15%). In brain extracts, 1264 and 1457 ions of interest were detected for the different modes of acquisition (positive and negative, respectively). Incorporation of gender into the model increased predictive accuracy and decreased RMSEV values. High resolution UPLC-QTof-MS has not previously been employed to biochemically profile post-mortem brain tissue, and the novel methods described and validated herein prove its potential for making new discoveries related to the etiology, pathophysiology, and treatment of degenerative brain disorders.
Resumo:
The techniques of principal component analysis (PCA) and partial least squares (PLS) are introduced from the point of view of providing a multivariate statistical method for modelling process plants. The advantages and limitations of PCA and PLS are discussed from the perspective of the type of data and problems that might be encountered in this application area. These concepts are exemplified by two case studies dealing first with data from a continuous stirred tank reactor (CSTR) simulation and second a literature source describing a low-density polyethylene (LDPE) reactor simulation.
Resumo:
Features analysis is an important task which can significantly affect the performance of automatic bacteria colony picking. Unstructured environments also affect the automatic colony screening. This paper presents a novel approach for adaptive colony segmentation in unstructured environments by treating the detected peaks of intensity histograms as a morphological feature of images. In order to avoid disturbing peaks, an entropy based mean shift filter is introduced to smooth images as a preprocessing step. The relevance and importance of these features can be determined in an improved support vector machine classifier using unascertained least square estimation. Experimental results show that the proposed unascertained least square support vector machine (ULSSVM) has better recognition accuracy than the other state-of-the-art techniques, and its training process takes less time than most of the traditional approaches presented in this paper.
Resumo:
The in-line measurement of COD and NH4-N in the WWTP inflow is crucial for the timely monitoring of biological wastewater treatment processes and for the development of advanced control strategies for optimized WWTP operation. As a direct measurement of COD and NH4-N requires expensive and high maintenance in-line probes or analyzers, an approach estimating COD and NH4-N based on standard and spectroscopic in-line inflow measurement systems using Machine Learning Techniques is presented in this paper. The results show that COD estimation using Radom Forest Regression with a normalized MSE of 0.3, which is sufficiently accurate for practical applications, can be achieved using only standard in-line measurements. In the case of NH4-N, a good estimation using Partial Least Squares Regression with a normalized MSE of 0.16 is only possible based on a combination of standard and spectroscopic in-line measurements. Furthermore, the comparison of regression and classification methods shows that both methods perform equally well in most cases.