929 resultados para Multivariate Adaptive Regression Splines (MARS)
Resumo:
Abstract A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Adaptability and invisibility are hallmarks of modern terrorism, and keeping pace with its dynamic nature presents a serious challenge for societies throughout the world. Innovations in computer science have incorporated applied mathematics to develop a wide array of predictive models to support the variety of approaches to counterterrorism. Predictive models are usually designed to forecast the location of attacks. Although this may protect individual structures or locations, it does not reduce the threat—it merely changes the target. While predictive models dedicated to events or social relationships receive much attention where the mathematical and social science communities intersect, models dedicated to terrorist locations such as safe-houses (rather than their targets or training sites) are rare and possibly nonexistent. At the time of this research, there were no publically available models designed to predict locations where violent extremists are likely to reside. This research uses France as a case study to present a complex systems model that incorporates multiple quantitative, qualitative and geospatial variables that differ in terms of scale, weight, and type. Though many of these variables are recognized by specialists in security studies, there remains controversy with respect to their relative importance, degree of interaction, and interdependence. Additionally, some of the variables proposed in this research are not generally recognized as drivers, yet they warrant examination based on their potential role within a complex system. This research tested multiple regression models and determined that geographically-weighted regression analysis produced the most accurate result to accommodate non-stationary coefficient behavior, demonstrating that geographic variables are critical to understanding and predicting the phenomenon of terrorism. This dissertation presents a flexible prototypical model that can be refined and applied to other regions to inform stakeholders such as policy-makers and law enforcement in their efforts to improve national security and enhance quality-of-life.
Resumo:
Neuroimaging research involves analyses of huge amounts of biological data that might or might not be related with cognition. This relationship is usually approached using univariate methods, and, therefore, correction methods are mandatory for reducing false positives. Nevertheless, the probability of false negatives is also increased. Multivariate frameworks have been proposed for helping to alleviate this balance. Here we apply multivariate distance matrix regression for the simultaneous analysis of biological and cognitive data, namely, structural connections among 82 brain regions and several latent factors estimating cognitive performance. We tested whether cognitive differences predict distances among individuals regarding their connectivity pattern. Beginning with 3,321 connections among regions, the 36 edges better predicted by the individuals' cognitive scores were selected. Cognitive scores were related to connectivity distances in both the full (3,321) and reduced (36) connectivity patterns. The selected edges connect regions distributed across the entire brain and the network defined by these edges supports high-order cognitive processes such as (a) (fluid) executive control, (b) (crystallized) recognition, learning, and language processing, and (c) visuospatial processing. This multivariate study suggests that one widespread, but limited number, of regions in the human brain, supports high-level cognitive ability differences. Hum Brain Mapp, 2016. © 2016 Wiley Periodicals, Inc.
Resumo:
The application of laser induced breakdown spectrometry (LIBS) aiming the direct analysis of plant materials is a great challenge that still needs efforts for its development and validation. In this way, a series of experimental approaches has been carried out in order to show that LIBS can be used as an alternative method to wet acid digestions based methods for analysis of agricultural and environmental samples. The large amount of information provided by LIBS spectra for these complex samples increases the difficulties for selecting the most appropriated wavelengths for each analyte. Some applications have suggested that improvements in both accuracy and precision can be achieved by the application of multivariate calibration in LIBS data when compared to the univariate regression developed with line emission intensities. In the present work, the performance of univariate and multivariate calibration, based on partial least squares regression (PLSR), was compared for analysis of pellets of plant materials made from an appropriate mixture of cryogenically ground samples with cellulose as the binding agent. The development of a specific PLSR model for each analyte and the selection of spectral regions containing only lines of the analyte of interest were the best conditions for the analysis. In this particular application, these models showed a similar performance. but PLSR seemed to be more robust due to a lower occurrence of outliers in comparison to the univariate method. Data suggests that efforts dealing with sample presentation and fitness of standards for LIBS analysis must be done in order to fulfill the boundary conditions for matrix independent development and validation. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Fourier transform near infrared (FT-NIR) spectroscopy was evaluated as an analytical too[ for monitoring residual Lignin, kappa number and hexenuronic acids (HexA) content in kraft pulps of Eucalyptus globulus. Sets of pulp samples were prepared under different cooking conditions to obtain a wide range of compound concentrations that were characterised by conventional wet chemistry analytical methods. The sample group was also analysed using FT-NIR spectroscopy in order to establish prediction models for the pulp characteristics. Several models were applied to correlate chemical composition in samples with the NIR spectral data by means of PCR or PLS algorithms. Calibration curves were built by using all the spectral data or selected regions. Best calibration models for the quantification of lignin, kappa and HexA were proposed presenting R-2 values of 0.99. Calibration models were used to predict pulp titers of 20 external samples in a validation set. The lignin concentration and kappa number in the range of 1.4-18% and 8-62, respectively, were predicted fairly accurately (standard error of prediction, SEP 1.1% for lignin and 2.9 for kappa). The HexA concentration (range of 5-71 mmol kg(-1) pulp) was more difficult to predict and the SEP was 7.0 mmol kg(-1) pulp in a model of HexA quantified by an ultraviolet (UV) technique and 6.1 mmol kg(-1) pulp in a model of HexA quantified by anion-exchange chromatography (AEC). Even in wet chemical procedures used for HexA determination, there is no good agreement between methods as demonstrated by the UV and AEC methods described in the present work. NIR spectroscopy did provide a rapid estimate of HexA content in kraft pulps prepared in routine cooking experiments.
Resumo:
This paper is part of a large study to assess the adequacy of the use of multivariate statistical techniques in theses and dissertations of some higher education institutions in the area of marketing with theme of consumer behavior from 1997 to 2006. The regression and conjoint analysis are focused on in this paper, two techniques with great potential of use in marketing studies. The objective of this study was to analyze whether the employement of these techniques suits the needs of the research problem presented in as well as to evaluate the level of success in meeting their premisses. Overall, the results suggest the need for more involvement of researchers in the verification of all the theoretical precepts of application of the techniques classified in the category of investigation of dependence among variables.
Resumo:
Fault detection and isolation (FDI) are important steps in the monitoring and supervision of industrial processes. Biological wastewater treatment (WWT) plants are difficult to model, and hence to monitor, because of the complexity of the biological reactions and because plant influent and disturbances are highly variable and/or unmeasured. Multivariate statistical models have been developed for a wide variety of situations over the past few decades, proving successful in many applications. In this paper we develop a new monitoring algorithm based on Principal Components Analysis (PCA). It can be seen equivalently as making Multiscale PCA (MSPCA) adaptive, or as a multiscale decomposition of adaptive PCA. Adaptive Multiscale PCA (AdMSPCA) exploits the changing multivariate relationships between variables at different time-scales. Adaptation of scale PCA models over time permits them to follow the evolution of the process, inputs or disturbances. Performance of AdMSPCA and adaptive PCA on a real WWT data set is compared and contrasted. The most significant difference observed was the ability of AdMSPCA to adapt to a much wider range of changes. This was mainly due to the flexibility afforded by allowing each scale model to adapt whenever it did not signal an abnormal event at that scale. Relative detection speeds were examined only summarily, but seemed to depend on the characteristics of the faults/disturbances. The results of the algorithms were similar for sudden changes, but AdMSPCA appeared more sensitive to slower changes.
Resumo:
A cross-sectional case-control study on the association between the reduced work ability and S. japonicum infection was carried out in a moderate endemic area for schistosomiasis japonica in the southern part of Dongting lake in China. A total of 120 cases with reduced work ability and 240 controls paired to the case by age, sex, occupation and without reduced work ability, participated in the study. The mean age for individuals was 37.6 years old (21-60), the ratio of male: female was 60:40, the prevalence of S. japonicum in the individuals was 28.3%. The results obtained in this study showed that the infection of S. japonicum in case and control groups was 49.2% (59/120) and 17.9% (43/240), respectively. Odds ratio for reduced work ability among those who had schistosomiasis was 4.34 (95%), confidence interval was 2.58-7.34, and among those who had S. japonicum infection (egg per gram > 100) was up to 12.67 (95%), confidence interval was 3.64-46.39. After odds ratio was adjusted by multiple logistic regression, it was confirmed that heavier intensity of S. japonicum infection and splenomegaly due to S. japonicum infection were the main risk factors for reduced work ability in the population studied.
Resumo:
A regression analysis using a linked file of all Swiss births und perinatal deaths 1979-1981 showed a significant relation between birthweight and canton. Sex of infant and multiplicity of birth were significant, too. For live births, marital and socio-economic status of mother and father relate to birthweight. Logistic regressions brought out relationships between the risk of stillbirth and occupation of father, nationality and marital status of mother, apart from birthweight. For live births, only sex and (weakly) marital status and rank of the child were influencial after correction for birthweight.
Resumo:
BACKGROUND AND AIMS: Data from the literature reveal the contrasting influences of family members and friends on the survival of old adults. On one hand, numerous studies have reported a positive association between social relationships and survival. On the other, ties with children may be associated with an increased risk of disability, whereas ties with friends or other relatives tend to improve survival. A five-year prospective, population-based study of 295 Swiss octogenarians tested the hypothesis that having a spouse, siblings or close friends, and regular contacts with relatives or friends are associated with longer survival, even at a very old age. METHODS: Data were collected through individual interviews, and a Cox regression model was applied to assess the effects of kinship and friendship networks on survival, after adjusting for socio-demographic and health-related variables. RESULTS: Our analyses indicate that the presence of a spouse in the household is not significantly related to survival, whereas the presence of siblings at baseline improves the oldest old's chances of surviving five years later. Moreover, the existence of close friends is a central component in the patterns of social relationships of oldest adults, and one which is significantly associated with survival. Overall, the protective effect of social relationships on survival is more related to the quality of those relationships (close friends) than to the frequency of relationships (regular contacts). CONCLUSIONS: We hypothesize that the existence of siblings or close friends may beneficially affect survival, due to the potential influence on the attitudes of octogenarians regarding health practices and adaptive strategies.
Resumo:
Given $n$ independent replicates of a jointly distributed pair $(X,Y)\in {\cal R}^d \times {\cal R}$, we wish to select from a fixed sequence of model classes ${\cal F}_1, {\cal F}_2, \ldots$ a deterministic prediction rule $f: {\cal R}^d \to {\cal R}$ whose risk is small. We investigate the possibility of empirically assessingthe {\em complexity} of each model class, that is, the actual difficulty of the estimation problem within each class. The estimated complexities are in turn used to define an adaptive model selection procedure, which is based on complexity penalized empirical risk.The available data are divided into two parts. The first is used to form an empirical cover of each model class, and the second is used to select a candidate rule from each cover based on empirical risk. The covering radii are determined empirically to optimize a tight upper bound on the estimation error. An estimate is chosen from the list of candidates in order to minimize the sum of class complexity and empirical risk. A distinguishing feature of the approach is that the complexity of each model class is assessed empirically, based on the size of its empirical cover.Finite sample performance bounds are established for the estimates, and these bounds are applied to several non-parametric estimation problems. The estimates are shown to achieve a favorable tradeoff between approximation and estimation error, and to perform as well as if the distribution-dependent complexities of the model classes were known beforehand. In addition, it is shown that the estimate can be consistent,and even possess near optimal rates of convergence, when each model class has an infinite VC or pseudo dimension.For regression estimation with squared loss we modify our estimate to achieve a faster rate of convergence.
Resumo:
Robust estimators for accelerated failure time models with asymmetric (or symmetric) error distribution and censored observations are proposed. It is assumed that the error model belongs to a log-location-scale family of distributions and that the mean response is the parameter of interest. Since scale is a main component of mean, scale is not treated as a nuisance parameter. A three steps procedure is proposed. In the first step, an initial high breakdown point S estimate is computed. In the second step, observations that are unlikely under the estimated model are rejected or down weighted. Finally, a weighted maximum likelihood estimate is computed. To define the estimates, functions of censored residuals are replaced by their estimated conditional expectation given that the response is larger than the observed censored value. The rejection rule in the second step is based on an adaptive cut-off that, asymptotically, does not reject any observation when the data are generat ed according to the model. Therefore, the final estimate attains full efficiency at the model, with respect to the maximum likelihood estimate, while maintaining the breakdown point of the initial estimator. Asymptotic results are provided. The new procedure is evaluated with the help of Monte Carlo simulations. Two examples with real data are discussed.
Resumo:
Panel data can be arranged into a matrix in two ways, called 'long' and 'wide' formats (LFand WF). The two formats suggest two alternative model approaches for analyzing paneldata: (i) univariate regression with varying intercept; and (ii) multivariate regression withlatent variables (a particular case of structural equation model, SEM). The present papercompares the two approaches showing in which circumstances they yield equivalent?insome cases, even numerically equal?results. We show that the univariate approach givesresults equivalent to the multivariate approach when restrictions of time invariance (inthe paper, the TI assumption) are imposed on the parameters of the multivariate model.It is shown that the restrictions implicit in the univariate approach can be assessed bychi-square difference testing of two nested multivariate models. In addition, commontests encountered in the econometric analysis of panel data, such as the Hausman test, areshown to have an equivalent representation as chi-square difference tests. Commonalitiesand differences between the univariate and multivariate approaches are illustrated usingan empirical panel data set of firms' profitability as well as a simulated panel data.
Resumo:
BACKGROUND: The aim of the current study was to assess whether widely used nutritional parameters are correlated with the nutritional risk score (NRS-2002) to identify postoperative morbidity and to evaluate the role of nutritionists in nutritional assessment. METHODS: A randomized trial on preoperative nutritional interventions (NCT00512213) provided the study cohort of 152 patients at nutritional risk (NRS-2002 ≥3) with a comprehensive phenotyping including diverse nutritional parameters (n=17), elaborated by nutritional specialists, and potential demographic and surgical (n=5) confounders. Risk factors for overall, severe (Dindo-Clavien 3-5) and infectious complications were identified by univariate analysis; parameters with P<0.20 were then entered in a multiple logistic regression model. RESULTS: Final analysis included 140 patients with complete datasets. Of these, 61 patients (43.6%) were overweight, and 72 patients (51.4%) experienced at least one complication of any degree of severity. Univariate analysis identified a correlation between few (≤3) active co-morbidities (OR=4.94; 95% CI: 1.47-16.56, p=0.01) and overall complications. Patients screened as being malnourished by nutritional specialists presented less overall complications compared to the not malnourished (OR=0.47; 95% CI: 0.22-0.97, p=0.043). Severe postoperative complications occurred more often in patients with low lean body mass (OR=1.06; 95% CI: 1-1.12, p=0.028). Few (≤3) active co-morbidities (OR=8.8; 95% CI: 1.12-68.99, p=0.008) were related with postoperative infections. Patients screened as being malnourished by nutritional specialists presented less infectious complications (OR=0.28; 95% CI: 0.1-0.78), p=0.014) as compared to the not malnourished. Multivariate analysis identified few co-morbidities (OR=6.33; 95% CI: 1.75-22.84, p=0.005), low weight loss (OR=1.08; 95% CI: 1.02-1.14, p=0.006) and low hemoglobin concentration (OR=2.84; 95% CI: 1.22-6.59, p=0.021) as independent risk factors for overall postoperative complications. Compliance with nutritional supplements (OR=0.37; 95% CI: 0.14-0.97, p=0.041) and supplementation of malnourished patients as assessed by nutritional specialists (OR=0.24; 95% CI: 0.08-0.69, p=0.009) were independently associated with decreased infectious complications. CONCLUSIONS: Nutritional support based upon NRS-2002 screening might result in overnutrition, with potentially deleterious clinical consequences. We emphasize the importance of detailed assessment of the nutritional status by a dedicated specialist before deciding on early nutritional intervention for patients with an initial NRS-2002 score of ≥3.