924 resultados para Multivariate Adaptive Regression Splines (MARS)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 62G08, 62P30.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sediment samples and hydrographic conditions were studied at 28 stations around Iceland. At these sites, Conductivity-Temperature-Depth (CTD) casts were conducted to collect hydrographic data and multicorer casts were conductd to collect data on sediment characteristics including grain size distribution, carbon and nitrogen concentration, and chloroplastic pigment concentration. A total of 14 environmental predictors were used to model sediment characteristics around Iceland on regional geographic space. For these, two approaches were used: Multivariate Adaptation Regression Splines (MARS) and randomForest regression models. RandomForest outperformed MARS in predicting grain size distribution. MARS models had a greater tendency to over- and underpredict sediment values in areas outside the environmental envelope defined by the training dataset. We provide first GIS layers on sediment characteristics around Iceland, that can be used as predictors in future models. Although models performed well, more samples, especially from the shelf areas, will be needed to improve the models in future.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Within the regression framework, we show how different levels of nonlinearity influence the instantaneous firing rate prediction of single neurons. Nonlinearity can be achieved in several ways. In particular, we can enrich the predictor set with basis expansions of the input variables (enlarging the number of inputs) or train a simple but different model for each area of the data domain. Spline-based models are popular within the first category. Kernel smoothing methods fall into the second category. Whereas the first choice is useful for globally characterizing complex functions, the second is very handy for temporal data and is able to include inner-state subject variations. Also, interactions among stimuli are considered. We compare state-of-the-art firing rate prediction methods with some more sophisticated spline-based nonlinear methods: multivariate adaptive regression splines and sparse additive models. We also study the impact of kernel smoothing. Finally, we explore the combination of various local models in an incremental learning procedure. Our goal is to demonstrate that appropriate nonlinearity treatment can greatly improve the results. We test our hypothesis on both synthetic data and real neuronal recordings in cat primary visual cortex, giving a plausible explanation of the results from a biological perspective.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

1Recent studies demonstrated the sensitivity of northern forest ecosystems to changes in the amount and duration of snow cover at annual to decadal time scales. However, the consequences of snowfall variability remain uncertain for ecological variables operating at longer time scales, especially the distributions of forest communities. 2The Great Lakes region of North America offers a unique setting to examine the long-term effects of variable snowfall on forest communities. Lake-effect snow produces a three-fold gradient in annual snowfall over tens of kilometres, and dramatic edaphic variations occur among landform types resulting from Quaternary glaciations. We tested the hypothesis that these factors interact to control the distributions of mesic (dominated by Acer saccharum, Tsuga canadensis and Fagus grandifolia) and xeric forests (dominated by Pinus and Quercus spp.) in northern Lower Michigan. 3We compiled pre-European-settlement vegetation data and overlaid these data with records of climate, water balance and soil, onto Landtype Association polygons in a geographical information system. We then used multivariate adaptive regression splines to model the abundance of mesic vegetation in relation to environmental controls. 4Snowfall is the most predictive among five variables retained by our model, and it affects model performance 29% more than soil texture, the second most important variable. The abundance of mesic trees is high on fine-textured soils regardless of snowfall, but it increases with snowfall on coarse-textured substrates. Lake-effect snowfall also determines the species composition within mesic forests. The weighted importance of A. saccharum is significantly greater than of T. canadensis or F. grandifolia within the lake-effect snowbelt, whereas T. canadensis is more plentiful outside the snowbelt. These patterns are probably driven by the influence of snowfall on soil moisture, nutrient availability and fire return intervals. 5Our results imply that a key factor dictating the spatio-temporal patterns of forest communities in the vast region around the Great Lakes is how the lake-effect snowfall regime responds to global change. Snowfall reductions will probably cause a major decrease in the abundance of ecologically and economically important species, such as A. saccharum.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes a numerically simple routine for locally adaptive smoothing. The locally heterogeneous regression function is modelled as a penalized spline with a smoothly varying smoothing parameter modelled as another penalized spline. This is being formulated as hierarchical mixed model, with spline coe±cients following a normal distribution, which by itself has a smooth structure over the variances. The modelling exercise is in line with Baladandayuthapani, Mallick & Carroll (2005) or Crainiceanu, Ruppert & Carroll (2006). But in contrast to these papers Laplace's method is used for estimation based on the marginal likelihood. This is numerically simple and fast and provides satisfactory results quickly. We also extend the idea to spatial smoothing and smoothing in the presence of non normal response.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genetic polymorphisms in deoxyribonucleic acid coding regions may have a phenotypic effect on the carrier, e.g. by influencing susceptibility to disease. Detection of deleterious mutations via association studies is hampered by the large number of candidate sites; therefore methods are needed to narrow down the search to the most promising sites. For this, a possible approach is to use structural and sequence-based information of the encoded protein to predict whether a mutation at a particular site is likely to disrupt the functionality of the protein itself. We propose a hierarchical Bayesian multivariate adaptive regression spline (BMARS) model for supervised learning in this context and assess its predictive performance by using data from mutagenesis experiments on lac repressor and lysozyme proteins. In these experiments, about 12 amino-acid substitutions were performed at each native amino-acid position and the effect on protein functionality was assessed. The training data thus consist of repeated observations at each position, which the hierarchical framework is needed to account for. The model is trained on the lac repressor data and tested on the lysozyme mutations and vice versa. In particular, we show that the hierarchical BMARS model, by allowing for the clustered nature of the data, yields lower out-of-sample misclassification rates compared with both a BMARS and a frequen-tist MARS model, a support vector machine classifier and an optimally pruned classification tree.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper derives the second-order biases Of maximum likelihood estimates from a multivariate normal model where the mean vector and the covariance matrix have parameters in common. We show that the second order bias can always be obtained by means of ordinary weighted least-squares regressions. We conduct simulation studies which indicate that the bias correction scheme yields nearly unbiased estimators. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Adaptive autoregressive (AAR) modeling of the EEG time series and the AAR parameters has been widely used in Brain computer interface (BCI) systems as input features for the classification stage. Multivariate adaptive autoregressive modeling (MVAAR) also has been used in literature. This paper revisits the use of MVAAR models and propose the use of adaptive Kalman filter (AKF) for estimating the MVAAR parameters as features in a motor imagery BCI application. The AKF approach is compared to the alternative short time moving window (STMW) MVAAR parameter estimation approach. Though the two MVAAR methods show a nearly equal classification accuracy, the AKF possess the advantage of higher estimation update rates making it easily adoptable for on-line BCI systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes the use of empirical modeling techniques for building microarchitecture sensitive models for compiler optimizations. The models we build relate program performance to settings of compiler optimization flags, associated heuristics and key microarchitectural parameters. Unlike traditional analytical modeling methods, this relationship is learned entirely from data obtained by measuring performance at a small number of carefully selected compiler/microarchitecture configurations. We evaluate three different learning techniques in this context viz. linear regression, adaptive regression splines and radial basis function networks. We use the generated models to a) predict program performance at arbitrary compiler/microarchitecture configurations, b) quantify the significance of complex interactions between optimizations and the microarchitecture, and c) efficiently search for'optimal' settings of optimization flags and heuristics for any given microarchitectural configuration. Our evaluation using benchmarks from the SPEC CPU2000 suits suggests that accurate models (< 5% average error in prediction) can be generated using a reasonable number of simulations. We also find that using compiler settings prescribed by a model-based search can improve program performance by as much as 19% (with an average of 9.5%) over highly optimized binaries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Die zunehmende Vernetzung der Informations- und Kommunikationssysteme führt zu einer weiteren Erhöhung der Komplexität und damit auch zu einer weiteren Zunahme von Sicherheitslücken. Klassische Schutzmechanismen wie Firewall-Systeme und Anti-Malware-Lösungen bieten schon lange keinen Schutz mehr vor Eindringversuchen in IT-Infrastrukturen. Als ein sehr wirkungsvolles Instrument zum Schutz gegenüber Cyber-Attacken haben sich hierbei die Intrusion Detection Systeme (IDS) etabliert. Solche Systeme sammeln und analysieren Informationen von Netzwerkkomponenten und Rechnern, um ungewöhnliches Verhalten und Sicherheitsverletzungen automatisiert festzustellen. Während signatur-basierte Ansätze nur bereits bekannte Angriffsmuster detektieren können, sind anomalie-basierte IDS auch in der Lage, neue bisher unbekannte Angriffe (Zero-Day-Attacks) frühzeitig zu erkennen. Das Kernproblem von Intrusion Detection Systeme besteht jedoch in der optimalen Verarbeitung der gewaltigen Netzdaten und der Entwicklung eines in Echtzeit arbeitenden adaptiven Erkennungsmodells. Um diese Herausforderungen lösen zu können, stellt diese Dissertation ein Framework bereit, das aus zwei Hauptteilen besteht. Der erste Teil, OptiFilter genannt, verwendet ein dynamisches "Queuing Concept", um die zahlreich anfallenden Netzdaten weiter zu verarbeiten, baut fortlaufend Netzverbindungen auf, und exportiert strukturierte Input-Daten für das IDS. Den zweiten Teil stellt ein adaptiver Klassifikator dar, der ein Klassifikator-Modell basierend auf "Enhanced Growing Hierarchical Self Organizing Map" (EGHSOM), ein Modell für Netzwerk Normalzustand (NNB) und ein "Update Model" umfasst. In dem OptiFilter werden Tcpdump und SNMP traps benutzt, um die Netzwerkpakete und Hostereignisse fortlaufend zu aggregieren. Diese aggregierten Netzwerkpackete und Hostereignisse werden weiter analysiert und in Verbindungsvektoren umgewandelt. Zur Verbesserung der Erkennungsrate des adaptiven Klassifikators wird das künstliche neuronale Netz GHSOM intensiv untersucht und wesentlich weiterentwickelt. In dieser Dissertation werden unterschiedliche Ansätze vorgeschlagen und diskutiert. So wird eine classification-confidence margin threshold definiert, um die unbekannten bösartigen Verbindungen aufzudecken, die Stabilität der Wachstumstopologie durch neuartige Ansätze für die Initialisierung der Gewichtvektoren und durch die Stärkung der Winner Neuronen erhöht, und ein selbst-adaptives Verfahren eingeführt, um das Modell ständig aktualisieren zu können. Darüber hinaus besteht die Hauptaufgabe des NNB-Modells in der weiteren Untersuchung der erkannten unbekannten Verbindungen von der EGHSOM und der Überprüfung, ob sie normal sind. Jedoch, ändern sich die Netzverkehrsdaten wegen des Concept drif Phänomens ständig, was in Echtzeit zur Erzeugung nicht stationärer Netzdaten führt. Dieses Phänomen wird von dem Update-Modell besser kontrolliert. Das EGHSOM-Modell kann die neuen Anomalien effektiv erkennen und das NNB-Model passt die Änderungen in Netzdaten optimal an. Bei den experimentellen Untersuchungen hat das Framework erfolgversprechende Ergebnisse gezeigt. Im ersten Experiment wurde das Framework in Offline-Betriebsmodus evaluiert. Der OptiFilter wurde mit offline-, synthetischen- und realistischen Daten ausgewertet. Der adaptive Klassifikator wurde mit dem 10-Fold Cross Validation Verfahren evaluiert, um dessen Genauigkeit abzuschätzen. Im zweiten Experiment wurde das Framework auf einer 1 bis 10 GB Netzwerkstrecke installiert und im Online-Betriebsmodus in Echtzeit ausgewertet. Der OptiFilter hat erfolgreich die gewaltige Menge von Netzdaten in die strukturierten Verbindungsvektoren umgewandelt und der adaptive Klassifikator hat sie präzise klassifiziert. Die Vergleichsstudie zwischen dem entwickelten Framework und anderen bekannten IDS-Ansätzen zeigt, dass der vorgeschlagene IDSFramework alle anderen Ansätze übertrifft. Dies lässt sich auf folgende Kernpunkte zurückführen: Bearbeitung der gesammelten Netzdaten, Erreichung der besten Performanz (wie die Gesamtgenauigkeit), Detektieren unbekannter Verbindungen und Entwicklung des in Echtzeit arbeitenden Erkennungsmodells von Eindringversuchen.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modeling the spatial variability that exists in pavement systems can be conveniently represented by means of random fields; in this study, a probabilistic analysis that considers the spatial variability, including the anisotropic nature of the pavement layer properties, is presented. The integration of the spatially varying log-normal random fields into a linear-elastic finite difference analysis has been achieved through the expansion optimal linear estimation method. For the estimation of the critical pavement responses, metamodels based on polynomial chaos expansion (PCE) are developed to replace the computationally expensive finite-difference model. The sparse polynomial chaos expansion based on an adaptive regression-based algorithm, and enhanced by the combined use of the global sensitivity analysis (GSA) is used, with significant savings in computational effort. The effect of anisotropy in each layer on the pavement responses was studied separately, and an effort is made to identify the pavement layer wherein the introduction of anisotropic characteristics results in the most significant impact on the critical strains. It is observed that the anisotropy in the base layer has a significant but diverse effect on both critical strains. While the compressive strain tends to be considerably higher than that observed for the isotropic section, the tensile strains show a decrease in the mean value with the introduction of base-layer anisotropy. Furthermore, asphalt-layer anisotropy also tends to decrease the critical tensile strain while having little effect on the critical compressive strain. (C) 2015 American Society of Civil Engineers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Software metrics are the key tool in software quality management. In this paper, we propose to use support vector machines for regression applied to software metrics to predict software quality. In experiments we compare this method with other regression techniques such as Multivariate Linear Regression, Conjunctive Rule and Locally Weighted Regression. Results on benchmark dataset MIS, using mean absolute error, and correlation coefficient as regression performance measures, indicate that support vector machines regression is a promising technique for software quality prediction. In addition, our investigation of PCA based metrics extraction shows that using the first few Principal Components (PC) we can still get relatively good performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this study was to describe the nonlinear association between body mass index (BMI) and breast cancer outcomes and to determine whether BMI improves prediction of outcomes. A cohort of906 breast cancer patients diagnosed at Henry Ford Health System, Detroit (1985-1990) were studied. The median follow-up was 10 years. Multivariate logistic regression was used to model breast cancer recurrence/progression and breast cancer-specific death. Restricted cubic splines were used to model nonlinear effects. Receiver operator characteristic areas under the curves (ROC AUC) were used to evaluate prediction. BMI was nonlinearly associated with recurrence/progression and death (p= 0.0230 and 0.0101). Probability of outcomes increased with increase or decrease ofBMI away from 25. BMI splines were suggestive of improved prediction of death. The ROC AUCs for nested models with and without BMI were 0.8424 and 0.8331 (p= 0.08). I f causally associated, modifying patients BMI towards 25 may improve outcomes.