971 resultados para ALS data-set


Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper introduces a new neurofuzzy model construction and parameter estimation algorithm from observed finite data sets, based on a Takagi and Sugeno (T-S) inference mechanism and a new extended Gram-Schmidt orthogonal decomposition algorithm, for the modeling of a priori unknown dynamical systems in the form of a set of fuzzy rules. The first contribution of the paper is the introduction of a one to one mapping between a fuzzy rule-base and a model matrix feature subspace using the T-S inference mechanism. This link enables the numerical properties associated with a rule-based matrix subspace, the relationships amongst these matrix subspaces, and the correlation between the output vector and a rule-base matrix subspace, to be investigated and extracted as rule-based knowledge to enhance model transparency. The matrix subspace spanned by a fuzzy rule is initially derived as the input regression matrix multiplied by a weighting matrix that consists of the corresponding fuzzy membership functions over the training data set. Model transparency is explored by the derivation of an equivalence between an A-optimality experimental design criterion of the weighting matrix and the average model output sensitivity to the fuzzy rule, so that rule-bases can be effectively measured by their identifiability via the A-optimality experimental design criterion. The A-optimality experimental design criterion of the weighting matrices of fuzzy rules is used to construct an initial model rule-base. An extended Gram-Schmidt algorithm is then developed to estimate the parameter vector for each rule. This new algorithm decomposes the model rule-bases via an orthogonal subspace decomposition approach, so as to enhance model transparency with the capability of interpreting the derived rule-base energy level. This new approach is computationally simpler than the conventional Gram-Schmidt algorithm for resolving high dimensional regression problems, whereby it is computationally desirable to decompose complex models into a few submodels rather than a single model with large number of input variables and the associated curse of dimensionality problem. Numerical examples are included to demonstrate the effectiveness of the proposed new algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A fundamental principle in practical nonlinear data modeling is the parsimonious principle of constructing the minimal model that explains the training data well. Leave-one-out (LOO) cross validation is often used to estimate generalization errors by choosing amongst different network architectures (M. Stone, "Cross validatory choice and assessment of statistical predictions", J. R. Stast. Soc., Ser. B, 36, pp. 117-147, 1974). Based upon the minimization of LOO criteria of either the mean squares of LOO errors or the LOO misclassification rate respectively, we present two backward elimination algorithms as model post-processing procedures for regression and classification problems. The proposed backward elimination procedures exploit an orthogonalization procedure to enable the orthogonality between the subspace as spanned by the pruned model and the deleted regressor. Subsequently, it is shown that the LOO criteria used in both algorithms can be calculated via some analytic recursive formula, as derived in this contribution, without actually splitting the estimation data set so as to reduce computational expense. Compared to most other model construction methods, the proposed algorithms are advantageous in several aspects; (i) There are no tuning parameters to be optimized through an extra validation data set; (ii) The procedure is fully automatic without an additional stopping criteria; and (iii) The model structure selection is directly based on model generalization performance. The illustrative examples on regression and classification are used to demonstrate that the proposed algorithms are viable post-processing methods to prune a model to gain extra sparsity and improved generalization.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This letter introduces a new robust nonlinear identification algorithm using the Predicted REsidual Sums of Squares (PRESS) statistic and for-ward regression. The major contribution is to compute the PRESS statistic within a framework of a forward orthogonalization process and hence construct a model with a good generalization property. Based on the properties of the PRESS statistic the proposed algorithm can achieve a fully automated procedure without resort to any other validation data set for iterative model evaluation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An automatic nonlinear predictive model-construction algorithm is introduced based on forward regression and the predicted-residual-sums-of-squares (PRESS) statistic. The proposed algorithm is based on the fundamental concept of evaluating a model's generalisation capability through crossvalidation. This is achieved by using the PRESS statistic as a cost function to optimise model structure. In particular, the proposed algorithm is developed with the aim of achieving computational efficiency, such that the computational effort, which would usually be extensive in the computation of the PRESS statistic, is reduced or minimised. The computation of PRESS is simplified by avoiding a matrix inversion through the use of the orthogonalisation procedure inherent in forward regression, and is further reduced significantly by the introduction of a forward-recursive formula. Based on the properties of the PRESS statistic, the proposed algorithm can achieve a fully automated procedure without resort to any other validation data set for iterative model evaluation. Numerical examples are used to demonstrate the efficacy of the algorithm.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The River Lugg has particular problems with high sediment loads that have resulted in detrimental impacts on ecology and fisheries. A new dynamic, process-based model of hydrology and sediments (INCA- SED) has been developed and applied to the River Lugg system using an extensive data set from 1995–2008. The model simulates sediment sources and sinks throughout the catchment and gives a good representation of the sediment response at 22 reaches along the River Lugg. A key question considered in using the model is the management of sediment sources so that concentrations and bed loads can be reduced in the river system. Altogether, five sediment management scenarios were selected for testing on the River Lugg, including land use change, contour tillage, hedging and buffer strips. Running the model with parameters altered to simulate these five scenarios produced some interesting results. All scenarios achieved some reduction in sediment levels, with the 40% land use change achieving the best result with a 19% reduction. The other scenarios also achieved significant reductions of between 7% and 9%. Buffer strips produce the best result at close to 9%. The results suggest that if hedge introduction, contour tillage and buffer strips were all applied, sediment reductions would total 24%, considerably improving the current sediment situation. We present a novel cost-effectiveness analysis of our results where we use percentage of land removed from production as our cost function. Given the minimal loss of land associated with contour tillage, hedges and buffer strips, we suggest that these management practices are the most cost-effective combination to reduce sediment loads.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Two different ways of performing low-energy electron diffraction (LEED) structure determinations for the p(2 x 2) structure of oxygen on Ni {111} are compared: a conventional LEED-IV structure analysis using integer and fractional-order IV-curves collected at normal incidence and an analysis using only integer-order IV-curves collected at three different angles of incidence. A clear discrimination between different adsorption sites can be achieved by the latter approach as well as the first and the best fit structures of both analyses are within each other's error bars (all less than 0.1 angstrom). The conventional analysis is more sensitive to the adsorbate coordinates and lateral parameters of the substrate atoms whereas the integer-order-based analysis is more sensitive to the vertical coordinates of substrate atoms. Adsorbate-related contributions to the intensities of integer-order diffraction spots are independent of the state of long-range order in the adsorbate layer. These results show, therefore, that for lattice-gas disordered adsorbate layers, for which only integer-order spots are observed, similar accuracy and reliability can be achieved as for ordered adsorbate layers, provided the data set is large enough.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An assessment of aerosol-cloud interactions (ACI) from ground-based remote sensing under coastal stratiform clouds is presented. The assessment utilizes a long-term, high temporal resolution data set from the Atmospheric Radiation Measurement (ARM) Program deployment at Pt. Reyes, California, United States, in 2005 to provide statistically robust measures of ACI and to characterize the variability of the measures based on variability in environmental conditions and observational approaches. The average ACIN (= dlnNd/dlna, the change in cloud drop number concentration with aerosol concentration) is 0.48, within a physically plausible range of 0–1.0. Values vary between 0.18 and 0.69 with dependence on (1) the assumption of constant cloud liquid water path (LWP), (2) the relative value of cloud LWP, (3) methods for retrieving Nd, (4) aerosol size distribution, (5) updraft velocity, and (6) the scale and resolution of observations. The sensitivity of the local, diurnally averaged radiative forcing to this variability in ACIN values, assuming an aerosol perturbation of 500 c-3 relative to a background concentration of 100 cm-3, ranges betwee-4 and -9 W -2. Further characterization of ACI and its variability is required to reduce uncertainties in global radiative forcing estimates.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A first step in interpreting the wide variation in trace gas concentrations measured over time at a given site is to classify the data according to the prevailing weather conditions. In order to classify measurements made during two intensive field campaigns at Mace Head, on the west coast of Ireland, an objective method of assigning data to different weather types has been developed. Air-mass back trajectories calculated using winds from ECMWF analyses, arriving at the site in 1995–1997, were allocated to clusters based on a statistical analysis of the latitude, longitude and pressure of the trajectory at 12 h intervals over 5 days. The robustness of the analysis was assessed by using an ensemble of back trajectories calculated for four points around Mace Head. Separate analyses were made for each of the 3 years, and for four 3-month periods. The use of these clusters in classifying ground-based ozone measurements at Mace Head is described, including the need to exclude data which have been influenced by local perturbations to the regional flow pattern, for example, by sea breezes. Even with a limited data set, based on 2 months of intensive field measurements in 1996 and 1997, there are statistically significant differences in ozone concentrations in air from the different clusters. The limitations of this type of analysis for classification and interpretation of ground-based chemistry measurements are discussed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Peak picking is an early key step in MS data analysis. We compare three commonly used approaches to peak picking and discuss their merits by means of statistical analysis. Methods investigated encompass signal-to-noise ratio, continuous wavelet transform, and a correlation-based approach using a Gaussian template. Functionality of the three methods is illustrated and discussed in a practical context using a mass spectral data set created with MALDI-TOF technology. Sensitivity and specificity are investigated using a manually defined reference set of peaks. As an additional criterion, the robustness of the three methods is assessed by a perturbation analysis and illustrated using ROC curves.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A new structure of Radial Basis Function (RBF) neural network called the Dual-orthogonal RBF Network (DRBF) is introduced for nonlinear time series prediction. The hidden nodes of a conventional RBF network compare the Euclidean distance between the network input vector and the centres, and the node responses are radially symmetrical. But in time series prediction where the system input vectors are lagged system outputs, which are usually highly correlated, the Euclidean distance measure may not be appropriate. The DRBF network modifies the distance metric by introducing a classification function which is based on the estimation data set. Training the DRBF networks consists of two stages. Learning the classification related basis functions and the important input nodes, followed by selecting the regressors and learning the weights of the hidden nodes. In both cases, a forward Orthogonal Least Squares (OLS) selection procedure is applied, initially to select the important input nodes and then to select the important centres. Simulation results of single-step and multi-step ahead predictions over a test data set are included to demonstrate the effectiveness of the new approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Techniques for the coherent generation and detection of electromagnetic radiation in the far infrared, or terahertz, region of the electromagnetic spectrum have recently developed rapidly and may soon be applied for in vivo medical imaging. Both continuous wave and pulsed imaging systems are under development, with terahertz pulsed imaging being the more common method. Typically a pump and probe technique is used, with picosecond pulses of terahertz radiation generated from femtosecond infrared laser pulses, using an antenna or nonlinear crystal. After interaction with the subject either by transmission or reflection, coherent detection is achieved when the terahertz beam is combined with the probe laser beam. Raster scanning of the subject leads to an image data set comprising a time series representing the pulse at each pixel. A set of parametric images may be calculated, mapping the values of various parameters calculated from the shape of the pulses. A safety analysis has been performed, based on current guidelines for skin exposure to radiation of wavelengths 2.6 µm–20 mm (15 GHz–115 THz), to determine the maximum permissible exposure (MPE) for such a terahertz imaging system. The international guidelines for this range of wavelengths are drawn from two U.S. standards documents. The method for this analysis was taken from the American National Standard for the Safe Use of Lasers (ANSI Z136.1), and to ensure a conservative analysis, parameters were drawn from both this standard and from the IEEE Standard for Safety Levels with Respect to Human Exposure to Radio Frequency Electromagnetic Fields (C95.1). The calculated maximum permissible average beam power was 3 mW, indicating that typical terahertz imaging systems are safe according to the current guidelines. Further developments may however result in systems that will exceed the calculated limit. Furthermore, the published MPEs for pulsed exposures are based on measurements at shorter wavelengths and with pulses of longer duration than those used in terahertz pulsed imaging systems, so the results should be treated with caution.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We provide a unified framework for a range of linear transforms that can be used for the analysis of terahertz spectroscopic data, with particular emphasis on their application to the measurement of leaf water content. The use of linear transforms for filtering, regression, and classification is discussed. For illustration, a classification problem involving leaves at three stages of drought and a prediction problem involving simulated spectra are presented. Issues resulting from scaling the data set are discussed. Using Lagrange multipliers, we arrive at the transform that yields the maximum separation between the spectra and show that this optimal transform is equivalent to computing the Euclidean distance between the samples. The optimal linear transform is compared with the average for all the spectra as well as with the Karhunen–Loève transform to discriminate a wet leaf from a dry leaf. We show that taking several principal components into account is equivalent to defining new axes in which data are to be analyzed. The procedure shows that the coefficients of the Karhunen–Loève transform are well suited to the process of classification of spectra. This is in line with expectations, as these coefficients are built from the statistical properties of the data set analyzed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Changes in climate variability and, in particular, changes in extreme climate events are likely to be of far more significance for environmentally vulnerable regions than changes in the mean state. It is generally accepted that sea-surface temperatures (SSTs) play an important role in modulating rainfall variability. Consequently, SSTs can be prescribed in global and regional climate modelling in order to study the physical mechanisms behind rainfall and its extremes. Using a satellite-based daily rainfall historical data set, this paper describes the main patterns of rainfall variability over southern Africa, identifies the dates when extreme rainfall occurs within these patterns, and shows the effect of resolution in trying to identify the location and intensity of SST anomalies associated with these extremes in the Atlantic and southwest Indian Ocean. Derived from a Principal Component Analysis (PCA), the results also suggest that, for the spatial pattern accounting for the highest amount of variability, extremes extracted at a higher spatial resolution do give a clearer indication regarding the location and intensity of anomalous SST regions. As the amount of variability explained by each spatial pattern defined by the PCA decreases, it would appear that extremes extracted at a lower resolution give a clearer indication of anomalous SST regions.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An investigation into the speciation and occurrence of nine haloacetic acids (HAAs) was conducted during the period of April 2007 to March 2008 and involved three drinking water supply systems in England, which were chosen to represent a range of source water conditions; these were an upland surface water, a lowland surface water and a groundwater. Samples were collected seasonally from the water treatment plants and at different locations in the distribution systems. The highest HAA concentrations occurred in the upland surface water system, with an average total HAA concentration of 21.3 μg/L. The lowest HAA levels were observed in the groundwater source, with a mean concentration of 0.6 μg/L. Seasonal variations were significant in the HAA concentrations; the highest total HAA concentrations were found during the autumn, when the concentrations were approximately two times higher than in winter and spring. HAA speciation varied among the water sources, with dichloroacetic acid and trichloroacetic acid dominant in the lowland surface water system and brominated species dominant in the upland surface water system. There was a strong correlation between trihalomethanes and HAAs when considering all samples from the three systems in the same data set (r2=0.88); however, the correlation was poor/moderate when considering each system independently.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The combination of the synthetic minority oversampling technique (SMOTE) and the radial basis function (RBF) classifier is proposed to deal with classification for imbalanced two-class data. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is applied to generate synthetic instances for the positive class to balance the training data set. Based on the over-sampled training data, the RBF classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier structure and the parameters of RBF kernels are determined using a particle swarm optimization algorithm based on the criterion of minimizing the leave-one-out misclassification rate. The experimental results on both simulated and real imbalanced data sets are presented to demonstrate the effectiveness of our proposed algorithm.