971 resultados para DATA SET
Resumo:
An automatic nonlinear predictive model-construction algorithm is introduced based on forward regression and the predicted-residual-sums-of-squares (PRESS) statistic. The proposed algorithm is based on the fundamental concept of evaluating a model's generalisation capability through crossvalidation. This is achieved by using the PRESS statistic as a cost function to optimise model structure. In particular, the proposed algorithm is developed with the aim of achieving computational efficiency, such that the computational effort, which would usually be extensive in the computation of the PRESS statistic, is reduced or minimised. The computation of PRESS is simplified by avoiding a matrix inversion through the use of the orthogonalisation procedure inherent in forward regression, and is further reduced significantly by the introduction of a forward-recursive formula. Based on the properties of the PRESS statistic, the proposed algorithm can achieve a fully automated procedure without resort to any other validation data set for iterative model evaluation. Numerical examples are used to demonstrate the efficacy of the algorithm.
Modelling sediment supply and transport in the River Lugg: strategies for controlling sediment loads
Resumo:
The River Lugg has particular problems with high sediment loads that have resulted in detrimental impacts on ecology and fisheries. A new dynamic, process-based model of hydrology and sediments (INCA- SED) has been developed and applied to the River Lugg system using an extensive data set from 1995–2008. The model simulates sediment sources and sinks throughout the catchment and gives a good representation of the sediment response at 22 reaches along the River Lugg. A key question considered in using the model is the management of sediment sources so that concentrations and bed loads can be reduced in the river system. Altogether, five sediment management scenarios were selected for testing on the River Lugg, including land use change, contour tillage, hedging and buffer strips. Running the model with parameters altered to simulate these five scenarios produced some interesting results. All scenarios achieved some reduction in sediment levels, with the 40% land use change achieving the best result with a 19% reduction. The other scenarios also achieved significant reductions of between 7% and 9%. Buffer strips produce the best result at close to 9%. The results suggest that if hedge introduction, contour tillage and buffer strips were all applied, sediment reductions would total 24%, considerably improving the current sediment situation. We present a novel cost-effectiveness analysis of our results where we use percentage of land removed from production as our cost function. Given the minimal loss of land associated with contour tillage, hedges and buffer strips, we suggest that these management practices are the most cost-effective combination to reduce sediment loads.
Resumo:
Two different ways of performing low-energy electron diffraction (LEED) structure determinations for the p(2 x 2) structure of oxygen on Ni {111} are compared: a conventional LEED-IV structure analysis using integer and fractional-order IV-curves collected at normal incidence and an analysis using only integer-order IV-curves collected at three different angles of incidence. A clear discrimination between different adsorption sites can be achieved by the latter approach as well as the first and the best fit structures of both analyses are within each other's error bars (all less than 0.1 angstrom). The conventional analysis is more sensitive to the adsorbate coordinates and lateral parameters of the substrate atoms whereas the integer-order-based analysis is more sensitive to the vertical coordinates of substrate atoms. Adsorbate-related contributions to the intensities of integer-order diffraction spots are independent of the state of long-range order in the adsorbate layer. These results show, therefore, that for lattice-gas disordered adsorbate layers, for which only integer-order spots are observed, similar accuracy and reliability can be achieved as for ordered adsorbate layers, provided the data set is large enough.
An assessment of aerosol‐cloud interactions in marine stratus clouds based on surface remote sensing
Resumo:
An assessment of aerosol-cloud interactions (ACI) from ground-based remote sensing under coastal stratiform clouds is presented. The assessment utilizes a long-term, high temporal resolution data set from the Atmospheric Radiation Measurement (ARM) Program deployment at Pt. Reyes, California, United States, in 2005 to provide statistically robust measures of ACI and to characterize the variability of the measures based on variability in environmental conditions and observational approaches. The average ACIN (= dlnNd/dlna, the change in cloud drop number concentration with aerosol concentration) is 0.48, within a physically plausible range of 0–1.0. Values vary between 0.18 and 0.69 with dependence on (1) the assumption of constant cloud liquid water path (LWP), (2) the relative value of cloud LWP, (3) methods for retrieving Nd, (4) aerosol size distribution, (5) updraft velocity, and (6) the scale and resolution of observations. The sensitivity of the local, diurnally averaged radiative forcing to this variability in ACIN values, assuming an aerosol perturbation of 500 c-3 relative to a background concentration of 100 cm-3, ranges betwee-4 and -9 W -2. Further characterization of ACI and its variability is required to reduce uncertainties in global radiative forcing estimates.
Resumo:
A first step in interpreting the wide variation in trace gas concentrations measured over time at a given site is to classify the data according to the prevailing weather conditions. In order to classify measurements made during two intensive field campaigns at Mace Head, on the west coast of Ireland, an objective method of assigning data to different weather types has been developed. Air-mass back trajectories calculated using winds from ECMWF analyses, arriving at the site in 1995–1997, were allocated to clusters based on a statistical analysis of the latitude, longitude and pressure of the trajectory at 12 h intervals over 5 days. The robustness of the analysis was assessed by using an ensemble of back trajectories calculated for four points around Mace Head. Separate analyses were made for each of the 3 years, and for four 3-month periods. The use of these clusters in classifying ground-based ozone measurements at Mace Head is described, including the need to exclude data which have been influenced by local perturbations to the regional flow pattern, for example, by sea breezes. Even with a limited data set, based on 2 months of intensive field measurements in 1996 and 1997, there are statistically significant differences in ozone concentrations in air from the different clusters. The limitations of this type of analysis for classification and interpretation of ground-based chemistry measurements are discussed.
Resumo:
Peak picking is an early key step in MS data analysis. We compare three commonly used approaches to peak picking and discuss their merits by means of statistical analysis. Methods investigated encompass signal-to-noise ratio, continuous wavelet transform, and a correlation-based approach using a Gaussian template. Functionality of the three methods is illustrated and discussed in a practical context using a mass spectral data set created with MALDI-TOF technology. Sensitivity and specificity are investigated using a manually defined reference set of peaks. As an additional criterion, the robustness of the three methods is assessed by a perturbation analysis and illustrated using ROC curves.
Resumo:
A new structure of Radial Basis Function (RBF) neural network called the Dual-orthogonal RBF Network (DRBF) is introduced for nonlinear time series prediction. The hidden nodes of a conventional RBF network compare the Euclidean distance between the network input vector and the centres, and the node responses are radially symmetrical. But in time series prediction where the system input vectors are lagged system outputs, which are usually highly correlated, the Euclidean distance measure may not be appropriate. The DRBF network modifies the distance metric by introducing a classification function which is based on the estimation data set. Training the DRBF networks consists of two stages. Learning the classification related basis functions and the important input nodes, followed by selecting the regressors and learning the weights of the hidden nodes. In both cases, a forward Orthogonal Least Squares (OLS) selection procedure is applied, initially to select the important input nodes and then to select the important centres. Simulation results of single-step and multi-step ahead predictions over a test data set are included to demonstrate the effectiveness of the new approach.
Resumo:
Techniques for the coherent generation and detection of electromagnetic radiation in the far infrared, or terahertz, region of the electromagnetic spectrum have recently developed rapidly and may soon be applied for in vivo medical imaging. Both continuous wave and pulsed imaging systems are under development, with terahertz pulsed imaging being the more common method. Typically a pump and probe technique is used, with picosecond pulses of terahertz radiation generated from femtosecond infrared laser pulses, using an antenna or nonlinear crystal. After interaction with the subject either by transmission or reflection, coherent detection is achieved when the terahertz beam is combined with the probe laser beam. Raster scanning of the subject leads to an image data set comprising a time series representing the pulse at each pixel. A set of parametric images may be calculated, mapping the values of various parameters calculated from the shape of the pulses. A safety analysis has been performed, based on current guidelines for skin exposure to radiation of wavelengths 2.6 µm–20 mm (15 GHz–115 THz), to determine the maximum permissible exposure (MPE) for such a terahertz imaging system. The international guidelines for this range of wavelengths are drawn from two U.S. standards documents. The method for this analysis was taken from the American National Standard for the Safe Use of Lasers (ANSI Z136.1), and to ensure a conservative analysis, parameters were drawn from both this standard and from the IEEE Standard for Safety Levels with Respect to Human Exposure to Radio Frequency Electromagnetic Fields (C95.1). The calculated maximum permissible average beam power was 3 mW, indicating that typical terahertz imaging systems are safe according to the current guidelines. Further developments may however result in systems that will exceed the calculated limit. Furthermore, the published MPEs for pulsed exposures are based on measurements at shorter wavelengths and with pulses of longer duration than those used in terahertz pulsed imaging systems, so the results should be treated with caution.
Resumo:
We provide a unified framework for a range of linear transforms that can be used for the analysis of terahertz spectroscopic data, with particular emphasis on their application to the measurement of leaf water content. The use of linear transforms for filtering, regression, and classification is discussed. For illustration, a classification problem involving leaves at three stages of drought and a prediction problem involving simulated spectra are presented. Issues resulting from scaling the data set are discussed. Using Lagrange multipliers, we arrive at the transform that yields the maximum separation between the spectra and show that this optimal transform is equivalent to computing the Euclidean distance between the samples. The optimal linear transform is compared with the average for all the spectra as well as with the Karhunen–Loève transform to discriminate a wet leaf from a dry leaf. We show that taking several principal components into account is equivalent to defining new axes in which data are to be analyzed. The procedure shows that the coefficients of the Karhunen–Loève transform are well suited to the process of classification of spectra. This is in line with expectations, as these coefficients are built from the statistical properties of the data set analyzed.
Resumo:
Changes in climate variability and, in particular, changes in extreme climate events are likely to be of far more significance for environmentally vulnerable regions than changes in the mean state. It is generally accepted that sea-surface temperatures (SSTs) play an important role in modulating rainfall variability. Consequently, SSTs can be prescribed in global and regional climate modelling in order to study the physical mechanisms behind rainfall and its extremes. Using a satellite-based daily rainfall historical data set, this paper describes the main patterns of rainfall variability over southern Africa, identifies the dates when extreme rainfall occurs within these patterns, and shows the effect of resolution in trying to identify the location and intensity of SST anomalies associated with these extremes in the Atlantic and southwest Indian Ocean. Derived from a Principal Component Analysis (PCA), the results also suggest that, for the spatial pattern accounting for the highest amount of variability, extremes extracted at a higher spatial resolution do give a clearer indication regarding the location and intensity of anomalous SST regions. As the amount of variability explained by each spatial pattern defined by the PCA decreases, it would appear that extremes extracted at a lower resolution give a clearer indication of anomalous SST regions.
Resumo:
An investigation into the speciation and occurrence of nine haloacetic acids (HAAs) was conducted during the period of April 2007 to March 2008 and involved three drinking water supply systems in England, which were chosen to represent a range of source water conditions; these were an upland surface water, a lowland surface water and a groundwater. Samples were collected seasonally from the water treatment plants and at different locations in the distribution systems. The highest HAA concentrations occurred in the upland surface water system, with an average total HAA concentration of 21.3 μg/L. The lowest HAA levels were observed in the groundwater source, with a mean concentration of 0.6 μg/L. Seasonal variations were significant in the HAA concentrations; the highest total HAA concentrations were found during the autumn, when the concentrations were approximately two times higher than in winter and spring. HAA speciation varied among the water sources, with dichloroacetic acid and trichloroacetic acid dominant in the lowland surface water system and brominated species dominant in the upland surface water system. There was a strong correlation between trihalomethanes and HAAs when considering all samples from the three systems in the same data set (r2=0.88); however, the correlation was poor/moderate when considering each system independently.
Resumo:
The combination of the synthetic minority oversampling technique (SMOTE) and the radial basis function (RBF) classifier is proposed to deal with classification for imbalanced two-class data. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is applied to generate synthetic instances for the positive class to balance the training data set. Based on the over-sampled training data, the RBF classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier structure and the parameters of RBF kernels are determined using a particle swarm optimization algorithm based on the criterion of minimizing the leave-one-out misclassification rate. The experimental results on both simulated and real imbalanced data sets are presented to demonstrate the effectiveness of our proposed algorithm.
Resumo:
Urban boundary layers (UBLs) can be highly complex due to the heterogeneous roughness and heating of the surface, particularly at night. Due to a general lack of observations, it is not clear whether canonical models of boundary layer mixing are appropriate in modelling air quality in urban areas. This paper reports Doppler lidar observations of turbulence profiles in the centre of London, UK, as part of the second REPARTEE campaign in autumn 2007. Lidar-measured standard deviation of vertical velocity averaged over 30 min intervals generally compared well with in situ sonic anemometer measurements at 190 m on the BT telecommunications Tower. During calm, nocturnal periods, the lidar underestimated turbulent mixing due mainly to limited sampling rate. Mixing height derived from the turbulence, and aerosol layer height from the backscatter profiles, showed similar diurnal cycles ranging from c. 300 to 800 m, increasing to c. 200 to 850 m under clear skies. The aerosol layer height was sometimes significantly different to the mixing height, particularly at night under clear skies. For convective and neutral cases, the scaled turbulence profiles resembled canonical results; this was less clear for the stable case. Lidar observations clearly showed enhanced mixing beneath stratocumulus clouds reaching down on occasion to approximately half daytime boundary layer depth. On one occasion the nocturnal turbulent structure was consistent with a nocturnal jet, suggesting a stable layer. Given the general agreement between observations and canonical turbulence profiles, mixing timescales were calculated for passive scalars released at street level to reach the BT Tower using existing models of turbulent mixing. It was estimated to take c. 10 min to diffuse up to 190 m, rising to between 20 and 50 min at night, depending on stability. Determination of mixing timescales is important when comparing to physico-chemical processes acting on pollutant species measured simultaneously at both the ground and at the BT Tower during the campaign. From the 3 week autumnal data-set there is evidence for occasional stable layers in central London, effectively decoupling surface emissions from air aloft.
Resumo:
We develop a database of 110 gradual solar energetic particle (SEP) events, over the period 1967–2006, providing estimates of event onset, duration, fluence, and peak flux for protons of energy E > 60 MeV. The database is established mainly from the energetic proton flux data distributed in the OMNI 2 data set; however, we also utilize the McMurdo neutron monitor and the energetic proton flux from GOES missions. To aid the development of the gradual SEP database, we establish a method with which the homogeneity of the energetic proton flux record is improved. A comparison between other SEP databases and the database developed here is presented which discusses the different algorithms used to define an event. Furthermore, we investigate the variation of gradual SEP occurrence and fluence with solar cycle phase, sunspot number (SSN), and interplanetary magnetic field intensity (Bmag) over solar cycles 20–23. We find that the occurrence and fluence of SEP events vary with the solar cycle phase. Correspondingly, we find a positive correlation between SEP occurrence and solar activity as determined by SSN and Bmag, while the mean fluence in individual events decreases with the same measures of solar activity. Therefore, although the number of events decreases when solar activity is low, the events that do occur at such times have higher fluence. Thus, large events such as the “Carrington flare” may be more likely at lower levels of solar activity. These results are discussed in the context of other similar investigations.
Resumo:
This contribution proposes a powerful technique for two-class imbalanced classification problems by combining the synthetic minority over-sampling technique (SMOTE) and the particle swarm optimisation (PSO) aided radial basis function (RBF) classifier. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is applied to generate synthetic instances for the positive class to balance the training data set. Based on the over-sampled training data, the RBF classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier's structure and the parameters of RBF kernels are determined using a PSO algorithm based on the criterion of minimising the leave-one-out misclassification rate. The experimental results obtained on a simulated imbalanced data set and three real imbalanced data sets are presented to demonstrate the effectiveness of our proposed algorithm.