190 resultados para DATA SET
Resumo:
We provide a unified framework for a range of linear transforms that can be used for the analysis of terahertz spectroscopic data, with particular emphasis on their application to the measurement of leaf water content. The use of linear transforms for filtering, regression, and classification is discussed. For illustration, a classification problem involving leaves at three stages of drought and a prediction problem involving simulated spectra are presented. Issues resulting from scaling the data set are discussed. Using Lagrange multipliers, we arrive at the transform that yields the maximum separation between the spectra and show that this optimal transform is equivalent to computing the Euclidean distance between the samples. The optimal linear transform is compared with the average for all the spectra as well as with the Karhunen–Loève transform to discriminate a wet leaf from a dry leaf. We show that taking several principal components into account is equivalent to defining new axes in which data are to be analyzed. The procedure shows that the coefficients of the Karhunen–Loève transform are well suited to the process of classification of spectra. This is in line with expectations, as these coefficients are built from the statistical properties of the data set analyzed.
Resumo:
Changes in climate variability and, in particular, changes in extreme climate events are likely to be of far more significance for environmentally vulnerable regions than changes in the mean state. It is generally accepted that sea-surface temperatures (SSTs) play an important role in modulating rainfall variability. Consequently, SSTs can be prescribed in global and regional climate modelling in order to study the physical mechanisms behind rainfall and its extremes. Using a satellite-based daily rainfall historical data set, this paper describes the main patterns of rainfall variability over southern Africa, identifies the dates when extreme rainfall occurs within these patterns, and shows the effect of resolution in trying to identify the location and intensity of SST anomalies associated with these extremes in the Atlantic and southwest Indian Ocean. Derived from a Principal Component Analysis (PCA), the results also suggest that, for the spatial pattern accounting for the highest amount of variability, extremes extracted at a higher spatial resolution do give a clearer indication regarding the location and intensity of anomalous SST regions. As the amount of variability explained by each spatial pattern defined by the PCA decreases, it would appear that extremes extracted at a lower resolution give a clearer indication of anomalous SST regions.
Resumo:
An investigation into the speciation and occurrence of nine haloacetic acids (HAAs) was conducted during the period of April 2007 to March 2008 and involved three drinking water supply systems in England, which were chosen to represent a range of source water conditions; these were an upland surface water, a lowland surface water and a groundwater. Samples were collected seasonally from the water treatment plants and at different locations in the distribution systems. The highest HAA concentrations occurred in the upland surface water system, with an average total HAA concentration of 21.3 μg/L. The lowest HAA levels were observed in the groundwater source, with a mean concentration of 0.6 μg/L. Seasonal variations were significant in the HAA concentrations; the highest total HAA concentrations were found during the autumn, when the concentrations were approximately two times higher than in winter and spring. HAA speciation varied among the water sources, with dichloroacetic acid and trichloroacetic acid dominant in the lowland surface water system and brominated species dominant in the upland surface water system. There was a strong correlation between trihalomethanes and HAAs when considering all samples from the three systems in the same data set (r2=0.88); however, the correlation was poor/moderate when considering each system independently.
Resumo:
The combination of the synthetic minority oversampling technique (SMOTE) and the radial basis function (RBF) classifier is proposed to deal with classification for imbalanced two-class data. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is applied to generate synthetic instances for the positive class to balance the training data set. Based on the over-sampled training data, the RBF classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier structure and the parameters of RBF kernels are determined using a particle swarm optimization algorithm based on the criterion of minimizing the leave-one-out misclassification rate. The experimental results on both simulated and real imbalanced data sets are presented to demonstrate the effectiveness of our proposed algorithm.
Resumo:
Urban boundary layers (UBLs) can be highly complex due to the heterogeneous roughness and heating of the surface, particularly at night. Due to a general lack of observations, it is not clear whether canonical models of boundary layer mixing are appropriate in modelling air quality in urban areas. This paper reports Doppler lidar observations of turbulence profiles in the centre of London, UK, as part of the second REPARTEE campaign in autumn 2007. Lidar-measured standard deviation of vertical velocity averaged over 30 min intervals generally compared well with in situ sonic anemometer measurements at 190 m on the BT telecommunications Tower. During calm, nocturnal periods, the lidar underestimated turbulent mixing due mainly to limited sampling rate. Mixing height derived from the turbulence, and aerosol layer height from the backscatter profiles, showed similar diurnal cycles ranging from c. 300 to 800 m, increasing to c. 200 to 850 m under clear skies. The aerosol layer height was sometimes significantly different to the mixing height, particularly at night under clear skies. For convective and neutral cases, the scaled turbulence profiles resembled canonical results; this was less clear for the stable case. Lidar observations clearly showed enhanced mixing beneath stratocumulus clouds reaching down on occasion to approximately half daytime boundary layer depth. On one occasion the nocturnal turbulent structure was consistent with a nocturnal jet, suggesting a stable layer. Given the general agreement between observations and canonical turbulence profiles, mixing timescales were calculated for passive scalars released at street level to reach the BT Tower using existing models of turbulent mixing. It was estimated to take c. 10 min to diffuse up to 190 m, rising to between 20 and 50 min at night, depending on stability. Determination of mixing timescales is important when comparing to physico-chemical processes acting on pollutant species measured simultaneously at both the ground and at the BT Tower during the campaign. From the 3 week autumnal data-set there is evidence for occasional stable layers in central London, effectively decoupling surface emissions from air aloft.
Resumo:
We develop a database of 110 gradual solar energetic particle (SEP) events, over the period 1967–2006, providing estimates of event onset, duration, fluence, and peak flux for protons of energy E > 60 MeV. The database is established mainly from the energetic proton flux data distributed in the OMNI 2 data set; however, we also utilize the McMurdo neutron monitor and the energetic proton flux from GOES missions. To aid the development of the gradual SEP database, we establish a method with which the homogeneity of the energetic proton flux record is improved. A comparison between other SEP databases and the database developed here is presented which discusses the different algorithms used to define an event. Furthermore, we investigate the variation of gradual SEP occurrence and fluence with solar cycle phase, sunspot number (SSN), and interplanetary magnetic field intensity (Bmag) over solar cycles 20–23. We find that the occurrence and fluence of SEP events vary with the solar cycle phase. Correspondingly, we find a positive correlation between SEP occurrence and solar activity as determined by SSN and Bmag, while the mean fluence in individual events decreases with the same measures of solar activity. Therefore, although the number of events decreases when solar activity is low, the events that do occur at such times have higher fluence. Thus, large events such as the “Carrington flare” may be more likely at lower levels of solar activity. These results are discussed in the context of other similar investigations.
Resumo:
This contribution proposes a powerful technique for two-class imbalanced classification problems by combining the synthetic minority over-sampling technique (SMOTE) and the particle swarm optimisation (PSO) aided radial basis function (RBF) classifier. In order to enhance the significance of the small and specific region belonging to the positive class in the decision region, the SMOTE is applied to generate synthetic instances for the positive class to balance the training data set. Based on the over-sampled training data, the RBF classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier's structure and the parameters of RBF kernels are determined using a PSO algorithm based on the criterion of minimising the leave-one-out misclassification rate. The experimental results obtained on a simulated imbalanced data set and three real imbalanced data sets are presented to demonstrate the effectiveness of our proposed algorithm.
Resumo:
This paper discusses how numerical gradient estimation methods may be used in order to reduce the computational demands on a class of multidimensional clustering algorithms. The study is motivated by the recognition that several current point-density based cluster identification algorithms could benefit from a reduction of computational demand if approximate a-priori estimates of the cluster centres present in a given data set could be supplied as starting conditions for these algorithms. In this particular presentation, the algorithm shown to benefit from the technique is the Mean-Tracking (M-T) cluster algorithm, but the results obtained from the gradient estimation approach may also be applied to other clustering algorithms and their related disciplines.
Resumo:
The structure and evolution of the Arctic stratospheric polar vortex is assessed during opposing phases of, primarily, the El Niño–Southern Oscillation (ENSO) and the Quasi-Biennial Oscillation (QBO), but the 11 year solar cycle and winters following large volcanic eruptions are also examined. The analysis is performed by taking 2-D moments of vortex potential vorticity (PV) fields which allow the area and centroid of the vortex to be calculated throughout the ERA-40 reanalysis data set (1958–2002). Composites of these diagnostics for the different phases of the natural forcings are then considered. Statistically significant results are found regarding the structure and evolution of the vortex during, in particular, the ENSO and QBO phases. When compared with the more traditional zonal mean zonal wind diagnostic at 60°N, the moment-based diagnostics are far more robust and contain more information regarding the state of the vortex. The study details, for the first time, a comprehensive sequence of events which map the evolution of the vortex during each of the forcings throughout an extended winter period.
A wind-tunnel study of flow distortion at a meteorological sensor on top of the BT Tower, London, UK
Resumo:
High quality wind measurements in cities are needed for numerous applications including wind engineering. Such data-sets are rare and measurement platforms may not be optimal for meteorological observations. Two years' wind data were collected on the BT Tower, London, UK, showing an upward deflection on average for all wind directions. Wind tunnel simulations were performed to investigate flow distortion around two scale models of the Tower. Using a 1:160 scale model it was shown that the Tower causes a small deflection (ca. 0.5°) compared to the lattice on top on which the instruments were placed (ca. 0–4°). These deflections may have been underestimated due to wind tunnel blockage. Using a 1:40 model, the observed flow pattern was consistent with streamwise vortex pairs shed from the upstream lattice edge. Correction factors were derived for different wind directions and reduced deflection in the full-scale data-set by <3°. Instrumental tilt caused a sinusoidal variation in deflection of ca. 2°. The residual deflection (ca. 3°) was attributed to the Tower itself. Correction of the wind-speeds was small (average 1%) therefore it was deduced that flow distortion does not significantly affect the measured wind-speeds and the wind climate statistics are reliable.
Resumo:
The potential of visible-near infrared spectra, obtained using a light backscatter sensor, in conjunction with chemometrics, to predict curd moisture and whey fat content in a cheese vat was examined. A three-factor (renneting temperature, calcium chloride, cutting time), central composite design was carried out in triplicate. Spectra (300–1,100 nm) of the product in the cheese vat were captured during syneresis using a prototype light backscatter sensor. Stirring followed upon cutting the gel, and samples of curd and whey were removed at 10 min intervals and analyzed for curd moisture and whey fat content. Spectral data were used to develop models for predicting curd moisture and whey fat contents using partial least squares regression. Subjecting the spectral data set to Jack-knifing improved the accuracy of the models. The whey fat models (R = 0.91, 0.95) and curd moisture model (R = 0.86, 0.89) provided good and approximate predictions, respectively. Visible-near infrared spectroscopy was found to have potential for the prediction of important syneresis indices in stirred cheese vats.
Resumo:
In the year 2007 a General Observation Period (GOP) has been performed within the German Priority Program on Quantitative Precipitation Forecasting (PQP). By optimizing the use of existing instrumentation a large data set of in-situ and remote sensing instruments with special focus on water cycle variables was gathered over the full year cycle. The area of interest covered central Europe with increasing focus towards the Black Forest where the Convective and Orographically-induced Precipitation Study (COPS) took place from June to August 2007. Thus the GOP includes a variety of precipitation systems in order to relate the COPS results to a larger spatial scale. For a timely use of the data, forecasts of the numerical weather prediction models COSMO-EU and COSMO-DE of the German Meteorological Service were tailored to match the observations and perform model evaluation in a near real-time environment. The ultimate goal is to identify and distinguish between different kinds of model deficits and to improve process understanding.
Resumo:
The aerosol component of the Oxford-Rutherford Aerosol and Cloud (ORAC) combined cloud and aerosol retrieval scheme is described and the theoretical performance of the algorithm is analysed. ORAC is an optimal estimation retrieval scheme for deriving cloud and aerosol properties from measurements made by imaging satellite radiometers and, when applied to cloud free radiances, provides estimates of aerosol optical depth at a wavelength of 550 nm, aerosol effective radius and surface reflectance at 550 nm. The aerosol retrieval component of ORAC has several incarnations – this paper addresses the version which operates in conjunction with the cloud retrieval component of ORAC (described by Watts et al., 1998), as applied in producing the Global Retrieval of ATSR Cloud Parameters and Evaluation (GRAPE) data-set.
Resumo:
The Global Retrieval of ATSR Cloud Parameters and Evaluation (GRAPE) project has produced a global data-set of cloud and aerosol properties from the Along Track Scanning Radiometer-2 (ATSR-2) instrument, covering the time period 1995�2001. This paper presents the validation of aerosol optical depths (AODs) over the ocean from this product against AERONET sun-photometer measurements, as well as a comparison to the Advanced Very High Resolution Radiometer (AVHRR) optical depth product produced by the Global Aerosol Climatology Project (GACP). The GRAPE AOD over ocean is found to be in good agreement with AERONET measurements, with a Pearson's correlation coefficient of 0.79 and a best-fit slope of 1.0±0.1, but with a positive bias of 0.08±0.04. Although the GRAPE and GACP datasets show reasonable agreement, there are significant differences. These discrepancies are explored, and suggest that the downward trend in AOD reported by GACP may arise from changes in sampling due to the orbital drift of the AVHRR instruments.
Resumo:
We examined the relationship between blood antioxidant enzyme activities, indices of inflammatory status and a number of lifestyle factors in the Caerphilly prospective cohort study of ischaemic heart disease. The study began in 1979 and is based on a representative male population sample. Initially 2512 men were seen in phase I, and followed-up every 5 years in phases II and III; they have recently been seen in phase IV. Data on social class, smoking habit, alcohol consumption were obtained by questionnaire, and body mass index was measured. Antioxidant enzyme activities and indices of inflammatory status were estimated by standard techniques. Significant associations were observed for: age with α-1-antichymotrypsin (p<0.0001) and with caeruloplasmin, both protein and oxidase (p<0.0001); smoking habit with α-1-antichymotrypsin (p<0.0001), with caeruloplasmin, both protein and oxidase (p<0.0001) and with glutathione peroxidose (GPX) (p<0.0001); social class with α-1-antichymotrypsin (p<0.0001), with caeruloplasmin both protein (p<0.001) and oxidase (p<0.01) and with GPX (p<0.0001); body mass index with α-1-antichymotrypsin (p<0.0001) and with caeruloplasmin protein (p<0.001). There was no significant association between alcohol consumption and any of the blood enzymes measured. Factor analysis produced a three-factor model (explaining 65.9% of the variation in the data set) which appeared to indicate close inter-relationships among antioxidants.