920 resultados para data sets


Relevância:

70.00% 70.00%

Publicador:

Resumo:

We propose a new class of neurofuzzy construction algorithms with the aim of maximizing generalization capability specifically for imbalanced data classification problems based on leave-one-out (LOO) cross validation. The algorithms are in two stages, first an initial rule base is constructed based on estimating the Gaussian mixture model with analysis of variance decomposition from input data; the second stage carries out the joint weighted least squares parameter estimation and rule selection using orthogonal forward subspace selection (OFSS)procedure. We show how different LOO based rule selection criteria can be incorporated with OFSS, and advocate either maximizing the leave-one-out area under curve of the receiver operating characteristics, or maximizing the leave-one-out Fmeasure if the data sets exhibit imbalanced class distribution. Extensive comparative simulations illustrate the effectiveness of the proposed algorithms.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Within the SPARC Data Initiative, the first comprehensive assessment of the quality of 13 water vapor products from 11 limb-viewing satellite instruments (LIMS, SAGE II, UARS-MLS, HALOE, POAM III, SMR, SAGE III, MIPAS, SCIAMACHY, ACE-FTS, and Aura-MLS) obtained within the time period 1978-2010 has been performed. Each instrument's water vapor profile measurements were compiled into monthly zonal mean time series on a common latitude-pressure grid. These time series serve as basis for the "climatological" validation approach used within the project. The evaluations include comparisons of monthly or annual zonal mean cross sections and seasonal cycles in the tropical and extratropical upper troposphere and lower stratosphere averaged over one or more years, comparisons of interannual variability, and a study of the time evolution of physical features in water vapor such as the tropical tape recorder and polar vortex dehydration. Our knowledge of the atmospheric mean state in water vapor is best in the lower and middle stratosphere of the tropics and midlatitudes, with a relative uncertainty of. 2-6% (as quantified by the standard deviation of the instruments' multiannual means). The uncertainty increases toward the polar regions (+/- 10-15%), the mesosphere (+/- 15%), and the upper troposphere/lower stratosphere below 100 hPa (+/- 30-50%), where sampling issues add uncertainty due to large gradients and high natural variability in water vapor. The minimum found in multiannual (1998-2008) mean water vapor in the tropical lower stratosphere is 3.5 ppmv (+/- 14%), with slightly larger uncertainties for monthly mean values. The frequently used HALOE water vapor data set shows consistently lower values than most other data sets throughout the atmosphere, with increasing deviations from the multi-instrument mean below 100 hPa in both the tropics and extratropics. The knowledge gained from these comparisons and regarding the quality of the individual data sets in different regions of the atmosphere will help to improve model-measurement comparisons (e.g., for diagnostics such as the tropical tape recorder or seasonal cycles), data merging activities, and studies of climate variability.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A comprehensive quality assessment of the ozone products from 18 limb-viewing satellite instruments is provided by means of a detailed intercomparison. The ozone climatologies in form of monthly zonal mean time series covering the upper troposphere to lower mesosphere are obtained from LIMS, SAGE I/II/III, UARS-MLS, HALOE, POAM II/III, SMR, OSIRIS, MIPAS, GOMOS, SCIAMACHY, ACE-FTS, ACE-MAESTRO, Aura-MLS, HIRDLS, and SMILES within 1978–2010. The intercomparisons focus on mean biases of annual zonal mean fields, interannual variability, and seasonal cycles. Additionally, the physical consistency of the data is tested through diagnostics of the quasi-biennial oscillation and Antarctic ozone hole. The comprehensive evaluations reveal that the uncertainty in our knowledge of the atmospheric ozone mean state is smallest in the tropical and midlatitude middle stratosphere with a 1σ multi-instrument spread of less than ±5%. While the overall agreement among the climatological data sets is very good for large parts of the stratosphere, individual discrepancies have been identified, including unrealistic month-to-month fluctuations, large biases in particular atmospheric regions, or inconsistencies in the seasonal cycle. Notable differences between the data sets exist in the tropical lower stratosphere (with a spread of ±30%) and at high latitudes (±15%). In particular, large relative differences are identified in the Antarctic during the time of the ozone hole, with a spread between the monthly zonal mean fields of ±50%. The evaluations provide guidance on what data sets are the most reliable for applications such as studies of ozone variability, model-measurement comparisons, detection of long-term trends, and data-merging activities.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

We present the first comprehensive intercomparison of currently available satellite ozone climatologies in the upper troposphere/lower stratosphere (UTLS) (300–70 hPa) as part of the Stratosphere-troposphere Processes and their Role in Climate (SPARC) Data Initiative. The Tropospheric Emission Spectrometer (TES) instrument is the only nadir-viewing instrument in this initiative, as well as the only instrument with a focus on tropospheric composition. We apply the TES observational operator to ozone climatologies from the more highly vertically resolved limb-viewing instruments. This minimizes the impact of differences in vertical resolution among the instruments and allows identification of systematic differences in the large-scale structure and variability of UTLS ozone. We find that the climatologies from most of the limb-viewing instruments show positive differences (ranging from 5 to 75%) with respect to TES in the tropical UTLS, and comparison to a “zonal mean” ozonesonde climatology indicates that these differences likely represent a positive bias for p ≤ 100 hPa. In the extratropics, there is good agreement among the climatologies regarding the timing and magnitude of the ozone seasonal cycle (differences in the peak-to-peak amplitude of <15%) when the TES observational operator is applied, as well as very consistent midlatitude interannual variability. The discrepancies in ozone temporal variability are larger in the tropics, with differences between the data sets of up to 55% in the seasonal cycle amplitude. However, the differences among the climatologies are everywhere much smaller than the range produced by current chemistry-climate models, indicating that the multiple-instrument ensemble is useful for quantitatively evaluating these models.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Stratospheric water vapour is a powerful greenhouse gas. The longest available record from balloon observations over Boulder, Colorado, USA shows increases in stratospheric water vapour concentrations that cannot be fully explained by observed changes in the main drivers, tropical tropopause temperatures and methane. Satellite observations could help resolve the issue, but constructing a reliable long-term data record from individual short satellite records is challenging. Here we present an approach to merge satellite data sets with the help of a chemistry–climate model nudged to observed meteorology. We use the models’ water vapour as a transfer function between data sets that overcomes issues arising from instrument drift and short overlap periods. In the lower stratosphere, our water vapour record extends back to 1988 and water vapour concentrations largely follow tropical tropopause temperatures. Lower and mid-stratospheric long-term trends are negative, and the trends from Boulder are shown not to be globally representative. In the upper stratosphere, our record extends back to 1986 and shows positive long-term trends. The altitudinal differences in the trends are explained by methane oxidation together with a strengthened lower-stratospheric and a weakened upper stratospheric circulation inferred by this analysis. Our results call into question previous estimates of surface radiative forcing based on presumed global long-term increases in water vapour concentrations in the lower stratosphere.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper presents a novel approach to the automatic classification of very large data sets composed of terahertz pulse transient signals, highlighting their potential use in biochemical, biomedical, pharmaceutical and security applications. Two different types of THz spectra are considered in the classification process. Firstly a binary classification study of poly-A and poly-C ribonucleic acid samples is performed. This is then contrasted with a difficult multi-class classification problem of spectra from six different powder samples that although have fairly indistinguishable features in the optical spectrum, they also possess a few discernable spectral features in the terahertz part of the spectrum. Classification is performed using a complex-valued extreme learning machine algorithm that takes into account features in both the amplitude as well as the phase of the recorded spectra. Classification speed and accuracy are contrasted with that achieved using a support vector machine classifier. The study systematically compares the classifier performance achieved after adopting different Gaussian kernels when separating amplitude and phase signatures. The two signatures are presented as feature vectors for both training and testing purposes. The study confirms the utility of complex-valued extreme learning machine algorithms for classification of the very large data sets generated with current terahertz imaging spectrometers. The classifier can take into consideration heterogeneous layers within an object as would be required within a tomographic setting and is sufficiently robust to detect patterns hidden inside noisy terahertz data sets. The proposed study opens up the opportunity for the establishment of complex-valued extreme learning machine algorithms as new chemometric tools that will assist the wider proliferation of terahertz sensing technology for chemical sensing, quality control, security screening and clinic diagnosis. Furthermore, the proposed algorithm should also be very useful in other applications requiring the classification of very large datasets.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

With a rapidly increasing fraction of electricity generation being sourced from wind, extreme wind power generation events such as prolonged periods of low (or high) generation and ramps in generation, are a growing concern for the efficient and secure operation of national power systems. As extreme events occur infrequently, long and reliable meteorological records are required to accurately estimate their characteristics. Recent publications have begun to investigate the use of global meteorological “reanalysis” data sets for power system applications, many of which focus on long-term average statistics such as monthly-mean generation. Here we demonstrate that reanalysis data can also be used to estimate the frequency of relatively short-lived extreme events (including ramping on sub-daily time scales). Verification against 328 surface observation stations across the United Kingdom suggests that near-surface wind variability over spatiotemporal scales greater than around 300 km and 6 h can be faithfully reproduced using reanalysis, with no need for costly dynamical downscaling. A case study is presented in which a state-of-the-art, 33 year reanalysis data set (MERRA, from NASA-GMAO), is used to construct an hourly time series of nationally-aggregated wind power generation in Great Britain (GB), assuming a fixed, modern distribution of wind farms. The resultant generation estimates are highly correlated with recorded data from National Grid in the recent period, both for instantaneous hourly values and for variability over time intervals greater than around 6 h. This 33 year time series is then used to quantify the frequency with which different extreme GB-wide wind power generation events occur, as well as their seasonal and inter-annual variability. Several novel insights into the nature of extreme wind power generation events are described, including (i) that the number of prolonged low or high generation events is well approximated by a Poission-like random process, and (ii) whilst in general there is large seasonal variability, the magnitude of the most extreme ramps is similar in both summer and winter. An up-to-date version of the GB case study data as well as the underlying model are freely available for download from our website: http://www.met.reading.ac.uk/~energymet/data/Cannon2014/.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Catastrophe risk models used by the insurance industry are likely subject to significant uncertainty, but due to their proprietary nature and strict licensing conditions they are not available for experimentation. In addition, even if such experiments were conducted, these would not be repeatable by other researchers because commercial confidentiality issues prevent the details of proprietary catastrophe model structures from being described in public domain documents. However, such experimentation is urgently required to improve decision making in both insurance and reinsurance markets. In this paper we therefore construct our own catastrophe risk model for flooding in Dublin, Ireland, in order to assess the impact of typical precipitation data uncertainty on loss predictions. As we consider only a city region rather than a whole territory and have access to detailed data and computing resources typically unavailable to industry modellers, our model is significantly more detailed than most commercial products. The model consists of four components, a stochastic rainfall module, a hydrological and hydraulic flood hazard module, a vulnerability module, and a financial loss module. Using these we undertake a series of simulations to test the impact of driving the stochastic event generator with four different rainfall data sets: ground gauge data, gauge-corrected rainfall radar, meteorological reanalysis data (European Centre for Medium-Range Weather Forecasts Reanalysis-Interim; ERA-Interim) and a satellite rainfall product (The Climate Prediction Center morphing method; CMORPH). Catastrophe models are unusual because they use the upper three components of the modelling chain to generate a large synthetic database of unobserved and severe loss-driving events for which estimated losses are calculated. We find the loss estimates to be more sensitive to uncertainties propagated from the driving precipitation data sets than to other uncertainties in the hazard and vulnerability modules, suggesting that the range of uncertainty within catastrophe model structures may be greater than commonly believed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

As we enter an era of ‘big data’, asset information is becoming a deliverable of complex projects. Prior research suggests digital technologies enable rapid, flexible forms of project organizing. This research analyses practices of managing change in Airbus, CERN and Crossrail, through desk-based review, interviews, visits and a cross-case workshop. These organizations deliver complex projects, rely on digital technologies to manage large data-sets; and use configuration management, a systems engineering approach with mid-20th century origins, to establish and maintain integrity. In them, configuration management has become more, rather than less, important. Asset information is structured, with change managed through digital systems, using relatively hierarchical, asynchronous and sequential processes. The paper contributes by uncovering limits to flexibility in complex projects where integrity is important. Challenges of managing change are discussed, considering the evolving nature of configuration management; potential use of analytics on complex projects; and implications for research and practice.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper reviews the literature concerning the practice of using Online Analytical Processing (OLAP) systems to recall information stored by Online Transactional Processing (OLTP) systems. Such a review provides a basis for discussion on the need for the information that are recalled through OLAP systems to maintain the contexts of transactions with the data captured by the respective OLTP system. The paper observes an industry trend involving the use of OLTP systems to process information into data, which are then stored in databases without the business rules that were used to process information and data stored in OLTP databases without associated business rules. This includes the necessitation of a practice, whereby, sets of business rules are used to extract, cleanse, transform and load data from disparate OLTP systems into OLAP databases to support the requirements for complex reporting and analytics. These sets of business rules are usually not the same as business rules used to capture data in particular OLTP systems. The paper argues that, differences between the business rules used to interpret these same data sets, risk gaps in semantics between information captured by OLTP systems and information recalled through OLAP systems. Literature concerning the modeling of business transaction information as facts with context as part of the modelling of information systems were reviewed to identify design trends that are contributing to the design quality of OLTP and OLAP systems. The paper then argues that; the quality of OLTP and OLAP systems design has a critical dependency on the capture of facts with associated context, encoding facts with contexts into data with business rules, storage and sourcing of data with business rules, decoding data with business rules into the facts with the context and recall of facts with associated contexts. The paper proposes UBIRQ, a design model to aid the co-design of data with business rules storage for OLTP and OLAP purposes. The proposed design model provides the opportunity for the implementation and use of multi-purpose databases, and business rules stores for OLTP and OLAP systems. Such implementations would enable the use of OLTP systems to record and store data with executions of business rules, which will allow for the use of OLTP and OLAP systems to query data with business rules used to capture the data. Thereby ensuring information recalled via OLAP systems preserves the contexts of transactions as per the data captured by the respective OLTP system.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The size and complexity of data sets generated within ecosystem-level programmes merits their capture, curation, storage and analysis, synthesis and visualisation using Big Data approaches. This review looks at previous attempts to organise and analyse such data through the International Biological Programme and draws on the mistakes made and the lessons learned for effective Big Data approaches to current Research Councils United Kingdom (RCUK) ecosystem-level programmes, using Biodiversity and Ecosystem Service Sustainability (BESS) and Environmental Virtual Observatory Pilot (EVOp) as exemplars. The challenges raised by such data are identified, explored and suggestions are made for the two major issues of extending analyses across different spatio-temporal scales and for the effective integration of quantitative and qualitative data.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A quality assessment of the CFC-11 (CCl3F), CFC-12 (CCl2F2), HF, and SF6 products from limb-viewing satellite instruments is provided by means of a detailed intercomparison. The climatologies in the form of monthly zonal mean time series are obtained from HALOE, MIPAS, ACE-FTS, and HIRDLS within the time period 1991–2010. The intercomparisons focus on the mean biases of the monthly and annual zonal mean fields and aim to identify their vertical, latitudinal and temporal structure. The CFC evaluations (based on MIPAS, ACE-FTS and HIRDLS) reveal that the uncertainty in our knowledge of the atmospheric CFC-11 and CFC-12 mean state, as given by satellite data sets, is smallest in the tropics and mid-latitudes at altitudes below 50 and 20 hPa, respectively, with a 1σ multi-instrument spread of up to ±5 %. For HF, the situation is reversed. The two available data sets (HALOE and ACE-FTS) agree well above 100 hPa, with a spread in this region of ±5 to ±10 %, while at altitudes below 100 hPa the HF annual mean state is less well known, with a spread ±30 % and larger. The atmospheric SF6 annual mean states derived from two satellite data sets (MIPAS and ACE-FTS) show only very small differences with a spread of less than ±5 % and often below ±2.5 %. While the overall agreement among the climatological data sets is very good for large parts of the upper troposphere and lower stratosphere (CFCs, SF6) or middle stratosphere (HF), individual discrepancies have been identified. Pronounced deviations between the instrument climatologies exist for particular atmospheric regions which differ from gas to gas. Notable features are differently shaped isopleths in the subtropics, deviations in the vertical gradients in the lower stratosphere and in the meridional gradients in the upper troposphere, and inconsistencies in the seasonal cycle. Additionally, long-term drifts between the instruments have been identified for the CFC-11 and CFC-12 time series. The evaluations as a whole provide guidance on what data sets are the most reliable for applications such as studies of atmospheric transport and variability, model–measurement comparisons and detection of long-term trends. The data sets will be publicly available from the SPARC Data Centre and through PANGAEA (doi:10.1594/PANGAEA.849223).

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Precipitation and temperature climate indices are calculated using the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis and validated against observational data from some stations over Brazil and other data sources. The spatial patterns of the climate indices trends are analyzed for the period 1961-1990 over South America. In addition, the correlation and linear regression coefficients for some specific stations were also obtained in order to compare with the reanalysis data. In general, the results suggest that NCEP/NCAR reanalysis can provide useful information about minimum temperature and consecutive dry days indices at individual grid cells in Brazil. However, some regional differences in the climate indices trends are observed when different data sets are compared. For instance, the NCEP/NCAR reanalysis shows a reversal signal for all rainfall annual indices and the cold night index over Argentina. Despite these differences, maps of the trends for most of the annual climate indices obtained from the NCEP/NCAR reanalysis and BRANT analysis are generally in good agreement with other available data sources and previous findings in the literature for large areas of southern South America. The pattern of trends for the precipitation annual indices over the 30 years analyzed indicates a change to wetter conditions over southern and southeastern parts of Brazil, Paraguay, Uruguay, central and northern Argentina, and parts of Chile and a decrease over southwestern South America. All over South America, the climate indices related to the minimum temperature (warm or cold nights) have clearly shown a warming tendency; however, no consistent changes in maximum temperature extremes (warm and cold days) have been observed. Therefore, one must be careful before suggesting an), trends for warm or cold days.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Astronomy has evolved almost exclusively by the use of spectroscopic and imaging techniques, operated separately. With the development of modern technologies, it is possible to obtain data cubes in which one combines both techniques simultaneously, producing images with spectral resolution. To extract information from them can be quite complex, and hence the development of new methods of data analysis is desirable. We present a method of analysis of data cube (data from single field observations, containing two spatial and one spectral dimension) that uses Principal Component Analysis (PCA) to express the data in the form of reduced dimensionality, facilitating efficient information extraction from very large data sets. PCA transforms the system of correlated coordinates into a system of uncorrelated coordinates ordered by principal components of decreasing variance. The new coordinates are referred to as eigenvectors, and the projections of the data on to these coordinates produce images we will call tomograms. The association of the tomograms (images) to eigenvectors (spectra) is important for the interpretation of both. The eigenvectors are mutually orthogonal, and this information is fundamental for their handling and interpretation. When the data cube shows objects that present uncorrelated physical phenomena, the eigenvector`s orthogonality may be instrumental in separating and identifying them. By handling eigenvectors and tomograms, one can enhance features, extract noise, compress data, extract spectra, etc. We applied the method, for illustration purpose only, to the central region of the low ionization nuclear emission region (LINER) galaxy NGC 4736, and demonstrate that it has a type 1 active nucleus, not known before. Furthermore, we show that it is displaced from the centre of its stellar bulge.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Phylogenetic analyses of representative species from the five genera of Winteraceae (Drimys, Pseudowintera, Takhtajania, Tasmannia, and Zygogynum s.l.) were performed using ITS nuclear sequences and a combined data-set of ITS + psbA-trnH + rpS16 sequences (sampling of 30 and 15 species, respectively). Indel informativity using simple gap coding or gaps as a fifth character was examined in both data-sets. Parsimony and Bayesian analyses support the monophyly of Drimys, Tasmannia, and Zygogynum s.l., but do not support the monophyly of Belliolum, Zygogynum s.s., and Bubbia. Within Drimys, the combined data-set recovers two subclades. Divergence time estimates suggest that the splitting between Drimys and its sister clade (Pseudowintera + Zygogynum s.l.) occurred around the end of the Cretaceous; in contrast, the divergence between the two subclades within Drimys is more recent (15.5-18.5 MY) and coincides in time with the Andean uplift. Estimates suggest that the earliest divergences within Winteraceae could have predated the first events of Gondwana fragmentation. (C) 2009 Elsevier Inc. All rights reserved.