902 resultados para Large Data Sets


Relevância:

90.00% 90.00%

Publicador:

Resumo:

n the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real-time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image-processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Background: Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. Results: We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2 of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log(2) units (6 of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. Conclusions: This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The long observational record is critical to our understanding of the Earth’s climate, but most observing systems were not developed with a climate objective in mind. As a result, tremendous efforts have gone into assessing and reprocessing the data records to improve their usefulness in climate studies. The purpose of this paper is to both review recent progress in reprocessing and reanalyzing observations, and summarize the challenges that must be overcome in order to improve our understanding of climate and variability. Reprocessing improves data quality through more scrutiny and improved retrieval techniques for individual observing systems, while reanalysis merges many disparate observations with models through data assimilation, yet both aim to provide a climatology of Earth processes. Many challenges remain, such as tracking the improvement of processing algorithms and limited spatial coverage. Reanalyses have fostered significant research, yet reliable global trends in many physical fields are not yet attainable, despite significant advances in data assimilation and numerical modeling. Oceanic reanalyses have made significant advances in recent years, but will only be discussed here in terms of progress toward integrated Earth system analyses. Climate data sets are generally adequate for process studies and large-scale climate variability. Communication of the strengths, limitations and uncertainties of reprocessed observations and reanalysis data, not only among the community of developers, but also with the extended research community, including the new generations of researchers and the decision makers is crucial for further advancement of the observational data records. It must be emphasized that careful investigation of the data and processing methods are required to use the observations appropriately.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Under particular large-scale atmospheric conditions, several windstorms may affect Europe within a short time period. The occurrence of such cyclone families leads to large socioeconomic impacts and cumulative losses. The serial clustering of windstorms is analyzed for the North Atlantic/western Europe. Clustering is quantified as the dispersion (ratio variance/mean) of cyclone passages over a certain area. Dispersion statistics are derived for three reanalysis data sets and a 20-run European Centre Hamburg Version 5 /Max Planck Institute Version–Ocean Model Version 1 global climate model (ECHAM5/MPI-OM1 GCM) ensemble. The dependence of the seriality on cyclone intensity is analyzed. Confirming previous studies, serial clustering is identified in reanalysis data sets primarily on both flanks and downstream regions of the North Atlantic storm track. This pattern is a robust feature in the reanalysis data sets. For the whole area, extreme cyclones cluster more than nonextreme cyclones. The ECHAM5/MPI-OM1 GCM is generally able to reproduce the spatial patterns of clustering under recent climate conditions, but some biases are identified. Under future climate conditions (A1B scenario), the GCM ensemble indicates that serial clustering may decrease over the North Atlantic storm track area and parts of western Europe. This decrease is associated with an extension of the polar jet toward Europe, which implies a tendency to a more regular occurrence of cyclones over parts of the North Atlantic Basin poleward of 50°N and western Europe. An increase of clustering of cyclones is projected south of Newfoundland. The detected shifts imply a change in the risk of occurrence of cumulative events over Europe under future climate conditions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We propose a new class of neurofuzzy construction algorithms with the aim of maximizing generalization capability specifically for imbalanced data classification problems based on leave-one-out (LOO) cross validation. The algorithms are in two stages, first an initial rule base is constructed based on estimating the Gaussian mixture model with analysis of variance decomposition from input data; the second stage carries out the joint weighted least squares parameter estimation and rule selection using orthogonal forward subspace selection (OFSS)procedure. We show how different LOO based rule selection criteria can be incorporated with OFSS, and advocate either maximizing the leave-one-out area under curve of the receiver operating characteristics, or maximizing the leave-one-out Fmeasure if the data sets exhibit imbalanced class distribution. Extensive comparative simulations illustrate the effectiveness of the proposed algorithms.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Satellite data are increasingly used to provide observation-based estimates of the effects of aerosols on climate. The Aerosol-cci project, part of the European Space Agency's Climate Change Initiative (CCI), was designed to provide essential climate variables for aerosols from satellite data. Eight algorithms, developed for the retrieval of aerosol properties using data from AATSR (4), MERIS (3) and POLDER, were evaluated to determine their suitability for climate studies. The primary result from each of these algorithms is the aerosol optical depth (AOD) at several wavelengths, together with the Ångström exponent (AE) which describes the spectral variation of the AOD for a given wavelength pair. Other aerosol parameters which are possibly retrieved from satellite observations are not considered in this paper. The AOD and AE (AE only for Level 2) were evaluated against independent collocated observations from the ground-based AERONET sun photometer network and against “reference” satellite data provided by MODIS and MISR. Tools used for the evaluation were developed for daily products as produced by the retrieval with a spatial resolution of 10 × 10 km2 (Level 2) and daily or monthly aggregates (Level 3). These tools include statistics for L2 and L3 products compared with AERONET, as well as scoring based on spatial and temporal correlations. In this paper we describe their use in a round robin (RR) evaluation of four months of data, one month for each season in 2008. The amount of data was restricted to only four months because of the large effort made to improve the algorithms, and to evaluate the improvement and current status, before larger data sets will be processed. Evaluation criteria are discussed. Results presented show the current status of the European aerosol algorithms in comparison to both AERONET and MODIS and MISR data. The comparison leads to a preliminary conclusion that the scores are similar, including those for the references, but the coverage of AATSR needs to be enhanced and further improvements are possible for most algorithms. None of the algorithms, including the references, outperforms all others everywhere. AATSR data can be used for the retrieval of AOD and AE over land and ocean. PARASOL and one of the MERIS algorithms have been evaluated over ocean only and both algorithms provide good results.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

There is a renewed interest in immersive visualization to navigate digital data-sets associated with large building and infrastructure projects. Following work with a fully immersive visualization facility at the University, this paper details the development of a complementary mobile visualization environment. It articulates progress on the requirements for this facility; the overall design of hardware and software; and the laboratory testing and planning for user pilots in construction applications. Like our fixed facility, this new light-weight mobile solution enables a group of users to navigate a 3D model at a 1:1 scale and to work collaboratively with structured asset information. However it offers greater flexibility as two users can assemble and start using it at a new location within an hour. The solution has been developed and tested in a laboratory and will be piloted in engineering design review and stakeholder engagement applications on a major construction project.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Stratospheric water vapour is a powerful greenhouse gas. The longest available record from balloon observations over Boulder, Colorado, USA shows increases in stratospheric water vapour concentrations that cannot be fully explained by observed changes in the main drivers, tropical tropopause temperatures and methane. Satellite observations could help resolve the issue, but constructing a reliable long-term data record from individual short satellite records is challenging. Here we present an approach to merge satellite data sets with the help of a chemistry–climate model nudged to observed meteorology. We use the models’ water vapour as a transfer function between data sets that overcomes issues arising from instrument drift and short overlap periods. In the lower stratosphere, our water vapour record extends back to 1988 and water vapour concentrations largely follow tropical tropopause temperatures. Lower and mid-stratospheric long-term trends are negative, and the trends from Boulder are shown not to be globally representative. In the upper stratosphere, our record extends back to 1986 and shows positive long-term trends. The altitudinal differences in the trends are explained by methane oxidation together with a strengthened lower-stratospheric and a weakened upper stratospheric circulation inferred by this analysis. Our results call into question previous estimates of surface radiative forcing based on presumed global long-term increases in water vapour concentrations in the lower stratosphere.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

During the last decades, several windstorm series hit Europe leading to large aggregated losses. Such storm series are examples of serial clustering of extreme cyclones, presenting a considerable risk for the insurance industry. Clustering of events and return periods of storm series for Germany are quantified based on potential losses using empirical models. Two reanalysis data sets and observations from German weather stations are considered for 30 winters. Histograms of events exceeding selected return levels (1-, 2- and 5-year) are derived. Return periods of historical storm series are estimated based on the Poisson and the negative binomial distributions. Over 4000 years of general circulation model (GCM) simulations forced with current climate conditions are analysed to provide a better assessment of historical return periods. Estimations differ between distributions, for example 40 to 65 years for the 1990 series. For such less frequent series, estimates obtained with the Poisson distribution clearly deviate from empirical data. The negative binomial distribution provides better estimates, even though a sensitivity to return level and data set is identified. The consideration of GCM data permits a strong reduction of uncertainties. The present results support the importance of considering explicitly clustering of losses for an adequate risk assessment for economical applications.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A statistical–dynamical downscaling (SDD) approach for the regionalization of wind energy output (Eout) over Europe with special focus on Germany is proposed. SDD uses an extended circulation weather type (CWT) analysis on global daily mean sea level pressure fields with the central point being located over Germany. Seventy-seven weather classes based on the associated CWT and the intensity of the geostrophic flow are identified. Representatives of these classes are dynamically downscaled with the regional climate model COSMO-CLM. By using weather class frequencies of different data sets, the simulated representatives are recombined to probability density functions (PDFs) of near-surface wind speed and finally to Eout of a sample wind turbine for present and future climate. This is performed for reanalysis, decadal hindcasts and long-term future projections. For evaluation purposes, results of SDD are compared to wind observations and to simulated Eout of purely dynamical downscaling (DD) methods. For the present climate, SDD is able to simulate realistic PDFs of 10-m wind speed for most stations in Germany. The resulting spatial Eout patterns are similar to DD-simulated Eout. In terms of decadal hindcasts, results of SDD are similar to DD-simulated Eout over Germany, Poland, Czech Republic, and Benelux, for which high correlations between annual Eout time series of SDD and DD are detected for selected hindcasts. Lower correlation is found for other European countries. It is demonstrated that SDD can be used to downscale the full ensemble of the Earth System Model of the Max Planck Institute (MPI-ESM) decadal prediction system. Long-term climate change projections in Special Report on Emission Scenarios of ECHAM5/MPI-OM as obtained by SDD agree well to the results of other studies using DD methods, with increasing Eout over northern Europe and a negative trend over southern Europe. Despite some biases, it is concluded that SDD is an adequate tool to assess regional wind energy changes in large model ensembles.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We analyze ionospheric convection patterns over the polar regions during the passage of an interplanetary magnetic cloud on January 14, 1988, when the interplanetary magnetic field (IMF) rotated slowly in direction and had a large amplitude. Using the assimilative mapping of ionospheric electrodynamics (AMIE) procedure, we combine simultaneous observations of ionospheric drifts and magnetic perturbations from many different instruments into consistent patterns of high-latitude electrodynamics, focusing on the period of northward IMF. By combining satellite data with ground-based observations, we have generated one of the most comprehensive data sets yet assembled and used it to produce convection maps for both hemispheres. We present evidence that a lobe convection cell was embedded within normal merging convection during a period when the IMF By and Bz components were large and positive. As the IMF became predominantly northward, a strong reversed convection pattern (afternoon-to-morning potential drop of around 100 kV) appeared in the southern (summer) polar cap, while convection in the northern (winter) hemisphere became weak and disordered with a dawn-to-dusk potential drop of the order of 30 kV. These patterns persisted for about 3 hours, until the IMF rotated significantly toward the west. We interpret this behavior in terms of a recently proposed merging model for northward IMF under solstice conditions, for which lobe field lines from the hemisphere tilted toward the Sun (summer hemisphere) drape over the dayside magnetosphere, producing reverse convection in the summer hemisphere and impeding direct contact between the solar wind and field lines connected to the winter polar cap. The positive IMF Bx component present at this time could have contributed to the observed hemispheric asymmetry. Reverse convection in the summer hemisphere broke down rapidly after the ratio |By/Bz| exceeded unity, while convection in the winter hemisphere strengthened. A dominant dawn-to-dusk potential drop was established in both hemispheres when the magnitude of By exceeded that of Bz, with potential drops of the order of 100 kV, even while Bz remained northward. The later transition to southward Bz produced a gradual intensification of the convection, but a greater qualitative change occurred at the transition through |By/Bz| = 1 than at the transition through Bz = 0. The various convection patterns we derive under northward IMF conditions illustrate all possibilities previously discussed in the literature: nearly single-cell and multicell, distorted and symmetric, ordered and unordered, and sunward and antisunward.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Contamination of the electroencephalogram (EEG) by artifacts greatly reduces the quality of the recorded signals. There is a need for automated artifact removal methods. However, such methods are rarely evaluated against one another via rigorous criteria, with results often presented based upon visual inspection alone. This work presents a comparative study of automatic methods for removing blink, electrocardiographic, and electromyographic artifacts from the EEG. Three methods are considered; wavelet, blind source separation (BSS), and multivariate singular spectrum analysis (MSSA)-based correction. These are applied to data sets containing mixtures of artifacts. Metrics are devised to measure the performance of each method. The BSS method is seen to be the best approach for artifacts of high signal to noise ratio (SNR). By contrast, MSSA performs well at low SNRs but at the expense of a large number of false positive corrections.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A basic data requirement of a river flood inundation model is a Digital Terrain Model (DTM) of the reach being studied. The scale at which modeling is required determines the accuracy required of the DTM. For modeling floods in urban areas, a high resolution DTM such as that produced by airborne LiDAR (Light Detection And Ranging) is most useful, and large parts of many developed countries have now been mapped using LiDAR. In remoter areas, it is possible to model flooding on a larger scale using a lower resolution DTM, and in the near future the DTM of choice is likely to be that derived from the TanDEM-X Digital Elevation Model (DEM). A variable-resolution global DTM obtained by combining existing high and low resolution data sets would be useful for modeling flood water dynamics globally, at high resolution wherever possible and at lower resolution over larger rivers in remote areas. A further important data resource used in flood modeling is the flood extent, commonly derived from Synthetic Aperture Radar (SAR) images. Flood extents become more useful if they are intersected with the DTM, when water level observations (WLOs) at the flood boundary can be estimated at various points along the river reach. To illustrate the utility of such a global DTM, two examples of recent research involving WLOs at opposite ends of the spatial scale are discussed. The first requires high resolution spatial data, and involves the assimilation of WLOs from a real sequence of high resolution SAR images into a flood model to update the model state with observations over time, and to estimate river discharge and model parameters, including river bathymetry and friction. The results indicate the feasibility of such an Earth Observation-based flood forecasting system. The second example is at a larger scale, and uses SAR-derived WLOs to improve the lower-resolution TanDEM-X DEM in the area covered by the flood extents. The resulting reduction in random height error is significant.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper reviews the literature concerning the practice of using Online Analytical Processing (OLAP) systems to recall information stored by Online Transactional Processing (OLTP) systems. Such a review provides a basis for discussion on the need for the information that are recalled through OLAP systems to maintain the contexts of transactions with the data captured by the respective OLTP system. The paper observes an industry trend involving the use of OLTP systems to process information into data, which are then stored in databases without the business rules that were used to process information and data stored in OLTP databases without associated business rules. This includes the necessitation of a practice, whereby, sets of business rules are used to extract, cleanse, transform and load data from disparate OLTP systems into OLAP databases to support the requirements for complex reporting and analytics. These sets of business rules are usually not the same as business rules used to capture data in particular OLTP systems. The paper argues that, differences between the business rules used to interpret these same data sets, risk gaps in semantics between information captured by OLTP systems and information recalled through OLAP systems. Literature concerning the modeling of business transaction information as facts with context as part of the modelling of information systems were reviewed to identify design trends that are contributing to the design quality of OLTP and OLAP systems. The paper then argues that; the quality of OLTP and OLAP systems design has a critical dependency on the capture of facts with associated context, encoding facts with contexts into data with business rules, storage and sourcing of data with business rules, decoding data with business rules into the facts with the context and recall of facts with associated contexts. The paper proposes UBIRQ, a design model to aid the co-design of data with business rules storage for OLTP and OLAP purposes. The proposed design model provides the opportunity for the implementation and use of multi-purpose databases, and business rules stores for OLTP and OLAP systems. Such implementations would enable the use of OLTP systems to record and store data with executions of business rules, which will allow for the use of OLTP and OLAP systems to query data with business rules used to capture the data. Thereby ensuring information recalled via OLAP systems preserves the contexts of transactions as per the data captured by the respective OLTP system.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper presents a summary of the work done within the European Union's Seventh Framework Programme project ECLIPSE (Evaluating the Climate and Air Quality Impacts of Short-Lived Pollutants). ECLIPSE had a unique systematic concept for designing a realistic and effective mitigation scenario for short-lived climate pollutants (SLCPs; methane, aerosols and ozone, and their precursor species) and quantifying its climate and air quality impacts, and this paper presents the results in the context of this overarching strategy. The first step in ECLIPSE was to create a new emission inventory based on current legislation (CLE) for the recent past and until 2050. Substantial progress compared to previous work was made by including previously unaccounted types of sources such as flaring of gas associated with oil production, and wick lamps. These emission data were used for present-day reference simulations with four advanced Earth system models (ESMs) and six chemistry transport models (CTMs). The model simulations were compared with a variety of ground-based and satellite observational data sets from Asia, Europe and the Arctic. It was found that the models still underestimate the measured seasonality of aerosols in the Arctic but to a lesser extent than in previous studies. Problems likely related to the emissions were identified for northern Russia and India, in particular. To estimate the climate impacts of SLCPs, ECLIPSE followed two paths of research: the first path calculated radiative forcing (RF) values for a large matrix of SLCP species emissions, for different seasons and regions independently. Based on these RF calculations, the Global Temperature change Potential metric for a time horizon of 20 years (GTP20) was calculated for each SLCP emission type. This climate metric was then used in an integrated assessment model to identify all emission mitigation measures with a beneficial air quality and short-term (20-year) climate impact. These measures together defined a SLCP mitigation (MIT) scenario. Compared to CLE, the MIT scenario would reduce global methane (CH4) and black carbon (BC) emissions by about 50 and 80 %, respectively. For CH4, measures on shale gas production, waste management and coal mines were most important. For non-CH4 SLCPs, elimination of high-emitting vehicles and wick lamps, as well as reducing emissions from gas flaring, coal and biomass stoves, agricultural waste, solvents and diesel engines were most important. These measures lead to large reductions in calculated surface concentrations of ozone and particulate matter. We estimate that in the EU, the loss of statistical life expectancy due to air pollution was 7.5 months in 2010, which will be reduced to 5.2 months by 2030 in the CLE scenario. The MIT scenario would reduce this value by another 0.9 to 4.3 months. Substantially larger reductions due to the mitigation are found for China (1.8 months) and India (11–12 months). The climate metrics cannot fully quantify the climate response. Therefore, a second research path was taken. Transient climate ensemble simulations with the four ESMs were run for the CLE and MIT scenarios, to determine the climate impacts of the mitigation. In these simulations, the CLE scenario resulted in a surface temperature increase of 0.70 ± 0.14 K between the years 2006 and 2050. For the decade 2041–2050, the warming was reduced by 0.22 ± 0.07 K in the MIT scenario, and this result was in almost exact agreement with the response calculated based on the emission metrics (reduced warming of 0.22 ± 0.09 K). The metrics calculations suggest that non-CH4 SLCPs contribute ~ 22 % to this response and CH4 78 %. This could not be fully confirmed by the transient simulations, which attributed about 90 % of the temperature response to CH4 reductions. Attribution of the observed temperature response to non-CH4 SLCP emission reductions and BC specifically is hampered in the transient simulations by small forcing and co-emitted species of the emission basket chosen. Nevertheless, an important conclusion is that our mitigation basket as a whole would lead to clear benefits for both air quality and climate. The climate response from BC reductions in our study is smaller than reported previously, possibly because our study is one of the first to use fully coupled climate models, where unforced variability and sea ice responses cause relatively strong temperature fluctuations that may counteract (and, thus, mask) the impacts of small emission reductions. The temperature responses to the mitigation were generally stronger over the continents than over the oceans, and with a warming reduction of 0.44 K (0.39–0.49) K the largest over the Arctic. Our calculations suggest particularly beneficial climate responses in southern Europe, where surface warming was reduced by about 0.3 K and precipitation rates were increased by about 15 (6–21) mm yr−1 (more than 4 % of total precipitation) from spring to autumn. Thus, the mitigation could help to alleviate expected future drought and water shortages in the Mediterranean area. We also report other important results of the ECLIPSE project.