190 resultados para DATA SET
Resumo:
In this article, we investigate how the choice of the attenuation factor in an extended version of Katz centrality influences the centrality of the nodes in evolving communication networks. For given snapshots of a network, observed over a period of time, recently developed communicability indices aim to identify the best broadcasters and listeners (receivers) in the network. Here we explore the attenuation factor constraint, in relation to the spectral radius (the largest eigenvalue) of the network at any point in time and its computation in the case of large networks. We compare three different communicability measures: standard, exponential, and relaxed (where the spectral radius bound on the attenuation factor is relaxed and the adjacency matrix is normalised, in order to maintain the convergence of the measure). Furthermore, using a vitality-based measure of both standard and relaxed communicability indices, we look at the ways of establishing the most important individuals for broadcasting and receiving of messages related to community bridging roles. We compare those measures with the scores produced by an iterative version of the PageRank algorithm and illustrate our findings with two examples of real-life evolving networks: the MIT reality mining data set, consisting of daily communications between 106 individuals over the period of one year, a UK Twitter mentions network, constructed from the direct \emph{tweets} between 12.4k individuals during one week, and a subset the Enron email data set.
Resumo:
Atmospheric dust is an important feedback in the climate system, potentially affecting the radiative balance and chemical composition of the atmosphere and providing nutrients to terrestrial and marine ecosystems. Yet the potential impact of dust on the climate system, both in the anthropogenically disturbed future and the naturally varying past, remains to be quantified. The geologic record of dust provides the opportunity to test earth system models designed to simulate dust. Records of dust can be obtained from ice cores, marine sediments, and terrestrial (loess) deposits. Although rarely unequivocal, these records document a variety of processes (source, transport and deposition) in the dust cycle, stored in each archive as changes in clay mineralogy, isotopes, grain size, and concentration of terrigenous materials. Although the extraction of information from each type of archive is slightly different, the basic controls on these dust indicators are the same. Changes in the dust flux and particle size might be controlled by a combination of (a) source area extent, (b) dust emission efficiency (wind speed) and atmospheric transport, (c) atmospheric residence time of dust, and/or (d) relative contributions of dry settling and rainout of dust. Similarly, changes in mineralogy reflect (a) source area mineralogy and weathering and (b) shifts in atmospheric transport. The combination of these geological data with process-based, forward-modelling schemes in global earth system models provides an excellent means of achieving a comprehensive picture of the global pattern of dust accumulation rates, their controlling mechanisms, and how those mechanisms may vary regionally. The Dust Indicators and Records of Terrestrial and MArine Palaeoenvironments (DIRTMAP) data base has been established to provide a global palaeoenvironmental data set that can be used to validate earth system model simulations of the dust cycle over the past 150,000 years.
Resumo:
Seventeen simulations of the Last Glacial Maximum (LGM) climate have been performed using atmospheric general circulation models (AGCM) in the framework of the Paleoclimate Modeling Intercomparison Project (PMIP). These simulations use the boundary conditions for CO2, insolation and ice-sheets; surface temperatures (SSTs) are either (a) prescribed using CLIMAP data set (eight models) or (b) computed by coupling the AGCM with a slab ocean (nine models). The present-day (PD) tropical climate is correctly depicted by all the models, except the coarser resolution models, and the simulated geographical distribution of annual mean temperature is in good agreement with climatology. Tropical cooling at the LGM is less than at middle and high latitudes, but greatly exceeds the PD temperature variability. The LGM simulations with prescribed SSTs underestimate the observed temperature changes except over equatorial Africa where the models produce a temperature decrease consistent with the data. Our results confirm previous analyses showing that CLIMAP (1981) SSTs only produce a weak terrestrial cooling. When SSTs are computed, the models depict a cooling over the Pacific and Indian oceans in contrast with CLIMAP and most models produce cooler temperatures over land. Moreover four of the nine simulations, produce a cooling in good agreement with terrestrial data. Two of these model results over ocean are consistent with new SST reconstructions whereas two models simulate a homogeneous cooling. Finally, the LGM aridity inferred for most of the tropics from the data, is globally reproduced by the models with a strong underestimation for models using computed SSTs.
Resumo:
This contribution proposes a novel probability density function (PDF) estimation based over-sampling (PDFOS) approach for two-class imbalanced classification problems. The classical Parzen-window kernel function is adopted to estimate the PDF of the positive class. Then according to the estimated PDF, synthetic instances are generated as the additional training data. The essential concept is to re-balance the class distribution of the original imbalanced data set under the principle that synthetic data sample follows the same statistical properties. Based on the over-sampled training data, the radial basis function (RBF) classifier is constructed by applying the orthogonal forward selection procedure, in which the classifier’s structure and the parameters of RBF kernels are determined using a particle swarm optimisation algorithm based on the criterion of minimising the leave-one-out misclassification rate. The effectiveness of the proposed PDFOS approach is demonstrated by the empirical study on several imbalanced data sets.
Resumo:
Using a transactions costs framework, we examine the impact of information and communication technologies (mobile phones and radios) use on market participation in developing country agricultural markets using a novel transaction-level data set of Ghanaian farmers. Our analysis of the choice of markets by farmers suggests that market information from a broader range of markets may not always induce farmers to sell in more distant markets; instead farmers may use broader market information to enhance their bargaining power in closer markets. Finally, we find weak evidence on the impact of using mobile phones in attracting farm gate buyers.
Resumo:
To date there has been no systematic study of the relationship between individuals’ opinions of different institutions and their perceptions of world affairs. This article tries to fill this gap by using a large cross-country data set comprising nine EU members and seven Asian nations and instrumental variable bivariate probit regression analysis. Controlling for a host of factors, the article shows that individuals’ confidence in multilateral institutions affects their perceptions of whether or not their country is being treated fairly in international affairs. This finding expands upon both theoretical work on multilateral institutions that has focused on state actors’ rationale for engaging in multilateral cooperation and empirical work that has treated confidence in multilateral institutions as a dependent variable. The article also shows that individuals’ confidence in different international organizations has undifferentiated effects on their perceptions of whether or not their country is being treated fairly in international affairs, though individuals more knowledgeable about international affairs exhibit slightly different attitudes. Finally, the article demonstrates significant differences in opinion across Europe and Asia.
Resumo:
We explore the mutual dependencies and interactions among different groups of species of the plankton population, based on an analysis of the long-term field observations carried out by our group in the North–West coast of the Bay of Bengal. The plankton community is structured into three groups of species, namely, non-toxic phytoplankton (NTP), toxic phytoplankton (TPP) and zooplankton. To find the pair-wise dependencies among the three groups of plankton, Pearson and partial correlation coefficients are calculated. To explore the simultaneous interaction among all the three groups, a time series analysis is performed. Following an Expectation Maximization (E-M) algorithm, those data points which are missing due to irregularities in sampling are estimated, and with the completed data set a Vector Auto-Regressive (VAR) model is analyzed. The overall analysis demonstrates that toxin-producing phytoplankton play two distinct roles: the inhibition on consumption of toxic substances reduces the abundance of zooplankton, and the toxic materials released by TPP significantly compensate for the competitive disadvantages among phytoplankton species. Our study suggests that the presence of TPP might be a possible cause for the generation of a complex interaction among the large number of phytoplankton and zooplankton species that might be responsible for the prolonged coexistence of the plankton species in a fluctuating biomass.
Resumo:
An efficient two-level model identification method aiming at maximising a model׳s generalisation capability is proposed for a large class of linear-in-the-parameters models from the observational data. A new elastic net orthogonal forward regression (ENOFR) algorithm is employed at the lower level to carry out simultaneous model selection and elastic net parameter estimation. The two regularisation parameters in the elastic net are optimised using a particle swarm optimisation (PSO) algorithm at the upper level by minimising the leave one out (LOO) mean square error (LOOMSE). There are two elements of original contributions. Firstly an elastic net cost function is defined and applied based on orthogonal decomposition, which facilitates the automatic model structure selection process with no need of using a predetermined error tolerance to terminate the forward selection process. Secondly it is shown that the LOOMSE based on the resultant ENOFR models can be analytically computed without actually splitting the data set, and the associate computation cost is small due to the ENOFR procedure. Consequently a fully automated procedure is achieved without resort to any other validation data set for iterative model evaluation. Illustrative examples are included to demonstrate the effectiveness of the new approaches.
Resumo:
In addition to CO2, the climate impact of aviation is strongly influenced by non-CO2 emissions, such as nitrogen oxides, influencing ozone and methane, and water vapour, which can lead to the formation of persistent contrails in ice-supersaturated regions. Because these non-CO2 emission effects are characterised by a short lifetime, their climate impact largely depends on emission location and time; that is to say, emissions in certain locations (or times) can lead to a greater climate impact (even on the global average) than the same emission in other locations (or times). Avoiding these climate-sensitive regions might thus be beneficial to climate. Here, we describe a modelling chain for investigating this climate impact mitigation option. This modelling chain forms a multi-step modelling approach, starting with the simulation of the fate of emissions released at a certain location and time (time-region grid points). This is performed with the chemistry–climate model EMAC, extended via the two submodels AIRTRAC (V1.0) and CONTRAIL (V1.0), which describe the contribution of emissions to the composition of the atmosphere and to contrail formation, respectively. The impact of emissions from the large number of time-region grid points is efficiently calculated by applying a Lagrangian scheme. EMAC also includes the calculation of radiative impacts, which are, in a second step, the input to climate metric formulas describing the global climate impact of the emission at each time-region grid point. The result of the modelling chain comprises a four-dimensional data set in space and time, which we call climate cost functions and which describes the global climate impact of an emission at each grid point and each point in time. In a third step, these climate cost functions are used in an air traffic simulator (SAAM) coupled to an emission tool (AEM) to optimise aircraft trajectories for the North Atlantic region. Here, we describe the details of this new modelling approach and show some example results. A number of sensitivity analyses are performed to motivate the settings of individual parameters. A stepwise sanity check of the results of the modelling chain is undertaken to demonstrate the plausibility of the climate cost functions.
Resumo:
Twitter is both a micro-blogging service and a platform for public conversation. Direct conversation is facilitated in Twitter through the use of @’s (mentions) and replies. While the conversational element of Twitter is of particular interest to the marketing sector, relatively few data-mining studies have focused on this area. We analyse conversations associated with reciprocated mentions that take place in a data-set consisting of approximately 4 million tweets collected over a period of 28 days that contain at least one mention. We ignore tweet content and instead use the mention network structure and its dynamical properties to identify and characterise Twitter conversations between pairs of users and within larger groups. We consider conversational balance, meaning the fraction of content contributed by each party. The goal of this work is to draw out some of the mechanisms driving conversation in Twitter, with the potential aim of developing conversational models.
Resumo:
During the last decades, several windstorm series hit Europe leading to large aggregated losses. Such storm series are examples of serial clustering of extreme cyclones, presenting a considerable risk for the insurance industry. Clustering of events and return periods of storm series for Germany are quantified based on potential losses using empirical models. Two reanalysis data sets and observations from German weather stations are considered for 30 winters. Histograms of events exceeding selected return levels (1-, 2- and 5-year) are derived. Return periods of historical storm series are estimated based on the Poisson and the negative binomial distributions. Over 4000 years of general circulation model (GCM) simulations forced with current climate conditions are analysed to provide a better assessment of historical return periods. Estimations differ between distributions, for example 40 to 65 years for the 1990 series. For such less frequent series, estimates obtained with the Poisson distribution clearly deviate from empirical data. The negative binomial distribution provides better estimates, even though a sensitivity to return level and data set is identified. The consideration of GCM data permits a strong reduction of uncertainties. The present results support the importance of considering explicitly clustering of losses for an adequate risk assessment for economical applications.
Resumo:
Solar Stormwatch was the first space weather citizen science project, the aim of which was to identify and track coronal mass ejections (CMEs) observed by the Heliospheric Imagers aboard the STEREO satellites. The project has now been running for approximately 4 years, with input from >16000 citizen scientists, resulting in a dataset of >38000 time-elongation profiles of CME trajectories, observed over 18 pre-selected position angles. We present our method for reducing this data set into aCME catalogue. The resulting catalogue consists of 144 CMEs over the period January-2007 to February-2010, of which 110 were observed by STEREO-A and 77 were observed by STEREO-B. For each CME, the time-elongation profiles generated by the citizen scientists are averaged into a consensus profile along each position angle that the event was tracked. We consider this catalogue to be unique, being at present the only citizen science generated CME catalogue, tracking CMEs over an elongation range of 4 degrees out to a maximum of approximately 70 degrees. Using single spacecraft fitting techniques, we estimate the speed, direction, solar source region and latitudinal width of each CME. This shows that, at present, the Solar Stormwatch catalogue (which covers only solar minimum years) contains almost exclusively slow CMEs, with a mean speed of approximately 350 kms−1. The full catalogue is available for public access at www.met.reading.ac.uk/spate/stormwatch. This includes, for each event, the unprocessed time-elongation profiles generated by Solar Stormwatch, the consensus time-elongation profiles and a set of summary plots, as well as the estimated CME properties.
Resumo:
We present an analysis of seven primary transit observations of the hot Neptune GJ436b at 3.6, 4.5, and 8 μm obtained with the Infrared Array Camera on the Spitzer Space Telescope. After correcting for systematic effects, we fitted the light curves using the Markov Chain Monte Carlo technique. Combining these new data with the EPOXI, Hubble Space Telescope, and ground-based V, I, H, and Ks published observations, the range 0.5-10 μm can be covered. Due to the low level of activity of GJ436, the effect of starspots on the combination of transits at different epochs is negligible at the accuracy of the data set. Representative climate models were calculated by using a three-dimensional, pseudospectral general circulation model with idealized thermal forcing. Simulated transit spectra of GJ436b were generated using line-by-line radiative transfer models including the opacities of the molecular species expected to be present in such a planetary atmosphere. A new, ab-initio-calculated, line list for hot ammonia has been used for the first time. The photometric data observed at multiple wavelengths can be interpreted with methane being the dominant absorption after molecular hydrogen, possibly with minor contributions from ammonia, water, and other molecules. No clear evidence of carbon monoxide and carbon dioxide is found from transit photometry. We discuss this result in the light of a recent paper where photochemical disequilibrium is hypothesized to interpret secondary transit photometric data. We show that the emission photometric data are not incompatible with the presence of abundant methane, but further spectroscopic data are desirable to confirm this scenario.
Resumo:
The weekly dependence of pollutant aerosols in the urban environment of Lisbon (Portugal) is inferred from the records of atmospheric electric field at Portela meteorological station (38°47′N,9°08′W). Measurements were made with a Bendorf electrograph. The data set exists from 1955 to 1990, but due to the contaminating effect of the radioactive fallout during 1960 and 1970s, only the period between 1980 and 1990 is considered here. Using a relative difference method a weekly dependence of the atmospheric electric field is found in these records, which shows an increasing trend between 1980 and 1990. This is consistent with a growth of population in the Lisbon metropolitan area and consequently urban activity, mainly traffic. Complementarily, using a Lomb–Scargle periodogram technique the presence of a daily and weekly cycle is also found. Moreover, to follow the evolution of theses cycles, in the period considered, a simple representation in a colour surface plot representation of the annual periodograms is presented. Further, a noise analysis of the periodograms is made, which validates the results found. Two datasets were considered: all days in the period, and fair-weather days only.
Resumo:
Many of the next generation of global climate models will include aerosol schemes which explicitly simulate the microphysical processes that determine the particle size distribution. These models enable aerosol optical properties and cloud condensation nuclei (CCN) concentrations to be determined by fundamental aerosol processes, which should lead to a more physically based simulation of aerosol direct and indirect radiative forcings. This study examines the global variation in particle size distribution simulated by 12 global aerosol microphysics models to quantify model diversity and to identify any common biases against observations. Evaluation against size distribution measurements from a new European network of aerosol supersites shows that the mean model agrees quite well with the observations at many sites on the annual mean, but there are some seasonal biases common to many sites. In particular, at many of these European sites, the accumulation mode number concentration is biased low during winter and Aitken mode concentrations tend to be overestimated in winter and underestimated in summer. At high northern latitudes, the models strongly underpredict Aitken and accumulation particle concentrations compared to the measurements, consistent with previous studies that have highlighted the poor performance of global aerosol models in the Arctic. In the marine boundary layer, the models capture the observed meridional variation in the size distribution, which is dominated by the Aitken mode at high latitudes, with an increasing concentration of accumulation particles with decreasing latitude. Considering vertical profiles, the models reproduce the observed peak in total particle concentrations in the upper troposphere due to new particle formation, although modelled peak concentrations tend to be biased high over Europe. Overall, the multi-model-mean data set simulates the global variation of the particle size distribution with a good degree of skill, suggesting that most of the individual global aerosol microphysics models are performing well, although the large model diversity indicates that some models are in poor agreement with the observations. Further work is required to better constrain size-resolved primary and secondary particle number sources, and an improved understanding of nucleation and growth (e.g. the role of nitrate and secondary organics) will improve the fidelity of simulated particle size distributions.