221 resultados para Imbalanced datasets
Resumo:
Tagging provides support for retrieval and categorization of online content depending on users' tag choice. A number of models of tagging behaviour have been proposed to identify factors that are considered to affect taggers, such as users' tagging history. In this paper, we use Semiotics Analysis and Activity theory, to study the effect the system designer has over tagging behaviour. The framework we use shows the components that comprise the tagging system and how they interact together to direct tagging behaviour. We analysed two collaborative tagging systems: CiteULike and Delicious by studying their components by applying our framework. Using datasets from both systems, we found that 35% of CiteULike users did not provide tags compared to only 0.1% of Delicious users. This was directly linked to the type of tools used by the system designer to support tagging.
Resumo:
The Bollène-2002 Experiment was aimed at developing the use of a radar volume-scanning strategy for conducting radar rainfall estimations in the mountainous regions of France. A developmental radar processing system, called Traitements Régionalisés et Adaptatifs de Données Radar pour l’Hydrologie (Regionalized and Adaptive Radar Data Processing for Hydrological Applications), has been built and several algorithms were specifically produced as part of this project. These algorithms include 1) a clutter identification technique based on the pulse-to-pulse variability of reflectivity Z for noncoherent radar, 2) a coupled procedure for determining a rain partition between convective and widespread rainfall R and the associated normalized vertical profiles of reflectivity, and 3) a method for calculating reflectivity at ground level from reflectivities measured aloft. Several radar processing strategies, including nonadaptive, time-adaptive, and space–time-adaptive variants, have been implemented to assess the performance of these new algorithms. Reference rainfall data were derived from a careful analysis of rain gauge datasets furnished by the Cévennes–Vivarais Mediterranean Hydrometeorological Observatory. The assessment criteria for five intense and long-lasting Mediterranean rain events have proven that good quantitative precipitation estimates can be obtained from radar data alone within 100-km range by using well-sited, well-maintained radar systems and sophisticated, physically based data-processing systems. The basic requirements entail performing accurate electronic calibration and stability verification, determining the radar detection domain, achieving efficient clutter elimination, and capturing the vertical structure(s) of reflectivity for the target event. Radar performance was shown to depend on type of rainfall, with better results obtained with deep convective rain systems (Nash coefficients of roughly 0.90 for point radar–rain gauge comparisons at the event time step), as opposed to shallow convective and frontal rain systems (Nash coefficients in the 0.6–0.8 range). In comparison with time-adaptive strategies, the space–time-adaptive strategy yields a very significant reduction in the radar–rain gauge bias while the level of scatter remains basically unchanged. Because the Z–R relationships have not been optimized in this study, results are attributed to an improved processing of spatial variations in the vertical profile of reflectivity. The two main recommendations for future work consist of adapting the rain separation method for radar network operations and documenting Z–R relationships conditional on rainfall type.
Resumo:
Purpose – Price indices for commercial real estate markets are difficult to construct because assets are heterogeneous, they are spatially dispersed and they are infrequently traded. Appraisal-based indices are one response to these problems, but may understate volatility or fail to capture turning points in a timely manner. This paper estimates “transaction linked indices” for major European markets to see whether these offer a different perspective on market performance. The paper aims to discuss these issues. Design/methodology/approach – The assessed value method is used to construct the indices. This has been recently applied to commercial real estate datasets in the USA and UK. The underlying data comprise appraisals and sale prices for assets monitored by Investment Property Databank (IPD). The indices are compared to appraisal-based series for the countries concerned for Q4 2001 to Q4 2012. Findings – Transaction linked indices show stronger growth and sharper declines over the course of the cycle, but they do not notably lead their appraisal-based counterparts. They are typically two to four times more volatile. Research limitations/implications – Only country-level indicators can be constructed in many cases owing to low trading volumes in the period studied, and this same issue prevented sample selection bias from being analysed in depth. Originality/value – Discussion of the utility of transaction-based price indicators is extended to European commercial real estate markets. The indicators offer alternative estimates of real estate market volatility that may be useful in asset allocation and risk modelling, including in a regulatory context.
Resumo:
Global hydrographic and air–sea freshwater flux datasets are used to investigate ocean salinity changes over 1950–2010 in relation to surface freshwater flux. On multi-decadal timescales, surface salinity increases (decreases) in evaporation (precipitation) dominated regions, the Atlantic–Pacific salinity contrast increases, and the upper thermocline salinity maximum increases while the salinity minimum of intermediate waters decreases. Potential trends in E–P are examined for 1950–2010 (using two reanalyses) and 1979–2010 (using four reanalyses and two blended products). Large differences in the 1950–2010 E–P trend patterns are evident in several regions, particularly the North Atlantic. For 1979–2010 some coherency in the spatial change patterns is evident but there is still a large spread in trend magnitude and sign between the six E–P products. However, a robust pattern of increased E–P in the southern hemisphere subtropical gyres is seen in all products. There is also some evidence in the tropical Pacific for a link between the spatial change patterns of salinity and E–P associated with ENSO. The water cycle amplification rate over specific regions is subsequently inferred from the observed 3-D salinity change field using a salt conservation equation in variable isopycnal volumes, implicitly accounting for the migration of isopycnal surfaces. Inferred global changes of E–P over 1950–2010 amount to an increase of 1 ± 0.6 % in net evaporation across the subtropics and an increase of 4.2 ± 2 % in net precipitation across subpolar latitudes. Amplification rates are approximately doubled over 1979–2010, consistent with accelerated broad-scale warming but also coincident with much improved salinity sampling over the latter period.
Resumo:
This paper presents a neuroscience inspired information theoretic approach to motion segmentation. Robust motion segmentation represents a fundamental first stage in many surveillance tasks. As an alternative to widely adopted individual segmentation approaches, which are challenged in different ways by imagery exhibiting a wide range of environmental variation and irrelevant motion, this paper presents a new biologically-inspired approach which computes the multivariate mutual information between multiple complementary motion segmentation outputs. Performance evaluation across a range of datasets and against competing segmentation methods demonstrates robust performance.
Resumo:
By the mid-1930s the major Hollywood studios had developed extensive networks of distribution subsidiaries across five continents. This article focuses on the operation of American film distributors in Australia – one of Hollywood's largest foreign markets. Drawing on two unique primary datasets, the article compares and investigates film distribution in Sydney's first-run and suburban-run markets. It finds that the subsidiaries of US film companies faced a greater liability of foreignness in the city centre market than in the suburban one. Our data support the argument that film audiences in local or suburban cinema markets were more receptive to Hollywood entertainment than those in metropolitan centres.
Resumo:
Dynamical downscaling is frequently used to investigate the dynamical variables of extra-tropical cyclones, for example, precipitation, using very high-resolution models nested within coarser resolution models to understand the processes that lead to intense precipitation. It is also used in climate change studies, using long timeseries to investigate trends in precipitation, or to look at the small-scale dynamical processes for specific case studies. This study investigates some of the problems associated with dynamical downscaling and looks at the optimum configuration to obtain the distribution and intensity of a precipitation field to match observations. This study uses the Met Office Unified Model run in limited area mode with grid spacings of 12, 4 and 1.5 km, driven by boundary conditions provided by the ECMWF Operational Analysis to produce high-resolution simulations for the Summer of 2007 UK flooding events. The numerical weather prediction model is initiated at varying times before the peak precipitation is observed to test the importance of the initialisation and boundary conditions, and how long the simulation can be run for. The results are compared to raingauge data as verification and show that the model intensities are most similar to observations when the model is initialised 12 hours before the peak precipitation is observed. It was also shown that using non-gridded datasets makes verification more difficult, with the density of observations also affecting the intensities observed. It is concluded that the simulations are able to produce realistic precipitation intensities when driven by the coarser resolution data.
Resumo:
Surface temperature is a key aspect of weather and climate, but the term may refer to different quantities that play interconnected roles and are observed by different means. In a community-based activity in June 2012, the EarthTemp Network brought together 55 researchers from five continents to improve the interaction between scientific communities who focus on surface temperature in particular domains, to exploit the strengths of different observing systems and to better meet the needs of different communities. The workshop identified key needs for progress towards meeting scientific and societal requirements for surface temperature understanding and information, which are presented in this community paper. A "whole-Earth" perspective is required with more integrated, collaborative approaches to observing and understanding Earth's various surface temperatures. It is necessary to build understanding of the relationships between different surface temperatures, where presently inadequate, and undertake large-scale systematic intercomparisons. Datasets need to be easier to obtain and exploit for a wide constituency of users, with the differences and complementarities communicated in readily understood terms, and realistic and consistent uncertainty information provided. Steps were also recommended to curate and make available data that are presently inaccessible, develop new observing systems and build capacities to accelerate progress in the accuracy and usability of surface temperature datasets.
Resumo:
African societies are dependent on rainfall for agricultural and other water-dependent activities, yet rainfall is extremely variable in both space and time and reoccurring water shocks, such as drought, can have considerable social and economic impacts. To help improve our knowledge of the rainfall climate, we have constructed a 30-year (1983–2012), temporally consistent rainfall dataset for Africa known as TARCAT (TAMSAT African Rainfall Climatology And Time-series) using archived Meteosat thermal infra-red (TIR) imagery, calibrated against rain gauge records collated from numerous African agencies. TARCAT has been produced at 10-day (dekad) scale at a spatial resolution of 0.0375°. An intercomparison of TARCAT from 1983 to 2010 with six long-term precipitation datasets indicates that TARCAT replicates the spatial and seasonal rainfall patterns and interannual variability well, with correlation coefficients of 0.85 and 0.70 with the Climate Research Unit (CRU) and Global Precipitation Climatology Centre (GPCC) gridded-gauge analyses respectively in the interannual variability of the Africa-wide mean monthly rainfall. The design of the algorithm for drought monitoring leads to TARCAT underestimating the Africa-wide mean annual rainfall on average by −0.37 mm day−1 (21%) compared to other datasets. As the TARCAT rainfall estimates are historically calibrated across large climatically homogeneous regions, the data can provide users with robust estimates of climate related risk, even in regions where gauge records are inconsistent in time.
Resumo:
Monte Carlo algorithms often aim to draw from a distribution π by simulating a Markov chain with transition kernel P such that π is invariant under P. However, there are many situations for which it is impractical or impossible to draw from the transition kernel P. For instance, this is the case with massive datasets, where is it prohibitively expensive to calculate the likelihood and is also the case for intractable likelihood models arising from, for example, Gibbs random fields, such as those found in spatial statistics and network analysis. A natural approach in these cases is to replace P by an approximation Pˆ. Using theory from the stability of Markov chains we explore a variety of situations where it is possible to quantify how ’close’ the chain given by the transition kernel Pˆ is to the chain given by P . We apply these results to several examples from spatial statistics and network analysis.
Resumo:
Photographs, videos and datasets related to Morgawr, the Bronze Age-type sewn-plank boat.
Resumo:
The Iberian viticultural regions are convened according to the Denomination of Origin (DO) and present different climates, soils, topography and management practices. All these elements influence the vegetative growth of different varieties throughout the peninsula, and are tied to grape quality and wine type. In the current study, an integrated analysis of climate, soil, topography and vegetative growth was performed for the Iberian DO regions, using state-of-the-art datasets. For climatic assessment, a categorized index, accounting for phenological/thermal development, water availability and grape ripening conditions was computed. Soil textural classes were established to distinguish soil types. Elevation and aspect (orientation) were also taken into account, as the leading topographic elements. A spectral vegetation index was used to assess grapevine vegetative growth and an integrated analysis of all variables was performed. The results showed that the integrated climate-soil-topography influence on vine performance is evident. Most Iberian vineyards are grown in temperate dry climates with loamy soils, presenting low vegetative growth. Vineyards in temperate humid conditions tend to show higher vegetative growth. Conversely, in cooler/warmer climates, lower vigour vineyards prevail and other factors, such as soil type and precipitation acquire more important roles in driving vigour. Vines in prevailing loamy soils are grown over a wide climatic diversity, suggesting that precipitation is the primary factor influencing vigour. The present assessment of terroir characteristics allows direct comparison among wine regions and may have great value to viticulturists, particularly under a changing climate.
Resumo:
This paper presents a novel approach to the automatic classification of very large data sets composed of terahertz pulse transient signals, highlighting their potential use in biochemical, biomedical, pharmaceutical and security applications. Two different types of THz spectra are considered in the classification process. Firstly a binary classification study of poly-A and poly-C ribonucleic acid samples is performed. This is then contrasted with a difficult multi-class classification problem of spectra from six different powder samples that although have fairly indistinguishable features in the optical spectrum, they also possess a few discernable spectral features in the terahertz part of the spectrum. Classification is performed using a complex-valued extreme learning machine algorithm that takes into account features in both the amplitude as well as the phase of the recorded spectra. Classification speed and accuracy are contrasted with that achieved using a support vector machine classifier. The study systematically compares the classifier performance achieved after adopting different Gaussian kernels when separating amplitude and phase signatures. The two signatures are presented as feature vectors for both training and testing purposes. The study confirms the utility of complex-valued extreme learning machine algorithms for classification of the very large data sets generated with current terahertz imaging spectrometers. The classifier can take into consideration heterogeneous layers within an object as would be required within a tomographic setting and is sufficiently robust to detect patterns hidden inside noisy terahertz data sets. The proposed study opens up the opportunity for the establishment of complex-valued extreme learning machine algorithms as new chemometric tools that will assist the wider proliferation of terahertz sensing technology for chemical sensing, quality control, security screening and clinic diagnosis. Furthermore, the proposed algorithm should also be very useful in other applications requiring the classification of very large datasets.
Resumo:
Understanding observed changes to the global water cycle is key to predicting future climate changes and their impacts. While many datasets document crucial variables such as precipitation, ocean salinity, runoff, and humidity, most are uncertain for determining long-term changes. In situ networks provide long time-series over land but are sparse in many regions, particularly the tropics. Satellite and reanalysis datasets provide global coverage, but their long-term stability is lacking. However, comparisons of changes among related variables can give insights into the robustness of observed changes. For example, ocean salinity, interpreted with an understanding of ocean processes, can help cross-validate precipitation. Observational evidence for human influences on the water cycle is emerging, but uncertainties resulting from internal variability and observational errors are too large to determine whether the observed and simulated changes are consistent. Improvements to the in situ and satellite observing networks that monitor the changing water cycle are required, yet continued data coverage is threatened by funding reductions. Uncertainty both in the role of anthropogenic aerosols, and due to large climate variability presently limits confidence in attribution of observed changes.
Resumo:
Many studies evaluating model boundary-layer schemes focus either on near-surface parameters or on short-term observational campaigns. This reflects the observational datasets that are widely available for use in model evaluation. In this paper we show how surface and long-term Doppler lidar observations, combined in a way to match model representation of the boundary layer as closely as possible, can be used to evaluate the skill of boundary-layer forecasts. We use a 2-year observational dataset from a rural site in the UK to evaluate a climatology of boundary layer type forecast by the UK Met Office Unified Model. In addition, we demonstrate the use of a binary skill score (Symmetric Extremal Dependence Index) to investigate the dependence of forecast skill on season, horizontal resolution and forecast leadtime. A clear diurnal and seasonal cycle can be seen in the climatology of both the model and observations, with the main discrepancies being the model overpredicting cumulus capped and decoupled stratocumulus capped boundary-layers and underpredicting well mixed boundary-layers. Using the SEDI skill score the model is most skillful at predicting the surface stability. The skill of the model in predicting cumulus capped and stratocumulus capped stable boundary layer forecasts is low but greater than a 24 hr persistence forecast. In contrast, the prediction of decoupled boundary-layers and boundary-layers with multiple cloud layers is lower than persistence. This process based evaluation approach has the potential to be applied to other boundary-layer parameterisation schemes with similar decision structures.