885 resultados para Incremental Information-content
Resumo:
Search engines exploit the Web's hyperlink structure to help infer information content. The new phenomenon of personal Web logs, or 'blogs', encourage more extensive annotation of Web content. If their resulting link structures bias the Web crawling applications that search engines depend upon, there are implications for another form of annotation rapidly on the rise, the Semantic Web. We conducted a Web crawl of 160 000 pages in which the link structure of the Web is compared with that of several thousand blogs. Results show that the two link structures are significantly different. We analyse the differences and infer the likely effect upon the performance of existing and future Web agents. The Semantic Web offers new opportunities to navigate the Web, but Web agents should be designed to take advantage of the emerging link structures, or their effectiveness will diminish.
Resumo:
The need for consistent assimilation of satellite measurements for numerical weather prediction led operational meteorological centers to assimilate satellite radiances directly using variational data assimilation systems. More recently there has been a renewed interest in assimilating satellite retrievals (e.g., to avoid the use of relatively complicated radiative transfer models as observation operators for data assimilation). The aim of this paper is to provide a rigorous and comprehensive discussion of the conditions for the equivalence between radiance and retrieval assimilation. It is shown that two requirements need to be satisfied for the equivalence: (i) the radiance observation operator needs to be approximately linear in a region of the state space centered at the retrieval and with a radius of the order of the retrieval error; and (ii) any prior information used to constrain the retrieval should not underrepresent the variability of the state, so as to retain the information content of the measurements. Both these requirements can be tested in practice. When these requirements are met, retrievals can be transformed so as to represent only the portion of the state that is well constrained by the original radiance measurements and can be assimilated in a consistent and optimal way, by means of an appropriate observation operator and a unit matrix as error covariance. Finally, specific cases when retrieval assimilation can be more advantageous (e.g., when the estimate sought by the operational assimilation system depends on the first guess) are discussed.
Resumo:
We evaluate a number of real estate sentiment indices to ascertain current and forward-looking information content that may be useful for forecasting the demand and supply activities. Our focus lies on sector-specific surveys targeting the players from the supply-side of both residential and non-residential real estate markets. Analyzing the dynamic relationships within a Vector Auto-Regression (VAR) framework, we test the efficacy of these indices by comparing them with other coincident indicators in predicting real estate returns. Overall, our analysis suggests that sentiment indicators convey important information which should be embedded in the modeling exercise to predict real estate market returns. Generally, sentiment indices show better information content than broad economic indicators. The goodness of fit of our models is higher for the residential market than for the non-residential real estate sector. The impulse responses, in general, conform to our theoretical expectations. Variance decompositions and out-of-sample predictions generally show desired contribution and reasonable improvement respectively, thus upholding our hypothesis. Quite remarkably, consistent with the theory, the predictability swings when we look through different phases of the cycle. This perhaps suggests that, e.g. during recessions, market players’ expectations may be more accurate predictor of the future performances, conceivably indicating a ‘negative’ information processing bias and thus conforming to the precautionary motive of consumer behaviour.
Resumo:
Decadal predictions have a high profile in the climate science community and beyond, yet very little is known about their skill. Nor is there any agreed protocol for estimating their skill. This paper proposes a sound and coordinated framework for verification of decadal hindcast experiments. The framework is illustrated for decadal hindcasts tailored to meet the requirements and specifications of CMIP5 (Coupled Model Intercomparison Project phase 5). The chosen metrics address key questions about the information content in initialized decadal hindcasts. These questions are: (1) Do the initial conditions in the hindcasts lead to more accurate predictions of the climate, compared to un-initialized climate change projections? and (2) Is the prediction model’s ensemble spread an appropriate representation of forecast uncertainty on average? The first question is addressed through deterministic metrics that compare the initialized and uninitialized hindcasts. The second question is addressed through a probabilistic metric applied to the initialized hindcasts and comparing different ways to ascribe forecast uncertainty. Verification is advocated at smoothed regional scales that can illuminate broad areas of predictability, as well as at the grid scale, since many users of the decadal prediction experiments who feed the climate data into applications or decision models will use the data at grid scale, or downscale it to even higher resolution. An overall statement on skill of CMIP5 decadal hindcasts is not the aim of this paper. The results presented are only illustrative of the framework, which would enable such studies. However, broad conclusions that are beginning to emerge from the CMIP5 results include (1) Most predictability at the interannual-to-decadal scale, relative to climatological averages, comes from external forcing, particularly for temperature; (2) though moderate, additional skill is added by the initial conditions over what is imparted by external forcing alone; however, the impact of initialization may result in overall worse predictions in some regions than provided by uninitialized climate change projections; (3) limited hindcast records and the dearth of climate-quality observational data impede our ability to quantify expected skill as well as model biases; and (4) as is common to seasonal-to-interannual model predictions, the spread of the ensemble members is not necessarily a good representation of forecast uncertainty. The authors recommend that this framework be adopted to serve as a starting point to compare prediction quality across prediction systems. The framework can provide a baseline against which future improvements can be quantified. The framework also provides guidance on the use of these model predictions, which differ in fundamental ways from the climate change projections that much of the community has become familiar with, including adjustment of mean and conditional biases, and consideration of how to best approach forecast uncertainty.
Resumo:
Global NDVI data are routinely derived from the AVHRR, SPOT-VGT, and MODIS/Terra earth observation records for a range of applications from terrestrial vegetation monitoring to climate change modeling. This has led to a substantial interest in the harmonization of multisensor records. Most evaluations of the internal consistency and continuity of global multisensor NDVI products have focused on time-series harmonization in the spectral domain, often neglecting the spatial domain. We fill this void by applying variogram modeling (a) to evaluate the differences in spatial variability between 8-km AVHRR, 1-km SPOT-VGT, and 1-km, 500-m, and 250-m MODIS NDVI products over eight EOS (Earth Observing System) validation sites, and (b) to characterize the decay of spatial variability as a function of pixel size (i.e. data regularization) for spatially aggregated Landsat ETM+ NDVI products and a real multisensor dataset. First, we demonstrate that the conjunctive analysis of two variogram properties – the sill and the mean length scale metric – provides a robust assessment of the differences in spatial variability between multiscale NDVI products that are due to spatial (nominal pixel size, point spread function, and view angle) and non-spatial (sensor calibration, cloud clearing, atmospheric corrections, and length of multi-day compositing period) factors. Next, we show that as the nominal pixel size increases, the decay of spatial information content follows a logarithmic relationship with stronger fit value for the spatially aggregated NDVI products (R2 = 0.9321) than for the native-resolution AVHRR, SPOT-VGT, and MODIS NDVI products (R2 = 0.5064). This relationship serves as a reference for evaluation of the differences in spatial variability and length scales in multiscale datasets at native or aggregated spatial resolutions. The outcomes of this study suggest that multisensor NDVI records cannot be integrated into a long-term data record without proper consideration of all factors affecting their spatial consistency. Hence, we propose an approach for selecting the spatial resolution, at which differences in spatial variability between NDVI products from multiple sensors are minimized. This approach provides practical guidance for the harmonization of long-term multisensor datasets.
Resumo:
This paper examines the impact of changes in the composition of real estate stock indices, considering companies both joining and leaving the indices. Stocks that are newly included not only see a short-term increase in their share price, but trading volumes increase in a permanent fashion following the event. This highlights the importance of indices in not only a benchmarking context but also in enhancing investor awareness and aiding liquidity. By contrast, as anticipated, the share prices of firms removed from indices fall around the time of the index change. The fact that the changes in share prices, either upwards for index inclusions or downwards for deletions, are generally not reversed, would indicate that the movements are not purely due to price pressure, but rather are more consistent with the information content hypothesis. There is no evidence, however, that index changes significantly affect the volatility of price changes or their operating performances as measured by their earnings per share.
Resumo:
Prism is a modular classification rule generation method based on the ‘separate and conquer’ approach that is alternative to the rule induction approach using decision trees also known as ‘divide and conquer’. Prism often achieves a similar level of classification accuracy compared with decision trees, but tends to produce a more compact noise tolerant set of classification rules. As with other classification rule generation methods, a principle problem arising with Prism is that of overfitting due to over-specialised rules. In addition, over-specialised rules increase the associated computational complexity. These problems can be solved by pruning methods. For the Prism method, two pruning algorithms have been introduced recently for reducing overfitting of classification rules - J-pruning and Jmax-pruning. Both algorithms are based on the J-measure, an information theoretic means for quantifying the theoretical information content of a rule. Jmax-pruning attempts to exploit the J-measure to its full potential because J-pruning does not actually achieve this and may even lead to underfitting. A series of experiments have proved that Jmax-pruning may outperform J-pruning in reducing overfitting. However, Jmax-pruning is computationally relatively expensive and may also lead to underfitting. This paper reviews the Prism method and the two existing pruning algorithms above. It also proposes a novel pruning algorithm called Jmid-pruning. The latter is based on the J-measure and it reduces overfitting to a similar level as the other two algorithms but is better in avoiding underfitting and unnecessary computational effort. The authors conduct an experimental study on the performance of the Jmid-pruning algorithm in terms of classification accuracy and computational efficiency. The algorithm is also evaluated comparatively with the J-pruning and Jmax-pruning algorithms.
Resumo:
We evaluate the predictive power of leading indicators for output growth at horizons up to 1 year. We use the MIDAS regression approach as this allows us to combine multiple individual leading indicators in a parsimonious way and to directly exploit the information content of the monthly series to predict quarterly output growth. When we use real-time vintage data, the indicators are found to have significant predictive ability, and this is further enhanced by the use of monthly data on the quarter at the time the forecast is made
Resumo:
Explaining the diversity of languages across the world is one of the central aims of typological, historical, and evolutionary linguistics. We consider the effect of language contact-the number of non-native speakers a language has-on the way languages change and evolve. By analysing hundreds of languages within and across language families, regions, and text types, we show that languages with greater levels of contact typically employ fewer word forms to encode the same information content (a property we refer to as lexical diversity). Based on three types of statistical analyses, we demonstrate that this variance can in part be explained by the impact of non-native speakers on information encoding strategies. Finally, we argue that languages are information encoding systems shaped by the varying needs of their speakers. Language evolution and change should be modeled as the co-evolution of multiple intertwined adaptive systems: On one hand, the structure of human societies and human learning capabilities, and on the other, the structure of language.
Resumo:
We utilized an ecosystem process model (SIPNET, simplified photosynthesis and evapotranspiration model) to estimate carbon fluxes of gross primary productivity and total ecosystem respiration of a high-elevation coniferous forest. The data assimilation routine incorporated aggregated twice-daily measurements of the net ecosystem exchange of CO2 (NEE) and satellite-based reflectance measurements of the fraction of absorbed photosynthetically active radiation (fAPAR) on an eight-day timescale. From these data we conducted a data assimilation experiment with fifteen different combinations of available data using twice-daily NEE, aggregated annual NEE, eight-day f AP AR, and average annual fAPAR. Model parameters were conditioned on three years of NEE and fAPAR data and results were evaluated to determine the information content from the different combinations of data streams. Across the data assimilation experiments conducted, model selection metrics such as the Bayesian Information Criterion and Deviance Information Criterion obtained minimum values when assimilating average annual fAPAR and twice-daily NEE data. Application of wavelet coherence analyses showed higher correlations between measured and modeled fAPAR on longer timescales ranging from 9 to 12 months. There were strong correlations between measured and modeled NEE (R2, coefficient of determination, 0.86), but correlations between measured and modeled eight-day fAPAR were quite poor (R2 = −0.94). We conclude that this inability to determine fAPAR on eight-day timescale would improve with the considerations of the radiative transfer through the plant canopy. Modeled fluxes when assimilating average annual fAPAR and annual NEE were comparable to corresponding results when assimilating twice-daily NEE, albeit at a greater uncertainty. Our results support the conclusion that for this coniferous forest twice-daily NEE data are a critical measurement stream for the data assimilation. The results from this modeling exercise indicate that for this coniferous forest, average annuals for satellite-based fAPAR measurements paired with annual NEE estimates may provide spatial detail to components of ecosystem carbon fluxes in proximity of eddy covariance towers. Inclusion of other independent data streams in the assimilation will also reduce uncertainty on modeled values.
Resumo:
We present cross-validation of remote sensing measurements of methane profiles in the Canadian high Arctic. Accurate and precise measurements of methane are essential to understand quantitatively its role in the climate system and in global change. Here, we show a cross-validation between three datasets: two from spaceborne instruments and one from a ground-based instrument. All are Fourier Transform Spectrometers (FTSs). We consider the Canadian SCISAT Atmospheric Chemistry Experiment (ACE)-FTS, a solar occultation infrared spectrometer operating since 2004, and the thermal infrared band of the Japanese Greenhouse Gases Observing Satellite (GOSAT) Thermal And Near infrared Sensor for carbon Observation (TANSO)-FTS, a nadir/off-nadir scanning FTS instrument operating at solar and terrestrial infrared wavelengths, since 2009. The ground-based instrument is a Bruker 125HR Fourier Transform Infrared (FTIR) spectrometer, measuring mid-infrared solar absorption spectra at the Polar Environment Atmospheric Research Laboratory (PEARL) Ridge Lab at Eureka, Nunavut (80° N, 86° W) since 2006. For each pair of instruments, measurements are collocated within 500 km and 24 h. An additional criterion based on potential vorticity values was found not to significantly affect differences between measurements. Profiles are regridded to a common vertical grid for each comparison set. To account for differing vertical resolutions, ACE-FTS measurements are smoothed to the resolution of either PEARL-FTS or TANSO-FTS, and PEARL-FTS measurements are smoothed to the TANSO-FTS resolution. Differences for each pair are examined in terms of profile and partial columns. During the period considered, the number of collocations for each pair is large enough to obtain a good sample size (from several hundred to tens of thousands depending on pair and configuration). Considering full profiles, the degrees of freedom for signal (DOFS) are between 0.2 and 0.7 for TANSO-FTS and between 1.5 and 3 for PEARL-FTS, while ACE-FTS has considerably more information (roughly 1° of freedom per altitude level). We take partial columns between roughly 5 and 30 km for the ACE-FTS–PEARL-FTS comparison, and between 5 and 10 km for the other pairs. The DOFS for the partial columns are between 1.2 and 2 for PEARL-FTS collocated with ACE-FTS, between 0.1 and 0.5 for PEARL-FTS collocated with TANSO-FTS or for TANSO-FTS collocated with either other instrument, while ACE-FTS has much higher information content. For all pairs, the partial column differences are within ± 3 × 1022 molecules cm−2. Expressed as median ± median absolute deviation (expressed in absolute or relative terms), these differences are 0.11 ± 9.60 × 10^20 molecules cm−2 (0.012 ± 1.018 %) for TANSO-FTS–PEARL-FTS, −2.6 ± 2.6 × 10^21 molecules cm−2 (−1.6 ± 1.6 %) for ACE-FTS–PEARL-FTS, and 7.4 ± 6.0 × 10^20 molecules cm−2 (0.78 ± 0.64 %) for TANSO-FTS–ACE-FTS. The differences for ACE-FTS–PEARL-FTS and TANSO-FTS–PEARL-FTS partial columns decrease significantly as a function of PEARL partial columns, whereas the range of partial column values for TANSO-FTS–ACE-FTS collocations is too small to draw any conclusion on its dependence on ACE-FTS partial columns.
Resumo:
A challenge for the clinical management of Parkinson's disease (PD) is the large within- and between-patient variability in symptom profiles as well as the emergence of motor complications which represent a significant source of disability in patients. This thesis deals with the development and evaluation of methods and systems for supporting the management of PD by using repeated measures, consisting of subjective assessments of symptoms and objective assessments of motor function through fine motor tests (spirography and tapping), collected by means of a telemetry touch screen device. One aim of the thesis was to develop methods for objective quantification and analysis of the severity of motor impairments being represented in spiral drawings and tapping results. This was accomplished by first quantifying the digitized movement data with time series analysis and then using them in data-driven modelling for automating the process of assessment of symptom severity. The objective measures were then analysed with respect to subjective assessments of motor conditions. Another aim was to develop a method for providing comparable information content as clinical rating scales by combining subjective and objective measures into composite scores, using time series analysis and data-driven methods. The scores represent six symptom dimensions and an overall test score for reflecting the global health condition of the patient. In addition, the thesis presents the development of a web-based system for providing a visual representation of symptoms over time allowing clinicians to remotely monitor the symptom profiles of their patients. The quality of the methods was assessed by reporting different metrics of validity, reliability and sensitivity to treatment interventions and natural PD progression over time. Results from two studies demonstrated that the methods developed for the fine motor tests had good metrics indicating that they are appropriate to quantitatively and objectively assess the severity of motor impairments of PD patients. The fine motor tests captured different symptoms; spiral drawing impairment and tapping accuracy related to dyskinesias (involuntary movements) whereas tapping speed related to bradykinesia (slowness of movements). A longitudinal data analysis indicated that the six symptom dimensions and the overall test score contained important elements of information of the clinical scales and can be used to measure effects of PD treatment interventions and disease progression. A usability evaluation of the web-based system showed that the information presented in the system was comparable to qualitative clinical observations and the system was recognized as a tool that will assist in the management of patients.
Resumo:
O objetivo desse trabalho é avaliar a capacidade de previsão do mercado sobre a volatilidade futura a partir das informações obtidas nas opções de Petrobras e Vale, além de fazer uma comparação com modelos do tipo GARCH e EWMA. Estudos semelhantes foram realizados no mercado de ações americano: Seja com uma cesta de ações selecionadas ou com relação ao índice S&P 100, as conclusões foram diversas. Se Canina e Figlewski (1993) a “volatilidade implícita tem virtualmente nenhuma correlação com a volatilidade futura”, Christensen e Prabhala (1998) concluem que a volatilidade implícita é um bom preditor da volatilidade futura. No mercado brasileiro, Andrade e Tabak (2001) utilizam opções de dólar para estudar o conteúdo da informação no mercado de opções. Além disso, comparam o poder de previsão da volatilidade implícita com modelos de média móvel e do tipo GARCH. Os autores concluem que a volatilidade implícita é um estimador viesado da volatilidade futura mas de desempenho superior se comparada com modelos estatísticos. Gabe e Portugal (2003) comparam a volatilidade implícita das opções de Telemar (TNLP4) com modelos estatísticos do tipo GARCH. Nesse caso, volatilidade implícita tambem é um estimador viesado, mas os modelos estatísticos além de serem bons preditores, não apresentaram viés. Os dados desse trabalho foram obtidos ao longo de 2008 e início de 2009, optando-se por observações intradiárias das volatilidades implícitas das opções “no dinheiro” de Petrobrás e Vale dos dois primeiros vencimentos. A volatidade implícita observada no mercado para ambos os ativos contém informação relevante sobre a volatilidade futura, mas da mesma forma que em estudos anteriores, mostou-se viesada. No caso específico de Petrobrás, o modelo GARCH se mostrou um previsor eficiente da volatilidade futura
Resumo:
We use the information content in the decisions of the NBER Business Cycle Dating Committee to construct coincident and leading indices of economic activity for the United States. We identify the coincident index by assuming that the coincident variables have a common cycle with the unobserved state of the economy, and that the NBER business cycle dates signify the turning points in the unobserved state. This model allows us to estimate our coincident index as a linear combination of the coincident series. We establish that our index performs better than other currently popular coincident indices of economic activity.
Resumo:
We use the information content in the decisions of the NBER Business Cycle Dating Committee to construct coincident and leading indices of economic activity for the United States. We identify the coincident index by assuming that the coincident variables have a common cycle with the unobserved state of the economy, and that the NBER business cycle dates signify the turning points in the unobserved state. This model allows us to estimate our coincident index as a linear combination of the coincident series. We establish that our index performs better than other currently popular coincident indices of economic activity.