969 resultados para on-disk data layout


Relevância:

100.00% 100.00%

Publicador:

Resumo:

R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computing and graphics. The environment in which many classical and modern statistical techniques have been implemented, but many are supplied as packages. There are 8 standard packages and many more are available through the cran family of Internet sites http://cran.r-project.org . We started to develop a library of functions in R to support the analysis of mixtures and our goal is a MixeR package for compositional data analysis that provides support for operations on compositions: perturbation and power multiplication, subcomposition with or without residuals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances, compositional Kullback-Leibler divergence etc. graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features: barycenter, geometric mean of the data set, the percentiles lines, marking and coloring of subsets of the data set, theirs geometric means, notation of individual data in the set . . . dealing with zeros and missing values in compositional data sets with R procedures for simple and multiplicative replacement strategy, the time series analysis of compositional data. We’ll present the current status of MixeR development and illustrate its use on selected data sets

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Compositional random vectors are fundamental tools in the Bayesian analysis of categorical data. Many of the issues that are discussed with reference to the statistical analysis of compositional data have a natural counterpart in the construction of a Bayesian statistical model for categorical data. This note builds on the idea of cross-fertilization of the two areas recommended by Aitchison (1986) in his seminal book on compositional data. Particular emphasis is put on the problem of what parameterization to use

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A major obstacle to processing images of the ocean floor comes from the absorption and scattering effects of the light in the aquatic environment. Due to the absorption of the natural light, underwater vehicles often require artificial light sources attached to them to provide the adequate illumination. Unfortunately, these flashlights tend to illuminate the scene in a nonuniform fashion, and, as the vehicle moves, induce shadows in the scene. For this reason, the first step towards application of standard computer vision techniques to underwater imaging requires dealing first with these lighting problems. This paper analyses and compares existing methodologies to deal with low-contrast, nonuniform illumination in underwater image sequences. The reviewed techniques include: (i) study of the illumination-reflectance model, (ii) local histogram equalization, (iii) homomorphic filtering, and, (iv) subtraction of the illumination field. Several experiments on real data have been conducted to compare the different approaches

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This lecture introduces an array of data sources that can be used to create new applications and visualisations, many examples of which are given. Additionally, there are a number of slides on open data standards, freedom of information requests and how to affect the future of open data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Flood extent maps derived from SAR images are a useful source of data for validating hydraulic models of river flood flow. The accuracy of such maps is reduced by a number of factors, including changes in returns from the water surface caused by different meteorological conditions and the presence of emergent vegetation. The paper describes how improved accuracy can be achieved by modifying an existing flood extent delineation algorithm to use airborne laser altimetry (LiDAR) as well as SAR data. The LiDAR data provide an additional constraint that waterline (land-water boundary) heights should vary smoothly along the flooded reach. The method was tested on a SAR image of a flood for which contemporaneous aerial photography existed, together with LiDAR data of the un-flooded reach. Waterline heights of the SAR flood extent conditioned on both SAR and LiDAR data matched the corresponding heights from the aerial photo waterline significantly more closely than those from the SAR flood extent conditioned only on SAR data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article reports on an investigation into the language learning beliefs of students of French in England, aged 16 to 18. It focuses on qualitative data from two groups of learners (10 in total). While both groups had broadly similar levels of achievement in French in terns of examination success, they dffered greatly in the self-image they had of themselves as language learners, with one group displaying low levels of self-eficacy beliefs regarding the possibility of future success. The implica tions of such beliefs for students' levels of motivation and persistence are discussed, together with their possible causes. The article concludes by suggesting changes in classroom practice that might help students develop a more positive image of them selves as language learners.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is a report on the data-mining of two chess databases, the objective being to compare their sub-7-man content with perfect play as documented in Nalimov endgame tables. Van der Heijden’s ENDGAME STUDY DATABASE IV is a definitive collection of 76,132 studies in which White should have an essentially unique route to the stipulated goal. Chessbase’s BIG DATABASE 2010 holds some 4.5 million games. Insight gained into both database content and data-mining has led to some delightful surprises and created a further agenda.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The polar winter stratospheric vortex is a coherent structure that undergoes different types of deformation that can be revealed by the geometric invariant moments. Three moments are used—the aspect ratio, the centroid latitude, and the area of the vortex based on stratospheric data from the 40-yr ECMWF Re-Analysis (ERA-40) project—to study sudden stratospheric warmings. Hierarchical clustering combined with data image visualization techniques is used as well. Using the gap statistic, three optimal clusters are obtained based on the three geometric moments considered here. The 850-K potential vorticity field, as well as the vertical profiles of polar temperature and zonal wind, provides evidence that the clusters represent, respectively, the undisturbed (U), displaced (D), and split (S) states of the polar vortex. This systematic method for identifying and characterizing the state of the polar vortex using objective methods is useful as a tool for analyzing observations and as a test for climate models to simulate the observations. The method correctly identifies all previously identified major warmings and also identifies significant minor warmings where the atmosphere is substantially disturbed but does not quite meet the criteria to qualify as a major stratospheric warming.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a method for dynamic data reconciliation of nonlinear systems that are simulated using the sequential modular approach, and where individual modules are represented by a class of differential algebraic equations. The estimation technique consists of a bank of extended Kalman filters that are integrated with the modules. The paper reports a study based on experimental data obtained from a pilot scale mixing process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Summary 1. In recent decades there have been population declines of many UK bird species, which have become the focus of intense research and debate. Recently, as the populations of potential predators have increased there is concern that increased rates of predation may be contributing to the declines. In this review, we assess the methodologies behind the current published science on the impacts of predators on avian prey in the UK. 2. We identified suitable studies, classified these according to study design (experimental ⁄observational) and assessed the quantity and quality of the data upon which any variation in predation rates was inferred. We then explored whether the underlying study methodology had implications for study outcome. 3. We reviewed 32 published studies and found that typically observational studies comprehensively monitored significantly fewer predator species than experimental studies. Data for a difference in predator abundance from targeted (i.e. bespoke) census techniques were available for less than half of the 32 predator species studied. 4. The probability of a study detecting an impact on prey abundance was strongly, positively related to the quality and quantity of data upon which the gradient in predation rates was inferred. 5. The findings suggest that if a study is based on good quality abundance data for a range of predator species then it is more likely to detect an effect than if it relies on opportunistic data for a smaller number of predators. 6. We recommend that the findings from studies which use opportunistic data, for a limited number of predator species, should be treated with caution and that future studies employ bespoke census techniques to monitor predator abundance for an appropriate suite of predators.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aircraft Maintenance, Repair and Overhaul (MRO) agencies rely largely on row-data based quotation systems to select the best suppliers for the customers (airlines). The data quantity and quality becomes a key issue to determining the success of an MRO job, since we need to ensure we achieve cost and quality benchmarks. This paper introduces a data mining approach to create an MRO quotation system that enhances the data quantity and data quality, and enables significantly more precise MRO job quotations. Regular Expression was utilized to analyse descriptive textual feedback (i.e. engineer’s reports) in order to extract more referable highly normalised data for job quotation. A text mining based key influencer analysis function enables the user to proactively select sub-parts, defects and possible solutions to make queries more accurate. Implementation results show that system data would improve cost quotation in 40% of MRO jobs, would reduce service cost without causing a drop in service quality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Previous studies of the place of Property in the multi-asset portfolio have generally relied on historical data, and have been concerned with the supposed risk reduction effects that Property would have on such portfolios. In this paper a different approach has been taken. Not only are expectations data used, but we have also concentrated upon the required return that Property would have to offer to achieve a holding of 15% in typical UK pension fund portfolios. Using two benchmark portfolios for pension funds, we have shown that Property's required return is less than that expected, and therefore it could justify a 15% holding.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Large-scale bottom-up estimates of terrestrial carbon fluxes, whether based on models or inventory, are highly dependent on the assumed land cover. Most current land cover and land cover change maps are based on satellite data and are likely to be so for the foreseeable future. However, these maps show large differences, both at the class level and when transformed into Plant Functional Types (PFTs), and these can lead to large differences in terrestrial CO2 fluxes estimated by Dynamic Vegetation Models. In this study the Sheffield Dynamic Global Vegetation Model is used. We compare PFT maps and the resulting fluxes arising from the use of widely available moderate (1 km) resolution satellite-derived land cover maps (the Global Land Cover 2000 and several MODIS classification schemes), with fluxes calculated using a reference high (25 m) resolution land cover map specific to Great Britain (the Land Cover Map 2000). We demonstrate that uncertainty is introduced into carbon flux calculations by (1) incorrect or uncertain assignment of land cover classes to PFTs; (2) information loss at coarser resolutions; (3) difficulty in discriminating some vegetation types from satellite data. When averaged over Great Britain, modeled CO2 fluxes derived using the different 1 km resolution maps differ from estimates made using the reference map. The ranges of these differences are 254 gC m−2 a−1 in Gross Primary Production (GPP); 133 gC m−2 a−1 in Net Primary Production (NPP); and 43 gC m−2 a−1 in Net Ecosystem Production (NEP). In GPP this accounts for differences of −15.8% to 8.8%. Results for living biomass exhibit a range of 1109 gC m−2. The types of uncertainties due to land cover confusion are likely to be representative of many parts of the world, especially heterogeneous landscapes such as those found in western Europe.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We investigate the error dynamics for cycled data assimilation systems, such that the inverse problem of state determination is solved at tk, k = 1, 2, 3, ..., with a first guess given by the state propagated via a dynamical system model from time tk − 1 to time tk. In particular, for nonlinear dynamical systems that are Lipschitz continuous with respect to their initial states, we provide deterministic estimates for the development of the error ||ek|| := ||x(a)k − x(t)k|| between the estimated state x(a) and the true state x(t) over time. Clearly, observation error of size δ > 0 leads to an estimation error in every assimilation step. These errors can accumulate, if they are not (a) controlled in the reconstruction and (b) damped by the dynamical system under consideration. A data assimilation method is called stable, if the error in the estimate is bounded in time by some constant C. The key task of this work is to provide estimates for the error ||ek||, depending on the size δ of the observation error, the reconstruction operator Rα, the observation operator H and the Lipschitz constants K(1) and K(2) on the lower and higher modes of controlling the damping behaviour of the dynamics. We show that systems can be stabilized by choosing α sufficiently small, but the bound C will then depend on the data error δ in the form c||Rα||δ with some constant c. Since ||Rα|| → ∞ for α → 0, the constant might be large. Numerical examples for this behaviour in the nonlinear case are provided using a (low-dimensional) Lorenz '63 system.