33 resultados para Data analysis system
Resumo:
The influence matrix is used in ordinary least-squares applications for monitoring statistical multiple-regression analyses. Concepts related to the influence matrix provide diagnostics on the influence of individual data on the analysis - the analysis change that would occur by leaving one observation out, and the effective information content (degrees of freedom for signal) in any sub-set of the analysed data. In this paper, the corresponding concepts have been derived in the context of linear statistical data assimilation in numerical weather prediction. An approximate method to compute the diagonal elements of the influence matrix (the self-sensitivities) has been developed for a large-dimension variational data assimilation system (the four-dimensional variational system of the European Centre for Medium-Range Weather Forecasts). Results show that, in the boreal spring 2003 operational system, 15% of the global influence is due to the assimilated observations in any one analysis, and the complementary 85% is the influence of the prior (background) information, a short-range forecast containing information from earlier assimilated observations. About 25% of the observational information is currently provided by surface-based observing systems, and 75% by satellite systems. Low-influence data points usually occur in data-rich areas, while high-influence data points are in data-sparse areas or in dynamically active regions. Background-error correlations also play an important role: high correlation diminishes the observation influence and amplifies the importance of the surrounding real and pseudo observations (prior information in observation space). Incorrect specifications of background and observation-error covariance matrices can be identified, interpreted and better understood by the use of influence-matrix diagnostics for the variety of observation types and observed variables used in the data assimilation system. Copyright © 2004 Royal Meteorological Society
Resumo:
Increasingly, distributed systems are being used to host all manner of applications. While these platforms provide a relatively cheap and effective means of executing applications, so far there has been little work in developing tools and utilities that can help application developers understand problems with the supporting software, or the executing applications. To fully understand why an application executing on a distributed system is not behaving as would be expected it is important that not only the application, but also the underlying middleware, and the operating system are analysed too, otherwise issues could be missed and certainly overall performance profiling and fault diagnoses would be harder to understand. We believe that one approach to profiling and the analysis of distributed systems and the associated applications is via the plethora of log files generated at runtime. In this paper we report on a system (Slogger), that utilises various emerging Semantic Web technologies to gather the heterogeneous log files generated by the various layers in a distributed system and unify them in common data store. Once unified, the log data can be queried and visualised in order to highlight potential problems or issues that may be occurring in the supporting software or the application itself.
Resumo:
This article reflects on key methodological issues emerging from children and young people's involvement in data analysis processes. We outline a pragmatic framework illustrating different approaches to engaging children, using two case studies of children's experiences of participating in data analysis. The article highlights methods of engagement and important issues such as the balance of power between adults and children, training, support, ethical considerations, time and resources. We argue that involving children in data analysis processes can have several benefits, including enabling a greater understanding of children's perspectives and helping to prioritise children's agendas in policy and practice. (C) 2007 The Author(s). Journal compilation (C) 2007 National Children's Bureau.
Recent developments in genetic data analysis: what can they tell us about human demographic history?
Resumo:
Over the last decade, a number of new methods of population genetic analysis based on likelihood have been introduced. This review describes and explains the general statistical techniques that have recently been used, and discusses the underlying population genetic models. Experimental papers that use these methods to infer human demographic and phylogeographic history are reviewed. It appears that the use of likelihood has hitherto had little impact in the field of human population genetics, which is still primarily driven by more traditional approaches. However, with the current uncertainty about the effects of natural selection, population structure and ascertainment of single-nucleotide polymorphism markers, it is suggested that likelihood-based methods may have a greater impact in the future.
Resumo:
In this paper, we address issues in segmentation Of remotely sensed LIDAR (LIght Detection And Ranging) data. The LIDAR data, which were captured by airborne laser scanner, contain 2.5 dimensional (2.5D) terrain surface height information, e.g. houses, vegetation, flat field, river, basin, etc. Our aim in this paper is to segment ground (flat field)from non-ground (houses and high vegetation) in hilly urban areas. By projecting the 2.5D data onto a surface, we obtain a texture map as a grey-level image. Based on the image, Gabor wavelet filters are applied to generate Gabor wavelet features. These features are then grouped into various windows. Among these windows, a combination of their first and second order of statistics is used as a measure to determine the surface properties. The test results have shown that ground areas can successfully be segmented from LIDAR data. Most buildings and high vegetation can be detected. In addition, Gabor wavelet transform can partially remove hill or slope effects in the original data by tuning Gabor parameters.
Resumo:
The iRODS system, created by the San Diego Supercomputing Centre, is a rule oriented data management system that allows the user to create sets of rules to define how the data is to be managed. Each rule corresponds to a particular action or operation (such as checksumming a file) and the system is flexible enough to allow the user to create new rules for new types of operations. The iRODS system can interface to any storage system (provided an iRODS driver is built for that system) and relies on its’ metadata catalogue to provide a virtual file-system that can handle files of any size and type. However, some storage systems (such as tape systems) do not handle small files efficiently and prefer small files to be packaged up (or “bundled”) into larger units. We have developed a system that can bundle small data files of any type into larger units - mounted collections. The system can create collection families and contains its’ own extensible metadata, including metadata on which family the collection belongs to. The mounted collection system can work standalone and is being incorporated into the iRODS system to enhance the systems flexibility to handle small files. In this paper we describe the motivation for creating a mounted collection system, its’ architecture and how it has been incorporated into the iRODS system. We describe different technologies used to create the mounted collection system and provide some performance numbers.
Resumo:
The principle aim of this research is to elucidate the factors driving the total rate of return of non-listed funds using a panel data analytical framework. In line with previous results, we find that core funds exhibit lower yet more stable returns than value-added and, in particular, opportunistic funds, both cross-sectionally and over time. After taking into account overall market exposure, as measured by weighted market returns, the excess returns of value-added and opportunity funds are likely to stem from: high leverage, high exposure to development, active asset management and investment in specialized property sectors. A random effects estimation of the panel data model largely confirms the findings obtained from the fixed effects model. Again, the country and sector property effect shows the strongest significance in explaining total returns. The stock market variable is negative which hints at switching effects between competing asset classes. For opportunity funds, on average, the returns attributable to gearing are three times higher than those for value added funds and over five times higher than for core funds. Overall, there is relatively strong evidence indicating that country and sector allocation, style, gearing and fund size combinations impact on the performance of unlisted real estate funds.