908 resultados para Data Structure Operations
Resumo:
Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.
Resumo:
The international response to SARS-CoV has produced an outstanding number of protein structures in a very short time. This review summarizes the findings of functional and structural studies including those derived from cryoelectron microscopy, small angle X-ray scattering, NMR spectroscopy, and X-ray crystallography, and incorporates bioinformatics predictions where no structural data is available. Structures that shed light on the function and biological roles of the proteins in viral replication and pathogenesis are highlighted. The high percentage of novel protein folds identified among SARS-CoV proteins is discussed.
Resumo:
Dispersion in the near-field region of localised releases in urban areas is difficult to predict because of the strong influence of individual buildings. Effects include upstream dispersion, trapping of material into building wakes and enhanced concentration fluctuations. As a result, concentration patterns are highly variable in time and mean profiles in the near field are strongly non-Gaussian. These aspects of near-field dispersion are documented by analysing data from direct numerical simulations in arrays of building-like obstacles and are related to the underlying flow structure. The mean flow structure around the buildings is found to exert a strong influence over the dispersion of material in the near field. Diverging streamlines around buildings enhance lateral dispersion. Entrainment of material into building wakes in the very near field gives rise to secondary sources, which then affect the subsequent dispersion pattern. High levels of concentration fluctuations are also found in this very near field; the fluctuation intensity is of order 2 to 5.
Resumo:
This paper presents and implements a number of tests for non-linear dependence and a test for chaos using transactions prices on three LIFFE futures contracts: the Short Sterling interest rate contract, the Long Gilt government bond contract, and the FTSE 100 stock index futures contract. While previous studies of high frequency futures market data use only those transactions which involve a price change, we use all of the transaction prices on these contracts whether they involve a price change or not. Our results indicate irrefutable evidence of non-linearity in two of the three contracts, although we find no evidence of a chaotic process in any of the series. We are also able to provide some indications of the effect of the duration of the trading day on the degree of non-linearity of the underlying contract. The trading day for the Long Gilt contract was extended in August 1994, and prior to this date there is no evidence of any structure in the return series. However, after the extension of the trading day we do find evidence of a non-linear return structure.
Resumo:
The Bollène-2002 Experiment was aimed at developing the use of a radar volume-scanning strategy for conducting radar rainfall estimations in the mountainous regions of France. A developmental radar processing system, called Traitements Régionalisés et Adaptatifs de Données Radar pour l’Hydrologie (Regionalized and Adaptive Radar Data Processing for Hydrological Applications), has been built and several algorithms were specifically produced as part of this project. These algorithms include 1) a clutter identification technique based on the pulse-to-pulse variability of reflectivity Z for noncoherent radar, 2) a coupled procedure for determining a rain partition between convective and widespread rainfall R and the associated normalized vertical profiles of reflectivity, and 3) a method for calculating reflectivity at ground level from reflectivities measured aloft. Several radar processing strategies, including nonadaptive, time-adaptive, and space–time-adaptive variants, have been implemented to assess the performance of these new algorithms. Reference rainfall data were derived from a careful analysis of rain gauge datasets furnished by the Cévennes–Vivarais Mediterranean Hydrometeorological Observatory. The assessment criteria for five intense and long-lasting Mediterranean rain events have proven that good quantitative precipitation estimates can be obtained from radar data alone within 100-km range by using well-sited, well-maintained radar systems and sophisticated, physically based data-processing systems. The basic requirements entail performing accurate electronic calibration and stability verification, determining the radar detection domain, achieving efficient clutter elimination, and capturing the vertical structure(s) of reflectivity for the target event. Radar performance was shown to depend on type of rainfall, with better results obtained with deep convective rain systems (Nash coefficients of roughly 0.90 for point radar–rain gauge comparisons at the event time step), as opposed to shallow convective and frontal rain systems (Nash coefficients in the 0.6–0.8 range). In comparison with time-adaptive strategies, the space–time-adaptive strategy yields a very significant reduction in the radar–rain gauge bias while the level of scatter remains basically unchanged. Because the Z–R relationships have not been optimized in this study, results are attributed to an improved processing of spatial variations in the vertical profile of reflectivity. The two main recommendations for future work consist of adapting the rain separation method for radar network operations and documenting Z–R relationships conditional on rainfall type.
Resumo:
Owing to continuous advances in the computational power of handheld devices like smartphones and tablet computers, it has become possible to perform Big Data operations including modern data mining processes onboard these small devices. A decade of research has proved the feasibility of what has been termed as Mobile Data Mining, with a focus on one mobile device running data mining processes. However, it is not before 2010 until the authors of this book initiated the Pocket Data Mining (PDM) project exploiting the seamless communication among handheld devices performing data analysis tasks that were infeasible until recently. PDM is the process of collaboratively extracting knowledge from distributed data streams in a mobile computing environment. This book provides the reader with an in-depth treatment on this emerging area of research. Details of techniques used and thorough experimental studies are given. More importantly and exclusive to this book, the authors provide detailed practical guide on the deployment of PDM in the mobile environment. An important extension to the basic implementation of PDM dealing with concept drift is also reported. In the era of Big Data, potential applications of paramount importance offered by PDM in a variety of domains including security, business and telemedicine are discussed.
Resumo:
Much UK research and market practice on portfolio strategy and performance benchmarking relies on a sector‐geography subdivision of properties. Prior tests of the appropriateness of such divisions have generally relied on aggregated or hypothetical return data. However, the results found in aggregate may not hold when individual buildings are considered. This paper makes use of a dataset of individual UK property returns. A series of multivariate exploratory statistical techniques are utilised to test whether the return behaviour of individual properties conforms to their a priori grouping. The results suggest strongly that neither standard sector nor regional classifications provide a clear demarcation of individual building performance. This has important implications for both portfolio strategy and performance measurement and benchmarking. However, there do appear to be size and yield effects that help explain return behaviour at the property level.
Resumo:
The purpose of this study was to develop an understanding of the current state of scientific data sharing that stakeholders could use to develop and implement effective data sharing strategies and policies. The study developed a conceptual model to describe the process of data sharing, and the drivers, barriers, and enablers that determine stakeholder engagement. The conceptual model was used as a framework to structure discussions and interviews with key members of all stakeholder groups. Analysis of data obtained from interviewees identified a number of themes that highlight key requirements for the development of a mature data sharing culture.
Resumo:
Automatic generation of classification rules has been an increasingly popular technique in commercial applications such as Big Data analytics, rule based expert systems and decision making systems. However, a principal problem that arises with most methods for generation of classification rules is the overfit-ting of training data. When Big Data is dealt with, this may result in the generation of a large number of complex rules. This may not only increase computational cost but also lower the accuracy in predicting further unseen instances. This has led to the necessity of developing pruning methods for the simplification of rules. In addition, classification rules are used further to make predictions after the completion of their generation. As efficiency is concerned, it is expected to find the first rule that fires as soon as possible by searching through a rule set. Thus a suit-able structure is required to represent the rule set effectively. In this chapter, the authors introduce a unified framework for construction of rule based classification systems consisting of three operations on Big Data: rule generation, rule simplification and rule representation. The authors also review some existing methods and techniques used for each of the three operations and highlight their limitations. They introduce some novel methods and techniques developed by them recently. These methods and techniques are also discussed in comparison to existing ones with respect to efficient processing of Big Data.
Resumo:
We present the first comprehensive intercomparison of currently available satellite ozone climatologies in the upper troposphere/lower stratosphere (UTLS) (300–70 hPa) as part of the Stratosphere-troposphere Processes and their Role in Climate (SPARC) Data Initiative. The Tropospheric Emission Spectrometer (TES) instrument is the only nadir-viewing instrument in this initiative, as well as the only instrument with a focus on tropospheric composition. We apply the TES observational operator to ozone climatologies from the more highly vertically resolved limb-viewing instruments. This minimizes the impact of differences in vertical resolution among the instruments and allows identification of systematic differences in the large-scale structure and variability of UTLS ozone. We find that the climatologies from most of the limb-viewing instruments show positive differences (ranging from 5 to 75%) with respect to TES in the tropical UTLS, and comparison to a “zonal mean” ozonesonde climatology indicates that these differences likely represent a positive bias for p ≤ 100 hPa. In the extratropics, there is good agreement among the climatologies regarding the timing and magnitude of the ozone seasonal cycle (differences in the peak-to-peak amplitude of <15%) when the TES observational operator is applied, as well as very consistent midlatitude interannual variability. The discrepancies in ozone temporal variability are larger in the tropics, with differences between the data sets of up to 55% in the seasonal cycle amplitude. However, the differences among the climatologies are everywhere much smaller than the range produced by current chemistry-climate models, indicating that the multiple-instrument ensemble is useful for quantitatively evaluating these models.
Resumo:
We present an intuitive geometric approach for analysing the structure and fragility of T1-weighted structural MRI scans of human brains. Apart from computing characteristics like the surface area and volume of regions of the brain that consist of highly active voxels, we also employ Network Theory in order to test how close these regions are to breaking apart. This analysis is used in an attempt to automatically classify subjects into three categories: Alzheimer’s disease, mild cognitive impairment and healthy controls, for the CADDementia Challenge.
Resumo:
The projected hand illusion (PHI) is a variant of the rubber hand illusion (RHI), and both are commonly used to study mechanisms of self-perception. A questionnaire was developed by Longo et al. (2008) to measure qualitative changes in the RHI. Such psychometric analyses have not yet been conducted on the questionnaire for the PHI. The present study is an attempt to validate minor modifications of the questionnaire of Longo et al. to assess the PHI in a community sample (n = 48) and to determine the association with selected demographic (age, sex, years of education), cognitive (Digit Span), and clinical (psychotic-like experiences) variables. Principal components analysis on the questionnaire data extracted four components: Embodiment of “Other” Hand, Disembodiment of Own Hand, Deafference, and Agency—in both synchronous and asynchronous PHI conditions. Questions assessing “Embodiment” and “Agency” loaded onto orthogonal components. Greater illusion ratings were positively associated with being female, being younger, and having higher scores on psychotic-like experiences. There was no association with cognitive performance. Overall, this study confirmed that self-perception as measured with PHI is a multicomponent construct, similar in many respects to the RHI. The main difference lies in the separation of Embodiment and Agency into separate constructs, and this likely reflects the fact that the “live” image of the PHI presents a more realistic picture of the hand and of the stroking movements of the experimenter compared with the RHI.
Resumo:
Data assimilation methods which avoid the assumption of Gaussian error statistics are being developed for geoscience applications. We investigate how the relaxation of the Gaussian assumption affects the impact observations have within the assimilation process. The effect of non-Gaussian observation error (described by the likelihood) is compared to previously published work studying the effect of a non-Gaussian prior. The observation impact is measured in three ways: the sensitivity of the analysis to the observations, the mutual information, and the relative entropy. These three measures have all been studied in the case of Gaussian data assimilation and, in this case, have a known analytical form. It is shown that the analysis sensitivity can also be derived analytically when at least one of the prior or likelihood is Gaussian. This derivation shows an interesting asymmetry in the relationship between analysis sensitivity and analysis error covariance when the two different sources of non-Gaussian structure are considered (likelihood vs. prior). This is illustrated for a simple scalar case and used to infer the effect of the non-Gaussian structure on mutual information and relative entropy, which are more natural choices of metric in non-Gaussian data assimilation. It is concluded that approximating non-Gaussian error distributions as Gaussian can give significantly erroneous estimates of observation impact. The degree of the error depends not only on the nature of the non-Gaussian structure, but also on the metric used to measure the observation impact and the source of the non-Gaussian structure.
Resumo:
Twitter is both a micro-blogging service and a platform for public conversation. Direct conversation is facilitated in Twitter through the use of @’s (mentions) and replies. While the conversational element of Twitter is of particular interest to the marketing sector, relatively few data-mining studies have focused on this area. We analyse conversations associated with reciprocated mentions that take place in a data-set consisting of approximately 4 million tweets collected over a period of 28 days that contain at least one mention. We ignore tweet content and instead use the mention network structure and its dynamical properties to identify and characterise Twitter conversations between pairs of users and within larger groups. We consider conversational balance, meaning the fraction of content contributed by each party. The goal of this work is to draw out some of the mechanisms driving conversation in Twitter, with the potential aim of developing conversational models.
Resumo:
The structure of a ferrofluid under the influence of an external magnetic field is expected to become anisotropic due to the alignment of the dipoles into the direction of the external field, and subsequently to the formation of particle chains due to the attractive head to tail orientations of the ferrofluid particles. Knowledge about the structure of a colloidal ferrofluid can be inferred from scattering data via the measurement of structure factors. We have used molecular-dynamics simulations to investigate the structure of both monodispersed and polydispersed ferrofluids. The results for the isotropic structure factor for monodispersed samples are similar to previous data by Camp and Patey that were obtained using an alternative Monte Carlo simulation technique, but in a different parameter region. Here we look in addition at bidispersed samples and compute the anisotropic structure factor by projecting the q vector onto the XY and XZ planes separately, when the magnetic field was applied along the z axis. We observe that the XY- plane structure factor as well as the pair distribution functions are quite different from those obtained for the XZ plane. Further, the two- dimensional structure factor patterns are investigated for both monodispersed and bidispersed samples under different conditions. In addition, we look at the scaling exponents of structure factors. Our results should be of value to interpret scattering data on ferrofluids obtained under the influence of an external field.