36 resultados para Data-stream balancing


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a simple Bayesian approach to sample size determination in clinical trials. It is required that the trial should be large enough to ensure that the data collected will provide convincing evidence either that an experimental treatment is better than a control or that it fails to improve upon control by some clinically relevant difference. The method resembles standard frequentist formulations of the problem, and indeed in certain circumstances involving 'non-informative' prior information it leads to identical answers. In particular, unlike many Bayesian approaches to sample size determination, use is made of an alternative hypothesis that an experimental treatment is better than a control treatment by some specified magnitude. The approach is introduced in the context of testing whether a single stream of binary observations are consistent with a given success rate p(0). Next the case of comparing two independent streams of normally distributed responses is considered, first under the assumption that their common variance is known and then for unknown variance. Finally, the more general situation in which a large sample is to be collected and analysed according to the asymptotic properties of the score statistic is explored. Copyright (C) 2007 John Wiley & Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the past decade, airborne based LIght Detection And Ranging (LIDAR) has been recognised by both the commercial and public sectors as a reliable and accurate source for land surveying in environmental, engineering and civil applications. Commonly, the first task to investigate LIDAR point clouds is to separate ground and object points. Skewness Balancing has been proven to be an efficient non-parametric unsupervised classification algorithm to address this challenge. Initially developed for moderate terrain, this algorithm needs to be adapted to handle sloped terrain. This paper addresses the difficulty of object and ground point separation in LIDAR data in hilly terrain. A case study on a diverse LIDAR data set in terms of data provider, resolution and LIDAR echo has been carried out. Several sites in urban and rural areas with man-made structure and vegetation in moderate and hilly terrain have been investigated and three categories have been identified. A deeper investigation on an urban scene with a river bank has been selected to extend the existing algorithm. The results show that an iterative use of Skewness Balancing is suitable for sloped terrain.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The high variability of the intensity of suprathermal electron flux in the solar wind is usually ascribed to the high variability of sources on the Sun. Here we demonstrate that a substantial amount of the variability arises from peaks in stream interaction regions, where fast wind runs into slow wind and creates a pressure ridge at the interface. Superposed epoch analysis centered on stream interfaces in 26 interaction regions previously identified in Wind data reveal a twofold increase in 250 eV flux (integrated over pitch angle). Whether the peaks result from the compression there or are solar signatures of the coronal hole boundary, to which interfaces may map, is an open question. Suggestive of the latter, some cases show a displacement between the electron and magnetic field peaks at the interface. Since solar information is transmitted to 1 AU much more quickly by suprathermal electrons compared to convected plasma signatures, the displacement may imply a shift in the coronal hole boundary through transport of open magnetic flux via interchange reconnection. If so, however, the fact that displacements occur in both directions and that the electron and field peaks in the superposed epoch analysis are nearly coincident indicate that any systematic transport expected from differential solar rotation is overwhelmed by a random pattern, possibly owing to transport across a ragged coronal hole boundary.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hydrologic transport of dissolved organic carbon (DOC) from peat soils may differ to organo-mineral soils in how they responded to changes in flow, because of differences in soil profile and hydrology. In well-drained organo-mineral soils, low flow is through the lower mineral layer where DOC is absorbed and high flow is through the upper organic layer where DOC is produced. DOC concentrations in streams draining organo-mineral soils typically increase with flow. In saturated peat soils, both high and low flows are through an organic layer where DOC is produced. Therefore, DOC in stream water draining peat may not increase in response to changes in flow as there is no switch in flow path between a mineral and organic layer. To verify this, we conducted a high-resolution monitoring study of soil and stream water at an upland peat catchment in northern England. Our data showed a strong positive correlation between DOC concentrations at − 1 and − 5 cm depth and stream water, and weaker correlations between concentrations at − 20 to − 50 cm depth and stream water. Although near surface organic material appears to be the key source of stream water DOC in both peat and organo-mineral soils, we observed a negative correlation between stream flow and DOC concentrations instead of a positive correlation as DOC released from organic layers during low and high flow was diluted by rainfall. The differences in DOC transport processes between peat and organo-mineral soils have different implications for our understanding of long-term changes in DOC exports. While increased rainfall may cause an increase in DOC flux from peat due to an increase in water volume, it may cause a decrease in concentrations. This response is contrary to expected changes in DOC exports from organo-mineral soils, where increase rainfall is likely to result in an increase in flux and concentration.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Light Detection And Ranging (LIDAR) is an important modality in terrain and land surveying for many environmental, engineering and civil applications. This paper presents the framework for a recently developed unsupervised classification algorithm called Skewness Balancing for object and ground point separation in airborne LIDAR data. The main advantages of the algorithm are threshold-freedom and independence from LIDAR data format and resolution, while preserving object and terrain details. The framework for Skewness Balancing has been built in this contribution with a prediction model in which unknown LIDAR tiles can be categorised as “hilly” or “moderate” terrains. Accuracy assessment of the model is carried out using cross-validation with an overall accuracy of 95%. An extension to the algorithm is developed to address the overclassification issue for hilly terrain. For moderate terrain, the results show that from the classified tiles detached objects (buildings and vegetation) and attached objects (bridges and motorway junctions) are separated from bare earth (ground, roads and yards) which makes Skewness Balancing ideal to be integrated into geographic information system (GIS) software packages.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper examines two hydrochemical time-series derived from stream samples taken in the Upper Hafren catchment, Plynlimon, Wales. One time-series comprises data collected at 7-hour intervals over 22 months (Neal et al., submitted, this issue), while the other is based on weekly sampling over 20 years. A subset of determinands: aluminium, calcium, chloride, conductivity, dissolved organic carbon, iron, nitrate, pH, silicon and sulphate are examined within a framework of non-stationary time-series analysis to identify determinand trends, seasonality and short-term dynamics. The results demonstrate that both long-term and high-frequency monitoring provide valuable and unique insights into the hydrochemistry of a catchment. The long-term data allowed analysis of long-termtrends, demonstrating continued increases in DOC concentrations accompanied by declining SO4 concentrations within the stream, and provided new insights into the changing amplitude and phase of the seasonality of the determinands such as DOC and Al. Additionally, these data proved invaluable for placing the short-term variability demonstrated within the high-frequency data within context. The 7-hour data highlighted complex diurnal cycles for NO3, Ca and Fe with cycles displaying changes in phase and amplitude on a seasonal basis. The high-frequency data also demonstrated the need to consider the impact that the time of sample collection can have on the summary statistics of the data and also that sampling during the hours of darkness provides additional hydrochemical information for determinands which exhibit pronounced diurnal variability. Moving forward, this research demonstrates the need for both long-term and high-frequency monitoring to facilitate a full and accurate understanding of catchment hydrochemical dynamics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Basic Network transactions specifies that datagram from source to destination is routed through numerous routers and paths depending on the available free and uncongested paths which results in the transmission route being too long, thus incurring greater delay, jitter, congestion and reduced throughput. One of the major problems of packet switched networks is the cell delay variation or jitter. This cell delay variation is due to the queuing delay depending on the applied loading conditions. The effect of delay, jitter accumulation due to the number of nodes along transmission routes and dropped packets adds further complexity to multimedia traffic because there is no guarantee that each traffic stream will be delivered according to its own jitter constraints therefore there is the need to analyze the effects of jitter. IP routers enable a single path for the transmission of all packets. On the other hand, Multi-Protocol Label Switching (MPLS) allows separation of packet forwarding and routing characteristics to enable packets to use the appropriate routes and also optimize and control the behavior of transmission paths. Thus correcting some of the shortfalls associated with IP routing. Therefore MPLS has been utilized in the analysis for effective transmission through the various networks. This paper analyzes the effect of delay, congestion, interference, jitter and packet loss in the transmission of signals from source to destination. In effect the impact of link failures, repair paths in the various physical topologies namely bus, star, mesh and hybrid topologies are all analyzed based on standard network conditions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A weekly programme of water quality monitoring has been conducted by Slapton Ley Field Centre since 1970. Samples have been collected for the four main streams draining into Slapton Ley, from the Ley itself and from other sites within the catchment. On occasions, more frequent sampling has been undertaken during short-term research projects, usually in relation to nutrient export from the catchment. These water quality data, unparalleled in length for a series of small drainage basins in the British Isles, provide a unique resource for analysis of spatial and temporal variations in stream water quality within an agricultural area. Not surprisingly, given the eutrophic status of the Ley, most attention has focused on the nutrients nitrate and phosphate. A number of approaches to modelling nutrient loss have been attempted, including time series analysis and the application of nutrient export and physically-based models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Meteorological (met) station data is used as the basis for a number of influential studies into the impacts of the variability of renewable resources. Real turbine output data is not often easy to acquire, whereas meteorological wind data, supplied at a standardised height of 10 m, is widely available. This data can be extrapolated to a standard turbine height using the wind profile power law and used to simulate the hypothetical power output of a turbine. Utilising a number of met sites in such a manner can develop a model of future wind generation output. However, the accuracy of this extrapolation is strongly dependent on the choice of the wind shear exponent alpha. This paper investigates the accuracy of the simulated generation output compared to reality using a wind farm in North Rhins, Scotland and a nearby met station in West Freugh. The results show that while a single annual average value for alpha may be selected to accurately represent the long term energy generation from a simulated wind farm, there are significant differences between simulation and reality on an hourly power generation basis, with implications for understanding the impact of variability of renewables on short timescales, particularly system balancing and the way that conventional generation may be asked to respond to a high level of variable renewable generation on the grid in the future.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The use of ageostrophic flow to infer the presence of vertical circulations in the entrances and exits of the climatological jet streams is questioned. Problems of interpretation arise because of the use of different definitions of geostrophy in theoretical studies and in analyses of atmospheric data. The nature and role of the ageostrophic flow based on constant and variable Coriolis parameter definitions of geostrophy vary. In the latter the geostrophic divergence cannot be neglected, so the vertical motion is not associated solely with the ageostrophic flow. Evidence is presented suggesting that ageostrophic flow in the climatological jet streams is primarily determined by the kinematic requirements of wave retrogression rather than by a forcing process. These requirements are largely met by the rotational flow, with the divergent circulations present being geostrophically forced, and so playing a secondary, restoring role.

Relevância:

30.00% 30.00%

Publicador: