814 resultados para data gathering algorithm


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Map-matching algorithms that utilise road segment connectivity along with other data (i.e.position, speed and heading) in the process of map-matching are normally suitable for high frequency (1 Hz or higher) positioning data from GPS. While applying such map-matching algorithms to low frequency data (such as data from a fleet of private cars, buses or light duty vehicles or smartphones), the performance of these algorithms reduces to in the region of 70% in terms of correct link identification, especially in urban and sub-urban road networks. This level of performance may be insufficient for some real-time Intelligent Transport System (ITS) applications and services such as estimating link travel time and speed from low frequency GPS data. Therefore, this paper develops a new weight-based shortest path and vehicle trajectory aided map-matching (stMM) algorithm that enhances the map-matching of low frequency positioning data on a road map. The well-known A* search algorithm is employed to derive the shortest path between two points while taking into account both link connectivity and turn restrictions at junctions. In the developed stMM algorithm, two additional weights related to the shortest path and vehicle trajectory are considered: one shortest path-based weight is related to the distance along the shortest path and the distance along the vehicle trajectory, while the other is associated with the heading difference of the vehicle trajectory. The developed stMM algorithm is tested using a series of real-world datasets of varying frequencies (i.e. 1 s, 5 s, 30 s, 60 s sampling intervals). A high-accuracy integrated navigation system (a high-grade inertial navigation system and a carrier-phase GPS receiver) is used to measure the accuracy of the developed algorithm. The results suggest that the algorithm identifies 98.9% of the links correctly for every 30 s GPS data. Omitting the information from the shortest path and vehicle trajectory, the accuracy of the algorithm reduces to about 73% in terms of correct link identification. The algorithm can process on average 50 positioning fixes per second making it suitable for real-time ITS applications and services.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Distributed systems are widely used for solving large-scale and data-intensive computing problems, including all-to-all comparison (ATAC) problems. However, when used for ATAC problems, existing computational frameworks such as Hadoop focus on load balancing for allocating comparison tasks, without careful consideration of data distribution and storage usage. While Hadoop-based solutions provide users with simplicity of implementation, their inherent MapReduce computing pattern does not match the ATAC pattern. This leads to load imbalances and poor data locality when Hadoop's data distribution strategy is used for ATAC problems. Here we present a data distribution strategy which considers data locality, load balancing and storage savings for ATAC computing problems in homogeneous distributed systems. A simulated annealing algorithm is developed for data distribution and task scheduling. Experimental results show a significant performance improvement for our approach over Hadoop-based solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Network topology and routing are two important factors in determining the communication costs of big data applications at large scale. As for a given Cluster, Cloud, or Grid system, the network topology is fixed and static or dynamic routing protocols are preinstalled to direct the network traffic. Users cannot change them once the system is deployed. Hence, it is hard for application developers to identify the optimal network topology and routing algorithm for their applications with distinct communication patterns. In this study, we design a CCG virtual system (CCGVS), which first uses container-based virtualization to allow users to create a farm of lightweight virtual machines on a single host. Then, it uses software-defined networking (SDN) technique to control the network traffic among these virtual machines. Users can change the network topology and control the network traffic programmingly, thereby enabling application developers to evaluate their applications on the same system with different network topologies and routing algorithms. The preliminary experimental results through both synthetic big data programs and NPB benchmarks have shown that CCGVS can represent application performance variations caused by network topology and routing algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This technical report describes a Light Detection and Ranging (LiDAR) augmented optimal path planning at low level flight methodology for remote sensing and sampling Unmanned Aerial Vehicles (UAV). The UAV is used to perform remote air sampling and data acquisition from a network of sensors on the ground. The data that contains information on the terrain is in the form of a 3D point clouds maps is processed by the algorithms to find an optimal path. The results show that the method and algorithm are able to use the LiDAR data to avoid obstacles when planning a path from a start to a target point. The report compares the performance of the method as the resolution of the LIDAR map is increased and when a Digital Elevation Model (DEM) is included. From a practical point of view, the optimal path plan is loaded and works seemingly with the UAV ground station and also shows the UAV ground station software augmented with more accurate LIDAR data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we first recast the generalized symmetric eigenvalue problem, where the underlying matrix pencil consists of symmetric positive definite matrices, into an unconstrained minimization problem by constructing an appropriate cost function, We then extend it to the case of multiple eigenvectors using an inflation technique, Based on this asymptotic formulation, we derive a quasi-Newton-based adaptive algorithm for estimating the required generalized eigenvectors in the data case. The resulting algorithm is modular and parallel, and it is globally convergent with probability one, We also analyze the effect of inexact inflation on the convergence of this algorithm and that of inexact knowledge of one of the matrices (in the pencil) on the resulting eigenstructure. Simulation results demonstrate that the performance of this algorithm is almost identical to that of the rank-one updating algorithm of Karasalo. Further, the performance of the proposed algorithm has been found to remain stable even over 1 million updates without suffering from any error accumulation problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A method for reconstruction of an object f(x) x=(x,y,z) from a limited set of cone-beam projection data has been developed. This method uses a modified form of convolution back-projection and projection onto convex sets (POCS) for handling the limited (or incomplete) data problem. In cone-beam tomography, one needs to have a complete geometry to completely reconstruct the original three-dimensional object. While complete geometries do exist, they are of little use in practical implementations. The most common trajectory used in practical scanners is circular, which is incomplete. It is, however, possible to recover some of the information of the original signal f(x) based on a priori knowledge of the nature of f(x). If this knowledge can be posed in a convex set framework, then POCS can be utilized. In this report, we utilize this a priori knowledge as convex set constraints to reconstruct f(x) using POCS. While we demonstrate the effectiveness of our algorithm for circular trajectories, it is essentially geometry independent and will be useful in any limited-view cone-beam reconstruction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recognizing similarities and deriving relationships among protein molecules is a fundamental requirement in present-day biology. Similarities can be present at various levels which can be detected through comparison of protein sequences or their structural folds. In some cases similarities obscure at these levels could be present merely in the substructures at their binding sites. Inferring functional similarities between protein molecules by comparing their binding sites is still largely exploratory and not as yet a routine protocol. One of the main reasons for this is the limitation in the choice of appropriate analytical tools that can compare binding sites with high sensitivity. To benefit from the enormous amount of structural data that is being rapidly accumulated, it is essential to have high throughput tools that enable large scale binding site comparison. Results: Here we present a new algorithm PocketMatch for comparison of binding sites in a frame invariant manner. Each binding site is represented by 90 lists of sorted distances capturing shape and chemical nature of the site. The sorted arrays are then aligned using an incremental alignment method and scored to obtain PMScores for pairs of sites. A comprehensive sensitivity analysis and an extensive validation of the algorithm have been carried out. A comparison with other site matching algorithms is also presented. Perturbation studies where the geometry of a given site was retained but the residue types were changed randomly, indicated that chance similarities were virtually non-existent. Our analysis also demonstrates that shape information alone is insufficient to discriminate between diverse binding sites, unless combined with chemical nature of amino acids. Conclusion: A new algorithm has been developed to compare binding sites in accurate, efficient and high-throughput manner. Though the representation used is conceptually simplistic, we demonstrate that along with the new alignment strategy used, it is sufficient to enable binding comparison with high sensitivity. Novel methodology has also been presented for validating the algorithm for accuracy and sensitivity with respect to geometry and chemical nature of the site. The method is also fast and takes about 1/250(th) second for one comparison on a single processor. A parallel version on BlueGene has also been implemented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a new approach for assessing power system voltage stability based on artificial feed forward neural network (FFNN). The approach uses real and reactive power, as well as voltage vectors for generators and load buses to train the neural net (NN). The input properties of the NN are generated from offline training data with various simulated loading conditions using a conventional voltage stability algorithm based on the L-index. The performance of the trained NN is investigated on two systems under various voltage stability assessment conditions. Main advantage is that the proposed approach is fast, robust, accurate and can be used online for predicting the L-indices of all the power system buses simultaneously. The method can also be effectively used to determining local and global stability margin for further improvement measures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Neural data are inevitably contaminated by noise. When such noisy data are subjected to statistical analysis, misleading conclusions can be reached. Here we attempt to address this problem by applying a state-space smoothing method, based on the combined use of the Kalman filter theory and the Expectation–Maximization algorithm, to denoise two datasets of local field potentials recorded from monkeys performing a visuomotor task. For the first dataset, it was found that the analysis of the high gamma band (60–90 Hz) neural activity in the prefrontal cortex is highly susceptible to the effect of noise, and denoising leads to markedly improved results that were physiologically interpretable. For the second dataset, Granger causality between primary motor and primary somatosensory cortices was not consistent across two monkeys and the effect of noise was suspected. After denoising, the discrepancy between the two subjects was significantly reduced.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract-To detect errors in decision tables one needs to decide whether a given set of constraints is feasible or not. This paper describes an algorithm to do so when the constraints are linear in variables that take only integer values. Decision tables with such constraints occur frequently in business data processing and in nonnumeric applications. The aim of the algorithm is to exploit. the abundance of very simple constraints that occur in typical decision table contexts. Essentially, the algorithm is a backtrack procedure where the the solution space is pruned by using the set of simple constrains. After some simplications, the simple constraints are captured in an acyclic directed graph with weighted edges. Further, only those partial vectors are considered from extension which can be extended to assignments that will at least satisfy the simple constraints. This is how pruning of the solution space is achieved. For every partial assignment considered, the graph representation of the simple constraints provides a lower bound for each variable which is not yet assigned a value. These lower bounds play a vital role in the algorithm and they are obtained in an efficient manner by updating older lower bounds. Our present algorithm also incorporates an idea by which it can be checked whether or not an (m - 2)-ary vector can be extended to a solution vector of m components, thereby backtracking is reduced by one component.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider the analysis of longitudinal data when the covariance function is modeled by additional parameters to the mean parameters. In general, inconsistent estimators of the covariance (variance/correlation) parameters will be produced when the "working" correlation matrix is misspecified, which may result in great loss of efficiency of the mean parameter estimators (albeit the consistency is preserved). We consider using different "Working" correlation models for the variance and the mean parameters. In particular, we find that an independence working model should be used for estimating the variance parameters to ensure their consistency in case the correlation structure is misspecified. The designated "working" correlation matrices should be used for estimating the mean and the correlation parameters to attain high efficiency for estimating the mean parameters. Simulation studies indicate that the proposed algorithm performs very well. We also applied different estimation procedures to a data set from a clinical trial for illustration.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper considers the applicability of the least mean fourth (LM F) power gradient adaptation criteria with 'advantage' for signals associated with gaussian noise, the associated noise power estimate not being known. The proposed method, as an adaptive spectral estimator, is found to provide superior performance than the least mean square (LMS) adaptation for the same (or even lower) speed of convergence for signals having sufficiently high signal-to-gaussian noise ratio. The results include comparison of the performance of the LMS-tapped delay line, LMF-tapped delay line, LMS-lattice and LMF-lattice algorithms, with the Burg's block data method as reference. The signals, like sinusoids with noise and stochastic signals like EEG, are considered in this study.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, a new incremental algorithm for layout compaction is proposed. In addition to its linear time performance in terms of the number of rectangles in the layout, we also describe how incremental compaction can form a good feature in the design of a layout editor. The design of such an editor is also described. In the design of the editor, we describe how arrays can be used to implement quadtrees that represent VLSI layouts. Such a representation provides speed of data access and low storage requirements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data-flow analysis is an integral part of any aggressive optimizing compiler. We propose a framework for improving the precision of data-flow analysis in the presence of complex control-flow. W initially perform data-flow analysis to determine those control-flow merges which cause the loss in data-flow analysis precision. The control-flow graph of the program is then restructured such that performing data-flow analysis on the resulting restructured graph gives more precise results. The proposed framework is both simple, involving the familiar notion of product automata, and also general, since it is applicable to any forward data-flow analysis. Apart from proving that our restructuring process is correct, we also show that restructuring is effective in that it necessarily leads to more optimization opportunities. Furthermore, the framework handles the trade-off between the increase in data-flow precision and the code size increase inherent in the restructuring. We show that determining an optimal restructuring is NP-hard, and propose and evaluate a greedy strategy. The framework has been implemented in the Scale research compiler, and instantiated for the specific problem of Constant Propagation. On the SPECINT 2000 benchmark suite we observe an average speedup of 4% in the running times over Wegman-Zadeck conditional constant propagation algorithm and 2% over a purely path profile guided approach.