945 resultados para File processing (Computer science)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (M. S.)--University of Illinois at Urbana-Champaign.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (M. S.)--University of Illinois at Urbana-Champaign.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Includes bibliographical references.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A specialised reconfigurable architecture is targeted at wireless base-band processing. It is built to cater for multiple wireless standards. It has lower power consumption than the processor-based solution. It can be scaled to run in parallel for processing multiple channels. Test resources are embedded on the architecture and testing strategies are included. This architecture is functionally partitioned according to the common operations found in wireless standards, such as CRC error correction, convolution and interleaving. These modules are linked via Virtual Wire Hardware modules and route-through switch matrices. Data can be processed in any order through this interconnect structure. Virtual Wire ensures the same flexibility as normal interconnects, but the area occupied and the number of switches needed is reduced. The testing algorithm scans all possible paths within the interconnection network exhaustively and searches for faults in the processing modules. The testing algorithm starts by scanning the externally addressable memory space and testing the master controller. The controller then tests every switch in the route-through switch matrix by making loops from the shared memory to each of the switches. The local switch matrix is also tested in the same way. Next the local memory is scanned. Finally, pre-defined test vectors are loaded into local memory to check the processing modules. This paper compares various base-band processing solutions. It describes the proposed platform and its implementation. It outlines the test resources and algorithm. It concludes with the mapping of Bluetooth and GSM base-band onto the platform.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Multiresolution Triangular Mesh (MTM) models are widely used to improve the performance of large terrain visualization by replacing the original model with a simplified one. MTM models, which consist of both original and simplified data, are commonly stored in spatial database systems due to their size. The relatively slow access speed of disks makes data retrieval the bottleneck of such terrain visualization systems. Existing spatial access methods proposed to address this problem rely on main-memory MTM models, which leads to significant overhead during query processing. In this paper, we approach the problem from a new perspective and propose a novel MTM called direct mesh that is designed specifically for secondary storage. It supports available indexing methods natively and requires no modification to MTM structure. Experiment results, which are based on two real-world data sets, show an average performance improvement of 5-10 times over the existing methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the rapid increase in both centralized video archives and distributed WWW video resources, content-based video retrieval is gaining its importance. To support such applications efficiently, content-based video indexing must be addressed. Typically, each video is represented by a sequence of frames. Due to the high dimensionality of frame representation and the large number of frames, video indexing introduces an additional degree of complexity. In this paper, we address the problem of content-based video indexing and propose an efficient solution, called the Ordered VA-File (OVA-File) based on the VA-file. OVA-File is a hierarchical structure and has two novel features: 1) partitioning the whole file into slices such that only a small number of slices are accessed and checked during k Nearest Neighbor (kNN) search and 2) efficient handling of insertions of new vectors into the OVA-File, such that the average distance between the new vectors and those approximations near that position is minimized. To facilitate a search, we present an efficient approximate kNN algorithm named Ordered VA-LOW (OVA-LOW) based on the proposed OVA-File. OVA-LOW first chooses possible OVA-Slices by ranking the distances between their corresponding centers and the query vector, and then visits all approximations in the selected OVA-Slices to work out approximate kNN. The number of possible OVA-Slices is controlled by a user-defined parameter delta. By adjusting delta, OVA-LOW provides a trade-off between the query cost and the result quality. Query by video clip consisting of multiple frames is also discussed. Extensive experimental studies using real video data sets were conducted and the results showed that our methods can yield a significant speed-up over an existing VA-file-based method and iDistance with high query result quality. Furthermore, by incorporating temporal correlation of video content, our methods achieved much more efficient performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quantile computation has many applications including data mining and financial data analysis. It has been shown that an is an element of-approximate summary can be maintained so that, given a quantile query d (phi, is an element of), the data item at rank [phi N] may be approximately obtained within the rank error precision is an element of N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different phi and is an element of poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In many advanced applications, data are described by multiple high-dimensional features. Moreover, different queries may weight these features differently; some may not even specify all the features. In this paper, we propose our solution to support efficient query processing in these applications. We devise a novel representation that compactly captures f features into two components: The first component is a 2D vector that reflects a distance range ( minimum and maximum values) of the f features with respect to a reference point ( the center of the space) in a metric space and the second component is a bit signature, with two bits per dimension, obtained by analyzing each feature's descending energy histogram. This representation enables two levels of filtering: The first component prunes away points that do not share similar distance ranges, while the bit signature filters away points based on the dimensions of the relevant features. Moreover, the representation facilitates the use of a single index structure to further speed up processing. We employ the classical B+-tree for this purpose. We also propose a KNN search algorithm that exploits the access orders of critical dimensions of highly selective features and partial distances to prune the search space more effectively. Our extensive experiments on both real-life and synthetic data sets show that the proposed solution offers significant performance advantages over sequential scan and retrieval methods using single and multiple VA-files.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex Event processing (CEP) has emerged over the last ten years. CEP systems are outstanding in processing large amount of data and responding in a timely fashion. While CEP applications are fast growing, performance management in this area has not gain much attention. It is critical to meet the promised level of service for both system designers and users. In this paper, we present a benchmark for complex event processing systems: CEPBen. The CEPBen benchmark is designed to evaluate CEP functional behaviours, i.e., filtering, transformation and event pattern detection and provides a novel methodology of evaluating the performance of CEP systems. A performance study by running the CEPBen on Esper CEP engine is described and discussed. The results obtained from performance tests demonstrate the influences of CEP functional behaviours on the system performance. © 2014 Springer International Publishing Switzerland.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Disk drives are the bottleneck in the processing of large amounts of data used in almost all common applications. File systems attempt to reduce this by storing data sequentially on the disk drives, thereby reducing the access latencies. Although this strategy is useful when data is retrieved sequentially, the access patterns in real world workloads is not necessarily sequential and this mismatch results in storage I/O performance degradation. This thesis demonstrates that one way to improve the storage performance is to reorganize data on disk drives in the same way in which it is mostly accessed. We identify two classes of accesses: static, where access patterns do not change over the lifetime of the data and dynamic, where access patterns frequently change over short durations of time, and propose, implement and evaluate layout strategies for each of these. Our strategies are implemented in a way that they can be seamlessly integrated or removed from the system as desired. We evaluate our layout strategies for static policies using tree-structured XML data where accesses to the storage device are mostly of two kinds—parent-to-child or child-to-sibling. Our results show that for a specific class of deep-focused queries, the existing file system layout policy performs better by 5–54X. For the non-deep-focused queries, our native layout mechanism shows an improvement of 3–127X. To improve performance of the dynamic access patterns, we implement a self-optimizing storage system that performs rearranges popular block accesses on a dedicated partition based on the observed workload characteristics. Our evaluation shows an improvement of over 80% in the disk busy times over a range of workloads. These results show that applying the knowledge of data access patterns for allocation decisions can substantially improve the I/O performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The advent of smart TVs has reshaped the TV-consumer interaction by combining TVs with mobile-like applications and access to the Internet. However, consumers are still unable to seamlessly interact with the contents being streamed. An example of such limitation is TV shopping, in which a consumer makes a purchase of a product or item displayed in the current TV show. Currently, consumers can only stop the current show and attempt to find a similar item in the Web or an actual store. It would be more convenient if the consumer could interact with the TV to purchase interesting items. ^ Towards the realization of TV shopping, this dissertation proposes a scalable multimedia content processing framework. Two main challenges in TV shopping are addressed: the efficient detection of products in the content stream, and the retrieval of similar products given a consumer-selected product. The proposed framework consists of three components. The first component performs computational and temporal aware multimedia abstraction to select a reduced number of frames that summarize the important information in the video stream. By both reducing the number of frames and taking into account the computational cost of the subsequent detection phase, this component component allows the efficient detection of products in the stream. The second component realizes the detection phase. It executes scalable product detection using multi-cue optimization. Additional information cues are formulated into an optimization problem that allows the detection of complex products, i.e., those that do not have a rigid form and can appear in various poses. After the second component identifies products in the video stream, the consumer can select an interesting one for which similar ones must be located in a product database. To this end, the third component of the framework consists of an efficient, multi-dimensional, tree-based indexing method for multimedia databases. The proposed index mechanism serves as the backbone of the search. Moreover, it is able to efficiently bridge the semantic gap and perception subjectivity issues during the retrieval process to provide more relevant results.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Postprint

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract Heading into the 2020s, Physics and Astronomy are undergoing experimental revolutions that will reshape our picture of the fabric of the Universe. The Large Hadron Collider (LHC), the largest particle physics project in the world, produces 30 petabytes of data annually that need to be sifted through, analysed, and modelled. In astrophysics, the Large Synoptic Survey Telescope (LSST) will be taking a high-resolution image of the full sky every 3 days, leading to data rates of 30 terabytes per night over ten years. These experiments endeavour to answer the question why 96% of the content of the universe currently elude our physical understanding. Both the LHC and LSST share the 5-dimensional nature of their data, with position, energy and time being the fundamental axes. This talk will present an overview of the experiments and data that is gathered, and outlines the challenges in extracting information. Common strategies employed are very similar to industrial data! Science problems (e.g., data filtering, machine learning, statistical interpretation) and provide a seed for exchange of knowledge between academia and industry. Speaker Biography Professor Mark Sullivan Mark Sullivan is a Professor of Astrophysics in the Department of Physics and Astronomy. Mark completed his PhD at Cambridge, and following postdoctoral study in Durham, Toronto and Oxford, now leads a research group at Southampton studying dark energy using exploding stars called "type Ia supernovae". Mark has many years' experience of research that involves repeatedly imaging the night sky to track the arrival of transient objects, involving significant challenges in data handling, processing, classification and analysis.