Biblioteca Digital

986 resultados para data compression

Efficient RDF Interchange (ERI) format for RDF data streams

Relevância:

30.00% 30.00%

Publicador:

Resumo:

RDF streams are sequences of timestamped RDF statements or graphs, which can be generated by several types of data sources (sensors, social networks, etc.). They may provide data at high volumes and rates, and be consumed by applications that require real-time responses. Hence it is important to publish and interchange them efficiently. In this paper, we exploit a key feature of RDF data streams, which is the regularity of their structure and data values, proposing a compressed, efficient RDF interchange (ERI) format, which can reduce the amount of data transmitted when processing RDF streams. Our experimental evaluation shows that our format produces state-of-the-art streaming compression, remaining efficient in performance.

Optimizing communication by compression for Multi-GPU Scalable Breadth-First Searches

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Debido al creciente aumento del tamaño de los datos en muchos de los actuales sistemas de información, muchos de los algoritmos de recorrido de estas estructuras pierden rendimento para realizar búsquedas en estos. Debido a que la representacion de estos datos en muchos casos se realiza mediante estructuras nodo-vertice (Grafos), en el año 2009 se creó el reto Graph500. Con anterioridad, otros retos como Top500 servían para medir el rendimiento en base a la capacidad de cálculo de los sistemas, mediante tests LINPACK. En caso de Graph500 la medicion se realiza mediante la ejecución de un algoritmo de recorrido en anchura de grafos (BFS en inglés) aplicada a Grafos. El algoritmo BFS es uno de los pilares de otros muchos algoritmos utilizados en grafos como SSSP, shortest path o Betweeness centrality. Una mejora en este ayudaría a la mejora de los otros que lo utilizan. Analisis del Problema El algoritmos BFS utilizado en los sistemas de computación de alto rendimiento (HPC en ingles) es usualmente una version para sistemas distribuidos del algoritmo secuencial original. En esta versión distribuida se inicia la ejecución realizando un particionado del grafo y posteriormente cada uno de los procesadores distribuidos computará una parte y distribuirá sus resultados a los demás sistemas. Debido a que la diferencia de velocidad entre el procesamiento en cada uno de estos nodos y la transfencia de datos por la red de interconexión es muy alta (estando en desventaja la red de interconexion) han sido bastantes las aproximaciones tomadas para reducir la perdida de rendimiento al realizar transferencias. Respecto al particionado inicial del grafo, el enfoque tradicional (llamado 1D-partitioned graph en ingles) consiste en asignar a cada nodo unos vertices fijos que él procesará. Para disminuir el tráfico de datos se propuso otro particionado (2D) en el cual la distribución se haciá en base a las aristas del grafo, en vez de a los vertices. Este particionado reducía el trafico en la red en una proporcion O(NxM) a O(log(N)). Si bien han habido otros enfoques para reducir la transferecnia como: reordemaniento inicial de los vertices para añadir localidad en los nodos, o particionados dinámicos, el enfoque que se va a proponer en este trabajo va a consistir en aplicar técnicas recientes de compression de grandes sistemas de datos como Bases de datos de alto volume o motores de búsqueda en internet para comprimir los datos de las transferencias entre nodos.---ABSTRACT---The Breadth First Search (BFS) algorithm is the foundation and building block of many higher graph-based operations such as spanning trees, shortest paths and betweenness centrality. The importance of this algorithm increases each day due to it is a key requirement for many data structures which are becoming popular nowadays. These data structures turn out to be internally graph structures. When the BFS algorithm is parallelized and the data is distributed into several processors, some research shows a performance limitation introduced by the interconnection network [31]. Hence, improvements on the area of communications may benefit the global performance in this key algorithm. In this work it is presented an alternative compression mechanism. It differs with current existing methods in that it is aware of characteristics of the data which may benefit the compression. Apart from this, we will perform a other test to see how this algorithm (in a dis- tributed scenario) benefits from traditional instruction-based optimizations. Last, we will review the current supercomputing techniques and the related work being done in the area.

An experimental and finite element poroelastic creep response analysis of an intervertebral hydrogel disc model in axial compression

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A hydrogel intervertebral disc (lVD) model consisting of an inner nucleus core and an outer anulus ring was manufactured from 30 and 35% by weight Poly(vinyl alcohol) hydrogel (PVA-H) concentrations and subjected to axial compression in between saturated porous endplates at 200 N for 11 h, 30 min. Repeat experiments (n = 4) on different samples (N = 2) show good reproducibility of fluid loss and axial deformation. An axisymmetric nonlinear poroelastic finite element model with variable permeability was developed using commercial finite element software to compare axial deformation and predicted fluid loss with experimental data. The FE predictions indicate differential fluid loss similar to that of biological IVDs, with the nucleus losing more water than the anulus, and there is overall good agreement between experimental and finite element predicted fluid loss. The stress distribution pattern indicates important similarities with the biological lVD that includes stress transference from the nucleus to the anulus upon sustained loading and renders it suitable as a model that can be used in future studies to better understand the role of fluid and stress in biological IVDs. (C) 2005 Springer Science + Business Media, Inc.

Pulse compression applied to the detection of F.S.K. and P.S.K. signals

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The matched filter detector is well known as the optimum detector for use in communication, as well as in radar systems for signals corrupted by Additive White Gaussian Noise (A.W.G.N.). Non-coherent F.S.K. and differentially coherent P.S.K. (D.P.S.K.) detection schemes, which employ a new approach in realizing the matched filter processor, are investigated. The new approach utilizes pulse compression techniques, well known in radar systems, to facilitate the implementation of the matched filter in the form of the Pulse Compressor Matched Filter (P.C.M.F.). Both detection schemes feature a mixer- P.C.M.F. Compound as their predetector processor. The Compound is utilized to convert F.S.K. modulation into pulse position modulation, and P.S.K. modulation into pulse polarity modulation. The mechanisms of both detection schemes are studied through examining the properties of the Autocorrelation function (A.C.F.) at the output of the P.C.M.F.. The effects produced by time delay, and carrier interference on the output A.C.F. are determined. Work related to the F.S.K. detection scheme is mostly confined to verifying its validity, whereas the D.P.S.K. detection scheme has not been reported before. Consequently, an experimental system was constructed, which utilized combined hardware and software, and operated under the supervision of a microprocessor system. The experimental system was used to develop error-rate models for both detection schemes under investigation. Performances of both F. S. K. and D.P. S. K. detection schemes were established in the presence of A. W. G. N. , practical imperfections, time delay, and carrier interference. The results highlight the candidacy of both detection schemes for use in the field of digital data communication and, in particular, the D.P.S.K. detection scheme, which performed very close to optimum in a background of A.W.G.N.

Applying A Normalized Compression Metric To The Measurement Of Dialect Distance

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper discusses the application of a similarity metric based on compression to the measurement of the distance among Bulgarian dia- lects. The similarity metric is de ned on the basis of the notion of Kolmo- gorov complexity of a le (or binary string). The application of Kolmogorov complexity in practice is not possible because its calculation over a le is an undecidable problem. Thus, the actual similarity metric is based on a real life compressor which only approximates the Kolmogorov complexity. To use the metric for distance measurement of Bulgarian dialects we rst represent the dialectological data in such a way that the metric is applicable. We propose two such representations which are compared to a baseline distance between dialects. Then we conclude the paper with an outline of our future work.

Conceptual Information Compression and Efficient Pattern Search

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces an encoding of knowledge representation statements as regular languages and proposes a two-phase approach to processing of explicitly declared conceptual information. The idea is presented for the simple conceptual graphs where conceptual pattern search is implemented by the so called projection operation. Projection calculations are organised into off-line preprocessing and run-time computations. This enables fast run-time treatment of NP-complete problems, given that the intermediate results of the off-line phase are kept in suitable data structures. The experiments with randomly-generated, middle-size knowledge bases support the claim that the suggested approach radically improves the run-time conceptual pattern search.

A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to be analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham’s razor non-plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.

Modular Digital Watermarking for Image Verification and Secure Data Storage in Web Applications

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Our modular approach to data hiding is an innovative concept in the data hiding research field. It enables the creation of modular digital watermarking methods that have extendable features and are designed for use in web applications. The methods consist of two types of modules – a basic module and an application-specific module. The basic module mainly provides features which are connected with the specific image format. As JPEG is a preferred image format on the Internet, we have put a focus on the achievement of a robust and error-free embedding and retrieval of the embedded data in JPEG images. The application-specific modules are adaptable to user requirements in the concrete web application. The experimental results of the modular data watermarking are very promising. They indicate excellent image quality, satisfactory size of the embedded data and perfect robustness against JPEG transformations with prespecified compression ratios. ACM Computing Classification System (1998): C.2.0.

Classification of Texts' Authorship Using a Regression Model on Compressed Data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

2010 Mathematics Subject Classification: 68T50,62H30,62J05.

Contributions to HEVC Prediction for Medical Image Compression

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Medical imaging technology and applications are continuously evolving, dealing with images of increasing spatial and temporal resolutions, which allow easier and more accurate medical diagnosis. However, this increase in resolution demands a growing amount of data to be stored and transmitted. Despite the high coding efficiency achieved by the most recent image and video coding standards in lossy compression, they are not well suited for quality-critical medical image compression where either near-lossless or lossless coding is required. In this dissertation, two different approaches to improve lossless coding of volumetric medical images, such as Magnetic Resonance and Computed Tomography, were studied and implemented using the latest standard High Efficiency Video Encoder (HEVC). In a first approach, the use of geometric transformations to perform inter-slice prediction was investigated. For the second approach, a pixel-wise prediction technique, based on Least-Squares prediction, that exploits inter-slice redundancy was proposed to extend the current HEVC lossless tools. Experimental results show a bitrate reduction between 45% and 49%, when compared with DICOM recommended encoders, and 13.7% when compared with standard HEVC.

An IP capable data link layer protocol for narrow band half-duplex packet data radio networks

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The wide adaptation of Internet Protocol (IP) as de facto protocol for most communication networks has established a need for developing IP capable data link layer protocol solutions for Machine to machine (M2M) and Internet of Things (IoT) networks. However, the wireless networks used for M2M and IoT applications usually lack the resources commonly associated with modern wireless communication networks. The existing IP capable data link layer solutions for wireless IoT networks provide the necessary overhead minimising and frame optimising features, but are often built to be compatible only with IPv6 and specific radio platforms. The objective of this thesis is to design IPv4 compatible data link layer for Netcontrol Oy's narrow band half-duplex packet data radio system. Based on extensive literature research, system modelling and solution concept testing, this thesis proposes the usage of tunslip protocol as the basis for the system data link layer protocol development. In addition to the functionality of tunslip, this thesis discusses the additional network, routing, compression, security and collision avoidance changes required to be made to the radio platform in order for it to be IP compatible while still being able to maintain the point-to-multipoint and multi-hop network characteristics. The data link layer design consists of the radio application, dynamic Maximum Transmission Unit (MTU) optimisation daemon and the tunslip interface. The proposed design uses tunslip for creating an IP capable data link protocol interface. The radio application receives data from tunslip and compresses the packets and uses the IP addressing information for radio network addressing and routing before forwarding the message to radio network. The dynamic MTU size optimisation daemon controls the tunslip interface maximum MTU size according to the link quality assessment calculated from the radio network diagnostic data received from the radio application. For determining the usability of tunslip as the basis for data link layer protocol, testing of the tunslip interface is conducted with both IEEE 802.15.4 radios and packet data radios. The test cases measure the radio network usability for User Datagram Protocol (UDP) based applications without applying any header or content compression. The test results for the packet data radios reveal that the typical success rate for packet reception through a single-hop link is above 99% with a round-trip-delay of 0.315s for 63B packets.

Iis--integrated Interactome System: A Web-based Platform For The Annotation, Analysis And Visualization Of Protein-metabolite-gene-drug Interactions By Integrating A Variety Of Data Sources And Tools.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.

[gait Speed, Grip Strength And Self-rated Health Among The Elderly: Data From The Fibra Campinas Network, São Paulo, Brazil].

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The article seeks to investigate patterns of performance and relationships between grip strength, gait speed and self-rated health, and investigate the relationships between them, considering the variables of gender, age and family income. This was conducted in a probabilistic sample of community-dwelling elderly aged 65 and over, members of a population study on frailty. A total of 689 elderly people without cognitive deficit suggestive of dementia underwent tests of gait speed and grip strength. Comparisons between groups were based on low, medium and high speed and strength. Self-related health was assessed using a 5-point scale. The males and the younger elderly individuals scored significantly higher on grip strength and gait speed than the female and oldest did; the richest scored higher than the poorest on grip strength and gait speed; females and men aged over 80 had weaker grip strength and lower gait speed; slow gait speed and low income arose as risk factors for a worse health evaluation. Lower muscular strength affects the self-rated assessment of health because it results in a reduction in functional capacity, especially in the presence of poverty and a lack of compensatory factors.

Correlation Between Cephalometric Data And Severity Of Sleep Apnea.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Obstructive sleep apnea syndrome has a high prevalence among adults. Cephalometric variables can be a valuable method for evaluating patients with this syndrome. To correlate cephalometric data with the apnea-hypopnea sleep index. We performed a retrospective and cross-sectional study that analyzed the cephalometric data of patients followed in the Sleep Disorders Outpatient Clinic of the Discipline of Otorhinolaryngology of a university hospital, from June 2007 to May 2012. Ninety-six patients were included, 45 men, and 51 women, with a mean age of 50.3 years. A total of 11 patients had snoring, 20 had mild apnea, 26 had moderate apnea, and 39 had severe apnea. The distance from the hyoid bone to the mandibular plane was the only variable that showed a statistically significant correlation with the apnea-hypopnea index. Cephalometric variables are useful tools for the understanding of obstructive sleep apnea syndrome. The distance from the hyoid bone to the mandibular plane showed a statistically significant correlation with the apnea-hypopnea index.

Censored Linear Regression Models For Irregularly Observed Longitudinal Data Using The Multivariate-t Distribution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.

«
1
2
...
5
6
7
8
9
10
11
...
65
66
»