999 resultados para Scalable documents


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Document engineering is the computer science discipline that investigates systems for documents in any form and in all media. As with the relationship between software engineering and software, document engineering is concerned with principles, tools and processes that improve our ability to create, manage, and maintain documents (http://www.documentengineering.org). The ACM Symposium on Document Engineering is an annual meeting of researchers active in document engineering: it is sponsored by ACM by means of the ACM SIGWEB Special Interest Group. In this editorial, we first point to work carried out in the context of document engineering, which are directly related to multimedia tools and applications. We conclude with a summary of the papers presented in this special issue.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is just over 20 years since Adobe's PostScript opened a new era in digital documents. PostScript allows most details of rendering to be hidden within the imaging device itself, while providing a rich set of primitives enabling document engineers to think of final-form rendering as being just a sophisticated exercise in computer graphics. The refinement of the PostScript model into PDF has been amazingly successful in creating a near-universal interchange format for complex and graphically rich digital documents but the PDF format itself is neither easy to create nor to amend. In the meantime a whole new world of digital documents has sprung up centred around XML-based technologies. The most widespread example is XHTML (with optional CSS styling) but more recently we have seen Scalable Vector Graphics (SVG) emerge as an XML-based, low-level, rendering language with PostScript-compatible rendering semantics. This paper surveys graphically-rich final-form rendering technologies and asks how flexible they can be in allowing adjustments to be made to final appearance without the need for regenerating a whole page or an entire document. Particular attention is focused on the relative merits of SVG and PDF in this regard and on the desirability, in any document layout language, of being able to manipulate the graphic properties of document components parametrically, and at a level of granularity smaller than an entire page.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We show that scalable multipartite entanglement among light fields may be generated by optical parametric oscillators (OPOs). The tripartite entanglement existent among the three bright beams produced by a single OPO-pump, signal, and idler-is scalable to a system of many OPOs by pumping them in cascade with the same optical field. This latter serves as an entanglement distributor. The special case of two OPOs is studied, as it is shown that the resulting five bright beams share genuine multipartite entanglement. In addition, the structure of entanglement distribution among the fields can be manipulated to some degree by tuning the incident pump power. The scalability to many fields is straightforward, allowing an alternative implementation of a multipartite quantum information network with continuous variables.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the challenges in scientific visualization is to generate software libraries suitable for the large-scale data emerging from tera-scale simulations and instruments. We describe the efforts currently under way at SDSC and NPACI to address these challenges. The scope of the SDSC project spans data handling, graphics, visualization, and scientific application domains. Components of the research focus on the following areas: intelligent data storage, layout and handling, using an associated “Floor-Plan” (meta data); performance optimization on parallel architectures; extension of SDSC’s scalable, parallel, direct volume renderer to allow perspective viewing; and interactive rendering of fractional images (“imagelets”), which facilitates the examination of large datasets. These concepts are coordinated within a data-visualization pipeline, which operates on component data blocks sized to fit within the available computing resources. A key feature of the scheme is that the meta data, which tag the data blocks, can be propagated and applied consistently. This is possible at the disk level, in distributing the computations across parallel processors; in “imagelet” composition; and in feature tagging. The work reflects the emerging challenges and opportunities presented by the ongoing progress in high-performance computing (HPC) and the deployment of the data, computational, and visualization Grids.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a scheme which offers a significant reduction in the resources required to implement linear optics quantum computing. The scheme is a variation of the proposal of Knill, Laflamme and Milburn, and makes use of an incremental approach to the error encoding to boost probability of success.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new method is presented to determine an accurate eigendecomposition of difficult low temperature unimolecular master equation problems. Based on a generalisation of the Nesbet method, the new method is capable of achieving complete spectral resolution of the master equation matrix with relative accuracy in the eigenvectors. The method is applied to a test case of the decomposition of ethane at 300 K from a microcanonical initial population with energy transfer modelled by both Ergodic Collision Theory and the exponential-down model. The fact that quadruple precision (16-byte) arithmetic is required irrespective of the eigensolution method used is demonstrated. (C) 2001 Elsevier Science B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose a second linearly scalable method for solving large master equations arising in the context of gas-phase reactive systems. The new method is based on the well-known shift-invert Lanczos iteration using the GMRES iteration preconditioned using the diffusion approximation to the master equation to provide the inverse of the master equation matrix. In this way we avoid the cubic scaling of traditional master equation solution methods while maintaining the speed of a partial spectral decomposition. The method is tested using a master equation modeling the formation of propargyl from the reaction of singlet methylene with acetylene, proceeding through long-lived isomerizing intermediates. (C) 2003 American Institute of Physics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose a novel fast and linearly scalable method for solving master equations arising in the context of gas-phase reactive systems, based on an existent stiff ordinary differential equation integrator. The required solution of a linear system involving the Jacobian matrix is achieved using the GMRES iteration preconditioned using the diffusion approximation to the master equation. In this way we avoid the cubic scaling of traditional master equation solution methods and maintain the low temperature robustness of numerical integration. The method is tested using a master equation modelling the formation of propargyl from the reaction of singlet methylene with acetylene, proceeding through long lived isomerizing intermediates. (C) 2003 American Institute of Physics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel high throughput and scalable unified architecture for the computation of the transform operations in video codecs for advanced standards is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute all the two-dimensional 4 x 4 and 2 x 2 transforms of the H.264/AVC standard. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms of performance and hardware cost to meet the specific requirements of any given video coding application. Experimental results obtained using a Xilinx Virtex-5 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which presents a throughput per unit of area relatively higher than other similar recently published designs targeting the H.264/AVC standard. Such results also showed that, when integrated in a multi-core embedded system, this architecture provides speedup factors of about 120x concerning pure software implementations of the transform algorithms, therefore allowing the computation, in real-time, of all the above mentioned transforms for Ultra High Definition Video (UHDV) sequences (4,320 x 7,680 @ 30 fps).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Conferência: IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors (ASAP)- Jun 05-07, 2013

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Consider the problem of designing an algorithm for acquiring sensor readings. Consider specifically the problem of obtaining an approximate representation of sensor readings where (i) sensor readings originate from different sensor nodes, (ii) the number of sensor nodes is very large, (iii) all sensor nodes are deployed in a small area (dense network) and (iv) all sensor nodes communicate over a communication medium where at most one node can transmit at a time (a single broadcast domain). We present an efficient algorithm for this problem, and our novel algorithm has two desired properties: (i) it obtains an interpolation based on all sensor readings and (ii) it is scalable, that is, its time-complexity is independent of the number of sensor nodes. Achieving these two properties is possible thanks to the close interlinking of the information processing algorithm, the communication system and a model of the physical world.