982 resultados para Data handling


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Due to the relative transparency of its embryos and larvae, the zebrafish is an ideal model organism for bioimaging approaches in vertebrates. Novel microscope technologies allow the imaging of developmental processes in unprecedented detail, and they enable the use of complex image-based read-outs for high-throughput/high-content screening. Such applications can easily generate Terabytes of image data, the handling and analysis of which becomes a major bottleneck in extracting the targeted information. Here, we describe the current state of the art in computational image analysis in the zebrafish system. We discuss the challenges encountered when handling high-content image data, especially with regard to data quality, annotation, and storage. We survey methods for preprocessing image data for further analysis, and describe selected examples of automated image analysis, including the tracking of cells during embryogenesis, heartbeat detection, identification of dead embryos, recognition of tissues and anatomical landmarks, and quantification of behavioral patterns of adult fish. We review recent examples for applications using such methods, such as the comprehensive analysis of cell lineages during early development, the generation of a three-dimensional brain atlas of zebrafish larvae, and high-throughput drug screens based on movement patterns. Finally, we identify future challenges for the zebrafish image analysis community, notably those concerning the compatibility of algorithms and data formats for the assembly of modular analysis pipelines.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ontology-Based Data Access (OBDA) permite el acceso a diferentes tipos de fuentes de datos (tradicionalmente bases de datos) usando un modelo más abstracto proporcionado por una ontología. La reescritura de consultas (query rewriting) usa una ontología para reescribir una consulta en una consulta reescrita que puede ser evaluada en la fuente de datos. Las consultas reescritas recuperan las respuestas que están implicadas por la combinación de los datos explicitamente almacenados en la fuente de datos, la consulta original y la ontología. Al trabajar sólo sobre las queries, la reescritura de consultas permite OBDA sobre cualquier fuente de datos que puede ser consultada, independientemente de las posibilidades para modificarla. Sin embargo, producir y evaluar las consultas reescritas son procesos costosos que suelen volverse más complejos conforme la expresividad y tamaño de la ontología y las consultas aumentan. En esta tesis exploramos distintas optimizaciones que peuden ser realizadas tanto en el proceso de reescritura como en las consultas reescritas para mejorar la aplicabilidad de OBDA en contextos realistas. Nuestra contribución técnica principal es un sistema de reescritura de consultas que implementa las optimizaciones presentadas en esta tesis. Estas optimizaciones son las contribuciones principales de la tesis y se pueden agrupar en tres grupos diferentes: -optimizaciones que se pueden aplicar al considerar los predicados en la ontología que no están realmente mapeados con las fuentes de datos. -optimizaciones en ingeniería que se pueden aplicar al manejar el proceso de reescritura de consultas en una forma que permite reducir la carga computacional del proceso de generación de consultas reescritas. -optimizaciones que se pueden aplicar al considerar metainformación adicional acerca de las características de la ABox. En esta tesis proporcionamos demostraciones formales acerca de la corrección y completitud de las optimizaciones propuestas, y una evaluación empírica acerca del impacto de estas optimizaciones. Como contribución adicional, parte de este enfoque empírico, proponemos un banco de pruebas (benchmark) para la evaluación de los sistemas de reescritura de consultas. Adicionalmente, proporcionamos algunas directrices para la creación y expansión de esta clase de bancos de pruebas. ABSTRACT Ontology-Based Data Access (OBDA) allows accessing different kinds of data sources (traditionally databases) using a more abstract model provided by an ontology. Query rewriting uses such ontology to rewrite a query into a rewritten query that can be evaluated on the data source. The rewritten queries retrieve the answers that are entailed by the combination of the data explicitly stored in the data source, the original query and the ontology. However, producing and evaluating the rewritten queries are both costly processes that become generally more complex as the expressiveness and size of the ontology and queries increase. In this thesis we explore several optimisations that can be performed both in the rewriting process and in the rewritten queries to improve the applicability of OBDA in real contexts. Our main technical contribution is a query rewriting system that implements the optimisations presented in this thesis. These optimisations are the core contributions of the thesis and can be grouped into three different groups: -optimisations that can be applied when considering the predicates in the ontology that are actually mapped to the data sources. -engineering optimisations that can be applied by handling the process of query rewriting in a way that permits to reduce the computational load of the query generation process. -optimisations that can be applied when considering additional metainformation about the characteristics of the ABox. In this thesis we provide formal proofs for the correctness of the proposed optimisations, and an empirical evaluation about the impact of the optimisations. As an additional contribution, part of this empirical approach, we propose a benchmark for the evaluation of query rewriting systems. We also provide some guidelines for the creation and expansion of this kind of benchmarks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

EMIR (Balcells et al. 2000) is a near-infrared wide-field camera and multi-object spectrograph being built for the GTC. The Data Reduction Pipeline (DRP) will be optimized for handling and reducing near-infrared data acquired with EMIR.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

National Highway Traffic Safety Administration, Washington, D.C.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

National Highway Traffic Safety Administration, Washington, D.C.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A concept has been developed where characteristic load cycles of longwall shields can describe most of the interaction between a longwall support and the roof. A characteristic load cycle is the change in support pressure with time from setting the support against the roof to the next release and movement of the support. The concept has been validated through the back-analysis of more than 500 000 individual load cycles in five longwall panels at four mines and seven geotechnical domains. The validation process depended upon the development of new software capable of both handling the large quantity of data emanating from a modern longwall and accurately delineating load cycles. Existing software was found not to be capable of delineating load cycles to a sufficient accuracy. Load-cycle analysis can now be used quantitatively to assess the adequacy of support capacity and the appropriateness of set pressure for the conditions under which a longwall is being operated. When linked to a description of geotechnical conditions, this has allowed the development of a database for support selection for greenfield sites. For existing sites, the load-cycle characteristic concept allows for a diagnosis of strata-support problem areas, enabling changes to be made to set pressure and mining strategies to manage better, or avoid, strata control problems. With further development of the software, there is the prospect of developing a system that is able to respond to changes in strata-support interaction in real time.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Retrieving large amounts of information over wide area networks, including the Internet, is problematic due to issues arising from latency of response, lack of direct memory access to data serving resources, and fault tolerance. This paper describes a design pattern for solving the issues of handling results from queries that return large amounts of data. Typically these queries would be made by a client process across a wide area network (or Internet), with one or more middle-tiers, to a relational database residing on a remote server. The solution involves implementing a combination of data retrieval strategies, including the use of iterators for traversing data sets and providing an appropriate level of abstraction to the client, double-buffering of data subsets, multi-threaded data retrieval, and query slicing. This design has recently been implemented and incorporated into the framework of a commercial software product developed at Oracle Corporation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent developments in service-oriented and distributed computing have created exciting opportunities for the integration of models in service chains to create the Model Web. This offers the potential for orchestrating web data and processing services, in complex chains; a flexible approach which exploits the increased access to products and tools, and the scalability offered by the Web. However, the uncertainty inherent in data and models must be quantified and communicated in an interoperable way, in order for its effects to be effectively assessed as errors propagate through complex automated model chains. We describe a proposed set of tools for handling, characterizing and communicating uncertainty in this context, and show how they can be used to 'uncertainty- enable' Web Services in a model chain. An example implementation is presented, which combines environmental and publicly-contributed data to produce estimates of sea-level air pressure, with estimates of uncertainty which incorporate the effects of model approximation as well as the uncertainty inherent in the observational and derived data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Pulse compression techniques originated in radar.The present work is concerned with the utilization of these techniques in general, and the linear FM (LFM) technique in particular, for comnunications. It introduces these techniques from an optimum communications viewpoint and outlines their capabilities.It also considers the candidacy of the class of LFM signals for digital data transmission and the LFM spectrum. Work related to the utilization of LFM signals for digital data transmission has been mostly experimental and mainly concerned with employing two rectangular LFM pulses (or chirps) with reversed slopes to convey the bits 1 and 0 in an incoherent node.No systematic theory for LFM signal design and system performance has been available. Accordingly, the present work establishes such a theory taking into account coherent and noncoherent single-link and multiplex signalling modes. Some new results concerning the slope-reversal chirp pair are obtained. The LFM technique combines the typical capabilities of pulse compression with a relative ease of implementation. However, these merits are often hampered by the difficulty of handling the LFM spectrum which cannot generally be expressed closed-form. The common practice is to obtain a plot of this spectrum with a digital computer for every single set of LFM pulse parameters.Moreover, reported work has been Justly confined to the spectrum of an ideally rectangular chirp pulse with no rise or fall times.Accordingly, the present work comprises a systerratic study of the LFM spectrum which takes the rise and fall time of the chirp pulse into account and can accommodate any LFM pulse with any parameters.It· formulates rather simple and accurate prediction criteria concerning the behaviour of this spectrum in the different frequency regions. These criteria would facilitate the handling of the LFM technique in theory and practice.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most of the existing work on information integration in the Semantic Web concentrates on resolving schema-level problems. Specific issues of data-level integration (instance coreferencing, conflict resolution, handling uncertainty) are usually tackled by applying the same techniques as for ontology schema matching or by reusing the solutions produced in the database domain. However, data structured according to OWL ontologies has its specific features: e.g., the classes are organized into a hierarchy, the properties are inherited, data constraints differ from those defined by database schema. This paper describes how these features are exploited in our architecture KnoFuss, designed to support data-level integration of semantic annotations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to be analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham’s razor non-plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Feature selection is important in medical field for many reasons. However, selecting important variables is a difficult task with the presence of censoring that is a unique feature in survival data analysis. This paper proposed an approach to deal with the censoring problem in endovascular aortic repair survival data through Bayesian networks. It was merged and embedded with a hybrid feature selection process that combines cox's univariate analysis with machine learning approaches such as ensemble artificial neural networks to select the most relevant predictive variables. The proposed algorithm was compared with common survival variable selection approaches such as; least absolute shrinkage and selection operator LASSO, and Akaike information criterion AIC methods. The results showed that it was capable of dealing with high censoring in the datasets. Moreover, ensemble classifiers increased the area under the roc curves of the two datasets collected from two centers located in United Kingdom separately. Furthermore, ensembles constructed with center 1 enhanced the concordance index of center 2 prediction compared to the model built with a single network. Although the size of the final reduced model using the neural networks and its ensembles is greater than other methods, the model outperformed the others in both concordance index and sensitivity for center 2 prediction. This indicates the reduced model is more powerful for cross center prediction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. ^ Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a twofold “custom wrapper” approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. ^ Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. ^ This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, the internet has grown exponentially, and become more complex. This increased complexity potentially introduces more network-level instability. But for any end-to-end internet connection, maintaining the connection's throughput and reliability at a certain level is very important. This is because it can directly affect the connection's normal operation. Therefore, a challenging research task is to improve a network's connection performance by optimizing its throughput and reliability. This dissertation proposed an efficient and reliable transport layer protocol (called concurrent TCP (cTCP)), an extension of the current TCP protocol, to optimize end-to-end connection throughput and enhance end-to-end connection fault tolerance. The proposed cTCP protocol could aggregate multiple paths' bandwidth by supporting concurrent data transfer (CDT) on a single connection. Here concurrent data transfer was defined as the concurrent transfer of data from local hosts to foreign hosts via two or more end-to-end paths. An RTT-Based CDT mechanism, which was based on a path's RTT (Round Trip Time) to optimize CDT performance, was developed for the proposed cTCP protocol. This mechanism primarily included an RTT-Based load distribution and path management scheme, which was used to optimize connections' throughput and reliability. A congestion control and retransmission policy based on RTT was also provided. According to experiment results, under different network conditions, our RTT-Based CDT mechanism could acquire good CDT performance. Finally a CWND-Based CDT mechanism, which was based on a path's CWND (Congestion Window), to optimize CDT performance was introduced. This mechanism primarily included: a CWND-Based load allocation scheme, which assigned corresponding data to paths based on their CWND to achieve aggregate bandwidth; a CWND-Based path management, which was used to optimize connections' fault tolerance; and a congestion control and retransmission management policy, which was similar to regular TCP in its separate path handling. According to corresponding experiment results, this mechanism could acquire near-optimal CDT performance under different network conditions.