25 resultados para downloading of data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

An important competence of human data analysts is to interpret and explain the meaning of the results of data analysis to end-users. However, existing automatic solutions for intelligent data analysis provide limited help to interpret and communicate information to non-expert users. In this paper we present a general approach to generating explanatory descriptions about the meaning of quantitative sensor data. We propose a type of web application: a virtual newspaper with automatically generated news stories that describe the meaning of sensor data. This solution integrates a variety of techniques from intelligent data analysis into a web-based multimedia presentation system. We validated our approach in a real world problem and demonstrate its generality using data sets from several domains. Our experience shows that this solution can facilitate the use of sensor data by general users and, therefore, can increase the utility of sensor network infrastructures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Microarray technique is rather powerful, as it allows to test up thousands of genes at a time, but this produces an overwhelming set of data files containing huge amounts of data, which is quite difficult to pre-process, separate, classify and correlate for interesting conclusions to be extracted. Modern machine learning, data mining and clustering techniques based on information theory, are needed to read and interpret the information contents buried in those large data sets. Independent Component Analysis method can be used to correct the data affected by corruption processes or to filter the uncorrectable one and then clustering methods can group similar genes or classify samples. In this paper a hybrid approach is used to obtain a two way unsupervised clustering for a corrected microarray data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An increasing number of neuroimaging studies are concerned with the identification of interactions or statistical dependencies between brain areas. Dependencies between the activities of different brain regions can be quantified with functional connectivity measures such as the cross-correlation coefficient. An important factor limiting the accuracy of such measures is the amount of empirical data available. For event-related protocols, the amount of data also affects the temporal resolution of the analysis. We use analytical expressions to calculate the amount of empirical data needed to establish whether a certain level of dependency is significant when the time series are autocorrelated, as is the case for biological signals. These analytical results are then contrasted with estimates from simulations based on real data recorded with magnetoencephalography during a resting-state paradigm and during the presentation of visual stimuli. Results indicate that, for broadband signals, 50–100 s of data is required to detect a true underlying cross-correlations coefficient of 0.05. This corresponds to a resolution of a few hundred milliseconds for typical event-related recordings. The required time window increases for narrow band signals as frequency decreases. For instance, approximately 3 times as much data is necessary for signals in the alpha band. Important implications can be derived for the design and interpretation of experiments to characterize weak interactions, which are potentially important for brain processing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acquired brain injury (ABI) is one of the leading causes of death and disability in the world and is associated with high health care costs as a result of the acute treatment and long term rehabilitation involved. Different algorithms and methods have been proposed to predict the effectiveness of rehabilitation programs. In general, research has focused on predicting the overall improvement of patients with ABI. The purpose of this study is the novel application of data mining (DM) techniques to predict the outcomes of cognitive rehabilitation in patients with ABI. We generate three predictive models that allow us to obtain new knowledge to evaluate and improve the effectiveness of the cognitive rehabilitation process. Decision tree (DT), multilayer perceptron (MLP) and general regression neural network (GRNN) have been used to construct the prediction models. 10-fold cross validation was carried out in order to test the algorithms, using the Institut Guttmann Neurorehabilitation Hospital (IG) patients database. Performance of the models was tested through specificity, sensitivity and accuracy analysis and confusion matrix analysis. The experimental results obtained by DT are clearly superior with a prediction average accuracy of 90.38%, while MLP and GRRN obtained a 78.7% and 75.96%, respectively. This study allows to increase the knowledge about the contributing factors of an ABI patient recovery and to estimate treatment efficacy in individual patients.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Replication Data Management (RDM) aims at enabling the use of data collections from several iterations of an experiment. However, there are several major challenges to RDM from integrating data models and data from empirical study infrastructures that were not designed to cooperate, e.g., data model variation of local data sources. [Objective] In this paper we analyze RDM needs and evaluate conceptual RDM approaches to support replication researchers. [Method] We adapted the ATAM evaluation process to (a) analyze RDM use cases and needs of empirical replication study research groups and (b) compare three conceptual approaches to address these RDM needs: central data repositories with a fixed data model, heterogeneous local repositories, and an empirical ecosystem. [Results] While the central and local approaches have major issues that are hard to resolve in practice, the empirical ecosystem allows bridging current gaps in RDM from heterogeneous data sources. [Conclusions] The empirical ecosystem approach should be explored in diverse empirical environments.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Una apropiada evaluación de los márgenes de seguridad de una instalación nuclear, por ejemplo, una central nuclear, tiene en cuenta todas las incertidumbres que afectan a los cálculos de diseño, funcionanmiento y respuesta ante accidentes de dicha instalación. Una fuente de incertidumbre son los datos nucleares, que afectan a los cálculos neutrónicos, de quemado de combustible o activación de materiales. Estos cálculos permiten la evaluación de las funciones respuesta esenciales para el funcionamiento correcto durante operación, y también durante accidente. Ejemplos de esas respuestas son el factor de multiplicación neutrónica o el calor residual después del disparo del reactor. Por tanto, es necesario evaluar el impacto de dichas incertidumbres en estos cálculos. Para poder realizar los cálculos de propagación de incertidumbres, es necesario implementar metodologías que sean capaces de evaluar el impacto de las incertidumbres de estos datos nucleares. Pero también es necesario conocer los datos de incertidumbres disponibles para ser capaces de manejarlos. Actualmente, se están invirtiendo grandes esfuerzos en mejorar la capacidad de analizar, manejar y producir datos de incertidumbres, en especial para isótopos importantes en reactores avanzados. A su vez, nuevos programas/códigos están siendo desarrollados e implementados para poder usar dichos datos y analizar su impacto. Todos estos puntos son parte de los objetivos del proyecto europeo ANDES, el cual ha dado el marco de trabajo para el desarrollo de esta tesis doctoral. Por tanto, primero se ha llevado a cabo una revisión del estado del arte de los datos nucleares y sus incertidumbres, centrándose en los tres tipos de datos: de decaimiento, de rendimientos de fisión y de secciones eficaces. A su vez, se ha realizado una revisión del estado del arte de las metodologías para la propagación de incertidumbre de estos datos nucleares. Dentro del Departamento de Ingeniería Nuclear (DIN) se propuso una metodología para la propagación de incertidumbres en cálculos de evolución isotópica, el Método Híbrido. Esta metodología se ha tomado como punto de partida para esta tesis, implementando y desarrollando dicha metodología, así como extendiendo sus capacidades. Se han analizado sus ventajas, inconvenientes y limitaciones. El Método Híbrido se utiliza en conjunto con el código de evolución isotópica ACAB, y se basa en el muestreo por Monte Carlo de los datos nucleares con incertidumbre. En esta metodología, se presentan diferentes aproximaciones según la estructura de grupos de energía de las secciones eficaces: en un grupo, en un grupo con muestreo correlacionado y en multigrupos. Se han desarrollado diferentes secuencias para usar distintas librerías de datos nucleares almacenadas en diferentes formatos: ENDF-6 (para las librerías evaluadas), COVERX (para las librerías en multigrupos de SCALE) y EAF (para las librerías de activación). Gracias a la revisión del estado del arte de los datos nucleares de los rendimientos de fisión se ha identificado la falta de una información sobre sus incertidumbres, en concreto, de matrices de covarianza completas. Además, visto el renovado interés por parte de la comunidad internacional, a través del grupo de trabajo internacional de cooperación para evaluación de datos nucleares (WPEC) dedicado a la evaluación de las necesidades de mejora de datos nucleares mediante el subgrupo 37 (SG37), se ha llevado a cabo una revisión de las metodologías para generar datos de covarianza. Se ha seleccionando la actualización Bayesiana/GLS para su implementación, y de esta forma, dar una respuesta a dicha falta de matrices completas para rendimientos de fisión. Una vez que el Método Híbrido ha sido implementado, desarrollado y extendido, junto con la capacidad de generar matrices de covarianza completas para los rendimientos de fisión, se han estudiado diferentes aplicaciones nucleares. Primero, se estudia el calor residual tras un pulso de fisión, debido a su importancia para cualquier evento después de la parada/disparo del reactor. Además, se trata de un ejercicio claro para ver la importancia de las incertidumbres de datos de decaimiento y de rendimientos de fisión junto con las nuevas matrices completas de covarianza. Se han estudiado dos ciclos de combustible de reactores avanzados: el de la instalación europea para transmutación industrial (EFIT) y el del reactor rápido de sodio europeo (ESFR), en los cuales se han analizado el impacto de las incertidumbres de los datos nucleares en la composición isotópica, calor residual y radiotoxicidad. Se han utilizado diferentes librerías de datos nucleares en los estudios antreriores, comparando de esta forma el impacto de sus incertidumbres. A su vez, mediante dichos estudios, se han comparando las distintas aproximaciones del Método Híbrido y otras metodologías para la porpagación de incertidumbres de datos nucleares: Total Monte Carlo (TMC), desarrollada en NRG por A.J. Koning y D. Rochman, y NUDUNA, desarrollada en AREVA GmbH por O. Buss y A. Hoefer. Estas comparaciones demostrarán las ventajas del Método Híbrido, además de revelar sus limitaciones y su rango de aplicación. ABSTRACT For an adequate assessment of safety margins of nuclear facilities, e.g. nuclear power plants, it is necessary to consider all possible uncertainties that affect their design, performance and possible accidents. Nuclear data are a source of uncertainty that are involved in neutronics, fuel depletion and activation calculations. These calculations can predict critical response functions during operation and in the event of accident, such as decay heat and neutron multiplication factor. Thus, the impact of nuclear data uncertainties on these response functions needs to be addressed for a proper evaluation of the safety margins. Methodologies for performing uncertainty propagation calculations need to be implemented in order to analyse the impact of nuclear data uncertainties. Nevertheless, it is necessary to understand the current status of nuclear data and their uncertainties, in order to be able to handle this type of data. Great eórts are underway to enhance the European capability to analyse/process/produce covariance data, especially for isotopes which are of importance for advanced reactors. At the same time, new methodologies/codes are being developed and implemented for using and evaluating the impact of uncertainty data. These were the objectives of the European ANDES (Accurate Nuclear Data for nuclear Energy Sustainability) project, which provided a framework for the development of this PhD Thesis. Accordingly, first a review of the state-of-the-art of nuclear data and their uncertainties is conducted, focusing on the three kinds of data: decay, fission yields and cross sections. A review of the current methodologies for propagating nuclear data uncertainties is also performed. The Nuclear Engineering Department of UPM has proposed a methodology for propagating uncertainties in depletion calculations, the Hybrid Method, which has been taken as the starting point of this thesis. This methodology has been implemented, developed and extended, and its advantages, drawbacks and limitations have been analysed. It is used in conjunction with the ACAB depletion code, and is based on Monte Carlo sampling of variables with uncertainties. Different approaches are presented depending on cross section energy-structure: one-group, one-group with correlated sampling and multi-group. Differences and applicability criteria are presented. Sequences have been developed for using different nuclear data libraries in different storing-formats: ENDF-6 (for evaluated libraries) and COVERX (for multi-group libraries of SCALE), as well as EAF format (for activation libraries). A revision of the state-of-the-art of fission yield data shows inconsistencies in uncertainty data, specifically with regard to complete covariance matrices. Furthermore, the international community has expressed a renewed interest in the issue through the Working Party on International Nuclear Data Evaluation Co-operation (WPEC) with the Subgroup (SG37), which is dedicated to assessing the need to have complete nuclear data. This gives rise to this review of the state-of-the-art of methodologies for generating covariance data for fission yields. Bayesian/generalised least square (GLS) updating sequence has been selected and implemented to answer to this need. Once the Hybrid Method has been implemented, developed and extended, along with fission yield covariance generation capability, different applications are studied. The Fission Pulse Decay Heat problem is tackled first because of its importance during events after shutdown and because it is a clean exercise for showing the impact and importance of decay and fission yield data uncertainties in conjunction with the new covariance data. Two fuel cycles of advanced reactors are studied: the European Facility for Industrial Transmutation (EFIT) and the European Sodium Fast Reactor (ESFR), and response function uncertainties such as isotopic composition, decay heat and radiotoxicity are addressed. Different nuclear data libraries are used and compared. These applications serve as frameworks for comparing the different approaches of the Hybrid Method, and also for comparing with other methodologies: Total Monte Carlo (TMC), developed at NRG by A.J. Koning and D. Rochman, and NUDUNA, developed at AREVA GmbH by O. Buss and A. Hoefer. These comparisons reveal the advantages, limitations and the range of application of the Hybrid Method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, organizations have plenty of data stored in DB databases, which contain invaluable information. Decision Support Systems DSS provide the support needed to manage this information and planning médium and long-term ?the modus operandi? of these organizations. Despite the growing importance of these systems, most proposals do not include its total evelopment, mostly limiting itself on the development of isolated parts, which often have serious integration problems. Hence, methodologies that include models and processes that consider every factor are necessary. This paper will try to fill this void as it proposes an approach for developing spatial DSS driven by the development of their associated Data Warehouse DW, without forgetting its other components. To the end of framing the proposal different Engineering Software focus (The Software Engineering Process and Model Driven Architecture) are used, and coupling with the DB development methodology, (and both of them adapted to DW peculiarities). Finally, an example illustrates the proposal.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Within the European Union, member states are setting up official data catalogues as entry points to access PSI (Public Sector Information). In this context, it is important to describe the metadata of these data portals, i.e., of data catalogs, and allow for interoperability among them. To tackle these issues, the Government Linked Data Working Group developed DCAT (Data Catalog Vocabulary), an RDF vocabulary for describing the metadata of data catalogs. This topic report analyzes the current use of the DCAT vocabulary in several European data catalogs and proposes some recommendations to deal with an inconsistent use of the metadata across countries. The enrichment of such metadata vocabularies with multilingual descriptions, as well as an account for cultural divergences, is seen as a necessary step to guarantee interoperability and ensure wider adoption.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last years, there has been an increase in the amount of real-time data generated. Sensors attached to things are transforming how we interact with our environment. Extracting meaningful information from these streams of data is essential for some application areas and requires processing systems that scale to varying conditions in data sources, complex queries, and system failures. This paper describes ongoing research on the development of a scalable RDF streaming engine.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last decade, multi-sensor data fusion has become a broadly demanded discipline to achieve advanced solutions that can be applied in many real world situations, either civil or military. In Defence,accurate detection of all target objects is fundamental to maintaining situational awareness, to locating threats in the battlefield and to identifying and protecting strategically own forces. Civil applications, such as traffic monitoring, have similar requirements in terms of object detection and reliable identification of incidents in order to ensure safety of road users. Thanks to the appropriate data fusion technique, we can give these systems the power to exploit automatically all relevant information from multiple sources to face for instance mission needs or assess daily supervision operations. This paper focuses on its application to active vehicle monitoring in a particular area of high density traffic, and how it is redirecting the research activities being carried out in the computer vision, signal processing and machine learning fields for improving the effectiveness of detection and tracking in ground surveillance scenarios in general. Specifically, our system proposes fusion of data at a feature level which is extracted from a video camera and a laser scanner. In addition, a stochastic-based tracking which introduces some particle filters into the model to deal with uncertainty due to occlusions and improve the previous detection output is presented in this paper. It has been shown that this computer vision tracker contributes to detect objects even under poor visual information. Finally, in the same way that humans are able to analyze both temporal and spatial relations among items in the scene to associate them a meaning, once the targets objects have been correctly detected and tracked, it is desired that machines can provide a trustworthy description of what is happening in the scene under surveillance. Accomplishing so ambitious task requires a machine learning-based hierarchic architecture able to extract and analyse behaviours at different abstraction levels. A real experimental testbed has been implemented for the evaluation of the proposed modular system. Such scenario is a closed circuit where real traffic situations can be simulated. First results have shown the strength of the proposed system.