901 resultados para Techniques of data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This data set contains two time series of measurements of dissolved phosphorus (organic, inorganic and total with a biweekly resolution) and dissolved inorganic phosphorus with a seasonal resolution. In addition, data on phosphorus from soil samples measured in 2007 and fractionated by different acid-extrations (Hedley fractions) are provided. All data measured at the main experiment plots of a large grassland biodiversity experiment (the Jena Experiment; see further details below). In the main experiment, 82 grassland plots of 20 x 20 m were established from a pool of 60 species belonging to four functional groups (grasses, legumes, tall and small herbs). In May 2002, varying numbers of plant species from this species pool were sown into the plots to create a gradient of plant species richness (1, 2, 4, 8, 16 and 60 species) and functional richness (1, 2, 3, 4 functional groups). Plots were maintained by bi-annual weeding and mowing. 1. Dissolved phosphorus in soil solution: Suction plates installed on the field site in 10, 20, 30 and 60 cm depth were used to sample soil pore water. Cumulatively extracted soil solution was collected every two weeks from October 2002 to May 2006. The biweekly samples from 2002, 2003 and 2004 were analyzed for dissolved organic phosphorus (DOP), dissolved inorganic phosphorus (PO4P) and dissolved total phosphorus (TDP) by Continuous Flow Analyzer (CFA SAN ++, SKALAR [Breda, The Netherlands]). 2. Seasonal values of dissolved inorganic phosphorus in soil solution were calculated as volume-weighted mean values of the biweekly measurements (spring = March to May, summer = June to August, fall = September to November, winter = December to February). 3. Phosphorus fractions in soil: Five independent soil samples per plot were taken in a depth of 0-15 cm using a soil corer with an inner diameter of 1 cm. The five samples per plot were combined to one composite sample per plot. A four-step sequential P fractionation (Hedley fractions) was applied and concentrations of P fractions in soil were measured photometrically (molybdenum blue-reactive P) with a Continuous Flow Analyzer (Bran&Luebbe, Germany).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The measurements were obtained during two North Sea wide STAR-shaped cruises during summer 1986 and winter 1987, which were performed to investigate the circulation induced transport and biologically induced pollutant transfer within the interdisciplinary research in the project "ZISCH - Zirkulation und Schadstoffumsatz in der Nordsee / Circulation and Contaminant Fluxes in the North Sea (1984-1989)". The inventory presents parameters measured on hydrodynamics, nutrient dynamics, ecosystem dynamics and pollutant dynamics in the pelagic and benthic realm. The research program had the objective of quantifying fluxes of major budgets, especially contaminants in the North Sea. In spring 1986, following the phytoplankton spring bloom, and in late winter 1987, at minimum primary production activity, the North Sea ecosystem was investigated on a station net covering the whole North Sea. The station net was shaped like a star. Sampling started in the centre, followed by the northwest section and moving counter clockwise around the North Sea following the residual currents. By this strategy, a time series was measured in the central North Sea and more synoptic data sets were obtained in the individual sections. Generally advection processes have to be considered when comparing the data from different stations. The entire sampling period lasted for more than six weeks in each cruise. Thus, a time-lag should be considered especially when comparing the data from the eastern and the western part of the central and northern North Sea, where samples were taken at the beginning and at the end of the campaign. The ZISCH investigations represented a qualitatively and quantitatively new approach to North Sea research in several respects. (1) The first simultaneous blanket coverage of all important biological, chemical and physical parameters in the entire North Sea ecosystem; (2) the first simultaneous measurements of major contaminants (metals and organohaline compounds) in the different ecosystem compartments; (3) simultaneous determinations of atmospheric inputs of momentum, energy and matter as important ecosystem boundary conditions; (4) performance of the complex measurement program during two seasons, namely the spring plankton bloom and the subsequent winter period of minimal biological activity; and (5) support of data analysis and interpretation by oceanographic and meteorological numerical models on the same scales.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Several meta-analysis methods can be used to quantitatively combine the results of a group of experiments, including the weighted mean difference, statistical vote counting, the parametric response ratio and the non-parametric response ratio. The software engineering community has focused on the weighted mean difference method. However, other meta-analysis methods have distinct strengths, such as being able to be used when variances are not reported. There are as yet no guidelines to indicate which method is best for use in each case. Aim: Compile a set of rules that SE researchers can use to ascertain which aggregation method is best for use in the synthesis phase of a systematic review. Method: Monte Carlo simulation varying the number of experiments in the meta analyses, the number of subjects that they include, their variance and effect size. We empirically calculated the reliability and statistical power in each case Results: WMD is generally reliable if the variance is low, whereas its power depends on the effect size and number of subjects per meta-analysis; the reliability of RR is generally unaffected by changes in variance, but it does require more subjects than WMD to be powerful; NPRR is the most reliable method, but it is not very powerful; SVC behaves well when the effect size is moderate, but is less reliable with other effect sizes. Detailed tables of results are annexed. Conclusions: Before undertaking statistical aggregation in software engineering, it is worthwhile checking whether there is any appreciable difference in the reliability and power of the methods. If there is, software engineers should select the method that optimizes both parameters.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La normalización de los métodos de análisis y de los principales aspectos relacionados con la conservación de los bienes culturales ha empezado en 2004 con la creación del comité europeo de normalización, CEN/TC 346 Conservation of Cultural Property, que tiene la responsabilidad no solamente de redactar protocolos de ensayos en laboratorio, sino también proponer las recomendaciones más adecuadas para designarlos de forma consensual y conservarlos de la forma más apropiada. Se comentan los aspectos relacionados con el origen de estas normas, el trabajo desarrollado y que muchas de ellas, aunque no estén dirigidas específicamente a la piedra, tienen en cuenta la presencia de este material en objetos arqueológicos, obras de arte, estructuras de fábricas y elementos ornamentales.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the paper we report on the results of our experiments on the construction of the opinion ontology. Our aim is to show the benefits of publishing in the open, on the Web, the results of the opinion mining process in a structured form. On the road to achieving this, we attempt to answer the research question to what extent opinion information can be formalized in a unified way. Furthermore, as part of the evaluation, we experiment with the usage of Semantic Web technologies and show particular use cases that support our claims.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a study of the effectiveness of global analysis in the parallelization of logic programs using strict independence. A number of well-known approximation domains are selected and tlieir usefulness for the application in hand is explained. Also, methods for using the information provided by such domains to improve parallelization are proposed. Local and global analyses are built using these domains and such analyses are embedded in a complete parallelizing compiler. Then, the performance of the domains (and the system in general) is assessed for this application through a number of experiments. We argüe that the results offer significant insight into the characteristics of these domains, the demands of the application, and the tradeoffs involved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cross‐lingual link discovery in the Web of Data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data grid services have been used to deal with the increasing needs of applications in terms of data volume and throughput. The large scale, heterogeneity and dynamism of grid environments often make management and tuning of these data services very complex. Furthermore, current high-performance I/O approaches are characterized by their high complexity and specific features that usually require specialized administrator skills. Autonomic computing can help manage this complexity. The present paper describes an autonomic subsystem intended to provide self-management features aimed at efficiently reducing the I/O problem in a grid environment, thereby enhancing the quality of service (QoS) of data access and storage services in the grid. Our proposal takes into account that data produced in an I/O system is not usually immediately required. Therefore, performance improvements are related not only to current but also to any future I/O access, as the actual data access usually occurs later on. Nevertheless, the exact time of the next I/O operations is unknown. Thus, our approach proposes a long-term prediction designed to forecast the future workload of grid components. This enables the autonomic subsystem to determine the optimal data placement to improve both current and future I/O operations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The study of temperature gradients in cold stores and containers is a critical issue in the food industry for the quality assurance of products during transport, as well as forminimizing losses. The objective of this work is to develop a new methodology of data analysis based on phase space graphs of temperature and enthalpy, collected by means of multidistributed, low cost and autonomous wireless sensors and loggers. A transoceanic refrigerated transport of lemons in a reefer container ship from Montevideo (Uruguay) to Cartagena (Spain) was monitored with a network of 39 semi-passive TurboTag RFID loggers and 13 i-button loggers. Transport included intermodal transit from transoceanic to short shipping vessels and a truck trip. Data analysis is carried out using qualitative phase diagrams computed on the basis of Takens?Ruelle reconstruction of attractors. Fruit stress is quantified in terms of the phase diagram area which characterizes the cyclic behaviour of temperature. Areas within the enthalpy phase diagram computed for the short sea shipping transport were 5 times higher than those computed for the long sea shipping, with coefficients of variation above 100% for both periods. This new methodology for data analysis highlights the significant heterogeneity of thermohygrometric conditions at different locations in the container.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Los avances en el hardware permiten disponer de grandes volúmenes de datos, surgiendo aplicaciones que deben suministrar información en tiempo cuasi-real, la monitorización de pacientes, ej., el seguimiento sanitario de las conducciones de agua, etc. Las necesidades de estas aplicaciones hacen emerger el modelo de flujo de datos (data streaming) frente al modelo almacenar-para-despuésprocesar (store-then-process). Mientras que en el modelo store-then-process, los datos son almacenados para ser posteriormente consultados; en los sistemas de streaming, los datos son procesados a su llegada al sistema, produciendo respuestas continuas sin llegar a almacenarse. Esta nueva visión impone desafíos para el procesamiento de datos al vuelo: 1) las respuestas deben producirse de manera continua cada vez que nuevos datos llegan al sistema; 2) los datos son accedidos solo una vez y, generalmente, no son almacenados en su totalidad; y 3) el tiempo de procesamiento por dato para producir una respuesta debe ser bajo. Aunque existen dos modelos para el cómputo de respuestas continuas, el modelo evolutivo y el de ventana deslizante; éste segundo se ajusta mejor en ciertas aplicaciones al considerar únicamente los datos recibidos más recientemente, en lugar de todo el histórico de datos. En los últimos años, la minería de datos en streaming se ha centrado en el modelo evolutivo. Mientras que, en el modelo de ventana deslizante, el trabajo presentado es más reducido ya que estos algoritmos no sólo deben de ser incrementales si no que deben borrar la información que caduca por el deslizamiento de la ventana manteniendo los anteriores tres desafíos. Una de las tareas fundamentales en minería de datos es la búsqueda de agrupaciones donde, dado un conjunto de datos, el objetivo es encontrar grupos representativos, de manera que se tenga una descripción sintética del conjunto. Estas agrupaciones son fundamentales en aplicaciones como la detección de intrusos en la red o la segmentación de clientes en el marketing y la publicidad. Debido a las cantidades masivas de datos que deben procesarse en este tipo de aplicaciones (millones de eventos por segundo), las soluciones centralizadas puede ser incapaz de hacer frente a las restricciones de tiempo de procesamiento, por lo que deben recurrir a descartar datos durante los picos de carga. Para evitar esta perdida de datos, se impone el procesamiento distribuido de streams, en concreto, los algoritmos de agrupamiento deben ser adaptados para este tipo de entornos, en los que los datos están distribuidos. En streaming, la investigación no solo se centra en el diseño para tareas generales, como la agrupación, sino también en la búsqueda de nuevos enfoques que se adapten mejor a escenarios particulares. Como ejemplo, un mecanismo de agrupación ad-hoc resulta ser más adecuado para la defensa contra la denegación de servicio distribuida (Distributed Denial of Services, DDoS) que el problema tradicional de k-medias. En esta tesis se pretende contribuir en el problema agrupamiento en streaming tanto en entornos centralizados y distribuidos. Hemos diseñado un algoritmo centralizado de clustering mostrando las capacidades para descubrir agrupaciones de alta calidad en bajo tiempo frente a otras soluciones del estado del arte, en una amplia evaluación. Además, se ha trabajado sobre una estructura que reduce notablemente el espacio de memoria necesario, controlando, en todo momento, el error de los cómputos. Nuestro trabajo también proporciona dos protocolos de distribución del cómputo de agrupaciones. Se han analizado dos características fundamentales: el impacto sobre la calidad del clustering al realizar el cómputo distribuido y las condiciones necesarias para la reducción del tiempo de procesamiento frente a la solución centralizada. Finalmente, hemos desarrollado un entorno para la detección de ataques DDoS basado en agrupaciones. En este último caso, se ha caracterizado el tipo de ataques detectados y se ha desarrollado una evaluación sobre la eficiencia y eficacia de la mitigación del impacto del ataque. ABSTRACT Advances in hardware allow to collect huge volumes of data emerging applications that must provide information in near-real time, e.g., patient monitoring, health monitoring of water pipes, etc. The data streaming model emerges to comply with these applications overcoming the traditional store-then-process model. With the store-then-process model, data is stored before being consulted; while, in streaming, data are processed on the fly producing continuous responses. The challenges of streaming for processing data on the fly are the following: 1) responses must be produced continuously whenever new data arrives in the system; 2) data is accessed only once and is generally not maintained in its entirety, and 3) data processing time to produce a response should be low. Two models exist to compute continuous responses: the evolving model and the sliding window model; the latter fits best with applications must be computed over the most recently data rather than all the previous data. In recent years, research in the context of data stream mining has focused mainly on the evolving model. In the sliding window model, the work presented is smaller since these algorithms must be incremental and they must delete the information which expires when the window slides. Clustering is one of the fundamental techniques of data mining and is used to analyze data sets in order to find representative groups that provide a concise description of the data being processed. Clustering is critical in applications such as network intrusion detection or customer segmentation in marketing and advertising. Due to the huge amount of data that must be processed by such applications (up to millions of events per second), centralized solutions are usually unable to cope with timing restrictions and recur to shedding techniques where data is discarded during load peaks. To avoid discarding of data, processing of streams (such as clustering) must be distributed and adapted to environments where information is distributed. In streaming, research does not only focus on designing for general tasks, such as clustering, but also in finding new approaches that fit bests with particular scenarios. As an example, an ad-hoc grouping mechanism turns out to be more adequate than k-means for defense against Distributed Denial of Service (DDoS). This thesis contributes to the data stream mining clustering technique both for centralized and distributed environments. We present a centralized clustering algorithm showing capabilities to discover clusters of high quality in low time and we provide a comparison with existing state of the art solutions. We have worked on a data structure that significantly reduces memory requirements while controlling the error of the clusters statistics. We also provide two distributed clustering protocols. We focus on the analysis of two key features: the impact on the clustering quality when computation is distributed and the requirements for reducing the processing time compared to the centralized solution. Finally, with respect to ad-hoc grouping techniques, we have developed a DDoS detection framework based on clustering.We have characterized the attacks detected and we have evaluated the efficiency and effectiveness of mitigating the attack impact.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is a known fact that noise analysis is a suitable method for sensor performance surveillance. In particular, controlling the response time of a sensor is an efficient way to anticipate failures and to have the opportunity to prevent them. In this work the response times of several sensors of Trillo NPP are estimated by means of noise analysis. The procedure applied consists of modeling each sensor with autoregressive methods and getting the searched parameter by analyzing the response of the model when a ramp is simulated as the input signal. Core exit thermocouples and in core self-powered neutron detectors are the main sensors analyzed but other plant sensors are studied as well. Since several measurement campaigns have been carried out, it has been also possible to analyze the evolution of the estimated parameters during more than one fuel cycle. Some sensitivity studies for the sample frequency of the signals and its influence on the response time are also included. Calculations and analysis have been done in the frame of a collaboration agreement between Trillo NPP operator (CNAT) and the School of Mines of Madrid.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dimensionality Reduction (DR) is attracting more attention these days as a result of the increasing need to handle huge amounts of data effectively. DR methods allow the number of initial features to be reduced considerably until a set of them is found that allows the original properties of the data to be kept. However, their use entails an inherent loss of quality that is likely to affect the understanding of the data, in terms of data analysis. This loss of quality could be determinant when selecting a DR method, because of the nature of each method. In this paper, we propose a methodology that allows different DR methods to be analyzed and compared as regards the loss of quality produced by them. This methodology makes use of the concept of preservation of geometry (quality assessment criteria) to assess the loss of quality. Experiments have been carried out by using the most well-known DR algorithms and quality assessment criteria, based on the literature. These experiments have been applied on 12 real-world datasets. Results obtained so far show that it is possible to establish a method to select the most appropriate DR method, in terms of minimum loss of quality. Experiments have also highlighted some interesting relationships between the quality assessment criteria. Finally, the methodology allows the appropriate choice of dimensionality for reducing data to be established, whilst giving rise to a minimum loss of quality.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ion beam therapy is a valuable method for the treatment of deep-seated and radio-resistant tumors thanks to the favorable depth-dose distribution characterized by the Bragg peak. Hadrontherapy facilities take advantage of the specific ion range, resulting in a highly conformal dose in the target volume, while the dose in critical organs is reduced as compared to photon therapy. The necessity to monitor the delivery precision, i.e. the ion range, is unquestionable, thus different approaches have been investigated, such as the detection of prompt photons or annihilation photons of positron emitter nuclei created during the therapeutic treatment. Based on the measurement of the induced β+ activity, our group has developed various in-beam PET prototypes: the one under test is composed by two planar detector heads, each one consisting of four modules with a total active area of 10 × 10 cm2. A single detector module is made of a LYSO crystal matrix coupled to a position sensitive photomultiplier and is read-out by dedicated frontend electronics. A preliminary data taking was performed at the Italian National Centre for Oncological Hadron Therapy (CNAO, Pavia), using proton beams in the energy range of 93–112 MeV impinging on a plastic phantom. The measured activity profiles are presented and compared with the simulated ones based on the Monte Carlo FLUKA package.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In coffee processing the fermentation stage is considered one of the critical operations by its impact on the final quality of the product. However, the level of control of the fermentation process on each farm is often not adequate; the use of sensorics for controlling coffee fermentation is not common. The objective of this work is to characterize the fermentation temperature in a fermentation tank by applying spatial interpolation and a new methodology of data analysis based on phase space diagrams of temperature data, collected by means of multi-distributed, low cost and autonomous wireless sensors. A real coffee fermentation was supervised in the Cauca region (Colombia) with a network of 24 semi-passive TurboTag RFID temperature loggers with vacuum plastic cover, submerged directly in the fermenting mass. Temporal evolution and spatial distribution of temperature is described in terms of the phase diagram areas which characterizes the cyclic behaviour of temperature and highlights the significant heterogeneity of thermal conditions at different locations in the tank where the average temperature of the fermentation was 21.2 °C, although there were temperature ranges of 4.6°C, and average spatial standard deviation of ±1.21ºC. In the upper part of the tank we found high heterogeneity of temperatures, the higher temperatures and therefore the higher fermentation rates. While at the bottom, it has been computed an area in the phase diagram practically half of the area occupied by the sensors of the upper tank, therefore this location showed higher temperature homogeneity

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Lately, the mobile data market has moved into a growth stage triggered by two facts: affordability of mobile broadband, and availability of data-friendly devices. At this stage, market growth is no longer dependent on push strategies from suppliers; on the contrary, demand is now driving the market. However, it will not be easy for mobile operating companies to cope up with the demand to come in the near future. The infrastructure that is needed to support corresponding demand is far from completion. Operators are forced to make heavy investments to upgrade and expand their networks. To decide how to handle the present and upcoming demand, they need to identify and understand the characteristics of the scenarios they face. This is precisely the aim of this article, which provides figures on the consequences for mobile infrastructures of a generalised mobile media uptake. Data from the Spanish mobile deployment case have been used to arrive at practical figures and illustration of results, but the conclusions are easily extended to other countries and regions