16 resultados para Distributed data
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
A mesura que la investigació depèn cada vegada més dels computadors, l'emmagatzematge de dades comença a convertir-se en un recurs escàs per als projectes, i suposa una gran part del cost total. Alguns projectes intenten resoldre aquest problema emprant emmagatzament distribuït. És doncs necessari que alguns centres proveeixin de grans quantitats d'emmagatzematge massiu de baix cost basat en cintes magnètiques. L'inconvenient d'aquesta solució és que el rendiment disminueix, particularment a l'hora de tractar-se de grans quantitats d'arxius petits. El nostre objectiu és crear un híbrid entre un sistema d'alt cost i rendiment basat en discs, i un de baix cost i rendiment basat en cintes. Per això, unirem dCache, un sistema d'emmagatzematge distribuït, amb Castor, un sistema d'emmagatzematge jeràrquic, creant sistemes de fitxers virtuals que contindran grans quantitats d'arxius petits per millorar el rendiment global del sistema.
Resumo:
El present treball fa un anàlisi i desenvolupament sobre les millores en la velocitat i en l’escalabilitat d'un simulador distribuït de grups de peixos. Aquests resultats s’han obtingut fent servir una nova estratègia de comunicació per als processos lògics (LPs) i canvis en l'algoritme de selecció de veïns que s'aplica a cadascun dels peixos en cada pas de simulació. L’idea proposada permet que cada procés lògic anticipi futures necessitats de dades pels seus veïns reduint el temps de comunicació al limitar la quantitat de missatges intercanviats entre els LPs. El nou algoritme de selecció dels veïns es va desenvolupar amb l'objectiu d'evitar treball innecessari permetent la disminució de les instruccions executades en cada pas de simulació i per cadascun del peixos simulats reduint de forma significativa el temps de simulació.
Resumo:
Un reto al ejecutar las aplicaciones en un cluster es lograr mejorar las prestaciones utilizando los recursos de manera eficiente, y este reto es mayor al utilizar un ambiente distribuido. Teniendo en cuenta este reto, se proponen un conjunto de reglas para realizar el cómputo en cada uno de los nodos, basado en el análisis de cómputo y comunicaciones de las aplicaciones, se analiza un esquema de mapping de celdas y un método para planificar el orden de ejecución, tomando en consideración la ejecución por prioridad, donde las celdas de fronteras tienen una mayor prioridad con respecto a las celdas internas. En la experimentación se muestra el solapamiento del computo interno con las comunicaciones de las celdas fronteras, obteniendo resultados donde el Speedup aumenta y los niveles de eficiencia se mantienen por encima de un 85%, finalmente se obtiene ganancias de los tiempos de ejecución, concluyendo que si se puede diseñar un esquemas de solapamiento que permita que la ejecución de las aplicaciones SPMD en un cluster se hagan de forma eficiente.
Resumo:
Consider a model with parameter phi, and an auxiliary model with parameter theta. Let phi be a randomly sampled from a given density over the known parameter space. Monte Carlo methods can be used to draw simulated data and compute the corresponding estimate of theta, say theta_tilde. A large set of tuples (phi, theta_tilde) can be generated in this manner. Nonparametric methods may be use to fit the function E(phi|theta_tilde=a), using these tuples. It is proposed to estimate phi using the fitted E(phi|theta_tilde=theta_hat), where theta_hat is the auxiliary estimate, using the real sample data. This is a consistent and asymptotically normally distributed estimator, under certain assumptions. Monte Carlo results for dynamic panel data and vector autoregressions show that this estimator can have very attractive small sample properties. Confidence intervals can be constructed using the quantiles of the phi for which theta_tilde is close to theta_hat. Such confidence intervals are found to have very accurate coverage.
Resumo:
This paper presents the "state of the art" about distributed systems and applications and it's focused on teaching about these systems. It presents different platforms where to run distributed applications and describes some development toolkits whose can be used to develop prototypes, practices and distributed applications. It also presents some existing distributed algorithms useful for class practices, and some tools to help managing distributed environments. Finally, the paper presents some teaching experiences with different approaches on how to teach about distributed systems.
Resumo:
Through this study, we will measure how the collective MPI operations behaves in virtual and physical clusters, and its impact on the application performance. As we stated before, we will use as a test case the Weather Research and Forecasting simulations.
Resumo:
One of the tantalising remaining problems in compositional data analysis lies in how to deal with data sets in which there are components which are essential zeros. By anessential zero we mean a component which is truly zero, not something recorded as zero simply because the experimental design or the measuring instrument has not been sufficiently sensitive to detect a trace of the part. Such essential zeros occur inmany compositional situations, such as household budget patterns, time budgets,palaeontological zonation studies, ecological abundance studies. Devices such as nonzero replacement and amalgamation are almost invariably ad hoc and unsuccessful insuch situations. From consideration of such examples it seems sensible to build up amodel in two stages, the first determining where the zeros will occur and the secondhow the unit available is distributed among the non-zero parts. In this paper we suggest two such models, an independent binomial conditional logistic normal model and a hierarchical dependent binomial conditional logistic normal model. The compositional data in such modelling consist of an incidence matrix and a conditional compositional matrix. Interesting statistical problems arise, such as the question of estimability of parameters, the nature of the computational process for the estimation of both the incidence and compositional parameters caused by the complexity of the subcompositional structure, the formation of meaningful hypotheses, and the devising of suitable testing methodology within a lattice of such essential zero-compositional hypotheses. The methodology is illustrated by application to both simulated and real compositional data
Resumo:
The increasing volume of data describing humandisease processes and the growing complexity of understanding, managing, and sharing such data presents a huge challenge for clinicians and medical researchers. This paper presents the@neurIST system, which provides an infrastructure for biomedical research while aiding clinical care, by bringing together heterogeneous data and complex processing and computing services. Although @neurIST targets the investigation and treatment of cerebral aneurysms, the system’s architecture is generic enough that it could be adapted to the treatment of other diseases.Innovations in @neurIST include confining the patient data pertaining to aneurysms inside a single environment that offers cliniciansthe tools to analyze and interpret patient data and make use of knowledge-based guidance in planning their treatment. Medicalresearchers gain access to a critical mass of aneurysm related data due to the system’s ability to federate distributed informationsources. A semantically mediated grid infrastructure ensures that both clinicians and researchers are able to seamlessly access andwork on data that is distributed across multiple sites in a secure way in addition to providing computing resources on demand forperforming computationally intensive simulations for treatment planning and research.
Resumo:
Structural equation models (SEM) are commonly used to analyze the relationship between variables some of which may be latent, such as individual ``attitude'' to and ``behavior'' concerning specific issues. A number of difficulties arise when we want to compare a large number of groups, each with large sample size, and the manifest variables are distinctly non-normally distributed. Using an specific data set, we evaluate the appropriateness of the following alternative SEM approaches: multiple group versus MIMIC models, continuous versus ordinal variables estimation methods, and normal theory versus non-normal estimation methods. The approaches are applied to the ISSP-1993 Environmental data set, with the purpose of exploring variation in the mean level of variables of ``attitude'' to and ``behavior''concerning environmental issues and their mutual relationship across countries. Issues of both theoretical and practical relevance arise in the course of this application.
Resumo:
Application of semi-distributed hydrological models to large, heterogeneous watersheds deals with several problems. On one hand, the spatial and temporal variability in catchment features should be adequately represented in the model parameterization, while maintaining the model complexity in an acceptable level to take advantage of state-of-the-art calibration techniques. On the other hand, model complexity enhances uncertainty in adjusted model parameter values, therefore increasing uncertainty in the water routing across the watershed. This is critical for water quality applications, where not only streamflow, but also a reliable estimation of the surface versus subsurface contributions to the runoff is needed. In this study, we show how a regularized inversion procedure combined with a multiobjective function calibration strategy successfully solves the parameterization of a complex application of a water quality-oriented hydrological model. The final value of several optimized parameters showed significant and consistentdifferences across geological and landscape features. Although the number of optimized parameters was significantly increased by the spatial and temporal discretization of adjustable parameters, the uncertainty in water routing results remained at reasonable values. In addition, a stepwise numerical analysis showed that the effects on calibration performance due to inclusion of different data types in the objective function could be inextricably linked. Thus caution should be taken when adding or removing data from an aggregated objective function.
Resumo:
Cognitive radio networks (CRN) sense spectrum occupancy and manage themselves to operate in unused bands without disturbing licensed users. The detection capability of a radio system can be enhanced if the sensing process is performed jointly by a group of nodes so that the effects of wireless fading and shadowing can be minimized. However, taking a collaborative approach poses new security threats to the system as nodes can report false sensing data to force a wrong decision. Providing security to the sensing process is also complex, as it usually involves introducing limitations to the CRN applications. The most common limitation is the need for a static trusted node that is able to authenticate and merge the reports of all CRN nodes. This paper overcomes this limitation by presenting a protocol that is suitable for fully distributed scenarios, where there is no static trusted node.
Resumo:
Flood simulation studies use spatial-temporal rainfall data input into distributed hydrological models. A correct description of rainfall in space and in time contributes to improvements on hydrological modelling and design. This work is focused on the analysis of 2-D convective structures (rain cells), whose contribution is especially significant in most flood events. The objective of this paper is to provide statistical descriptors and distribution functions for convective structure characteristics of precipitation systems producing floods in Catalonia (NE Spain). To achieve this purpose heavy rainfall events recorded between 1996 and 2000 have been analysed. By means of weather radar, and applying 2-D radar algorithms a distinction between convective and stratiform precipitation is made. These data are introduced and analyzed with a GIS. In a first step different groups of connected pixels with convective precipitation are identified. Only convective structures with an area greater than 32 km2 are selected. Then, geometric characteristics (area, perimeter, orientation and dimensions of the ellipse), and rainfall statistics (maximum, mean, minimum, range, standard deviation, and sum) of these structures are obtained and stored in a database. Finally, descriptive statistics for selected characteristics are calculated and statistical distributions are fitted to the observed frequency distributions. Statistical analyses reveal that the Generalized Pareto distribution for the area and the Generalized Extreme Value distribution for the perimeter, dimensions, orientation and mean areal precipitation are the statistical distributions that best fit the observed ones of these parameters. The statistical descriptors and the probability distribution functions obtained are of direct use as an input in spatial rainfall generators.
Resumo:
The performance of a hydrologic model depends on the rainfall input data, both spatially and temporally. As the spatial distribution of rainfall exerts a great influence on both runoff volumes and peak flows, the use of a distributed hydrologic model can improve the results in the case of convective rainfall in a basin where the storm area is smaller than the basin area. The aim of this study was to perform a sensitivity analysis of the rainfall time resolution on the results of a distributed hydrologic model in a flash-flood prone basin. Within such a catchment, floods are produced by heavy rainfall events with a large convective component. A second objective of the current paper is the proposal of a methodology that improves the radar rainfall estimation at a higher spatial and temporal resolution. Composite radar data from a network of three C-band radars with 6-min temporal and 2 × 2 km2 spatial resolution were used to feed the RIBS distributed hydrological model. A modification of the Window Probability Matching Method (gauge-adjustment method) was applied to four cases of heavy rainfall to improve the observed rainfall sub-estimation by computing new Z/R relationships for both convective and stratiform reflectivities. An advection correction technique based on the cross-correlation between two consecutive images was introduced to obtain several time resolutions from 1 min to 30 min. The RIBS hydrologic model was calibrated using a probabilistic approach based on a multiobjective methodology for each time resolution. A sensitivity analysis of rainfall time resolution was conducted to find the resolution that best represents the hydrological basin behaviour.
Resumo:
Peer-reviewed