915 resultados para Data replication processes
Resumo:
Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, distributed storage systems have been considering techniques of data replication, migration, distribution, and access parallelism. However, the main drawback of those studies is that they do not take into account application behavior to perform data access optimization. This limitation motivated this paper which applies strategies to support the online prediction of application behavior in order to optimize data access operations on distributed systems, without requiring any information on past executions. In order to accomplish such a goal, this approach organizes application behaviors as time series and, then, analyzes and classifies those series according to their properties. By knowing properties, the approach selects modeling techniques to represent series and perform predictions, which are, later on, used to optimize data access operations. This new approach was implemented and evaluated using the OptorSim simulator, sponsored by the LHC-CERN project and widely employed by the scientific community. Experiments confirm this new approach reduces application execution time in about 50 percent, specially when handling large amounts of data.
Resumo:
Traceability is a concept that arose from the need for monitoring of production processes, this concept is usually used in sectors related to food production or activities involving some kind of direct risk to people. Agribusiness in the cotton industry does not have a comprehensive infrastructure for all stages of the processes involved in production. Map and define the data to enable traceability of products is synonymous to delegate responsibilities for all involved in the production, the collection of aggregate data on cotton production is done in stages and specific pre-defined since the choice of the variety through the processing, the scope of this article specifically addresses the production of lint cotton. The paper presents a proposal based on service oriented architecture (SOA) for data integration processes in the cotton industry, this proposal provide support for the implementation of platform independent solutions.
Resumo:
The discovery of the Cosmic Microwave Background (CMB) radiation in 1965 is one of the fundamental milestones supporting the Big Bang theory. The CMB is one of the most important source of information in cosmology. The excellent accuracy of the recent CMB data of WMAP and Planck satellites confirmed the validity of the standard cosmological model and set a new challenge for the data analysis processes and their interpretation. In this thesis we deal with several aspects and useful tools of the data analysis. We focus on their optimization in order to have a complete exploitation of the Planck data and contribute to the final published results. The issues investigated are: the change of coordinates of CMB maps using the HEALPix package, the problem of the aliasing effect in the generation of low resolution maps, the comparison of the Angular Power Spectrum (APS) extraction performances of the optimal QML method, implemented in the code called BolPol, and the pseudo-Cl method, implemented in Cromaster. The QML method has been then applied to the Planck data at large angular scales to extract the CMB APS. The same method has been applied also to analyze the TT parity and the Low Variance anomalies in the Planck maps, showing a consistent deviation from the standard cosmological model, the possible origins for this results have been discussed. The Cromaster code instead has been applied to the 408 MHz and 1.42 GHz surveys focusing on the analysis of the APS of selected regions of the synchrotron emission. The new generation of CMB experiments will be dedicated to polarization measurements, for which are necessary high accuracy devices for separating the polarizations. Here a new technology, called Photonic Crystals, is exploited to develop a new polarization splitter device and its performances are compared to the devices used nowadays.
Resumo:
Temperature sensitive (ts) mutant viruses have helped elucidate replication processes in many viral systems. Several panels of replication-defective ts mutants in which viral RNA synthesis is abolished at the nonpermissive temperature (RNA$\sp{-})$ have been isolated for Mouse Hepatitis Virus, MHV (Robb et al., 1979; Koolen et al., 1983; Martin et al., 1988; Schaad et al., 1990). However, no one had investigated genetic or phenotypic relationships between these different mutant panels. In order to determine how the panel of MHV-JHM RNA$\sp{-}$ ts mutants (Robb et al., 1979) were genetically related to other described MHV RNA$\sp{-}$ ts mutants, the MHV-JHM mutants were tested for complementation with representatives from two different sets of MHV-A59 ts mutants (Koolen et al., 1983; Schaad et al., 1990). The three ts mutant panels together were found to comprise eight genetically distinct complementation groups. Of these eight complementation groups, three complementation classes are unique to their particular mutant panel; genetically equivalent mutants were not observed within the other two mutant panels. Two complementation groups were common to all three mutant panels. The three remaining complementation groups overlapped two of the three mutant sets. Mutants MHV-JHM tsA204 and MHV-A59 ts261 were shown to be within one of these overlapping complementation groups. The phenotype of the MHV-JHM mutants within this complementation class has been previously characterized (Leibowitz et al., 1982; Leibowitz et al, 1990). When these mutants were grown at the permissive temperature, then shifted up to the nonpermissive temperature at the start of RNA synthesis, genome-length RNA and leader RNA fragments accumulated, but no subgenomic mRNA was synthesized. MHV-A59 ts261 produced leader RNA fragments identical to those observed with MHV-JHM tsA204. Thus, these two MHV RNA$\sp{-}$ ts mutants that were genetically equivalent by complementation testing were phenotypically similar as well. Recombination frequencies obtained from crosses of MHV-A59 ts261 with several of the gene 1 MHV-A59 mutants indicated that the causal mutation(s) of MHV-A59 ts261 was located near the overlapping junction of ORF1a and ORF1b, in the 3$\sp\prime$ end of ORF1a, or the 5$\sp\prime$ end of ORF1b. Sequence analysis of this junction and 1400 nucleotides into the 5$\sp\prime$ end of ORF1b of MHV-A59 ts261 revealed one nucleotide change from the wildtype MHV-A59. This substitution at nucleotide 13,598 (A to G) was a silent mutation in the ORF1a reading frame, but resulted in an amino acid change in ORF1b gene product (I to V). This amino acid change would be expressed only in the readthrough translation product produced upon successful ribosome frameshifting. A revertant of MHV-A59 ts261 (R2) also retained this guanidine residue, but had a second substitution at nucleotide 14,475 in ORF1b. This mutation results in the substitution of valine for an isoleucine.^ The data presented here suggest that the mutation in MHV-A59 ts261 (nucleotide 13,598) would be responsible for the MHV-JHM complementation group A phenotype. A second-site reversion at nucleotide 14,475 may correct this defect in the revertant. Sequencing of gene 1 immediately upstream of nucleotide 13,296 and downstream of nucleotide 15,010 must be conducted to test this hypothesis. ^
Resumo:
This dissertation contains four essays that all share a common purpose: developing new methodologies to exploit the potential of high-frequency data for the measurement, modeling and forecasting of financial assets volatility and correlations. The first two chapters provide useful tools for univariate applications while the last two chapters develop multivariate methodologies. In chapter 1, we introduce a new class of univariate volatility models named FloGARCH models. FloGARCH models provide a parsimonious joint model for low frequency returns and realized measures, and are sufficiently flexible to capture long memory as well as asymmetries related to leverage effects. We analyze the performances of the models in a realistic numerical study and on the basis of a data set composed of 65 equities. Using more than 10 years of high-frequency transactions, we document significant statistical gains related to the FloGARCH models in terms of in-sample fit, out-of-sample fit and forecasting accuracy compared to classical and Realized GARCH models. In chapter 2, using 12 years of high-frequency transactions for 55 U.S. stocks, we argue that combining low-frequency exogenous economic indicators with high-frequency financial data improves the ability of conditionally heteroskedastic models to forecast the volatility of returns, their full multi-step ahead conditional distribution and the multi-period Value-at-Risk. Using a refined version of the Realized LGARCH model allowing for time-varying intercept and implemented with realized kernels, we document that nominal corporate profits and term spreads have strong long-run predictive ability and generate accurate risk measures forecasts over long-horizon. The results are based on several loss functions and tests, including the Model Confidence Set. Chapter 3 is a joint work with David Veredas. We study the class of disentangled realized estimators for the integrated covariance matrix of Brownian semimartingales with finite activity jumps. These estimators separate correlations and volatilities. We analyze different combinations of quantile- and median-based realized volatilities, and four estimators of realized correlations with three synchronization schemes. Their finite sample properties are studied under four data generating processes, in presence, or not, of microstructure noise, and under synchronous and asynchronous trading. The main finding is that the pre-averaged version of disentangled estimators based on Gaussian ranks (for the correlations) and median deviations (for the volatilities) provide a precise, computationally efficient, and easy alternative to measure integrated covariances on the basis of noisy and asynchronous prices. Along these lines, a minimum variance portfolio application shows the superiority of this disentangled realized estimator in terms of numerous performance metrics. Chapter 4 is co-authored with Niels S. Hansen, Asger Lunde and Kasper V. Olesen, all affiliated with CREATES at Aarhus University. We propose to use the Realized Beta GARCH model to exploit the potential of high-frequency data in commodity markets. The model produces high quality forecasts of pairwise correlations between commodities which can be used to construct a composite covariance matrix. We evaluate the quality of this matrix in a portfolio context and compare it to models used in the industry. We demonstrate significant economic gains in a realistic setting including short selling constraints and transaction costs.
Resumo:
El proceso de toma de decisiones en las bibliotecas universitarias es de suma importancia, sin embargo, se encuentra complicaciones como la gran cantidad de fuentes de datos y los grandes volúmenes de datos a analizar. Las bibliotecas universitarias están acostumbradas a producir y recopilar una gran cantidad de información sobre sus datos y servicios. Las fuentes de datos comunes son el resultado de sistemas internos, portales y catálogos en línea, evaluaciones de calidad y encuestas. Desafortunadamente estas fuentes de datos sólo se utilizan parcialmente para la toma de decisiones debido a la amplia variedad de formatos y estándares, así como la falta de métodos eficientes y herramientas de integración. Este proyecto de tesis presenta el análisis, diseño e implementación del Data Warehouse, que es un sistema integrado de toma de decisiones para el Centro de Documentación Juan Bautista Vázquez. En primer lugar se presenta los requerimientos y el análisis de los datos en base a una metodología, esta metodología incorpora elementos claves incluyendo el análisis de procesos, la calidad estimada, la información relevante y la interacción con el usuario que influyen en una decisión bibliotecaria. A continuación, se propone la arquitectura y el diseño del Data Warehouse y su respectiva implementación la misma que soporta la integración, procesamiento y el almacenamiento de datos. Finalmente los datos almacenados se analizan a través de herramientas de procesamiento analítico y la aplicación de técnicas de Bibliomining ayudando a los administradores del centro de documentación a tomar decisiones óptimas sobre sus recursos y servicios.
Resumo:
This thesis presents a study of the Grid data access patterns in distributed analysis in the CMS experiment at the LHC accelerator. This study ranges from the deep analysis of the historical patterns of access to the most relevant data types in CMS, to the exploitation of a supervised Machine Learning classification system to set-up a machinery able to eventually predict future data access patterns - i.e. the so-called dataset “popularity” of the CMS datasets on the Grid - with focus on specific data types. All the CMS workflows run on the Worldwide LHC Computing Grid (WCG) computing centers (Tiers), and in particular the distributed analysis systems sustains hundreds of users and applications submitted every day. These applications (or “jobs”) access different data types hosted on disk storage systems at a large set of WLCG Tiers. The detailed study of how this data is accessed, in terms of data types, hosting Tiers, and different time periods, allows to gain precious insight on storage occupancy over time and different access patterns, and ultimately to extract suggested actions based on this information (e.g. targetted disk clean-up and/or data replication). In this sense, the application of Machine Learning techniques allows to learn from past data and to gain predictability potential for the future CMS data access patterns. Chapter 1 provides an introduction to High Energy Physics at the LHC. Chapter 2 describes the CMS Computing Model, with special focus on the data management sector, also discussing the concept of dataset popularity. Chapter 3 describes the study of CMS data access patterns with different depth levels. Chapter 4 offers a brief introduction to basic machine learning concepts and gives an introduction to its application in CMS and discuss the results obtained by using this approach in the context of this thesis.
Resumo:
Executive Summary The objective of this report was to use the Sydney Opera House as a case study of the application of Building Information Modelling (BIM). The Sydney opera House is a complex, large building with very irregular building configuration, that makes it a challenging test. A number of key concerns are evident at SOH: • the building structure is complex, and building service systems - already the major cost of ongoing maintenance - are undergoing technology change, with new computer based services becoming increasingly important. • the current “documentation” of the facility is comprised of several independent systems, some overlapping and is inadequate to service current and future services required • the building has reached a milestone age in terms of the condition and maintainability of key public areas and service systems, functionality of spaces and longer term strategic management. • many business functions such as space or event management require up-to-date information of the facility that are currently inadequately delivered, expensive and time consuming to update and deliver to customers. • major building upgrades are being planned that will put considerable strain on existing Facilities Portfolio services, and their capacity to manage them effectively While some of these concerns are unique to the House, many will be common to larger commercial and institutional portfolios. The work described here supported a complementary task which sought to identify if a building information model – an integrated building database – could be created, that would support asset & facility management functions (see Sydney Opera House – FM Exemplar Project, Report Number: 2005-001-C-4 Building Information Modelling for FM at Sydney Opera House), a business strategy that has been well demonstrated. The development of the BIMSS - Open Specification for BIM has been surprisingly straightforward. The lack of technical difficulties in converting the House’s existing conventions and standards to the new model based environment can be related to three key factors: • SOH Facilities Portfolio – the internal group responsible for asset and facility management - have already well established building and documentation policies in place. The setting and adherence to well thought out operational standards has been based on the need to create an environment that is understood by all users and that addresses the major business needs of the House. • The second factor is the nature of the IFC Model Specification used to define the BIM protocol. The IFC standard is based on building practice and nomenclature, widely used in the construction industries across the globe. For example the nomenclature of building parts – eg ifcWall, corresponds to our normal terminology, but extends the traditional drawing environment currently used for design and documentation. This demonstrates that the international IFC model accurately represents local practice for building data representation and management. • a BIM environment sets up opportunities for innovative processes that can exploit the rich data in the model and improve services and functions for the House: for example several high-level processes have been identified that could benefit from standardized Building Information Models such as maintenance processes using engineering data, business processes using scheduling, venue access, security data and benchmarking processes using building performance data. The new technology matches business needs for current and new services. The adoption of IFC compliant applications opens the way forward for shared building model collaboration and new processes, a significant new focus of the BIM standards. In summary, SOH current building standards have been successfully drafted for a BIM environment and are confidently expected to be fully developed when BIM is adopted operationally by SOH. These BIM standards and their application to the Opera House are intended as a template for other organisations to adopt for the own procurement and facility management activities. Appendices provide an overview of the IFC Integrated Object Model and an understanding IFC Model Data.
Resumo:
Objectives: This methodological paper reports on the development and validation of a work sampling instrument and data collection processes to conduct a national study of nurse practitioners’ work patterns. ---------- Design: Published work sampling instruments provided the basis for development and validation of a tool for use in a national study of nurse practitioner work activities across diverse contextual and clinical service models. Steps taken in the approach included design of a nurse practitioner-specific data collection tool and development of an innovative web-based program to train and establish inter rater reliability of a team of data collectors who were geographically dispersed across metropolitan, rural and remote health care settings. ---------- Setting: The study is part of a large funded study into nurse practitioner service. The Australian Nurse Practitioner Study is a national study phased over three years and was designed to provide essential information for Australian health service planners, regulators and consumer groups on the profile, process and outcome of nurse practitioner service. ---------- Results: The outcome if this phase of the study is empirically tested instruments, process and training materials for use in an international context by investigators interested in conducting a national study of nurse practitioner work practices. ---------- Conclusion: Development and preparation of a new approach to describing nurse practitioner practices using work sampling methods provides the groundwork for international collaboration in evaluation of nurse practitioner service.
Resumo:
The term “cloud computing” has emerged as a major ICT trend and has been acknowledged by respected industry survey organizations as a key technology and market development theme for the industry and ICT users in 2010. However, one of the major challenges that faces the cloud computing concept and its global acceptance is how to secure and protect the data and processes that are the property of the user. The security of the cloud computing environment is a new research area requiring further development by both the academic and industrial research communities. Today, there are many diverse and uncoordinated efforts underway to address security issues in cloud computing and, especially, the identity management issues. This paper introduces an architecture for a new approach to necessary “mutual protection” in the cloud computing environment, based upon a concept of mutual trust and the specification of definable profiles in vector matrix form. The architecture aims to achieve better, more generic and flexible authentication, authorization and control, based on a concept of mutuality, within that cloud computing environment.
Resumo:
Climate change presents risks to health that must be addressed by both decision-makers and public health researchers. Within the application of Environmental Health Impact Assessment (EHIA), there have been few attempts to incorporate climate change-related health risks as an input to the framework. This study used a focus group design to examine the perceptions of government, industry and academic specialists about the suitability of assessing the health consequences of climate change within an EHIA framework. Practitioners expressed concern over a number of factors relating to the current EHIA methodology and the inclusion of climate change-related health risks. These concerns related to the broad scope of issues that would need to be considered, problems with identifying appropriate health indicators, the lack of relevant qualitative information that is currently incorporated in assessment and persistent issues surrounding stakeholder participation. It was suggested that improvements are needed in data collection processes, particularly in terms of adequate communication between environmental and health practitioners. Concerns were raised surrounding data privacy and usage, and how these could impact on the assessment process. These findings may provide guidance for government and industry bodies to improve the assessment of climate change-related health risks.
Resumo:
Grounded theory, first developed by Glaser and Strauss in the 1960s, was introduced into nursing education as a distinct research methodology in the 1970s. The theory is grounded in a critique of the dominant contemporary approach to social inquiry, which imposed "enduring" theoretical propositions onto study data. Rather than starting from a set theoretical framework, grounded theory relies on researchers distinguishing meaningful constructs from generated data and then identifying an appropriate theory. Grounded theory is thus particularly useful in investigating complex issues and behaviours not previously addressed and concepts and relationships in particular populations or places that are still undeveloped or weakly connected. Grounded theory data analysis processes include open, axial and selective coding levels. The purpose of this article was to explore the grounded theory research process and provide an initial understanding of this methodology.
Resumo:
This thesis opens up the design space for awareness research in CSCW and HCI. By challenging the prevalent understanding of roles in awareness processes and exploring different mechanisms for actively engaging users in the awareness process, this thesis provides a better understanding of the complexity of these processes and suggests practical solutions for designing and implementing systems that support active awareness. Mutual awareness, a prominent research topic in the fields of Computer-Supported Cooperative Work (CSCW) and Human-Computer Interaction (HCI) refers to a fundamental aspect of a person’s work: their ability to gain a better understanding of a situation by perceiving and interpreting their co-workers actions. Technologically-mediated awareness, used to support co-workers across distributed settings, distinguishes between the roles of the actor, whose actions are often limited to being the target of an automated data gathering processes, and the receiver, who wants to be made aware of the actors’ actions. This receiver-centric view of awareness, focusing on helping receivers to deal with complex sets of awareness information, stands in stark contrast to our understanding of awareness as social process involving complex interactions between both actors and receivers. It fails to take into account an actors’ intimate understanding of their own activities and the contribution that this subjective understanding could make in providing richer awareness information. In this thesis I challenge the prevalent receiver-centric notion of awareness, and explore the conceptual foundations, design, implementation and evaluation of an alternative active awareness approach by making the following five contributions. Firstly, I identify the limitations of existing awareness research and solicit further evidence to support the notion of active awareness. I analyse ethnographic workplace studies that demonstrate how actors engage in an intricate interplay involving the monitoring of their co-workers progress and displaying aspects of their activities that may be of relevance to others. The examination of a large body of awareness research reveals that while disclosing information is a common practice in face-to-face collaborative settings it has been neglected in implementations of technically mediated awareness. Based on these considerations, I introduce the notion of intentional disclosure to describe the action of users actively and deliberately contributing awareness information. I consider challenges and potential solutions for the design of active awareness. I compare a range of systems, each allowing users to share information about their activities at various levels of detail. I discuss one of the main challenges to active awareness: that disclosing information about activities requires some degree of effort. I discuss various representations of effort in collaborative work. These considerations reveal that there is a trade-off between the richness of awareness information and the effort required to provide this information. I propose a framework for active awareness, aimed to help designers to understand the scope and limitations of different types of intentional disclosure. I draw on the identified richness/effort trade-off to develop two types of intentional disclosure, both of which aim to facilitate the disclosure of information while reducing the effort required to do so. For both of these approaches, direct and indirect disclosure, I delineate how they differ from related approaches and define a set of design criteria that is intended to guide their implementation. I demonstrate how the framework of active awareness can be practically applied by building two proof-of-concept prototypes that implement direct and indirect disclosure respectively. AnyBiff, implementing direct disclosure, allows users to create, share and use shared representations of activities in order to express their current actions and intentions. SphereX, implementing indirect disclosure, represents shared areas of interests or working context, and links sets of activities to these representations. Lastly, I present the results of the qualitative evaluation of the two prototypes and analyse the results with regard to the extent to which they implemented their respective disclosure mechanisms and supported active awareness. Both systems were deployed and tested in real world environments. The results for AnyBiff showed that users developed a wide range of activity representations, some unanticipated, and actively used the system to disclose information. The results further highlighted a number of design considerations relating to the relationship between awareness and communication, and the role of ambiguity. The evaluation of SphereX validated the feasibility of the indirect disclosure approach. However, the study highlighted the challenges of implementing cross-application awareness support and translating the concept to users. The study resulted in design recommendations aimed to improve the implementation of future systems.
Resumo:
Vertebral fracture risk is a heritable complex trait. The aim of this study was to identify genetic susceptibility factors for osteoporotic vertebral fractures applying a genome-wide association study (GWAS) approach. The GWAS discovery was based on the Rotterdam Study, a population-based study of elderly Dutch individuals aged >55years; and comprising 329 cases and 2666 controls with radiographic scoring (McCloskey-Kanis) and genetic data. Replication of one top-associated SNP was pursued by de-novo genotyping of 15 independent studies across Europe, the United States, and Australia and one Asian study. Radiographic vertebral fracture assessment was performed using McCloskey-Kanis or Genant semi-quantitative definitions. SNPs were analyzed in relation to vertebral fracture using logistic regression models corrected for age and sex. Fixed effects inverse variance and Han-Eskin alternative random effects meta-analyses were applied. Genome-wide significance was set at p<5×10-8. In the discovery, a SNP (rs11645938) on chromosome 16q24 was associated with the risk for vertebral fractures at p=4.6×10-8. However, the association was not significant across 5720 cases and 21,791 controls from 14 studies. Fixed-effects meta-analysis summary estimate was 1.06 (95% CI: 0.98-1.14; p=0.17), displaying high degree of heterogeneity (I2=57%; Qhet p=0.0006). Under Han-Eskin alternative random effects model the summary effect was significant (p=0.0005). The SNP maps to a region previously found associated with lumbar spine bone mineral density (LS-BMD) in two large meta-analyses from the GEFOS consortium. A false positive association in the GWAS discovery cannot be excluded, yet, the low-powered setting of the discovery and replication settings (appropriate to identify risk effect size >1.25) may still be consistent with an effect size <1.10, more of the type expected in complex traits. Larger effort in studies with standardized phenotype definitions is needed to confirm or reject the involvement of this locus on the risk for vertebral fractures.