956 resultados para Data sets
Resumo:
By the end of the 19th century, geodesy has contributed greatly to the knowledge of regional tectonics and fault movement through its ability to measure, at sub-centimetre precision, the relative positions of points on the Earth’s surface. Nowadays the systematic analysis of geodetic measurements in active deformation regions represents therefore one of the most important tool in the study of crustal deformation over different temporal scales [e.g., Dixon, 1991]. This dissertation focuses on motion that can be observed geodetically with classical terrestrial position measurements, particularly triangulation and leveling observations. The work is divided into two sections: an overview of the principal methods for estimating longterm accumulation of elastic strain from terrestrial observations, and an overview of the principal methods for rigorously inverting surface coseismic deformation fields for source geometry with tests on synthetic deformation data sets and applications in two different tectonically active regions of the Italian peninsula. For the long-term accumulation of elastic strain analysis, triangulation data were available from a geodetic network across the Messina Straits area (southern Italy) for the period 1971 – 2004. From resulting angle changes, the shear strain rates as well as the orientation of the principal axes of the strain rate tensor were estimated. The computed average annual shear strain rates for the time period between 1971 and 2004 are γ˙1 = 113.89 ± 54.96 nanostrain/yr and γ˙2 = -23.38 ± 48.71 nanostrain/yr, with the orientation of the most extensional strain (θ) at N140.80° ± 19.55°E. These results suggests that the first-order strain field of the area is dominated by extension in the direction perpendicular to the trend of the Straits, sustaining the hypothesis that the Messina Straits could represents an area of active concentrated deformation. The orientation of θ agree well with GPS deformation estimates, calculated over shorter time interval, and is consistent with previous preliminary GPS estimates [D’Agostino and Selvaggi, 2004; Serpelloni et al., 2005] and is also similar to the direction of the 1908 (MW 7.1) earthquake slip vector [e.g., Boschi et al., 1989; Valensise and Pantosti, 1992; Pino et al., 2000; Amoruso et al., 2002]. Thus, the measured strain rate can be attributed to an active extension across the Messina Straits, corresponding to a relative extension rate ranges between < 1mm/yr and up to ~ 2 mm/yr, within the portion of the Straits covered by the triangulation network. These results are consistent with the hypothesis that the Messina Straits is an important active geological boundary between the Sicilian and the Calabrian domains and support previous preliminary GPS-based estimates of strain rates across the Straits, which show that the active deformation is distributed along a greater area. Finally, the preliminary dislocation modelling has shown that, although the current geodetic measurements do not resolve the geometry of the dislocation models, they solve well the rate of interseismic strain accumulation across the Messina Straits and give useful information about the locking the depth of the shear zone. Geodetic data, triangulation and leveling measurements of the 1976 Friuli (NE Italy) earthquake, were available for the inversion of coseismic source parameters. From observed angle and elevation changes, the source parameters of the seismic sequence were estimated in a join inversion using an algorithm called “simulated annealing”. The computed optimal uniform–slip elastic dislocation model consists of a 30° north-dipping shallow (depth 1.30 ± 0.75 km) fault plane with azimuth of 273° and accommodating reverse dextral slip of about 1.8 m. The hypocentral location and inferred fault plane of the main event are then consistent with the activation of Periadriatic overthrusts or other related thrust faults as the Gemona- Kobarid thrust. Then, the geodetic data set exclude the source solution of Aoudia et al. [2000], Peruzza et al. [2002] and Poli et al. [2002] that considers the Susans-Tricesimo thrust as the May 6 event. The best-fit source model is then more consistent with the solution of Pondrelli et al. [2001], which proposed the activation of other thrusts located more to the North of the Susans-Tricesimo thrust, probably on Periadriatic related thrust faults. The main characteristics of the leveling and triangulation data are then fit by the optimal single fault model, that is, these results are consistent with a first-order rupture process characterized by a progressive rupture of a single fault system. A single uniform-slip fault model seems to not reproduce some minor complexities of the observations, and some residual signals that are not modelled by the optimal single-fault plane solution, were observed. In fact, the single fault plane model does not reproduce some minor features of the leveling deformation field along the route 36 south of the main uplift peak, that is, a second fault seems to be necessary to reproduce these residual signals. By assuming movements along some mapped thrust located southward of the inferred optimal single-plane solution, the residual signal has been successfully modelled. In summary, the inversion results presented in this Thesis, are consistent with the activation of some Periadriatic related thrust for the main events of the sequence, and with a minor importance of the southward thrust systems of the middle Tagliamento plain.
Resumo:
Data deduplication describes a class of approaches that reduce the storage capacity needed to store data or the amount of data that has to be transferred over a network. These approaches detect coarse-grained redundancies within a data set, e.g. a file system, and remove them.rnrnOne of the most important applications of data deduplication are backup storage systems where these approaches are able to reduce the storage requirements to a small fraction of the logical backup data size.rnThis thesis introduces multiple new extensions of so-called fingerprinting-based data deduplication. It starts with the presentation of a novel system design, which allows using a cluster of servers to perform exact data deduplication with small chunks in a scalable way.rnrnAfterwards, a combination of compression approaches for an important, but often over- looked, data structure in data deduplication systems, so called block and file recipes, is introduced. Using these compression approaches that exploit unique properties of data deduplication systems, the size of these recipes can be reduced by more than 92% in all investigated data sets. As file recipes can occupy a significant fraction of the overall storage capacity of data deduplication systems, the compression enables significant savings.rnrnA technique to increase the write throughput of data deduplication systems, based on the aforementioned block and file recipes, is introduced next. The novel Block Locality Caching (BLC) uses properties of block and file recipes to overcome the chunk lookup disk bottleneck of data deduplication systems. This chunk lookup disk bottleneck either limits the scalability or the throughput of data deduplication systems. The presented BLC overcomes the disk bottleneck more efficiently than existing approaches. Furthermore, it is shown that it is less prone to aging effects.rnrnFinally, it is investigated if large HPC storage systems inhibit redundancies that can be found by fingerprinting-based data deduplication. Over 3 PB of HPC storage data from different data sets have been analyzed. In most data sets, between 20 and 30% of the data can be classified as redundant. According to these results, future work in HPC storage systems should further investigate how data deduplication can be integrated into future HPC storage systems.rnrnThis thesis presents important novel work in different area of data deduplication re- search.
Resumo:
BACKGROUND: Physiologic data display is essential to decision making in critical care. Current displays echo first-generation hemodynamic monitors dating to the 1970s and have not kept pace with new insights into physiology or the needs of clinicians who must make progressively more complex decisions about their patients. The effectiveness of any redesign must be tested before deployment. Tools that compare current displays with novel presentations of processed physiologic data are required. Regenerating conventional physiologic displays from archived physiologic data is an essential first step. OBJECTIVES: The purposes of the study were to (1) describe the SSSI (single sensor single indicator) paradigm that is currently used for physiologic signal displays, (2) identify and discuss possible extensions and enhancements of the SSSI paradigm, and (3) develop a general approach and a software prototype to construct such "extended SSSI displays" from raw data. RESULTS: We present Multi Wave Animator (MWA) framework-a set of open source MATLAB (MathWorks, Inc., Natick, MA, USA) scripts aimed to create dynamic visualizations (eg, video files in AVI format) of patient vital signs recorded from bedside (intensive care unit or operating room) monitors. Multi Wave Animator creates animations in which vital signs are displayed to mimic their appearance on current bedside monitors. The source code of MWA is freely available online together with a detailed tutorial and sample data sets.
Resumo:
This paper introduces a novel approach to making inference about the regression parameters in the accelerated failure time (AFT) model for current status and interval censored data. The estimator is constructed by inverting a Wald type test for testing a null proportional hazards model. A numerically efficient Markov chain Monte Carlo (MCMC) based resampling method is proposed to simultaneously obtain the point estimator and a consistent estimator of its variance-covariance matrix. We illustrate our approach with interval censored data sets from two clinical studies. Extensive numerical studies are conducted to evaluate the finite sample performance of the new estimators.
Resumo:
We propose a novel class of models for functional data exhibiting skewness or other shape characteristics that vary with spatial or temporal location. We use copulas so that the marginal distributions and the dependence structure can be modeled independently. Dependence is modeled with a Gaussian or t-copula, so that there is an underlying latent Gaussian process. We model the marginal distributions using the skew t family. The mean, variance, and shape parameters are modeled nonparametrically as functions of location. A computationally tractable inferential framework for estimating heterogeneous asymmetric or heavy-tailed marginal distributions is introduced. This framework provides a new set of tools for increasingly complex data collected in medical and public health studies. Our methods were motivated by and are illustrated with a state-of-the-art study of neuronal tracts in multiple sclerosis patients and healthy controls. Using the tools we have developed, we were able to find those locations along the tract most affected by the disease. However, our methods are general and highly relevant to many functional data sets. In addition to the application to one-dimensional tract profiles illustrated here, higher-dimensional extensions of the methodology could have direct applications to other biological data including functional and structural MRI.
Resumo:
Functional neuroimaging techniques enable investigations into the neural basis of human cognition, emotions, and behaviors. In practice, applications of functional magnetic resonance imaging (fMRI) have provided novel insights into the neuropathophysiology of major psychiatric,neurological, and substance abuse disorders, as well as into the neural responses to their treatments. Modern activation studies often compare localized task-induced changes in brain activity between experimental groups. One may also extend voxel-level analyses by simultaneously considering the ensemble of voxels constituting an anatomically defined region of interest (ROI) or by considering means or quantiles of the ROI. In this work we present a Bayesian extension of voxel-level analyses that offers several notable benefits. First, it combines whole-brain voxel-by-voxel modeling and ROI analyses within a unified framework. Secondly, an unstructured variance/covariance for regional mean parameters allows for the study of inter-regional functional connectivity, provided enough subjects are available to allow for accurate estimation. Finally, an exchangeable correlation structure within regions allows for the consideration of intra-regional functional connectivity. We perform estimation for our model using Markov Chain Monte Carlo (MCMC) techniques implemented via Gibbs sampling which, despite the high throughput nature of the data, can be executed quickly (less than 30 minutes). We apply our Bayesian hierarchical model to two novel fMRI data sets: one considering inhibitory control in cocaine-dependent men and the second considering verbal memory in subjects at high risk for Alzheimer’s disease. The unifying hierarchical model presented in this manuscript is shown to enhance the interpretation content of these data sets.
Resumo:
PURPOSE: To describe the implementation and use of an electronic patient-referral system as an aid to the efficient referral of patients to a remote and specialized treatment center. METHODS AND MATERIALS: A system for the exchange of radiotherapy data between different commercial planning systems and a specially developed planning system for proton therapy has been developed through the use of the PAPYRUS diagnostic image standard as an intermediate format. To ensure the cooperation of the different TPS manufacturers, the number of data sets defined for transfer has been restricted to the three core data sets of CT, VOIs, and three-dimensional dose distributions. As a complement to the exchange of data, network-wide application-sharing (video-conferencing) technologies have been adopted to provide methods for the interactive discussion and assessment of treatments plans with one or more partner clinics. RESULTS: Through the use of evaluation plans based on the exchanged data, referring clinics can accurately assess the advantages offered by proton therapy on a patient-by-patient basis, while the practicality or otherwise of the proposed treatments can simultaneously be assessed by the proton therapy center. Such a system, along with the interactive capabilities provided by video-conferencing methods, has been found to be an efficient solution to the problem of patient assessment and selection at a specialized treatment center, and is a necessary first step toward the full electronic integration of such centers with their remotely situated referral centers.
Resumo:
BACKGROUND: Gene expression analysis has emerged as a major biological research area, with real-time quantitative reverse transcription PCR (RT-QPCR) being one of the most accurate and widely used techniques for expression profiling of selected genes. In order to obtain results that are comparable across assays, a stable normalization strategy is required. In general, the normalization of PCR measurements between different samples uses one to several control genes (e.g. housekeeping genes), from which a baseline reference level is constructed. Thus, the choice of the control genes is of utmost importance, yet there is not a generally accepted standard technique for screening a large number of candidates and identifying the best ones. RESULTS: We propose a novel approach for scoring and ranking candidate genes for their suitability as control genes. Our approach relies on publicly available microarray data and allows the combination of multiple data sets originating from different platforms and/or representing different pathologies. The use of microarray data allows the screening of tens of thousands of genes, producing very comprehensive lists of candidates. We also provide two lists of candidate control genes: one which is breast cancer-specific and one with more general applicability. Two genes from the breast cancer list which had not been previously used as control genes are identified and validated by RT-QPCR. Open source R functions are available at http://www.isrec.isb-sib.ch/~vpopovic/research/ CONCLUSION: We proposed a new method for identifying candidate control genes for RT-QPCR which was able to rank thousands of genes according to some predefined suitability criteria and we applied it to the case of breast cancer. We also empirically showed that translating the results from microarray to PCR platform was achievable.
Resumo:
A combinatorial protocol (CP) is introduced here to interface it with the multiple linear regression (MLR) for variable selection. The efficiency of CP-MLR is primarily based on the restriction of entry of correlated variables to the model development stage. It has been used for the analysis of Selwood et al data set [16], and the obtained models are compared with those reported from GFA [8] and MUSEUM [9] approaches. For this data set CP-MLR could identify three highly independent models (27, 28 and 31) with Q2 value in the range of 0.632-0.518. Also, these models are divergent and unique. Even though, the present study does not share any models with GFA [8], and MUSEUM [9] results, there are several descriptors common to all these studies, including the present one. Also a simulation is carried out on the same data set to explain the model formation in CP-MLR. The results demonstrate that the proposed method should be able to offer solutions to data sets with 50 to 60 descriptors in reasonable time frame. By carefully selecting the inter-parameter correlation cutoff values in CP-MLR one can identify divergent models and handle data sets larger than the present one without involving excessive computer time.
A global historical ozone data set and prominent features of stratospheric variability prior to 1979
Resumo:
We present a vertically resolved zonal mean monthly mean global ozone data set spanning the period 1901 to 2007, called HISTOZ.1.0. It is based on a new approach that combines information from an ensemble of chemistry climate model (CCM) simulations with historical total column ozone information. The CCM simulations incorporate important external drivers of stratospheric chemistry and dynamics (in particular solar and volcanic effects, greenhouse gases and ozone depleting substances, sea surface temperatures, and the quasi-biennial oscillation). The historical total column ozone observations include ground-based measurements from the 1920s onward and satellite observations from 1970 to 1976. An off-line data assimilation approach is used to combine model simulations, observations, and information on the observation error. The period starting in 1979 was used for validation with existing ozone data sets and therefore only ground-based measurements were assimilated. Results demonstrate considerable skill from the CCM simulations alone. Assimilating observations provides additional skill for total column ozone. With respect to the vertical ozone distribution, assimilating observations increases on average the correlation with a reference data set, but does not decrease the mean squared error. Analyses of HISTOZ.1.0 with respect to the effects of El Niño–Southern Oscillation (ENSO) and of the 11 yr solar cycle on stratospheric ozone from 1934 to 1979 qualitatively confirm previous studies that focussed on the post-1979 period. The ENSO signature exhibits a much clearer imprint of a change in strength of the Brewer–Dobson circulation compared to the post-1979 period. The imprint of the 11 yr solar cycle is slightly weaker in the earlier period. Furthermore, the total column ozone increase from the 1950s to around 1970 at northern mid-latitudes is briefly discussed. Indications for contributions of a tropospheric ozone increase, greenhouse gases, and changes in atmospheric circulation are found. Finally, the paper points at several possible future improvements of HISTOZ.1.0.
Resumo:
The motion of lung tumors during respiration makes the accurate delivery of radiation therapy to the thorax difficult because it increases the uncertainty of target position. The adoption of four-dimensional computed tomography (4D-CT) has allowed us to determine how a tumor moves with respiration for each individual patient. Using information acquired during a 4D-CT scan, we can define the target, visualize motion, and calculate dose during the planning phase of the radiotherapy process. One image data set that can be created from the 4D-CT acquisition is the maximum-intensity projection (MIP). The MIP can be used as a starting point to define the volume that encompasses the motion envelope of the moving gross target volume (GTV). Because of the close relationship that exists between the MIP and the final target volume, we investigated four MIP data sets created with different methodologies (3 using various 4D-CT sorting implementations, and one using all available cine CT images) to compare target delineation. It has been observed that changing the 4D-CT sorting method will lead to the selection of a different collection of images; however, the clinical implications of changing the constituent images on the resultant MIP data set are not clear. There has not been a comprehensive study that compares target delineation based on different 4D-CT sorting methodologies in a patient population. We selected a collection of patients who had previously undergone thoracic 4D-CT scans at our institution, and who had lung tumors that moved at least 1 cm. We then generated the four MIP data sets and automatically contoured the target volumes. In doing so, we identified cases in which the MIP generated from a 4D-CT sorting process under-represented the motion envelope of the target volume by more than 10% than when measured on the MIP generated from all of the cine CT images. The 4D-CT methods suffered from duplicate image selection and might not choose maximum extent images. Based on our results, we suggest utilization of a MIP generated from the full cine CT data set to ensure a representative inclusive tumor extent, and to avoid geometric miss.
Resumo:
INTRODUCTION Even though arthroplasty of the ankle joint is considered to be an established procedure, only about 1,300 endoprostheses are implanted in Germany annually. Arthrodeses of the ankle joint are performed almost three times more often. This may be due to the availability of the procedure - more than twice as many providers perform arthrodesis - as well as the postulated high frequency of revision procedures of arthroplasties in the literature. In those publications, however, there is often no clear differentiation between revision surgery with exchange of components, subsequent interventions due to complications and subsequent surgery not associated with complications. The German Orthopaedic Foot and Ankle Association's (D. A. F.) registry for total ankle replacement collects data pertaining to perioperative complications as well as cause, nature and extent of the subsequent interventions, and postoperative patient satisfaction. MATERIAL AND METHODS The D. A. F.'s total ankle replacement register is a nation-wide, voluntary registry. After giving written informed consent, the patients can be added to the database by participating providers. Data are collected during hospital stay for surgical treatment, during routine follow-up inspections and in the context of revision surgery. The information can be submitted in paper-based or online formats. The survey instruments are available as minimum data sets or scientific questionnaires which include patient-reported outcome measures (PROMs). The pseudonymous clinical data are collected and evaluated at the Institute for Evaluative Research in Medicine, University of Bern/Switzerland (IEFM). The patient-related data remain on the register's module server in North Rhine-Westphalia, Germany. The registry's methodology as well as the results of the revisions and patient satisfaction for 115 patients with a two year follow-up period are presented. Statistical analyses are performed with SAS™ (Version 9.4, SAS Institute, Inc., Cary, NC, USA). RESULTS About 2½ years after the register was launched there are 621 datasets on primary implantations, 1,427 on follow-ups and 121 records on re-operation available. 49 % of the patients received their implants due to post-traumatic osteoarthritis, 27 % because of a primary osteoarthritis and 15 % of patients suffered from a rheumatic disease. More than 90 % of the primary interventions proceeded without complications. Subsequent interventions were recorded for 84 patients, which corresponds to a rate of 13.5 % with respect to the primary implantations. It should be noted that these secondary procedures also include two-stage procedures not due to a complication. "True revisions" are interventions with exchange of components due to mechanical complications and/or infection and were present in 7.6 % of patients. 415 of the patients commented on their satisfaction with the operative result during the last follow-up: 89.9 % of patients evaluate their outcome as excellent or good, 9.4 % as moderate and only 0.7 % (3 patients) as poor. In these three cases a component loosening or symptomatic USG osteoarthritis was present. Two-year follow-up data using the American Orthopedic Foot and Ankle Society Ankle and Hindfoot Scale (AOFAS-AHS) are already available for 115 patients. The median AOFAS-AHS score increased from 33 points preoperatively to more than 80 points three to six months postoperatively. This increase remained nearly constant over the entire two-year follow-up period. CONCLUSION Covering less than 10 % of the approximately 240 providers in Germany and approximately 12 % of the annually implanted total ankle-replacements, the D. A. F.-register is still far from being seen as a national registry. Nevertheless, geographical coverage and inclusion of "high-" (more than 100 total ankle replacements a year) and "low-volume surgeons" (less than 5 total ankle replacements a year) make the register representative for Germany. The registry data show that the number of subsequent interventions and in particular the "true revision" procedures are markedly lower than the 20 % often postulated in the literature. In addition, a high level of patient satisfaction over the short and medium term is recorded. From the perspective of the authors, these results indicate that total ankle arthroplasty - given a correct indication and appropriate selection of patients - is not inferior to an ankle arthrodesis concerning patients' satisfaction and function. First valid survival rates can be expected about 10 years after the register's start.
Resumo:
The OPERA detector, designed to search for νμ → ντ oscillations in the CNGS beam, is located in the underground Gran Sasso laboratory, a privileged location to study TeV-scale cosmic rays. For the analysis here presented, the detector was used to measure the atmospheric muon charge ratio in the TeV region. OPERA collected chargeseparated cosmic ray data between 2008 and 2012. More than 3 million atmospheric muon events were detected and reconstructed, among which about 110000 multiple muon bundles. The charge ratio Rμ ≡ Nμ+/Nμ− was measured separately for single and for multiple muon events. The analysis exploited the inversion of the magnet polarity which was performed on purpose during the 2012 Run. The combination of the two data sets with opposite magnet polarities allowedminimizing systematic uncertainties and reaching an accurate determination of the muon charge ratio. Data were fitted to obtain relevant parameters on the composition of primary cosmic rays and the associated kaon production in the forward fragmentation region. In the surface energy range 1–20 TeV investigated by OPERA, Rμ is well described by a parametric model including only pion and kaon contributions to themuon flux, showing no significant contribution of the prompt component. The energy independence supports the validity of Feynman scaling in the fragmentation region up to 200 TeV/nucleon primary energy.
Resumo:
BACKGROUND Record linkage of existing individual health care data is an efficient way to answer important epidemiological research questions. Reuse of individual health-related data faces several problems: Either a unique personal identifier, like social security number, is not available or non-unique person identifiable information, like names, are privacy protected and cannot be accessed. A solution to protect privacy in probabilistic record linkages is to encrypt these sensitive information. Unfortunately, encrypted hash codes of two names differ completely if the plain names differ only by a single character. Therefore, standard encryption methods cannot be applied. To overcome these challenges, we developed the Privacy Preserving Probabilistic Record Linkage (P3RL) method. METHODS In this Privacy Preserving Probabilistic Record Linkage method we apply a three-party protocol, with two sites collecting individual data and an independent trusted linkage center as the third partner. Our method consists of three main steps: pre-processing, encryption and probabilistic record linkage. Data pre-processing and encryption are done at the sites by local personnel. To guarantee similar quality and format of variables and identical encryption procedure at each site, the linkage center generates semi-automated pre-processing and encryption templates. To retrieve information (i.e. data structure) for the creation of templates without ever accessing plain person identifiable information, we introduced a novel method of data masking. Sensitive string variables are encrypted using Bloom filters, which enables calculation of similarity coefficients. For date variables, we developed special encryption procedures to handle the most common date errors. The linkage center performs probabilistic record linkage with encrypted person identifiable information and plain non-sensitive variables. RESULTS In this paper we describe step by step how to link existing health-related data using encryption methods to preserve privacy of persons in the study. CONCLUSION Privacy Preserving Probabilistic Record linkage expands record linkage facilities in settings where a unique identifier is unavailable and/or regulations restrict access to the non-unique person identifiable information needed to link existing health-related data sets. Automated pre-processing and encryption fully protect sensitive information ensuring participant confidentiality. This method is suitable not just for epidemiological research but also for any setting with similar challenges.
Resumo:
A wide variety of spatial data collection efforts are ongoing throughout local, state and federal agencies, private firms and non-profit organizations. Each effort is established for a different purpose but organizations and individuals often collect and maintain the same or similar information. The United States federal government has undertaken many initiatives such as the National Spatial Data Infrastructure, the National Map and Geospatial One-Stop to reduce duplicative spatial data collection and promote the coordinated use, sharing, and dissemination of spatial data nationwide. A key premise in most of these initiatives is that no national government will be able to gather and maintain more than a small percentage of the geographic data that users want and desire. Thus, national initiatives depend typically on the cooperation of those already gathering spatial data and those using GIs to meet specific needs to help construct and maintain these spatial data infrastructures and geo-libraries for their nations (Onsrud 2001). Some of the impediments to widespread spatial data sharing are well known from directly asking GIs data producers why they are not currently involved in creating datasets that are of common or compatible formats, documenting their datasets in a standardized metadata format or making their datasets more readily available to others through Data Clearinghouses or geo-libraries. The research described in this thesis addresses the impediments to wide-scale spatial data sharing faced by GIs data producers and explores a new conceptual data-sharing approach, the Public Commons for Geospatial Data, that supports user-friendly metadata creation, open access licenses, archival services and documentation of parent lineage of the contributors and value- adders of digital spatial data sets.