934 resultados para Data anonymization and sanitization
Resumo:
In this paper, we propose nonlinear elliptical models for correlated data with heteroscedastic and/or autoregressive structures. Our aim is to extend the models proposed by Russo et al. [22] by considering a more sophisticated scale structure to deal with variations in data dispersion and/or a possible autocorrelation among measurements taken throughout the same experimental unit. Moreover, to avoid the possible influence of outlying observations or to take into account the non-normal symmetric tails of the data, we assume elliptical contours for the joint distribution of random effects and errors, which allows us to attribute different weights to the observations. We propose an iterative algorithm to obtain the maximum-likelihood estimates for the parameters and derive the local influence curvatures for some specific perturbation schemes. The motivation for this work comes from a pharmacokinetic indomethacin data set, which was analysed previously by Bocheng and Xuping [1] under normality.
Resumo:
Abstract Background The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heteregeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. Results We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. Conclusions The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools and data sources. The methodology can be used in the development of connectors supporting both simple and nontrivial processing requirements, thus assuring accurate data exchange and information interpretation from exchanged data.
Resumo:
The miniaturization race in the hardware industry aiming at continuous increasing of transistor density on a die does not bring respective application performance improvements any more. One of the most promising alternatives is to exploit a heterogeneous nature of common applications in hardware. Supported by reconfigurable computation, which has already proved its efficiency in accelerating data intensive applications, this concept promises a breakthrough in contemporary technology development. Memory organization in such heterogeneous reconfigurable architectures becomes very critical. Two primary aspects introduce a sophisticated trade-off. On the one hand, a memory subsystem should provide well organized distributed data structure and guarantee the required data bandwidth. On the other hand, it should hide the heterogeneous hardware structure from the end-user, in order to support feasible high-level programmability of the system. This thesis work explores the heterogeneous reconfigurable hardware architectures and presents possible solutions to cope the problem of memory organization and data structure. By the example of the MORPHEUS heterogeneous platform, the discussion follows the complete design cycle, starting from decision making and justification, until hardware realization. Particular emphasis is made on the methods to support high system performance, meet application requirements, and provide a user-friendly programmer interface. As a result, the research introduces a complete heterogeneous platform enhanced with a hierarchical memory organization, which copes with its task by means of separating computation from communication, providing reconfigurable engines with computation and configuration data, and unification of heterogeneous computational devices using local storage buffers. It is distinguished from the related solutions by distributed data-flow organization, specifically engineered mechanisms to operate with data on local domains, particular communication infrastructure based on Network-on-Chip, and thorough methods to prevent computation and communication stalls. In addition, a novel advanced technique to accelerate memory access was developed and implemented.
Resumo:
The Gaia space mission is a major project for the European astronomical community. As challenging as it is, the processing and analysis of the huge data-flow incoming from Gaia is the subject of thorough study and preparatory work by the DPAC (Data Processing and Analysis Consortium), in charge of all aspects of the Gaia data reduction. This PhD Thesis was carried out in the framework of the DPAC, within the team based in Bologna. The task of the Bologna team is to define the calibration model and to build a grid of spectro-photometric standard stars (SPSS) suitable for the absolute flux calibration of the Gaia G-band photometry and the BP/RP spectrophotometry. Such a flux calibration can be performed by repeatedly observing each SPSS during the life-time of the Gaia mission and by comparing the observed Gaia spectra to the spectra obtained by our ground-based observations. Due to both the different observing sites involved and the huge amount of frames expected (≃100000), it is essential to maintain the maximum homogeneity in data quality, acquisition and treatment, and a particular care has to be used to test the capabilities of each telescope/instrument combination (through the “instrument familiarization plan”), to devise methods to keep under control, and eventually to correct for, the typical instrumental effects that can affect the high precision required for the Gaia SPSS grid (a few % with respect to Vega). I contributed to the ground-based survey of Gaia SPSS in many respects: with the observations, the instrument familiarization plan, the data reduction and analysis activities (both photometry and spectroscopy), and to the maintenance of the data archives. However, the field I was personally responsible for was photometry and in particular relative photometry for the production of short-term light curves. In this context I defined and tested a semi-automated pipeline which allows for the pre-reduction of imaging SPSS data and the production of aperture photometry catalogues ready to be used for further analysis. A series of semi-automated quality control criteria are included in the pipeline at various levels, from pre-reduction, to aperture photometry, to light curves production and analysis.
Resumo:
A study of maar-diatreme volcanoes has been perfomed by inversion of gravity and magnetic data. The geophysical inverse problem has been solved by means of the damped nonlinear least-squares method. To ensure stability and convergence of the solution of the inverse problem, a mathematical tool, consisting in data weighting and model scaling, has been worked out. Theoretical gravity and magnetic modeling of maar-diatreme volcanoes has been conducted in order to get information, which is used for a simple rough qualitative and/or quantitative interpretation. The information also serves as a priori information to design models for the inversion and/or to assist the interpretation of inversion results. The results of theoretical modeling have been used to roughly estimate the heights and the dip angles of the walls of eight Eifel maar-diatremes — each taken as a whole. Inversemodeling has been conducted for the Schönfeld Maar (magnetics) and the Hausten-Morswiesen Maar (gravity and magnetics). The geometrical parameters of these maars, as well as the density and magnetic properties of the rocks filling them, have been estimated. For a reliable interpretation of the inversion results, beside the knowledge from theoretical modeling, it was resorted to other tools such like field transformations and spectral analysis for complementary information. Geologic models, based on thesynthesis of the respective interpretation results, are presented for the two maars mentioned above. The results gave more insight into the genesis, physics and posteruptive development of the maar-diatreme volcanoes. A classification of the maar-diatreme volcanoes into three main types has been elaborated. Relatively high magnetic anomalies are indicative of scoria cones embeded within maar-diatremes if they are not caused by a strong remanent component of the magnetization. Smaller (weaker) secondary gravity and magnetic anomalies on the background of the main anomaly of a maar-diatreme — especially in the boundary areas — are indicative for subsidence processes, which probably occurred in the late sedimentation phase of the posteruptive development. Contrary to postulates referring to kimberlite pipes, there exists no generalized systematics between diameter and height nor between geophysical anomaly and the dimensions of the maar-diatreme volcanoes. Although both maar-diatreme volcanoes and kimberlite pipes are products of phreatomagmatism, they probably formed in different thermodynamic and hydrogeological environments. In the case of kimberlite pipes, large amounts of magma and groundwater, certainly supplied by deep and large reservoirs, interacted under high pressure and temperature conditions. This led to a long period phreatomagmatic process and hence to the formation of large structures. Concerning the maar-diatreme and tuff-ring-diatreme volcanoes, the phreatomagmatic process takes place due to an interaction between magma from small and shallow magma chambers (probably segregated magmas) and small amounts of near-surface groundwater under low pressure and temperature conditions. This leads to shorter time eruptions and consequently to structures of smaller size in comparison with kimberlite pipes. Nevertheless, the results show that the diameter to height ratio for 50% of the studied maar-diatremes is around 1, whereby the dip angle of the diatreme walls is similar to that of the kimberlite pipes and lies between 70 and 85°. Note that these numerical characteristics, especially the dip angle, hold for the maars the diatremes of which — estimated by modeling — have the shape of a truncated cone. This indicates that the diatreme can not be completely resolved by inversion.
Resumo:
Data sets describing the state of the earth's atmosphere are of great importance in the atmospheric sciences. Over the last decades, the quality and sheer amount of the available data increased significantly, resulting in a rising demand for new tools capable of handling and analysing these large, multidimensional sets of atmospheric data. The interdisciplinary work presented in this thesis covers the development and the application of practical software tools and efficient algorithms from the field of computer science, aiming at the goal of enabling atmospheric scientists to analyse and to gain new insights from these large data sets. For this purpose, our tools combine novel techniques with well-established methods from different areas such as scientific visualization and data segmentation. In this thesis, three practical tools are presented. Two of these tools are software systems (Insight and IWAL) for different types of processing and interactive visualization of data, the third tool is an efficient algorithm for data segmentation implemented as part of Insight.Insight is a toolkit for the interactive, three-dimensional visualization and processing of large sets of atmospheric data, originally developed as a testing environment for the novel segmentation algorithm. It provides a dynamic system for combining at runtime data from different sources, a variety of different data processing algorithms, and several visualization techniques. Its modular architecture and flexible scripting support led to additional applications of the software, from which two examples are presented: the usage of Insight as a WMS (web map service) server, and the automatic production of a sequence of images for the visualization of cyclone simulations. The core application of Insight is the provision of the novel segmentation algorithm for the efficient detection and tracking of 3D features in large sets of atmospheric data, as well as for the precise localization of the occurring genesis, lysis, merging and splitting events. Data segmentation usually leads to a significant reduction of the size of the considered data. This enables a practical visualization of the data, statistical analyses of the features and their events, and the manual or automatic detection of interesting situations for subsequent detailed investigation. The concepts of the novel algorithm, its technical realization, and several extensions for avoiding under- and over-segmentation are discussed. As example applications, this thesis covers the setup and the results of the segmentation of upper-tropospheric jet streams and cyclones as full 3D objects. Finally, IWAL is presented, which is a web application for providing an easy interactive access to meteorological data visualizations, primarily aimed at students. As a web application, the needs to retrieve all input data sets and to install and handle complex visualization tools on a local machine are avoided. The main challenge in the provision of customizable visualizations to large numbers of simultaneous users was to find an acceptable trade-off between the available visualization options and the performance of the application. Besides the implementational details, benchmarks and the results of a user survey are presented.
Resumo:
The objective of this study was to characterize empirically the association between vaccination coverage and the size and occurrence of measles epidemics in Germany. In order to achieve this we analysed data routinely collected by the Robert Koch Institute, which comprise the weekly number of reported measles cases at all ages as well as estimates of vaccination coverage at the average age of entry into the school system. Coverage levels within each federal state of Germany are incorporated into a multivariate time-series model for infectious disease counts, which captures occasional outbreaks by means of an autoregressive component. The observed incidence pattern of measles for all ages is best described by using the log proportion of unvaccinated school starters in the autoregressive component of the model.
Resumo:
Background The release of quality data from acute care hospitals to the general public is based on the aim to inform the public, to provide transparency and to foster quality-based competition among providers. Due to the expected mechanisms of action and possibly the adverse consequences of public quality comparison, it is a controversial topic. The perspective of physicians and nurses is of particular importance in this context. They are mainly responsible for the collection of quality-control data, and are directly confronted with the results of public comparison. The research focus of this qualitative study was to discover what the views and opinions of the Swiss physicians and nurses were regarding these issues. It was investigated as to how the two professional groups appraised the opportunities as well as the risks of the release of quality data in Switzerland. Methods A qualitative approach was chosen to answer the research question. For data collection, four focus groups were conducted with physicians and nurses who were employed in Swiss acute care hospitals. Qualitative content analysis was applied to the data. Results The results revealed that both occupational groups had a very critical and negative attitude regarding the recent developments. The perceived risks were dominating their view. In summary, their main concerns were: the reduction of complexity, the one-sided focus on measurable quality variables, risk selection, the threat of data manipulation and the abuse of published information by the media. An additional concern was that the impression is given that the complex construct of quality can be reduced to a few key figures, and it that it is constructed from a false message which then influences society and politics. This critical attitude is associated with the different value system and the professional self-concept that both physicians and nurses have, in comparison to the underlying principles of a market-based economy and the economic orientation of health care business. Conclusions The critical and negative attitude of Swiss physicians and nurses must, under all conditions, be heeded to and investigated regarding its impact on work motivation and identification with the profession. At the same time, the two professional groups are obligated to reflect upon their critical attitude and take a proactive role in the development of appropriate quality indicators for the publication of quality data in Switzerland.
Resumo:
Data on antimicrobial use play a key role in the development of policies for the containment of antimicrobial resistance. On-farm data could provide a detailed overview of the antimicrobial use, but technical and methodological aspects of data collection and interpretation, as well as data quality need to be further assessed. The aims of this study were (1) to quantify antimicrobial use in the study population using different units of measurement and contrast the results obtained, (2) to evaluate data quality of farm records on antimicrobial use, and (3) to compare data quality of different recording systems. During 1 year, data on antimicrobial use were collected from 97 dairy farms. Antimicrobial consumption was quantified using: (1) the incidence density of antimicrobial treatments; (2) the weight of active substance; (3) the used daily dose and (4) the used course dose for antimicrobials for intestinal, intrauterine and systemic use; and (5) the used unit dose, for antimicrobials for intramammary use. Data quality was evaluated by describing completeness and accuracy of the recorded information, and by comparing farmers' and veterinarians' records. Relative consumption of antimicrobials depended on the unit of measurement: used doses reflected the treatment intensity better than weight of active substance. The use of antimicrobials classified as high priority was low, although under- and overdosing were frequently observed. Electronic recording systems allowed better traceability of the animals treated. Recording drug name or dosage often resulted in incomplete or inaccurate information. Veterinarians tended to record more drugs than farmers. The integration of veterinarian and farm data would improve data quality.
Resumo:
A time series is a sequence of observations made over time. Examples in public health include daily ozone concentrations, weekly admissions to an emergency department or annual expenditures on health care in the United States. Time series models are used to describe the dependence of the response at each time on predictor variables including covariates and possibly previous values in the series. Time series methods are necessary to account for the correlation among repeated responses over time. This paper gives an overview of time series ideas and methods used in public health research.
Resumo:
OBJECTIVE: The aim of this study was to evaluate soft tissue image quality of a mobile cone-beam computed tomography (CBCT) scanner with an integrated flat-panel detector. STUDY DESIGN: Eight fresh human cadavers were used in this study. For evaluation of soft tissue visualization, CBCT data sets and corresponding computed tomography (CT) and magnetic resonance imaging (MRI) data sets were acquired. Evaluation was performed with the help of 10 defined cervical anatomical structures. RESULTS: The statistical analysis of the scoring results of 3 examiners revealed the CBCT images to be of inferior quality regarding the visualization of most of the predefined structures. Visualization without a significant difference was found regarding the demarcation of the vertebral bodies and the pyramidal cartilages, the arteriosclerosis of the carotids (compared with CT), and the laryngeal skeleton (compared with MRI). Regarding arteriosclerosis of the carotids compared with MRI, CBCT proved to be superior. CONCLUSIONS: The integration of a flat-panel detector improves soft tissue visualization using a mobile CBCT scanner.
Resumo:
The recent liberalization of the German energy market has forced the energy industry to develop and install new information systems to support agents on the energy trading floors in their analytical tasks. Besides classical approaches of building a data warehouse giving insight into the time series to understand market and pricing mechanisms, it is crucial to provide a variety of external data from the web. Weather information as well as political news or market rumors are relevant to give the appropriate interpretation to the variables of a volatile energy market. Starting from a multidimensional data model and a collection of buy and sell transactions a data warehouse is built that gives analytical support to the agents. Following the idea of web farming we harvest the web, match the external information sources after a filtering and evaluation process to the data warehouse objects, and present this qualified information on a user interface where market values are correlated with those external sources over the time axis.
Resumo:
OBJECTIVE To investigate whether it is valid to combine follow-up and change data when conducting meta-analyses of continuous outcomes. STUDY DESIGN AND SETTING Meta-epidemiological study of randomized controlled trials in patients with osteoarthritis of the knee/hip, which assessed patient-reported pain. We calculated standardized mean differences (SMDs) based on follow-up and change data, and pooled within-trial differences in SMDs. We also derived pooled SMDs indicating the largest treatment effect within a trial (optimistic selection of SMDs) and derived pooled SMDs from the estimate indicating the smallest treatment effect within a trial (pessimistic selection of SMDs). RESULTS A total of 21 meta-analyses with 189 trials with 292 randomized comparisons in 41,256 patients were included. On average, SMDs were 0.04 standard deviation units more beneficial when follow-up values were used (difference in SMDs: -0.04; 95% confidence interval: -0.13, 0.06; P=0.44). In 13 meta-analyses (62%), there was a relevant difference in clinical and/or significance level between optimistic and pessimistic pooled SMDs. CONCLUSION On average, there is no relevant difference between follow-up and change data SMDs, and combining these estimates in meta-analysis is generally valid. Decision on which type of data to use when both follow-up and change data are available should be prespecified in the meta-analysis protocol.