855 resultados para Automatic Data Processing.
Resumo:
In den westlichen Industrieländern ist das Mammakarzinom der häufigste bösartige Tumor der Frau. Sein weltweiter Anteil an allen Krebserkrankungen der Frau beläuft sich auf etwa 21 %. Inzwischen ist jede neunte Frau bedroht, während ihres Lebens an Brustkrebs zu erkranken. Die alterstandardisierte Mortalitätrate liegt derzeit bei knapp 27 %.rnrnDas Mammakarzinom hat eine relative geringe Wachstumsrate. Die Existenz eines diagnostischen Verfahrens, mit dem alle Mammakarzinome unter 10 mm Durchmesser erkannt und entfernt werden, würden den Tod durch Brustkrebs praktisch beseitigen. Denn die 20-Jahres-Überlebungsrate bei Erkrankung durch initiale Karzinome der Größe 5 bis 10 mm liegt mit über 95 % sehr hoch.rnrnMit der Kontrastmittel gestützten Bildgebung durch die MRT steht eine relativ junge Untersuchungsmethode zur Verfügung, die sensitiv genug zur Erkennung von Karzinomen ab einer Größe von 3 mm Durchmesser ist. Die diagnostische Methodik ist jedoch komplex, fehleranfällig, erfordert eine lange Einarbeitungszeit und somit viel Erfahrung des Radiologen.rnrnEine Computer unterstützte Diagnosesoftware kann die Qualität einer solch komplexen Diagnose erhöhen oder zumindest den Prozess beschleunigen. Das Ziel dieser Arbeit ist die Entwicklung einer vollautomatischen Diagnose Software, die als Zweitmeinungssystem eingesetzt werden kann. Meines Wissens existiert eine solche komplette Software bis heute nicht.rnrnDie Software führt eine Kette von verschiedenen Bildverarbeitungsschritten aus, die dem Vorgehen des Radiologen nachgeahmt wurden. Als Ergebnis wird eine selbstständige Diagnose für jede gefundene Läsion erstellt: Zuerst eleminiert eine 3d Bildregistrierung Bewegungsartefakte als Vorverarbeitungsschritt, um die Bildqualität der nachfolgenden Verarbeitungsschritte zu verbessern. Jedes kontrastanreichernde Objekt wird durch eine regelbasierte Segmentierung mit adaptiven Schwellwerten detektiert. Durch die Berechnung kinetischer und morphologischer Merkmale werden die Eigenschaften der Kontrastmittelaufnahme, Form-, Rand- und Textureeigenschaften für jedes Objekt beschrieben. Abschließend werden basierend auf den erhobenen Featurevektor durch zwei trainierte neuronale Netze jedes Objekt in zusätzliche Funde oder in gut- oder bösartige Läsionen klassifiziert.rnrnDie Leistungsfähigkeit der Software wurde auf Bilddaten von 101 weiblichen Patientinnen getested, die 141 histologisch gesicherte Läsionen enthielten. Die Vorhersage der Gesundheit dieser Läsionen ergab eine Sensitivität von 88 % bei einer Spezifität von 72 %. Diese Werte sind den in der Literatur bekannten Vorhersagen von Expertenradiologen ähnlich. Die Vorhersagen enthielten durchschnittlich 2,5 zusätzliche bösartige Funde pro Patientin, die sich als falsch klassifizierte Artefakte herausstellten.rn
Resumo:
Data deduplication describes a class of approaches that reduce the storage capacity needed to store data or the amount of data that has to be transferred over a network. These approaches detect coarse-grained redundancies within a data set, e.g. a file system, and remove them.rnrnOne of the most important applications of data deduplication are backup storage systems where these approaches are able to reduce the storage requirements to a small fraction of the logical backup data size.rnThis thesis introduces multiple new extensions of so-called fingerprinting-based data deduplication. It starts with the presentation of a novel system design, which allows using a cluster of servers to perform exact data deduplication with small chunks in a scalable way.rnrnAfterwards, a combination of compression approaches for an important, but often over- looked, data structure in data deduplication systems, so called block and file recipes, is introduced. Using these compression approaches that exploit unique properties of data deduplication systems, the size of these recipes can be reduced by more than 92% in all investigated data sets. As file recipes can occupy a significant fraction of the overall storage capacity of data deduplication systems, the compression enables significant savings.rnrnA technique to increase the write throughput of data deduplication systems, based on the aforementioned block and file recipes, is introduced next. The novel Block Locality Caching (BLC) uses properties of block and file recipes to overcome the chunk lookup disk bottleneck of data deduplication systems. This chunk lookup disk bottleneck either limits the scalability or the throughput of data deduplication systems. The presented BLC overcomes the disk bottleneck more efficiently than existing approaches. Furthermore, it is shown that it is less prone to aging effects.rnrnFinally, it is investigated if large HPC storage systems inhibit redundancies that can be found by fingerprinting-based data deduplication. Over 3 PB of HPC storage data from different data sets have been analyzed. In most data sets, between 20 and 30% of the data can be classified as redundant. According to these results, future work in HPC storage systems should further investigate how data deduplication can be integrated into future HPC storage systems.rnrnThis thesis presents important novel work in different area of data deduplication re- search.
Resumo:
In vielen Industriezweigen, zum Beispiel in der Automobilindustrie, werden Digitale Versuchsmodelle (Digital MockUps) eingesetzt, um die Konstruktion und die Funktion eines Produkts am virtuellen Prototypen zu überprüfen. Ein Anwendungsfall ist dabei die Überprüfung von Sicherheitsabständen einzelner Bauteile, die sogenannte Abstandsanalyse. Ingenieure ermitteln dabei für bestimmte Bauteile, ob diese in ihrer Ruhelage sowie während einer Bewegung einen vorgegeben Sicherheitsabstand zu den umgebenden Bauteilen einhalten. Unterschreiten Bauteile den Sicherheitsabstand, so muss deren Form oder Lage verändert werden. Dazu ist es wichtig, die Bereiche der Bauteile, welche den Sicherhabstand verletzen, genau zu kennen. rnrnIn dieser Arbeit präsentieren wir eine Lösung zur Echtzeitberechnung aller den Sicherheitsabstand unterschreitenden Bereiche zwischen zwei geometrischen Objekten. Die Objekte sind dabei jeweils als Menge von Primitiven (z.B. Dreiecken) gegeben. Für jeden Zeitpunkt, in dem eine Transformation auf eines der Objekte angewendet wird, berechnen wir die Menge aller den Sicherheitsabstand unterschreitenden Primitive und bezeichnen diese als die Menge aller toleranzverletzenden Primitive. Wir präsentieren in dieser Arbeit eine ganzheitliche Lösung, welche sich in die folgenden drei großen Themengebiete unterteilen lässt.rnrnIm ersten Teil dieser Arbeit untersuchen wir Algorithmen, die für zwei Dreiecke überprüfen, ob diese toleranzverletzend sind. Hierfür präsentieren wir verschiedene Ansätze für Dreiecks-Dreiecks Toleranztests und zeigen, dass spezielle Toleranztests deutlich performanter sind als bisher verwendete Abstandsberechnungen. Im Fokus unserer Arbeit steht dabei die Entwicklung eines neuartigen Toleranztests, welcher im Dualraum arbeitet. In all unseren Benchmarks zur Berechnung aller toleranzverletzenden Primitive beweist sich unser Ansatz im dualen Raum immer als der Performanteste.rnrnDer zweite Teil dieser Arbeit befasst sich mit Datenstrukturen und Algorithmen zur Echtzeitberechnung aller toleranzverletzenden Primitive zwischen zwei geometrischen Objekten. Wir entwickeln eine kombinierte Datenstruktur, die sich aus einer flachen hierarchischen Datenstruktur und mehreren Uniform Grids zusammensetzt. Um effiziente Laufzeiten zu gewährleisten ist es vor allem wichtig, den geforderten Sicherheitsabstand sinnvoll im Design der Datenstrukturen und der Anfragealgorithmen zu beachten. Wir präsentieren hierzu Lösungen, die die Menge der zu testenden Paare von Primitiven schnell bestimmen. Darüber hinaus entwickeln wir Strategien, wie Primitive als toleranzverletzend erkannt werden können, ohne einen aufwändigen Primitiv-Primitiv Toleranztest zu berechnen. In unseren Benchmarks zeigen wir, dass wir mit unseren Lösungen in der Lage sind, in Echtzeit alle toleranzverletzenden Primitive zwischen zwei komplexen geometrischen Objekten, bestehend aus jeweils vielen hunderttausend Primitiven, zu berechnen. rnrnIm dritten Teil präsentieren wir eine neuartige, speicheroptimierte Datenstruktur zur Verwaltung der Zellinhalte der zuvor verwendeten Uniform Grids. Wir bezeichnen diese Datenstruktur als Shrubs. Bisherige Ansätze zur Speicheroptimierung von Uniform Grids beziehen sich vor allem auf Hashing Methoden. Diese reduzieren aber nicht den Speicherverbrauch der Zellinhalte. In unserem Anwendungsfall haben benachbarte Zellen oft ähnliche Inhalte. Unser Ansatz ist in der Lage, den Speicherbedarf der Zellinhalte eines Uniform Grids, basierend auf den redundanten Zellinhalten, verlustlos auf ein fünftel der bisherigen Größe zu komprimieren und zur Laufzeit zu dekomprimieren.rnrnAbschießend zeigen wir, wie unsere Lösung zur Berechnung aller toleranzverletzenden Primitive Anwendung in der Praxis finden kann. Neben der reinen Abstandsanalyse zeigen wir Anwendungen für verschiedene Problemstellungen der Pfadplanung.
Resumo:
Applying location-focused data protection law within the context of a location-agnostic cloud computing framework is fraught with difficulties. While the Proposed EU Data Protection Regulation has introduced a lot of changes to the current data protection framework, the complexities of data processing in the cloud involve various layers and intermediaries of actors that have not been properly addressed. This leaves some gaps in the regulation when analyzed in cloud scenarios. This paper gives a brief overview of the relevant provisions of the regulation that will have an impact on cloud transactions and addresses the missing links. It is hoped that these loopholes will be reconsidered before the final version of the law is passed in order to avoid unintended consequences.
Resumo:
The article proposes granular computing as a theoretical, formal and methodological basis for the newly emerging research field of human–data interaction (HDI). We argue that the ability to represent and reason with information granules is a prerequisite for data legibility. As such, it allows for extending the research agenda of HDI to encompass the topic of collective intelligence amplification, which is seen as an opportunity of today’s increasingly pervasive computing environments. As an example of collective intelligence amplification in HDI, we introduce a collaborative urban planning use case in a cognitive city environment and show how an iterative process of user input and human-oriented automated data processing can support collective decision making. As a basis for automated human-oriented data processing, we use the spatial granular calculus of granular geometry.
Resumo:
Navigation of deep space probes is most commonly operated using the spacecraft Doppler tracking technique. Orbital parameters are determined from a series of repeated measurements of the frequency shift of a microwave carrier over a given integration time. Currently, both ESA and NASA operate antennas at several sites around the world to ensure the tracking of deep space probes. Just a small number of software packages are nowadays used to process Doppler observations. The Astronomical Institute of the University of Bern (AIUB) has recently started the development of Doppler data processing capabilities within the Bernese GNSS Software. This software has been extensively used for Precise Orbit Determination of Earth orbiting satellites using GPS data collected by on-board receivers and for subsequent determination of the Earth gravity field. In this paper, we present the currently achieved status of the Doppler data modeling and orbit determination capabilities in the Bernese GNSS Software using GRAIL data. In particular we will focus on the implemented orbit determination procedure used for the combined analysis of Doppler and intersatellite Ka-band data. We show that even at this earlier stage of the development we can achieve an accuracy of few mHz on two-way S-band Doppler observation and of 2 µm/s on KBRR data from the GRAIL primary mission phase.
Resumo:
A wide variety of spatial data collection efforts are ongoing throughout local, state and federal agencies, private firms and non-profit organizations. Each effort is established for a different purpose but organizations and individuals often collect and maintain the same or similar information. The United States federal government has undertaken many initiatives such as the National Spatial Data Infrastructure, the National Map and Geospatial One-Stop to reduce duplicative spatial data collection and promote the coordinated use, sharing, and dissemination of spatial data nationwide. A key premise in most of these initiatives is that no national government will be able to gather and maintain more than a small percentage of the geographic data that users want and desire. Thus, national initiatives depend typically on the cooperation of those already gathering spatial data and those using GIs to meet specific needs to help construct and maintain these spatial data infrastructures and geo-libraries for their nations (Onsrud 2001). Some of the impediments to widespread spatial data sharing are well known from directly asking GIs data producers why they are not currently involved in creating datasets that are of common or compatible formats, documenting their datasets in a standardized metadata format or making their datasets more readily available to others through Data Clearinghouses or geo-libraries. The research described in this thesis addresses the impediments to wide-scale spatial data sharing faced by GIs data producers and explores a new conceptual data-sharing approach, the Public Commons for Geospatial Data, that supports user-friendly metadata creation, open access licenses, archival services and documentation of parent lineage of the contributors and value- adders of digital spatial data sets.
Resumo:
Clinical Research Data Quality Literature Review and Pooled Analysis We present a literature review and secondary analysis of data accuracy in clinical research and related secondary data uses. A total of 93 papers meeting our inclusion criteria were categorized according to the data processing methods. Quantitative data accuracy information was abstracted from the articles and pooled. Our analysis demonstrates that the accuracy associated with data processing methods varies widely, with error rates ranging from 2 errors per 10,000 files to 5019 errors per 10,000 fields. Medical record abstraction was associated with the highest error rates (70–5019 errors per 10,000 fields). Data entered and processed at healthcare facilities had comparable error rates to data processed at central data processing centers. Error rates for data processed with single entry in the presence of on-screen checks were comparable to double entered data. While data processing and cleaning methods may explain a significant amount of the variability in data accuracy, additional factors not resolvable here likely exist. Defining Data Quality for Clinical Research: A Concept Analysis Despite notable previous attempts by experts to define data quality, the concept remains ambiguous and subject to the vagaries of natural language. This current lack of clarity continues to hamper research related to data quality issues. We present a formal concept analysis of data quality, which builds on and synthesizes previously published work. We further posit that discipline-level specificity may be required to achieve the desired definitional clarity. To this end, we combine work from the clinical research domain with findings from the general data quality literature to produce a discipline-specific definition and operationalization for data quality in clinical research. While the results are helpful to clinical research, the methodology of concept analysis may be useful in other fields to clarify data quality attributes and to achieve operational definitions. Medical Record Abstractor’s Perceptions of Factors Impacting the Accuracy of Abstracted Data Medical record abstraction (MRA) is known to be a significant source of data errors in secondary data uses. Factors impacting the accuracy of abstracted data are not reported consistently in the literature. Two Delphi processes were conducted with experienced medical record abstractors to assess abstractor’s perceptions about the factors. The Delphi process identified 9 factors that were not found in the literature, and differed with the literature by 5 factors in the top 25%. The Delphi results refuted seven factors reported in the literature as impacting the quality of abstracted data. The results provide insight into and indicate content validity of a significant number of the factors reported in the literature. Further, the results indicate general consistency between the perceptions of clinical research medical record abstractors and registry and quality improvement abstractors. Distributed Cognition Artifacts on Clinical Research Data Collection Forms Medical record abstraction, a primary mode of data collection in secondary data use, is associated with high error rates. Distributed cognition in medical record abstraction has not been studied as a possible explanation for abstraction errors. We employed the theory of distributed representation and representational analysis to systematically evaluate cognitive demands in medical record abstraction and the extent of external cognitive support employed in a sample of clinical research data collection forms. We show that the cognitive load required for abstraction in 61% of the sampled data elements was high, exceedingly so in 9%. Further, the data collection forms did not support external cognition for the most complex data elements. High working memory demands are a possible explanation for the association of data errors with data elements requiring abstractor interpretation, comparison, mapping or calculation. The representational analysis used here can be used to identify data elements with high cognitive demands.
Resumo:
This article presents and technically describes a new field spectro-goniometer system for the ground-based characterization of the surface reflectance anisotropy under natural illumination conditions developed at the Alfred Wegener Institute (AWI). The spectro-goniometer consists of a Manual Transportable Instrument platform for ground-based Spectro-directional observations (ManTIS), and a hyperspectral sensor system. The presented measurement strategy shows that the AWI ManTIS field spectro-goniometer can deliver high quality hemispherical conical reflectance factor (HCRF) measurements with a pointing accuracy of ±6 cm within the constant observation center. The sampling of a ManTIS hemisphere (up to 30° viewing zenith, 360° viewing azimuth) needs approx. 18 min. The developed data processing chain in combination with the software used for the semi-automatic control provides a reliable method to reduce temporal effects during the measurements. The presented visualization and analysis approaches of the HCRF data of an Arctic low growing vegetation showcase prove the high quality of spectro-goniometer measurements. The patented low-cost and lightweight ManTIS instrument platform can be customized for various research needs and is available for purchase.
Resumo:
In recent years, profiling floats, which form the basis of the successful international Argo observatory, are also being considered as platforms for marine biogeochemical research. This study showcases the utility of floats as a novel tool for combined gas measurements of CO2 partial pressure (pCO2) and O2. These float prototypes were equipped with a small-sized and submersible pCO2 sensor and an optode O2 sensor for highresolution measurements in the surface ocean layer. Four consecutive deployments were carried out during November 2010 and June 2011 near the Cape Verde Ocean Observatory (CVOO) in the eastern tropical North Atlantic. The profiling float performed upcasts every 31 h while measuring pCO2, O2, salinity, temperature, and hydrostatic pressure in the upper 200 m of the water column. To maintain accuracy, regular pCO2 sensor zeroings at depth and surface, as well as optode measurements in air, were performed for each profile. Through the application of data processing procedures (e.g., time-lag correction), accuracies of floatborne pCO2 measurements were greatly improved (10-15 µatm for the water column and 5 µatm for surface measurements). O2 measurements yielded an accuracy of 2 µmol/kg. First results of this pilot study show the possibility of using profiling floats as a platform for detailed and unattended observations of the marine carbon and oxygen cycle dynamics.