952 resultados para Dynamic data set visualization
Resumo:
Abstract Background A popular model for gene regulatory networks is the Boolean network model. In this paper, we propose an algorithm to perform an analysis of gene regulatory interactions using the Boolean network model and time-series data. Actually, the Boolean network is restricted in the sense that only a subset of all possible Boolean functions are considered. We explore some mathematical properties of the restricted Boolean networks in order to avoid the full search approach. The problem is modeled as a Constraint Satisfaction Problem (CSP) and CSP techniques are used to solve it. Results We applied the proposed algorithm in two data sets. First, we used an artificial dataset obtained from a model for the budding yeast cell cycle. The second data set is derived from experiments performed using HeLa cells. The results show that some interactions can be fully or, at least, partially determined under the Boolean model considered. Conclusions The algorithm proposed can be used as a first step for detection of gene/protein interactions. It is able to infer gene relationships from time-series data of gene expression, and this inference process can be aided by a priori knowledge available.
Resumo:
[EN] The information provided by the International Commission for the Conservation of Atlantic Tunas (ICCAT) on captures of skipjack tuna (Katsuwonus pelamis) in the central-east Atlantic has a number of limitations, such as gaps in the statistics for certain fleets and the level of spatiotemporal detail at which catches are reported. As a result, the quality of these data and their effectiveness for providing management advice is limited. In order to reconstruct missing spatiotemporal data of catches, the present study uses Data INterpolating Empirical Orthogonal Functions (DINEOF), a technique for missing data reconstruction, applied here for the first time to fisheries data. DINEOF is based on an Empirical Orthogonal Functions decomposition performed with a Lanczos method. DINEOF was tested with different amounts of missing data, intentionally removing values from 3.4% to 95.2% of data loss, and then compared with the same data set with no missing data. These validation analyses show that DINEOF is a reliable methodological approach of data reconstruction for the purposes of fishery management advice, even when the amount of missing data is very high.
Resumo:
In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.
Resumo:
By the end of the 19th century, geodesy has contributed greatly to the knowledge of regional tectonics and fault movement through its ability to measure, at sub-centimetre precision, the relative positions of points on the Earth’s surface. Nowadays the systematic analysis of geodetic measurements in active deformation regions represents therefore one of the most important tool in the study of crustal deformation over different temporal scales [e.g., Dixon, 1991]. This dissertation focuses on motion that can be observed geodetically with classical terrestrial position measurements, particularly triangulation and leveling observations. The work is divided into two sections: an overview of the principal methods for estimating longterm accumulation of elastic strain from terrestrial observations, and an overview of the principal methods for rigorously inverting surface coseismic deformation fields for source geometry with tests on synthetic deformation data sets and applications in two different tectonically active regions of the Italian peninsula. For the long-term accumulation of elastic strain analysis, triangulation data were available from a geodetic network across the Messina Straits area (southern Italy) for the period 1971 – 2004. From resulting angle changes, the shear strain rates as well as the orientation of the principal axes of the strain rate tensor were estimated. The computed average annual shear strain rates for the time period between 1971 and 2004 are γ˙1 = 113.89 ± 54.96 nanostrain/yr and γ˙2 = -23.38 ± 48.71 nanostrain/yr, with the orientation of the most extensional strain (θ) at N140.80° ± 19.55°E. These results suggests that the first-order strain field of the area is dominated by extension in the direction perpendicular to the trend of the Straits, sustaining the hypothesis that the Messina Straits could represents an area of active concentrated deformation. The orientation of θ agree well with GPS deformation estimates, calculated over shorter time interval, and is consistent with previous preliminary GPS estimates [D’Agostino and Selvaggi, 2004; Serpelloni et al., 2005] and is also similar to the direction of the 1908 (MW 7.1) earthquake slip vector [e.g., Boschi et al., 1989; Valensise and Pantosti, 1992; Pino et al., 2000; Amoruso et al., 2002]. Thus, the measured strain rate can be attributed to an active extension across the Messina Straits, corresponding to a relative extension rate ranges between < 1mm/yr and up to ~ 2 mm/yr, within the portion of the Straits covered by the triangulation network. These results are consistent with the hypothesis that the Messina Straits is an important active geological boundary between the Sicilian and the Calabrian domains and support previous preliminary GPS-based estimates of strain rates across the Straits, which show that the active deformation is distributed along a greater area. Finally, the preliminary dislocation modelling has shown that, although the current geodetic measurements do not resolve the geometry of the dislocation models, they solve well the rate of interseismic strain accumulation across the Messina Straits and give useful information about the locking the depth of the shear zone. Geodetic data, triangulation and leveling measurements of the 1976 Friuli (NE Italy) earthquake, were available for the inversion of coseismic source parameters. From observed angle and elevation changes, the source parameters of the seismic sequence were estimated in a join inversion using an algorithm called “simulated annealing”. The computed optimal uniform–slip elastic dislocation model consists of a 30° north-dipping shallow (depth 1.30 ± 0.75 km) fault plane with azimuth of 273° and accommodating reverse dextral slip of about 1.8 m. The hypocentral location and inferred fault plane of the main event are then consistent with the activation of Periadriatic overthrusts or other related thrust faults as the Gemona- Kobarid thrust. Then, the geodetic data set exclude the source solution of Aoudia et al. [2000], Peruzza et al. [2002] and Poli et al. [2002] that considers the Susans-Tricesimo thrust as the May 6 event. The best-fit source model is then more consistent with the solution of Pondrelli et al. [2001], which proposed the activation of other thrusts located more to the North of the Susans-Tricesimo thrust, probably on Periadriatic related thrust faults. The main characteristics of the leveling and triangulation data are then fit by the optimal single fault model, that is, these results are consistent with a first-order rupture process characterized by a progressive rupture of a single fault system. A single uniform-slip fault model seems to not reproduce some minor complexities of the observations, and some residual signals that are not modelled by the optimal single-fault plane solution, were observed. In fact, the single fault plane model does not reproduce some minor features of the leveling deformation field along the route 36 south of the main uplift peak, that is, a second fault seems to be necessary to reproduce these residual signals. By assuming movements along some mapped thrust located southward of the inferred optimal single-plane solution, the residual signal has been successfully modelled. In summary, the inversion results presented in this Thesis, are consistent with the activation of some Periadriatic related thrust for the main events of the sequence, and with a minor importance of the southward thrust systems of the middle Tagliamento plain.
Resumo:
In the recent decade, the request for structural health monitoring expertise increased exponentially in the United States. The aging issues that most of the transportation structures are experiencing can put in serious jeopardy the economic system of a region as well as of a country. At the same time, the monitoring of structures is a central topic of discussion in Europe, where the preservation of historical buildings has been addressed over the last four centuries. More recently, various concerns arose about security performance of civil structures after tragic events such the 9/11 or the 2011 Japan earthquake: engineers looks for a design able to resist exceptional loadings due to earthquakes, hurricanes and terrorist attacks. After events of such a kind, the assessment of the remaining life of the structure is at least as important as the initial performance design. Consequently, it appears very clear that the introduction of reliable and accessible damage assessment techniques is crucial for the localization of issues and for a correct and immediate rehabilitation. The System Identification is a branch of the more general Control Theory. In Civil Engineering, this field addresses the techniques needed to find mechanical characteristics as the stiffness or the mass starting from the signals captured by sensors. The objective of the Dynamic Structural Identification (DSI) is to define, starting from experimental measurements, the modal fundamental parameters of a generic structure in order to characterize, via a mathematical model, the dynamic behavior. The knowledge of these parameters is helpful in the Model Updating procedure, that permits to define corrected theoretical models through experimental validation. The main aim of this technique is to minimize the differences between the theoretical model results and in situ measurements of dynamic data. Therefore, the new model becomes a very effective control practice when it comes to rehabilitation of structures or damage assessment. The instrumentation of a whole structure is an unfeasible procedure sometimes because of the high cost involved or, sometimes, because it’s not possible to physically reach each point of the structure. Therefore, numerous scholars have been trying to address this problem. In general two are the main involved methods. Since the limited number of sensors, in a first case, it’s possible to gather time histories only for some locations, then to move the instruments to another location and replay the procedure. Otherwise, if the number of sensors is enough and the structure does not present a complicate geometry, it’s usually sufficient to detect only the principal first modes. This two problems are well presented in the works of Balsamo [1] for the application to a simple system and Jun [2] for the analysis of system with a limited number of sensors. Once the system identification has been carried, it is possible to access the actual system characteristics. A frequent practice is to create an updated FEM model and assess whether the structure fulfills or not the requested functions. Once again the objective of this work is to present a general methodology to analyze big structure using a limited number of instrumentation and at the same time, obtaining the most information about an identified structure without recalling methodologies of difficult interpretation. A general framework of the state space identification procedure via OKID/ERA algorithm is developed and implemented in Matlab. Then, some simple examples are proposed to highlight the principal characteristics and advantage of this methodology. A new algebraic manipulation for a prolific use of substructuring results is developed and implemented.
Resumo:
The aim of this work was to show that refined analyses of background, low magnitude seismicity allow to delineate the main active faults and to accurately estimate the directions of the regional tectonic stress that characterize the Southern Apennines (Italy), a structurally complex area with high seismic potential. Thanks the presence in the area of an integrated dense and wide dynamic network, was possible to analyzed an high quality microearthquake data-set consisting of 1312 events that occurred from August 2005 to April 2011 by integrating the data recorded at 42 seismic stations of various networks. The refined seismicity location and focal mechanisms well delineate a system of NW-SE striking normal faults along the Apenninic chain and an approximately E-W oriented, strike-slip fault, transversely cutting the belt. The seismicity along the chain does not occur on a single fault but in a volume, delimited by the faults activated during the 1980 Irpinia M 6.9 earthquake, on sub-parallel predominant normal faults. Results show that the recent low magnitude earthquakes belongs to the background seismicity and they are likely generated along the major fault segments activated during the most recent earthquakes, suggesting that they are still active today thirty years after the mainshock occurrences. In this sense, this study gives a new perspective to the application of the high quality records of low magnitude background seismicity for the identification and characterization of active fault systems. The analysis of the stress tensor inversion provides two equivalent models to explain the microearthquake generation along both the NW-SE striking normal faults and the E- W oriented fault with a dominant dextral strike-slip motion, but having different geological interpretations. We suggest that the NW-SE-striking Africa-Eurasia convergence acts in the background of all these structures, playing a primary and unifying role in the seismotectonics of the whole region.
Resumo:
HintergrundrnDie hygrohalophytische Gattung Salicornia ist in Mittel- und Westeuropa durch vier nah verwandte, sympatrisch vorkommende Arten vertreten. Es handelt sich um die zwei tetraploiden Arten S. procumbens und S. stricta und die diploiden Arten S. europaea und S. ramosissima. Morphologisch lassen sich die Arten zwar nur schwer voneinander unterscheiden, die morphologische Variation ist aber wiederum so hoch, dass mehrere distinkte Arten/Morphotypen unterschieden werden können. Bezüglich ihrer Verteilung im hochdynamischen Lebensraum Salzwiese findet man die verschiedenen Arten/Morphotypen in überlappenden Bereichen des Habitats. Ihr relativ vorhersagbares Auftreten entlang eines ökologischen Gradienten innerhalb ihres Lebensraumes scheint jedoch für eine ökologische Differenzierung der verschiedenen Arten/Morphotypen zu sprechen. Aufgrund des sympatrischen Vorkommens der scheinbar ökologisch und morphologisch differenzierten Morphotypen stellt sich die Frage, durch welche Prozesse diese entstanden sein könnten (genetische und ökologische Differenzierung) aber auch welche Prozesse die dauerhafte Koexistenz der Arten (reproduktive Isolationsmechanismen) aufrechterhalten.rnZielsetzungrnZiel dieser Arbeit war es, die Entstehung und Diversifizierung der mittel- und westeuropäischen Salicornia-Arten anhand von molekulargenetischen, ökologischen und reproduktionsbiologischen Methoden zu untersuchen.rnMethodenrnAnhand einer AFLP-Fragmentanalyse mit 89 Herkünften aus Großbritannien, Frankreich und Deutschland wurden molekulare Phylogenien erstellt sowie eine Hauptkomponenten- und Clusteranalyse durchgeführt. Um die ökologische Differenzierung und phänotypische Plastizität der vier Arten/Morphotypen zu untersuchen wurde ein reziprokes Transplantationsexperiment durchgeführt. Um die reproduktiven Isolationsmechanismen der Arten/Morphotypen zu untersuchen, wurden verschiedene Beobachtungen und Experimente durchgeführt.rnErgebnissernDie molekularen Analysen konnten zwar die beiden Artengruppen (Ploidiestufen) trennen, lieferten aber innerhalb dieser weder ein taxonomisches noch ein geographisches Signal. Akzessionen mit identischer Morphologie aus der gleichen Population verteilten sich in den Analysen in verschiedene genetische Cluster. Identische Morphotypen aus verschiedenen geographischen Regionen gruppieren teilweise zusammen. Das Transplantationsexperiment zeigte für die beiden tetraploiden Arten S. procumbens und S. stricta eine deutliche ökologische Differenzierung, bei S. procumbens in Form von verminderter Fitness und einer beschleunigten Phänologie, bei S. stricta nur in Form einer veränderten Phänologie. Bezüglich der Plastizität zeigten beide tetraploiden Arten eine konstante Morphologie. Die beiden diploiden Taxa S. europaea und S. ramosissima zeigten weder eine klare ökologische Differenzierung noch eine konstante Morphologie. Bezüglich der Reproduktionsbiologie konnte bestätigt werden, dass Selbstung bei allen Taxa der hauptsächliche Reproduktionsmodus ist. Bei den tetraploiden Taxa zeigte sich zwar ein geringes Maß an Fremdbefruchtung, bei den diploiden Taxa führen dagegen morphologische Besonderheiten zu hochgradiger Selbstung.rnRésumérnDie in Mittel- und Westeuropa vorkommenden Salicornia-Arten stellen keine evolutionären Einheiten dar. Die beiden tetraploiden Taxa sollten auf Grund ihrer parallelen Entstehung und ökologischen Differenzierung als Ökotypen angesprochen werden. Beide Ökotypen weisen ein hohes Ausbreitungspotential aus und persistieren als Inzuchtlinien mit geringem Anteil an Fremdbestäubung. Die diploiden Taxa sind weder ökologisch differenziert noch morphologisch stabil und sollten deshalb als nur ein morphologisch sehr variables, aus zahlreichen weitverbreiteten Inzuchtlinien bestehendes Taxon angesehen werden.
Resumo:
Der Haupt-Lichtsammelkomplex (LHCII) des Photosyntheseapparates höherer Pflanzen gehört zu den häufigsten Membranproteinen der Erde. Seine Kristallstruktur ist bekannt. Das Apoprotein kann rekombinant in Escherichia coli überexprimiert und somit molekularbiologisch vielfältig verändert werden. In Detergenzlösung besitzt das denaturierte Protein die erstaunliche Fähigkeit, sich spontan zu funktionalen Protein-Pigment-Komplexen zu organisieren, welche strukturell nahezu identisch sind mit nativem LHCII. Der Faltungsprozess findet in vitro im Zeitbereich von Sekunden bis Minuten statt und ist abhängig von der Bindung der Cofaktoren Chlorophyll a und b sowie verschiedenen Carotinoiden.rn Diese Eigenschaften machen LHCII besonders geeignet für Strukturuntersuchungen mittels der elektronenparamagnetischen Resonanz (EPR)-Spektrokopie. Diese setzt eine punktspezifische Spinmarkierung des LHCII voraus, die in dieser Arbeit zunächst optimiert wurde. Einschließlich der Beiträge Anderer stand eine breite Auswahl von über 40 spinmarkierten Mutanten des LHCII bereit, einen N-terminalen „Cys walk“ eingeschlossen. Weder der hierfür notwendige Austausch einzelner Aminosäuren noch die Anknüpfung des Spinmarkers beeinträchtigten die Funktion des LHCII. Zudem konnte ein Protokoll zur Präparation heterogen spinmarkierter LHCII-Trimere entwickelt werden, also von Trimeren, die jeweils nur ein Monomer mit einer Spinmarkierung enthalten.rn Spinmarkierte Proben des Detergenz-solubilisierten LHCII wurden unter Verwendung verschiedener EPR-Techniken strukturell analysiert. Als besonders aussagekräftig erwies sich die Messung der Wasserzugänglichkeit einzelner Aminosäurepositionen anhand der Electron Spin Echo Envelope Modulation (ESEEM). In Kombination mit der etablierten Double Electron-Electron Resonance (DEER)-Technik zur Detektion von Abständen zwischen zwei Spinmarkern wurde der membranständige Kernbereich des LHCII in Lösung eingehend untersucht und strukturell der Kristallstruktur für sehr ähnlich befunden. Die Vermessung kristallographisch nicht erfasster Bereiche nahe dem N-Terminus offenbarte die schon früher detektierte Strukturdynamik der Domäne in Abhängigkeit des Oligomerisierungsgrades. Der neue, noch zu vervollständigende Datensatz aus Abstandsverteilungen und ESEEM-Wasserzugänglichkeiten monomerer wie trimerer Proben sollte in naher Zukunft die sehr genaue Modellierung der N-terminalen Domäne des LHCII ermöglichen.rn In einem weiteren Abschnitt der Arbeit wurde die Faltung des LHCII-Apoproteins bei der LHCII-Assemblierung in vitro untersucht. Vorausgegangene fluoreszenzspektroskopi-sche Arbeiten hatten gezeigt, dass die Bindung von Chlorophyll a und b in aufeinanderfolgenden Schritten im Zeitbereich von weniger als einer Minute bzw. mehreren Minuten erfolgten. Sowohl die Wasserzugänglichkeit einzelner Aminosäurepositionen als auch Spin-Spin-Abstände änderten sich in ähnlichen Zeitbereichen. Die Daten deuten darauf hin, dass die Ausbildung der mittleren Transmembran-Helix mit der schnelleren Chlorophyll-a-Bindung einhergeht, während sich die Superhelix aus den beiden anderen Transmembranhelices erst im langsameren Schritt, zusammen mit der Chlorophyll-b-Bindung, ausbildet.rn
Resumo:
The energy released during a seismic crisis in volcanic areas is strictly related to the physical processes in the volcanic structure. In particular Long Period seismicity, that seems to be related to the oscillation of a fluid-filled crack (Chouet , 1996, Chouet, 2003, McNutt, 2005), can precedes or accompanies an eruption. The present doctoral thesis is focused on the study of the LP seismicity recorded in the Campi Flegrei volcano (Campania, Italy) during the October 2006 crisis. Campi Flegrei Caldera is an active caldera; the combination of an active magmatic system and a dense populated area make the Campi Flegrei a critical volcano. The source dynamic of LP seismicity is thought to be very different from the other kind of seismicity ( Tectonic or Volcano Tectonic): it’s characterized by a time sustained source and a low content in frequency. This features implies that the duration–magnitude, that is commonly used for VT events and sometimes for LPs as well, is unadapted for LP magnitude evaluation. The main goal of this doctoral work was to develop a method for the determination of the magnitude for the LP seismicity; it’s based on the comparison of the energy of VT event and LP event, linking the energy to the VT moment magnitude. So the magnitude of the LP event would be the moment magnitude of a VT event with the same energy of the LP. We applied this method to the LP data-set recorded at Campi Flegrei caldera in 2006, to an LP data-set of Colima volcano recorded in 2005 – 2006 and for an event recorded at Etna volcano. Experimenting this method to lots of waveforms recorded at different volcanoes we tested its easy applicability and consequently its usefulness in the routinely and in the quasi-real time work of a volcanological observatory.
Resumo:
In this work we will discuss about a project started by the Emilia-Romagna Regional Government regarding the manage of the public transport. In particular we will perform a data mining analysis on the data-set of this project. After introducing the Weka software used to make our analysis, we will discover the most useful data mining techniques and algorithms; and we will show how these results can be used to violate the privacy of the same public transport operators. At the end, despite is off topic of this work, we will spend also a few words about how it's possible to prevent this kind of attack.
Resumo:
Data deduplication describes a class of approaches that reduce the storage capacity needed to store data or the amount of data that has to be transferred over a network. These approaches detect coarse-grained redundancies within a data set, e.g. a file system, and remove them.rnrnOne of the most important applications of data deduplication are backup storage systems where these approaches are able to reduce the storage requirements to a small fraction of the logical backup data size.rnThis thesis introduces multiple new extensions of so-called fingerprinting-based data deduplication. It starts with the presentation of a novel system design, which allows using a cluster of servers to perform exact data deduplication with small chunks in a scalable way.rnrnAfterwards, a combination of compression approaches for an important, but often over- looked, data structure in data deduplication systems, so called block and file recipes, is introduced. Using these compression approaches that exploit unique properties of data deduplication systems, the size of these recipes can be reduced by more than 92% in all investigated data sets. As file recipes can occupy a significant fraction of the overall storage capacity of data deduplication systems, the compression enables significant savings.rnrnA technique to increase the write throughput of data deduplication systems, based on the aforementioned block and file recipes, is introduced next. The novel Block Locality Caching (BLC) uses properties of block and file recipes to overcome the chunk lookup disk bottleneck of data deduplication systems. This chunk lookup disk bottleneck either limits the scalability or the throughput of data deduplication systems. The presented BLC overcomes the disk bottleneck more efficiently than existing approaches. Furthermore, it is shown that it is less prone to aging effects.rnrnFinally, it is investigated if large HPC storage systems inhibit redundancies that can be found by fingerprinting-based data deduplication. Over 3 PB of HPC storage data from different data sets have been analyzed. In most data sets, between 20 and 30% of the data can be classified as redundant. According to these results, future work in HPC storage systems should further investigate how data deduplication can be integrated into future HPC storage systems.rnrnThis thesis presents important novel work in different area of data deduplication re- search.
Resumo:
When estimating the effect of treatment on HIV using data from observational studies, standard methods may produce biased estimates due to the presence of time-dependent confounders. Such confounding can be present when a covariate, affected by past exposure, is both a predictor of the future exposure and the outcome. One example is the CD4 cell count, being a marker for disease progression for HIV patients, but also a marker for treatment initiation and influenced by treatment. Fitting a marginal structural model (MSM) using inverse probability weights is one way to give appropriate adjustment for this type of confounding. In this paper we study a simple and intuitive approach to estimate similar treatment effects, using observational data to mimic several randomized controlled trials. Each 'trial' is constructed based on individuals starting treatment in a certain time interval. An overall effect estimate for all such trials is found using composite likelihood inference. The method offers an alternative to the use of inverse probability of treatment weights, which is unstable in certain situations. The estimated parameter is not identical to the one of an MSM, it is conditioned on covariate values at the start of each mimicked trial. This allows the study of questions that are not that easily addressed fitting an MSM. The analysis can be performed as a stratified weighted Cox analysis on the joint data set of all the constructed trials, where each trial is one stratum. The model is applied to data from the Swiss HIV cohort study.
Resumo:
Despite association with lung growth and long-term respiratory morbidity, there is a lack of normative lung function data for unsedated infants conforming to latest European Respiratory Society/American Thoracic Society standards. Lung function was measured using an ultrasonic flow meter in 342 unsedated, healthy, term-born infants at a mean ± sd age of 5.1 ± 0.8 weeks during natural sleep according to the latest standards. Tidal breathing flow-volume loops (TBFVL) and exhaled nitric oxide (eNO) measurements were obtained from 100 regular breaths. We aimed for three acceptable measurements for multiple-breath washout and 5-10 acceptable interruption resistance (R(int)) measurements. Acceptable measurements were obtained in ≤ 285 infants with high variability. Mean values were 7.48 mL·kg⁻¹ (95% limits of agreement 4.95-10.0 mL·kg⁻¹) for tidal volume, 14.3 ppb (2.6-26.1 ppb) for eNO, 23.9 mL·kg⁻¹ (16.0-31.8 mL·kg⁻¹) for functional residual capacity, 6.75 (5.63-7.87) for lung clearance index and 3.78 kPa·s·L⁻¹ (1.14-6.42 kPa·s·L⁻¹) for R(int). In males, TBFVL outcomes were associated with anthropometric parameters and in females, with maternal smoking during pregnancy, maternal asthma and Caesarean section. This large normative data set in unsedated infants offers reference values for future research and particularly for studies where sedation may put infants at risk. Furthermore, it highlights the impact of maternal and environmental risk factors on neonatal lung function.
Experimental Evaluation of the Influence of Human-Structure Interaction for Vibration Serviceability
Resumo:
The effects of human-structure interaction on the dynamic performance of occupied structures have long been observed. The inclusion of the effects of human-structure interaction is important to ensure that the dynamic response of a structure is not overestimated. Previous observations, both in service and in the laboratory, have yielded results indicating that the effects are dependent on the natural frequency of the structure, the posture of the occupants, and the mass ratio of the occupants to the structure. These results are noteworthy, but are limited in their application,because the data are sparse and are only pertinent to a specific set of characteristics identified in a given study. To examine these characteristics simultaneously and consistently, an experimental test structure was designed with variable properties to replicate a variety of configurations within a controlled setting focusing on the effects of passive occupants. Experimental modal analysis techniques were employed to both the empty and occupied conditions of the structure and the dynamic properties associated with each condition were compared. Results similar to previous investigations were observed, including both an increase and a decrease in natural frequency of the occupied structure with respect to the empty structure, as well as the identification of a second mode of vibration. The damping of the combined system was higher for all configurations. Overall, this study provides a broad data set representing a wide array of configurations. The experimental results of this study were used to assess current recommendations for the dynamic properties of a crowd to analytically predict the effects of human-structure interaction. The experimental results were used to select a set of properties for passive, standing occupants and develop a new model that can more accurately represent the behavior of the human-structure system as experimentally measured in this study.
Resumo:
This paper proposes Poisson log-linear multilevel models to investigate population variability in sleep state transition rates. We specifically propose a Bayesian Poisson regression model that is more flexible, scalable to larger studies, and easily fit than other attempts in the literature. We further use hierarchical random effects to account for pairings of individuals and repeated measures within those individuals, as comparing diseased to non-diseased subjects while minimizing bias is of epidemiologic importance. We estimate essentially non-parametric piecewise constant hazards and smooth them, and allow for time varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming piecewise constant hazards. This relationship allows us to synthesize two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed.