891 resultados para Data Storage Solutions
Resumo:
In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.
Resumo:
Beside the traditional paradigm of "centralized" power generation, a new concept of "distributed" generation is emerging, in which the same user becomes pro-sumer. During this transition, the Energy Storage Systems (ESS) can provide multiple services and features, which are necessary for a higher quality of the electrical system and for the optimization of non-programmable Renewable Energy Source (RES) power plants. A ESS prototype was designed, developed and integrated into a renewable energy production system in order to create a smart microgrid and consequently manage in an efficient and intelligent way the energy flow as a function of the power demand. The produced energy can be introduced into the grid, supplied to the load directly or stored in batteries. The microgrid is composed by a 7 kW wind turbine (WT) and a 17 kW photovoltaic (PV) plant are part of. The load is given by electrical utilities of a cheese factory. The ESS is composed by the following two subsystems, a Battery Energy Storage System (BESS) and a Power Control System (PCS). With the aim of sizing the ESS, a Remote Grid Analyzer (RGA) was designed, realized and connected to the wind turbine, photovoltaic plant and the switchboard. Afterwards, different electrochemical storage technologies were studied, and taking into account the load requirements present in the cheese factory, the most suitable solution was identified in the high temperatures salt Na-NiCl2 battery technology. The data acquisition from all electrical utilities provided a detailed load analysis, indicating the optimal storage size equal to a 30 kW battery system. Moreover a container was designed and realized to locate the BESS and PCS, meeting all the requirements and safety conditions. Furthermore, a smart control system was implemented in order to handle the different applications of the ESS, such as peak shaving or load levelling.
Resumo:
Data deduplication describes a class of approaches that reduce the storage capacity needed to store data or the amount of data that has to be transferred over a network. These approaches detect coarse-grained redundancies within a data set, e.g. a file system, and remove them.rnrnOne of the most important applications of data deduplication are backup storage systems where these approaches are able to reduce the storage requirements to a small fraction of the logical backup data size.rnThis thesis introduces multiple new extensions of so-called fingerprinting-based data deduplication. It starts with the presentation of a novel system design, which allows using a cluster of servers to perform exact data deduplication with small chunks in a scalable way.rnrnAfterwards, a combination of compression approaches for an important, but often over- looked, data structure in data deduplication systems, so called block and file recipes, is introduced. Using these compression approaches that exploit unique properties of data deduplication systems, the size of these recipes can be reduced by more than 92% in all investigated data sets. As file recipes can occupy a significant fraction of the overall storage capacity of data deduplication systems, the compression enables significant savings.rnrnA technique to increase the write throughput of data deduplication systems, based on the aforementioned block and file recipes, is introduced next. The novel Block Locality Caching (BLC) uses properties of block and file recipes to overcome the chunk lookup disk bottleneck of data deduplication systems. This chunk lookup disk bottleneck either limits the scalability or the throughput of data deduplication systems. The presented BLC overcomes the disk bottleneck more efficiently than existing approaches. Furthermore, it is shown that it is less prone to aging effects.rnrnFinally, it is investigated if large HPC storage systems inhibit redundancies that can be found by fingerprinting-based data deduplication. Over 3 PB of HPC storage data from different data sets have been analyzed. In most data sets, between 20 and 30% of the data can be classified as redundant. According to these results, future work in HPC storage systems should further investigate how data deduplication can be integrated into future HPC storage systems.rnrnThis thesis presents important novel work in different area of data deduplication re- search.
Resumo:
Virgin olive oil(VOO) is a product characterized by high economic and nutritional values, because of its superior sensory characteristics and minor compounds (phenols and tocopherols) contents. Since the original quality of VOO may change during its storage, this study aimed to investigate the influence of different storage and shipment conditions on the quality of VOO, by studying different solutions such as filtration, dark storage and shipment inside insulated containers to protect it. Different analytical techniques were used to follow-up the quality changes during virgin olive oil storage and simulated shipments, in terms of basic quality parameters, sensory analysis and evaluation of minor components (phenolic compounds, diglycerides, volatile compounds). Four main research streams were presented in this PhD thesis: The results obtained from the first experimental section revealed that the application of filtration and/or clarification can decrease the unavoidable quality loss of the oil samples during storage, in comparison with unfiltered oil samples. The second section indicated that the virgin olive oil freshness, evaluated by diglycerides content, was mainly affected by the storage time and temperature. The third section revealed that fluctuation in temperature during storage may adversely affect the virgin olive oil quality, in terms of hydrolytic rancidity and oxidation quality. The fourth section showed that virgin olive oil shipped inside insulated containers showed lower hydrolytic and oxidation degradation than those without insulation cover. Overall, this PhD thesis highlighted that application of adequate treatment, such as filtration or clarification, in addition to a good protection against other external variables, such as temperature and light, will improve the stability of virgin olive oil during storage.
Resumo:
Nella fisica delle particelle, onde poter effettuare analisi dati, è necessario disporre di una grande capacità di calcolo e di storage. LHC Computing Grid è una infrastruttura di calcolo su scala globale e al tempo stesso un insieme di servizi, sviluppati da una grande comunità di fisici e informatici, distribuita in centri di calcolo sparsi in tutto il mondo. Questa infrastruttura ha dimostrato il suo valore per quanto riguarda l'analisi dei dati raccolti durante il Run-1 di LHC, svolgendo un ruolo fondamentale nella scoperta del bosone di Higgs. Oggi il Cloud computing sta emergendo come un nuovo paradigma di calcolo per accedere a grandi quantità di risorse condivise da numerose comunità scientifiche. Date le specifiche tecniche necessarie per il Run-2 (e successivi) di LHC, la comunità scientifica è interessata a contribuire allo sviluppo di tecnologie Cloud e verificare se queste possano fornire un approccio complementare, oppure anche costituire una valida alternativa, alle soluzioni tecnologiche esistenti. Lo scopo di questa tesi è di testare un'infrastruttura Cloud e confrontare le sue prestazioni alla LHC Computing Grid. Il Capitolo 1 contiene un resoconto generale del Modello Standard. Nel Capitolo 2 si descrive l'acceleratore LHC e gli esperimenti che operano a tale acceleratore, con particolare attenzione all’esperimento CMS. Nel Capitolo 3 viene trattato il Computing nella fisica delle alte energie e vengono esaminati i paradigmi Grid e Cloud. Il Capitolo 4, ultimo del presente elaborato, riporta i risultati del mio lavoro inerente l'analisi comparata delle prestazioni di Grid e Cloud.
Resumo:
Bandlaufwerke waren bisher die vorherrschende Technologie, um die anfallenden Datenmengen in Archivsystemen zu speichern. Mit Zugriffsmustern, die immer aktiver werden, und Speichermedien wie Festplatten die kostenmäßig aufholen, muss die Architektur vor Speichersystemen zur Archivierung neu überdacht werden. Zuverlässigkeit, Integrität und Haltbarkeit sind die Haupteigenschaften der digitalen Archivierung. Allerdings nimmt auch die Zugriffsgeschwindigkeit einen erhöhten Stellenwert ein, wenn aktive Archive ihre gesamten Inhalte für den direkten Zugriff bereitstellen. Ein band-basiertes System kann die hierfür benötigte Parallelität, Latenz und Durchsatz nicht liefern, was in der Regel durch festplattenbasierte Systeme als Zwischenspeicher kompensiert wird.rnIn dieser Arbeit untersuchen wir die Herausforderungen und Möglichkeiten ein festplattenbasiertes Speichersystem zu entwickeln, das auf eine hohe Zuverlässigkeit und Energieeffizienz zielt und das sich sowohl für aktive als auch für kalte Archivumgebungen eignet. Zuerst analysieren wir die Speichersysteme und Zugriffsmuster eines großen digitalen Archivs und präsentieren damit ein mögliches Einsatzgebiet für unsere Architektur. Daraufhin stellen wir Mechanismen vor um die Zuverlässigkeit einer einzelnen Festplatte zu verbessern und präsentieren sowie evaluieren einen neuen, energieeffizienten, zwei- dimensionalen RAID Ansatz der für „Schreibe ein Mal, lese mehrfach“ Zugriffe optimiert ist. Letztlich stellen wir Protokollierungs- und Zwischenspeichermechanismen vor, die die zugrundeliegenden Ziele unterstützen und evaluieren das RAID System in einer Dateisystemumgebung.
Resumo:
In vielen Industriezweigen, zum Beispiel in der Automobilindustrie, werden Digitale Versuchsmodelle (Digital MockUps) eingesetzt, um die Konstruktion und die Funktion eines Produkts am virtuellen Prototypen zu überprüfen. Ein Anwendungsfall ist dabei die Überprüfung von Sicherheitsabständen einzelner Bauteile, die sogenannte Abstandsanalyse. Ingenieure ermitteln dabei für bestimmte Bauteile, ob diese in ihrer Ruhelage sowie während einer Bewegung einen vorgegeben Sicherheitsabstand zu den umgebenden Bauteilen einhalten. Unterschreiten Bauteile den Sicherheitsabstand, so muss deren Form oder Lage verändert werden. Dazu ist es wichtig, die Bereiche der Bauteile, welche den Sicherhabstand verletzen, genau zu kennen. rnrnIn dieser Arbeit präsentieren wir eine Lösung zur Echtzeitberechnung aller den Sicherheitsabstand unterschreitenden Bereiche zwischen zwei geometrischen Objekten. Die Objekte sind dabei jeweils als Menge von Primitiven (z.B. Dreiecken) gegeben. Für jeden Zeitpunkt, in dem eine Transformation auf eines der Objekte angewendet wird, berechnen wir die Menge aller den Sicherheitsabstand unterschreitenden Primitive und bezeichnen diese als die Menge aller toleranzverletzenden Primitive. Wir präsentieren in dieser Arbeit eine ganzheitliche Lösung, welche sich in die folgenden drei großen Themengebiete unterteilen lässt.rnrnIm ersten Teil dieser Arbeit untersuchen wir Algorithmen, die für zwei Dreiecke überprüfen, ob diese toleranzverletzend sind. Hierfür präsentieren wir verschiedene Ansätze für Dreiecks-Dreiecks Toleranztests und zeigen, dass spezielle Toleranztests deutlich performanter sind als bisher verwendete Abstandsberechnungen. Im Fokus unserer Arbeit steht dabei die Entwicklung eines neuartigen Toleranztests, welcher im Dualraum arbeitet. In all unseren Benchmarks zur Berechnung aller toleranzverletzenden Primitive beweist sich unser Ansatz im dualen Raum immer als der Performanteste.rnrnDer zweite Teil dieser Arbeit befasst sich mit Datenstrukturen und Algorithmen zur Echtzeitberechnung aller toleranzverletzenden Primitive zwischen zwei geometrischen Objekten. Wir entwickeln eine kombinierte Datenstruktur, die sich aus einer flachen hierarchischen Datenstruktur und mehreren Uniform Grids zusammensetzt. Um effiziente Laufzeiten zu gewährleisten ist es vor allem wichtig, den geforderten Sicherheitsabstand sinnvoll im Design der Datenstrukturen und der Anfragealgorithmen zu beachten. Wir präsentieren hierzu Lösungen, die die Menge der zu testenden Paare von Primitiven schnell bestimmen. Darüber hinaus entwickeln wir Strategien, wie Primitive als toleranzverletzend erkannt werden können, ohne einen aufwändigen Primitiv-Primitiv Toleranztest zu berechnen. In unseren Benchmarks zeigen wir, dass wir mit unseren Lösungen in der Lage sind, in Echtzeit alle toleranzverletzenden Primitive zwischen zwei komplexen geometrischen Objekten, bestehend aus jeweils vielen hunderttausend Primitiven, zu berechnen. rnrnIm dritten Teil präsentieren wir eine neuartige, speicheroptimierte Datenstruktur zur Verwaltung der Zellinhalte der zuvor verwendeten Uniform Grids. Wir bezeichnen diese Datenstruktur als Shrubs. Bisherige Ansätze zur Speicheroptimierung von Uniform Grids beziehen sich vor allem auf Hashing Methoden. Diese reduzieren aber nicht den Speicherverbrauch der Zellinhalte. In unserem Anwendungsfall haben benachbarte Zellen oft ähnliche Inhalte. Unser Ansatz ist in der Lage, den Speicherbedarf der Zellinhalte eines Uniform Grids, basierend auf den redundanten Zellinhalten, verlustlos auf ein fünftel der bisherigen Größe zu komprimieren und zur Laufzeit zu dekomprimieren.rnrnAbschießend zeigen wir, wie unsere Lösung zur Berechnung aller toleranzverletzenden Primitive Anwendung in der Praxis finden kann. Neben der reinen Abstandsanalyse zeigen wir Anwendungen für verschiedene Problemstellungen der Pfadplanung.
Resumo:
L’esperimento CMS a LHC ha raccolto ingenti moli di dati durante Run-1, e sta sfruttando il periodo di shutdown (LS1) per evolvere il proprio sistema di calcolo. Tra i possibili miglioramenti al sistema, emergono ampi margini di ottimizzazione nell’uso dello storage ai centri di calcolo di livello Tier-2, che rappresentano - in Worldwide LHC Computing Grid (WLCG)- il fulcro delle risorse dedicate all’analisi distribuita su Grid. In questa tesi viene affrontato uno studio della popolarità dei dati di CMS nell’analisi distribuita su Grid ai Tier-2. Obiettivo del lavoro è dotare il sistema di calcolo di CMS di un sistema per valutare sistematicamente l’ammontare di spazio disco scritto ma non acceduto ai centri Tier-2, contribuendo alla costruzione di un sistema evoluto di data management dinamico che sappia adattarsi elasticamente alle diversi condizioni operative - rimuovendo repliche dei dati non necessarie o aggiungendo repliche dei dati più “popolari” - e dunque, in ultima analisi, che possa aumentare l’“analysis throughput” complessivo. Il Capitolo 1 fornisce una panoramica dell’esperimento CMS a LHC. Il Capitolo 2 descrive il CMS Computing Model nelle sue generalità, focalizzando la sua attenzione principalmente sul data management e sulle infrastrutture ad esso connesse. Il Capitolo 3 descrive il CMS Popularity Service, fornendo una visione d’insieme sui servizi di data popularity già presenti in CMS prima dell’inizio di questo lavoro. Il Capitolo 4 descrive l’architettura del toolkit sviluppato per questa tesi, ponendo le basi per il Capitolo successivo. Il Capitolo 5 presenta e discute gli studi di data popularity condotti sui dati raccolti attraverso l’infrastruttura precedentemente sviluppata. L’appendice A raccoglie due esempi di codice creato per gestire il toolkit attra- verso cui si raccolgono ed elaborano i dati.
Resumo:
Nowadays, data handling and data analysis in High Energy Physics requires a vast amount of computational power and storage. In particular, the world-wide LHC Com- puting Grid (LCG), an infrastructure and pool of services developed and deployed by a ample community of physicists and computer scientists, has demonstrated to be a game changer in the efficiency of data analyses during Run-I at the LHC, playing a crucial role in the Higgs boson discovery. Recently, the Cloud computing paradigm is emerging and reaching a considerable adoption level by many different scientific organizations and not only. Cloud allows to access and utilize not-owned large computing resources shared among many scientific communities. Considering the challenging requirements of LHC physics in Run-II and beyond, the LHC computing community is interested in exploring Clouds and see whether they can provide a complementary approach - or even a valid alternative - to the existing technological solutions based on Grid. In the LHC community, several experiments have been adopting Cloud approaches, and in particular the experience of the CMS experiment is of relevance to this thesis. The LHC Run-II has just started, and Cloud-based solutions are already in production for CMS. However, other approaches of Cloud usage are being thought of and are at the prototype level, as the work done in this thesis. This effort is of paramount importance to be able to equip CMS with the capability to elastically and flexibly access and utilize the computing resources needed to face the challenges of Run-III and Run-IV. The main purpose of this thesis is to present forefront Cloud approaches that allow the CMS experiment to extend to on-demand resources dynamically allocated as needed. Moreover, a direct access to Cloud resources is presented as suitable use case to face up with the CMS experiment needs. Chapter 1 presents an overview of High Energy Physics at the LHC and of the CMS experience in Run-I, as well as preparation for Run-II. Chapter 2 describes the current CMS Computing Model, and Chapter 3 provides Cloud approaches pursued and used within the CMS Collaboration. Chapter 4 and Chapter 5 discuss the original and forefront work done in this thesis to develop and test working prototypes of elastic extensions of CMS computing resources on Clouds, and HEP Computing “as a Service”. The impact of such work on a benchmark CMS physics use-cases is also demonstrated.
Resumo:
Big data è il termine usato per descrivere una raccolta di dati così estesa in termini di volume,velocità e varietà da richiedere tecnologie e metodi analitici specifici per l'estrazione di valori significativi. Molti sistemi sono sempre più costituiti e caratterizzati da enormi moli di dati da gestire,originati da sorgenti altamente eterogenee e con formati altamente differenziati,oltre a qualità dei dati estremamente eterogenei. Un altro requisito in questi sistemi potrebbe essere il fattore temporale: sempre più sistemi hanno bisogno di ricevere dati significativi dai Big Data il prima possibile,e sempre più spesso l’input da gestire è rappresentato da uno stream di informazioni continuo. In questo campo si inseriscono delle soluzioni specifiche per questi casi chiamati Online Stream Processing. L’obiettivo di questa tesi è di proporre un prototipo funzionante che elabori dati di Instant Coupon provenienti da diverse fonti con diversi formati e protocolli di informazioni e trasmissione e che memorizzi i dati elaborati in maniera efficiente per avere delle risposte in tempo reale. Le fonti di informazione possono essere di due tipologie: XMPP e Eddystone. Il sistema una volta ricevute le informazioni in ingresso, estrapola ed elabora codeste fino ad avere dati significativi che possono essere utilizzati da terze parti. Lo storage di questi dati è fatto su Apache Cassandra. Il problema più grosso che si è dovuto risolvere riguarda il fatto che Apache Storm non prevede il ribilanciamento delle risorse in maniera automatica, in questo caso specifico però la distribuzione dei clienti durante la giornata è molto varia e ricca di picchi. Il sistema interno di ribilanciamento sfrutta tecnologie innovative come le metriche e sulla base del throughput e della latenza esecutiva decide se aumentare/diminuire il numero di risorse o semplicemente non fare niente se le statistiche sono all’interno dei valori di soglia voluti.
Resumo:
Over the past twenty years, new technologies have required an increasing use of mathematical models in order to understand better the structural behavior: finite element method is the one mostly used. However, the reliability of this method applied to different situations has to be tried each time. Since it is not possible to completely model the reality, different hypothesis must be done: these are the main problems of FE modeling. The following work deals with this problem and tries to figure out a way to identify some of the unknown main parameters of a structure. This main research focuses on a particular path of study and development, but the same concepts can be applied to other objects of research. The main purpose of this work is the identification of unknown boundary conditions of a bridge pier using the data acquired experimentally with field tests and a FEM modal updating process. This work doesn’t want to be new, neither innovative. A lot of work has been done during the past years on this main problem and many solutions have been shown and published. This thesis just want to rework some of the main aspects of the structural optimization process, using a real structure as fitting model.
Resumo:
This paper presents an automated solution for precise detection of fiducial screws from three-dimensional (3D) Computerized Tomography (CT)/Digital Volume Tomography (DVT) data for image-guided ENT surgery. Unlike previously published solutions, we regard the detection of the fiducial screws from the CT/DVT volume data as a pose estimation problem. We thus developed a model-based solution. Starting from a user-supplied initialization, our solution detects the fiducial screws by iteratively matching a computer aided design (CAD) model of the fiducial screw to features extracted from the CT/DVT data. We validated our solution on one conventional CT dataset and on five DVT volume datasets, resulting in a total detection of 24 fiducial screws. Our experimental results indicate that the proposed solution achieves much higher reproducibility and precision than the manual detection. Further comparison shows that the proposed solution produces better results on the DVT dataset than on the conventional CT dataset.
Resumo:
AIM: The purpose of this study was to evaluate the activation of resin-modified glass ionomer restorative material (RMGI, Vitremer-3M-ESPE, A3) by halogen lamp (QTH) or light-emitting diode (LED) by Knoop microhardness (KHN) in two storage conditions: 24hrs and 6 months and in two depths (0 and 2 mm). MATERIALS AND METHODS: The specimens were randomly divided into 3 experimental groups (n=10) according to activation form and evaluated in depth after 24h and after 6 months of storage. Activation was performed with QTH for 40s (700 mW/cm2) and for 40 or 20 s with LED (1,200 mW/scm2). After 24 hrs and 6 months of storage at 37°C in relative humidity in lightproof container, the Knoop microhardness test was performed. Statistics Data were analysed by three-way ANOVA and Tukey post-tests (p<0.05). RESULTS: All evaluated factors showed significant differences (p<0.05). After 24 hrs there were no differences within the experimental groups. KHN at 0 mm was significantly higher than 2 mm. After 6 months, there was an increase of microhardness values for all groups, being the ones activated by LED higher than the ones activated by QTH. CONCLUSION: Light-activation with LED positively influenced the KHN for RMGI evaluated after 6 months.
Resumo:
Animal coloration often serves as a signal to others that may communicate traits about the individual such as toxicity, status, or quality. Colorful ornaments in many animals are often honest signals of quality assessed by mates, and different colors may beproduced by different biochemical pigments. Investigations of the mechanisms responsible for variation in color expression among birds are best when including a geographically and temporally broad sample. In order to obtain such a sample, studies such as this often use museum specimens; however, in order for museum specimens toserve as an accurate replacement, they must accurately represent living birds, or we must understand the ways in which they differ. In this thesis, I investigated the link between feather corticosterone, a hormone secreted in response to stress, and carotenoid-basedcoloration in the Red-winged Blackbird (Agelaius phoeniceus) in order to explore a mechanistic link between physiological state and color expression. Male Red-winged Blackbirds with lower feather corticosterone had significantly brighter red epaulets than birds with higher feather corticosterone, while I found no significant changes in red chroma. I also performed a methodological comparison of color change in museum specimens among different pigment types (carotenoid and psittacofulvin) and pigments in different locations in the body (feather and bill carotenoids) in order to quantify colorchange over time. Carotenoids and psittacofulvins showed significant reductions in red brightness and chroma over time in the collection, and carotenoid color changed significantly faster than psittacofulvin color. Both bill and feather carotenoids showed significant reductions in red brightness and red chroma over time, but change of both red chroma and red brightness occurred at a similar rate in feathers and bills. In order to use museum specimens of ecological research on bird coloration specimen age must be accounted for before the data can be used; however, once this is accomplished, museum- based color data may be used to draw conclusions about wild populations.
Resumo:
The aim of this study was to assess the influence on the infrared laser fluorescence response of some storage methods commonly used in dental research. Forty extracted permanent teeth, selected from a pool of frozen teeth, were divided into four groups of 10. Three groups were stored at 4 degrees C in 1% chloramine, 10% formalin or 0.02% thymol solution. The fourth group was stored at -20 degrees C (no storage solution added). Fluorescence measurements were performed at 14, 77, 113, 168, 232, 486 and 737 days. After 2 years, significant decreases in fluorescence (p<0.01) for the samples in formalin (-60%), chloramine (-72%) and thymol (-54%) were observed. The frozen teeth showed a slight but non-significant increase in fluorescence of 5% (p>0.01). Storing solutions have a significant influence on the fluorescence yield. Samples used for in vitro purposes stored frozen do not significantly change their fluorescence response. Thus, cut-off values obtained under the latter conditions could be extrapolated to the in vivo situation.