919 resultados para genoma, genetica, dna, bioinformatica, mapreduce, snp, gwas, big data, sequenziamento, pipeline


Relevância:

100.00% 100.00%

Publicador:

Resumo:

La Tesi tratta i concetti di Privacy e Protezione dei Dati personali, contestualizzandone il quadro normativo e tecnologico con particolare riferimento ai contesti emergenti rappresentati – per un verso – dalla proposta di nuovo Regolamento generale sulla protezione dei dati personali (redatto dal Parlamento Europeo e dal Consiglio dell’Unione Europea), – per un altro – dalla metodologia di progettazione del Privacy by Design e – per entrambi – dalla previsione di un nuovo attore: il responsabile per la protezione dei dati personali (Privacy Officer). L’elaborato si articola su tre parti oltre introduzione, conclusioni e riferimenti bibliografici. La prima parte descrive il concetto di privacy e le relative minacce e contromisure (tradizionali ed emergenti) con riferimento ai contesti di gestione (aziendale e Big Data) e al quadro normativo vigente. La seconda Parte illustra in dettaglio i principi e le prassi del Privacy by Design e la figura del Privacy Officer formalmente riconosciuta dal novellato giuridico. La terza parte illustra il caso di studio nel quale vengono analizzate tramite una tabella comparativa minacce e contromisure rilevabili in un contesto aziendale.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Sviluppo e analisi di un dataset campione, composto da circa 3 mln di entry ed estratto da un data warehouse di informazioni riguardanti il consumo energetico di diverse smart home.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Il presente elaborato ha come oggetto l’analisi delle prestazioni e il porting di un sistema di SBI sulla distribuzione Hadoop di Cloudera. Nello specifico è stato fatto un porting dei dati del progetto WebPolEU. Successivamente si sono confrontate le prestazioni del query engine Impala con quelle di ElasticSearch che, diversamente da Oracle, sfrutta la stessa componente hardware (cluster).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nella tesi, inizialmente, viene introdotto il concetto di Big Data, descrivendo le caratteristiche principali, il loro utilizzo, la provenienza e le opportunità che possono apportare. Successivamente, si sono spiegati i motivi che hanno portato alla nascita del movimento NoSQL, come la necessità di dover gestire i Big Data pur mantenendo una struttura flessibile nel tempo. Inoltre, dopo un confronto con i sistemi tradizionali, si è passati al classificare questi DBMS in diverse famiglie, accennando ai concetti strutturali sulle quali si basano, per poi spiegare il funzionamento. In seguito è stato descritto il database MongoDB orientato ai documenti. Sono stati approfonditi i dettagli strutturali, i concetti sui quali si basa e gli obbiettivi che si pone, per poi andare ad analizzare nello specifico importanti funzioni, come le operazioni di inserimento e cancellazione, ma anche il modo di interrogare il database. Grazie alla sue caratteristiche che lo rendono molto performante, MonogDB, è stato utilizzato come supporto di base di dati per la realizzazione di un applicazione web che permette di mostrare la mappa della connettività urbana.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La quantità di dati che vengono generati e immagazzinati sta aumentando sempre più grazie alle nuove tecnologie e al numero di utenti sempre maggiore. Questi dati, elaborati correttamente, permettono quindi di ottenere delle informazioni di valore strategico che aiutano nell’effettuare decisioni aziendali a qualsiasi livello, dalla produzione fino al marketing. Sono nati soprattutto negli ultimi anni numerosi framework proprietari e open source che permettono l'elaborazione di questi dati sfruttando un cluster. In particolare tra i più utilizzati e attivi in questo momento a livello open source troviamo Hadoop e Spark. Obiettivo di questa tesi è realizzare un modello di Spark per realizzare una funzione di costo che sia non solo implementabile all’interno dell’ottimizzatore di Spark SQL, ma anche per poter effettuare delle simulazioni di esecuzione di query su tale sistema. Si è quindi studiato nel dettaglio con ducumentazione e test il comportamento del sistema per realizzare un modello. I dati ottenuti sono infine stati confrontati con dati sperimentali ottenuti tramite l'utilizzo di un cluster. Con la presenza di tale modello non solo risulta possibile comprendere in maniera più approfondita il reale comportamento di Spark ma permette anche di programmare applicazioni più efficienti e progettare con maggiore precisione sistemi per la gestione dei dataset che sfruttino tali framework.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ogni giorno vengono generati grandi moli di dati attraverso sorgenti diverse. Questi dati, chiamati Big Data, sono attualmente oggetto di forte interesse nel settore IT (Information Technology). I processi digitalizzati, le interazioni sui social media, i sensori ed i sistemi mobili, che utilizziamo quotidianamente, sono solo un piccolo sottoinsieme di tutte le fonti che contribuiscono alla produzione di questi dati. Per poter analizzare ed estrarre informazioni da questi grandi volumi di dati, tante sono le tecnologie che sono state sviluppate. Molte di queste sfruttano approcci distribuiti e paralleli. Una delle tecnologie che ha avuto maggior successo nel processamento dei Big Data, e Apache Hadoop. Il Cloud Computing, in particolare le soluzioni che seguono il modello IaaS (Infrastructure as a Service), forniscono un valido strumento all'approvvigionamento di risorse in maniera semplice e veloce. Per questo motivo, in questa proposta, viene utilizzato OpenStack come piattaforma IaaS. Grazie all'integrazione delle tecnologie OpenStack e Hadoop, attraverso Sahara, si riesce a sfruttare le potenzialita offerte da un ambiente cloud per migliorare le prestazioni dell'elaborazione distribuita e parallela. Lo scopo di questo lavoro e ottenere una miglior distribuzione delle risorse utilizzate nel sistema cloud con obiettivi di load balancing. Per raggiungere questi obiettivi, si sono rese necessarie modifiche sia al framework Hadoop che al progetto Sahara.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big Brother Watch and others have filed a complaint against the United Kingdom under the European Convention on Human Rights about a violation of Article 8, the right to privacy. It regards the NSA affair and UK-based surveillance activities operated by secret services. The question is whether it will be declared admissible and, if so, whether the European Court of Human Rights will find a violation. This article discusses three possible challenges for these types of complaints and analyses whether the current privacy paradigm is still adequate in view of the development known as Big Data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Teaching is a dynamic activity. It can be very effective, if its impact is constantly monitored and adjusted to the demands of changing social contexts and needs of learners. This implies that teachers need to be aware about teaching and learning processes. Moreover, they should constantly question their didactical methods and the learning resources, which they provide to their students. They should reflect if their actions are suitable, and they should regulate their teaching, e.g., by updating learning materials based on new knowledge about learners, or by motivating learners to engage in further learning activities. In the last years, a rising interest in ‘learning analytics’ is observable. This interest is motivated by the availability of massive amounts of educational data. Also, the continuously increasing processing power, and a strong motivation for discovering new information from these pools of educational data, is pushing further developments within the learning analytics research field. Learning analytics could be a method for reflective teaching practice that enables and guides teachers to investigate and evaluate their work in future learning scenarios. However, this potentially positive impact has not yet been sufficiently verified by learning analytics research. Another method that pursues these goals is ‘action research’. Learning analytics promises to initiate action research processes because it facilitates awareness, reflection and regulation of teaching activities analogous to action research. Therefore, this thesis joins both concepts, in order to improve the design of learning analytics tools. Central research question of this thesis are: What are the dimensions of learning analytics in relation to action research, which need to be considered when designing a learning analytics tool? How does a learning analytics dashboard impact the teachers of technology-enhanced university lectures regarding ‘awareness’, ‘reflection’ and ‘action’? Does it initiate action research? Which are central requirements for a learning analytics tool, which pursues such effects? This project followed design-based research principles, in order to answer these research questions. The main contributions are: a theoretical reference model that connects action research and learning analytics, the conceptualization and implementation of a learning analytics tool, a requirements catalogue for useful and usable learning analytics design based on evaluations, a tested procedure for impact analysis, and guidelines for the introduction of learning analytics into higher education.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Logistiknetzwerke von Unternehmen wachsen sehr schnell und werden immer komplexer. Unternehmen wissen oft nicht, von welchen anderen Unternehmen sie abhängig sind und welche geschäftskritischen Risiken sich daraus für sie ergeben. Aus diesem Grund wird in diesem Artikel ein Konzept eines proaktiven Ri-sikomanagements in Logistiknetzwerken vorgestellt. Das Konzept basiert auf der Big Data Technologie und verwendet zur Identifikation von Risiken und zum Aufbau eines Logistiknetzwerkes neben internen Unternehmensdaten auch externe Daten, z. B. Social Media Plattformen oder andere Datenportale. Diese Daten werden ausgewertet und mit Risiken behaftete Beziehungen werden dem Bediener grafisch angezeigt. Zusätzlich dazu kann das System dem Benutzer mögliche Alternativen zur Vermeidung dieser Risiken aufzeigen und somit zur Entscheidungsunterstützung genutzt werden.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Simulation techniques are almost indispensable in the analysis of complex systems. Materials- and related information flow processes in logistics often possess such complexity. Further problem arise as the processes change over time and pose a Big Data problem as well. To cope with these issues adaptive simulations are more and more frequently used. This paper presents a few relevant advanced simulation models and intro-duces a novel model structure, which unifies modelling of geometrical relations and time processes. This way the process structure and their geometric relations can be handled in a well understandable and transparent way. Capabilities and applicability of the model is also presented via a demonstrational example.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Lyme disease Borrelia can infect humans and animals for months to years, despite the presence of an active host immune response. The vls antigenic variation system, which expresses the surface-exposed lipoprotein VlsE, plays a major role in B. burgdorferi immune evasion. Gene conversion between vls silent cassettes and the vlsE expression site occurs at high frequency during mammalian infection, resulting in sequence variation in the VlsE product. In this study, we examined vlsE sequence variation in B. burgdorferi B31 during mouse infection by analyzing 1,399 clones isolated from bladder, heart, joint, ear, and skin tissues of mice infected for 4 to 365 days. The median number of codon changes increased progressively in C3H/HeN mice from 4 to 28 days post infection, and no clones retained the parental vlsE sequence at 28 days. In contrast, the decrease in the number of clones with the parental vlsE sequence and the increase in the number of sequence changes occurred more gradually in severe combined immunodeficiency (SCID) mice. Clones containing a stop codon were isolated, indicating that continuous expression of full-length VlsE is not required for survival in vivo; also, these clones continued to undergo vlsE recombination. Analysis of clones with apparent single recombination events indicated that recombinations into vlsE are nonselective with regard to the silent cassette utilized, as well as the length and location of the recombination event. Sequence changes as small as one base pair were common. Fifteen percent of recovered vlsE variants contained "template-independent" sequence changes, which clustered in the variable regions of vlsE. We hypothesize that the increased frequency and complexity of vlsE sequence changes observed in clones recovered from immunocompetent mice (as compared with SCID mice) is due to rapid clearance of relatively invariant clones by variable region-specific anti-VlsE antibody responses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This chapter presents fuzzy cognitive maps (FCM) as a vehicle for Web knowledge aggregation, representation, and reasoning. The corresponding Web KnowARR framework incorporates findings from fuzzy logic. To this end, a first emphasis is particularly on the Web KnowARR framework along with a stakeholder management use case to illustrate the framework’s usefulness as a second focal point. This management form is to help projects to acceptance and assertiveness where claims for company decisions are actively involved in the management process. Stakeholder maps visually (re-) present these claims. On one hand, they resort to non-public content and on the other they resort to content that is available to the public (mostly on the Web). The Semantic Web offers opportunities not only to present public content descriptively but also to show relationships. The proposed framework can serve as the basis for the public content of stakeholder maps.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The fuzzy analytical network process (FANP) is introduced as a potential multi-criteria-decision-making (MCDM) method to improve digital marketing management endeavors. Today’s information overload makes digital marketing optimization, which is needed to continuously improve one’s business, increasingly difficult. The proposed FANP framework is a method for enhancing the interaction between customers and marketers (i.e., involved stakeholders) and thus for reducing the challenges of big data. The presented implementation takes realities’ fuzziness into account to manage the constant interaction and continuous development of communication between marketers and customers on the Web. Using this FANP framework, the marketers are able to increasingly meet the varying requirements of their customers. To improve the understanding of the implementation, advanced visualization methods (e.g., wireframes) are used.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The era of big data opens up new opportunities in personalised medicine, preventive care, chronic disease management and in telemonitoring and managing of patients with implanted devices. The rich data accumulating within online services and internet companies provide a microscope to study human behaviour at scale, and to ask completely new questions about the interplay between behavioural patterns and health. In this paper, we shed light on a particular aspect of data-driven healthcare: autonomous decision-making. We first look at three examples where we can expect data-driven decisions to be taken autonomously by technology, with no or limited human intervention. We then discuss some of the technical and practical challenges that can be expected, and sketch the research agenda to address them.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a novel surrogate model-based global optimization framework allowing a large number of function evaluations. The method, called SpLEGO, is based on a multi-scale expected improvement (EI) framework relying on both sparse and local Gaussian process (GP) models. First, a bi-objective approach relying on a global sparse GP model is used to determine potential next sampling regions. Local GP models are then constructed within each selected region. The method subsequently employs the standard expected improvement criterion to deal with the exploration-exploitation trade-off within selected local models, leading to a decision on where to perform the next function evaluation(s). The potential of our approach is demonstrated using the so-called Sparse Pseudo-input GP as a global model. The algorithm is tested on four benchmark problems, whose number of starting points ranges from 102 to 104. Our results show that SpLEGO is effective and capable of solving problems with large number of starting points, and it even provides significant advantages when compared with state-of-the-art EI algorithms.