670 resultados para pipeline
Resumo:
Analyzing large-scale gene expression data is a labor-intensive and time-consuming process. To make data analysis easier, we developed a set of pipelines for rapid processing and analysis poplar gene expression data for knowledge discovery. Of all pipelines developed, differentially expressed genes (DEGs) pipeline is the one designed to identify biologically important genes that are differentially expressed in one of multiple time points for conditions. Pathway analysis pipeline was designed to identify the differentially expression metabolic pathways. Protein domain enrichment pipeline can identify the enriched protein domains present in the DEGs. Finally, Gene Ontology (GO) enrichment analysis pipeline was developed to identify the enriched GO terms in the DEGs. Our pipeline tools can analyze both microarray gene data and high-throughput gene data. These two types of data are obtained by two different technologies. A microarray technology is to measure gene expression levels via microarray chips, a collection of microscopic DNA spots attached to a solid (glass) surface, whereas high throughput sequencing, also called as the next-generation sequencing, is a new technology to measure gene expression levels by directly sequencing mRNAs, and obtaining each mRNA’s copy numbers in cells or tissues. We also developed a web portal (http://sys.bio.mtu.edu/) to make all pipelines available to public to facilitate users to analyze their gene expression data. In addition to the analyses mentioned above, it can also perform GO hierarchy analysis, i.e. construct GO trees using a list of GO terms as an input.
Resumo:
Dato il recente avvento delle tecnologie NGS, in grado di sequenziare interi genomi umani in tempi e costi ridotti, la capacità di estrarre informazioni dai dati ha un ruolo fondamentale per lo sviluppo della ricerca. Attualmente i problemi computazionali connessi a tali analisi rientrano nel topic dei Big Data, con databases contenenti svariati tipi di dati sperimentali di dimensione sempre più ampia. Questo lavoro di tesi si occupa dell'implementazione e del benchmarking dell'algoritmo QDANet PRO, sviluppato dal gruppo di Biofisica dell'Università di Bologna: il metodo consente l'elaborazione di dati ad alta dimensionalità per l'estrazione di una Signature a bassa dimensionalità di features con un'elevata performance di classificazione, mediante una pipeline d'analisi che comprende algoritmi di dimensionality reduction. Il metodo è generalizzabile anche all'analisi di dati non biologici, ma caratterizzati comunque da un elevato volume e complessità, fattori tipici dei Big Data. L'algoritmo QDANet PRO, valutando la performance di tutte le possibili coppie di features, ne stima il potere discriminante utilizzando un Naive Bayes Quadratic Classifier per poi determinarne il ranking. Una volta selezionata una soglia di performance, viene costruito un network delle features, da cui vengono determinate le componenti connesse. Ogni sottografo viene analizzato separatamente e ridotto mediante metodi basati sulla teoria dei networks fino all'estrapolazione della Signature finale. Il metodo, già precedentemente testato su alcuni datasets disponibili al gruppo di ricerca con riscontri positivi, è stato messo a confronto con i risultati ottenuti su databases omici disponibili in letteratura, i quali costituiscono un riferimento nel settore, e con algoritmi già esistenti che svolgono simili compiti. Per la riduzione dei tempi computazionali l'algoritmo è stato implementato in linguaggio C++ su HPC, con la parallelizzazione mediante librerie OpenMP delle parti più critiche.
Resumo:
Event extraction from texts aims to detect structured information such as what has happened, to whom, where and when. Event extraction and visualization are typically considered as two different tasks. In this paper, we propose a novel approach based on probabilistic modelling to jointly extract and visualize events from tweets where both tasks benefit from each other. We model each event as a joint distribution over named entities, a date, a location and event-related keywords. Moreover, both tweets and event instances are associated with coordinates in the visualization space. The manifold assumption that the intrinsic geometry of tweets is a low-rank, non-linear manifold within the high-dimensional space is incorporated into the learning framework using a regularization. Experimental results show that the proposed approach can effectively deal with both event extraction and visualization and performs remarkably better than both the state-of-the-art event extraction method and a pipeline approach for event extraction and visualization.
Resumo:
Faced with the continued emergence of antibiotic resistance to all known classes of antibiotics, a paradigm shift in approaches toward antifungal therapeutics is required. Well characterized in a broad spectrum of bacterial and fungal pathogens, biofilms are a key factor in limiting the effectiveness of conventional antibiotics. Therefore, therapeutics such as small molecules that prevent or disrupt biofilm formation would render pathogens susceptible to clearance by existing drugs. This is the first report describing the effect of the Pseudomonas aeruginosa alkylhydroxyquinolone interkingdom signal molecules 2-heptyl-3-hydroxy-4-quinolone and 2-heptyl-4-quinolone on biofilm formation in the important fungal pathogen Aspergillus fumigatus. Decoration of the anthranilate ring on the quinolone framework resulted in significant changes in the capacity of these chemical messages to suppress biofilm formation. Addition of methoxy or methyl groups at the C5–C7 positions led to retention of anti-biofilm activity, in some cases dependent on the alkyl chain length at position C2. In contrast, halogenation at either the C3 or C6 positions led to loss of activity, with one notable exception. Microscopic staining provided key insights into the structural impact of the parent and modified molecules, identifying lead compounds for further development.
Resumo:
O objetivo primordial deste trabalho foi estabelecer um roteiro tecnológico para aplicação das tecnologias de “Captação, Utilização e Sequestração de Carbono - CCUS” em Portugal. Para o efeito procedeu-se à identificação da origem das maiores fontes emissoras estacionárias industriais de CO2, adotando como critério o valor mínimo de 1×105 ton CO2/ano e limitado apenas ao território continental. Com base na informação recolhida e referente aos dados oficiais mais recentes (ano de 2013), estimou-se que o volume de emissões industriais de CO2 possível de captar em Portugal, corresponde a cerca de 47 % do valor global das emissões industriais, sendo oriundo de três setores de atividade industrial: produção de cimento, de pasta de papel e centrais termoelétricas a carvão. A maioria das grandes fontes emissoras industriais localiza-se no litoral do país, concentrando-se entre Aveiro e Sines. Pelas condicionantes geográficas do país e, sobretudo pela vantagem de já existir uma rede de gasodutos para o transporte de gás natural, com as respetivas infraestruturas de apoio associadas, admitiu-se que o cenário mais favorável para o transporte do CO2 captado será a criação de um sistema de transporte por gasoduto específico para o CO2. Como critério de compatibilização da proximidade das fontes emissoras de CO2 com potenciais locais para o armazenamento geológico das correntes captadas, adotou-se a distância máxima de 100 km, considerada adequada perante a dimensão do território nacional e as características do tecido industrial nacional. Efetuou-se a revisão das tecnologias de captação de CO2 disponíveis, quer comercialmente, quer em níveis avançados de demonstração e procedeu-se à análise exploratória da adequação desses diferentes métodos de captação a cada um dos setores de atividade industrial previamente identificados com emissões de CO2 suscetíveis de serem captadas. Na perspetiva da melhor integração dos processos, esta análise preliminar tomou em consideração as características das misturas gasosas, assim como o contexto industrial correspondente e o processo produtivo que lhe dá origem. As possibilidades de utilização industrial do CO2 sujeito à captação no país foram tratadas neste trabalho de forma genérica dado que a identificação de oportunidades reais para a utilização de correntes de CO2 captadas exige uma análise de compatibilização das necessidades efetivas de utilização de CO2 por parte de potenciais utilizadores industriais que carece da caracterização prévia das propriedades dessas correntes. Este é um tipo de análise muito específico que pressupõe o interesse mútuo de diferentes intervenientes: agentes emissores de CO2, operadores de transporte e, principalmente, potenciais utilizadores de CO2 como: matéria-prima para a síntese de compostos, solvente de extração supercrítica na indústria alimentar ou farmacêutica, agente corretor de pH em tratamento de efluentes, biofixação por fotossíntese, ou outra das aplicações possíveis identificadas para o CO2 captado. A última etapa deste estudo consistiu na avaliação das possibilidades de armazenamento geológico do CO2 captado e envolveu a identificação, nas bacias sedimentares nacionais, de formações geológicas com características reconhecidas como sendo boas indicações para o armazenamento de CO2 de forma permanente e em segurança. Seguiu-se a metodologia preconizada por organizações internacionais aplicando à situação nacional, critérios de seleção e de segurança que se encontram reconhecidamente definidos. A adequação para o armazenamento de CO2 das formações geológicas pré-selecionadas terá que ser comprovada por estudos adicionais que complementem os dados já existentes sobre as características geológicas destas formações e, mais importante ainda, por testes laboratoriais e ensaios de injeção de CO2 que possam fornecer informação concreta para estimar a capacidade de sequestração e de retenção de CO2 nestas formações e estabelecer os modelos geológicos armazenamento que permitam identificar e estimar, de forma concreta e objetiva, os riscos associados à injeção e armazenamento de CO2.
Resumo:
The migratory endoparasitic nematode Bursaphelenchus xylophilus, which is the causal agent of pine wilt disease, has phytophagous and mycetophagous phases during its life cycle. This highly unusual feature distinguishes it from other plantparasitic nematodes and requires profound changes in biology between modes. During the phytophagous stage, the nematode migrates within pine trees, feeding on the contents of parenchymal cells. Like other plant pathogens, B. xylophilus secretes effectors from pharyngeal gland cells into the host during infection.We provide the first description of changes in the morphology of these gland cells between juvenile and adult life stages. Using a comparative transcriptomics approach and an effector identification pipeline, we identify numerous novel parasitism genes which may be important for the mediation of interactions of B. xylophilus with its host. In-depth characterization of all parasitism genes using in situ hybridization reveals two major categories of detoxification proteins, those specifically expressed in either the pharyngeal gland cells or the digestive system. These data suggest that B. xylophilus incorporates effectors in a multilayer detoxification strategy in order to protect itself from host defence responses during phytophagy.
Resumo:
Este trabajo crea y aplica una metodología acorde con la realidad del país para zonificar la vulnerabilidad estructural y de la población sometida a una amenaza tecnológica, en este caso bajo la influencia de una posible emergencia por derrame de combustibles del poliducto. Una de las finalidades del trabajo es zonificar esta vulnerabilidad como elemento por considerar en la determinación del riesgo, elaboración de planes municipales de contingencia y en las propuestas de ordenamiento territorial.Palabras claves: Poliducto, RECOPE, vulnerabilidad, riesgo, derrame de combustibles, amenaza tecnológica.Abstract: This study develops and applies a methodology in keeping with the reality of the country in order to zone the structural and population vulnerability subject to technological hazard, in this case the influence of a possible emergency due to fuel spill from an oil pipeline. One of the objectives of this study is to zone this vulnerability as an element to consider in determining risk, devising municipal contingency plans and in proposing territorial coding.Keywords: Pipeline, RECOPE, vulnerability, risk, fuel spill, hazard, technological hazard.
Resumo:
Antigen design is generally driven by the need to obtain enhanced stability,efficiency and safety in vaccines.Unfortunately,the antigen modification is rarely proceeded in parallel with analytical tools development characterization.The analytical tools set up is required during steps of vaccine manufacturing pipeline,for vaccine production modifications,improvements or regulatory requirements.Despite the relevance of bioconjugate vaccines,robust and consistent analytical tools to evaluate the extent of carrier glycosylation are missing.Bioconjugation is a glycoengineering technology aimed to produce N-glycoprotein in vivo in E.coli cells,based on the PglB-dependent system by C. jejuni,applied for production of several glycoconjugate vaccines.This applicability is due to glycocompetent E. coli ability to produce site-selective glycosylated protein used,after few purification steps, as vaccines able to elicit both humoral and cell-mediate immune-response.Here, S.aureus Hla bioconjugated with CP5 was used to perform rational analytical-driven design of the glycosylation sites for the glycosylation extent quantification by Mass Spectrometry.The aim of the study was to develop a MS-based approach to quantify the glycosylation extent for in-process monitoring of bioconjugate production and for final product characterization.The three designed consensus sequences differ for a single amino-acid residue and fulfill the prerequisites for engineered bioconjugate more appropriate from an analytical perspective.We aimed to achieve an optimal MS detectability of the peptide carrying the consensus sequences,complying with the well-characterized requirements for N-glycosylation by PglB.Hla carrier isoforms,bearing these consensus sequences allowed a recovery of about 20 ng/μg of periplasmic protein glycosylated at 40%.The SRM-MS here developed was successfully applied to evaluate the differential site occupancy when carrier protein present two glycosites.The glycosylation extent in each glycosite was determined and the difference in the isoforms were influenced either by the overall source of protein produced and by the position of glycosite insertion.The analytical driven design of the bioconjugated antigen and the development of accurate,precise and robust analytical method allowed to finely characterize the vaccine.
Resumo:
Network monitoring is of paramount importance for effective network management: it allows to constantly observe the network’s behavior to ensure it is working as intended and can trigger both automated and manual remediation procedures in case of failures and anomalies. The concept of SDN decouples the control logic from legacy network infrastructure to perform centralized control on multiple switches in the network, and in this context, the responsibility of switches is only to forward packets according to the flow control instructions provided by controller. However, as current SDN switches only expose simple per-port and per-flow counters, the controller has to do almost all the processing to determine the network state, which causes significant communication overhead and excessive latency for monitoring purposes. The absence of programmability in the data plane of SDN prompted the advent of programmable switches, which allow developers to customize the data-plane pipeline and implement novel programs operating directly in the switches. This means that we can offload certain monitoring tasks to programmable data planes, to perform fine-grained monitoring even at very high packet processing speeds. Given the central importance of network monitoring exploiting programmable data planes, the goal of this thesis is to enable a wide range of monitoring tasks in programmable switches, with a specific focus on the ones equipped with programmable ASICs. Indeed, most network monitoring solutions available in literature do not take computational and memory constraints of programmable switches into due account, preventing, de facto, their successful implementation in commodity switches. This claims that network monitoring tasks can be executed in programmable switches. Our evaluations show that the contributions in this thesis could be used by network administrators as well as network security engineers, to better understand the network status depending on different monitoring metrics, and thus prevent network infrastructure and service outages.
Resumo:
Nowadays robotic applications are widespread and most of the manipulation tasks are efficiently solved. However, Deformable-Objects (DOs) still represent a huge limitation for robots. The main difficulty in DOs manipulation is dealing with the shape and dynamics uncertainties, which prevents the use of model-based approaches (since they are excessively computationally complex) and makes sensory data difficult to interpret. This thesis reports the research activities aimed to address some applications in robotic manipulation and sensing of Deformable-Linear-Objects (DLOs), with particular focus to electric wires. In all the works, a significant effort was made in the study of an effective strategy for analyzing sensory signals with various machine learning algorithms. In the former part of the document, the main focus concerns the wire terminals, i.e. detection, grasping, and insertion. First, a pipeline that integrates vision and tactile sensing is developed, then further improvements are proposed for each module. A novel procedure is proposed to gather and label massive amounts of training images for object detection with minimal human intervention. Together with this strategy, we extend a generic object detector based on Convolutional-Neural-Networks for orientation prediction. The insertion task is also extended by developing a closed-loop control capable to guide the insertion of a longer and curved segment of wire through a hole, where the contact forces are estimated by means of a Recurrent-Neural-Network. In the latter part of the thesis, the interest shifts to the DLO shape. Robotic reshaping of a DLO is addressed by means of a sequence of pick-and-place primitives, while a decision making process driven by visual data learns the optimal grasping locations exploiting Deep Q-learning and finds the best releasing point. The success of the solution leverages on a reliable interpretation of the DLO shape. For this reason, further developments are made on the visual segmentation.
Resumo:
In questo elaborato di tesi si affronta lo sviluppo di un framework per l'analisi di URL di phishing estratte da documenti malevoli. Tramite il linguaggio python3 e browsers automatizzati si è sviluppata una pipeline per analizzare queste campagne malevole. La pipeline ha lo scopo di arrivare alla pagina finale, evitando di essere bloccata da tecniche anti-bot di cloaking, per catturare una schermata e salvare la pagina in locale. Durante l'analisi tutto il traffico è salvato per analisi future. Ad ogni URL visitato vengono salvate informazioni quali entry DNS, codice di Autonomous System e lo stato nella blocklist di Google. Un'analisi iniziale delle due campagne più estese è stata effettuata, rivelando il business model dietro ad esse e le tecniche usate per proteggere l'infrastruttura stessa.
Resumo:
Although the debate of what data science is has a long history and has not reached a complete consensus yet, Data Science can be summarized as the process of learning from data. Guided by the above vision, this thesis presents two independent data science projects developed in the scope of multidisciplinary applied research. The first part analyzes fluorescence microscopy images typically produced in life science experiments, where the objective is to count how many marked neuronal cells are present in each image. Aiming to automate the task for supporting research in the area, we propose a neural network architecture tuned specifically for this use case, cell ResUnet (c-ResUnet), and discuss the impact of alternative training strategies in overcoming particular challenges of our data. The approach provides good results in terms of both detection and counting, showing performance comparable to the interpretation of human operators. As a meaningful addition, we release the pre-trained model and the Fluorescent Neuronal Cells dataset collecting pixel-level annotations of where neuronal cells are located. In this way, we hope to help future research in the area and foster innovative methodologies for tackling similar problems. The second part deals with the problem of distributed data management in the context of LHC experiments, with a focus on supporting ATLAS operations concerning data transfer failures. In particular, we analyze error messages produced by failed transfers and propose a Machine Learning pipeline that leverages the word2vec language model and K-means clustering. This provides groups of similar errors that are presented to human operators as suggestions of potential issues to investigate. The approach is demonstrated on one full day of data, showing promising ability in understanding the message content and providing meaningful groupings, in line with previously reported incidents by human operators.
Resumo:
Hereditary optic neuropathies (HON) are a genetic cause of visual impairment characterized by degeneration of retinal ganglion cells. The majority of HON are caused by pathogenic variants in mtDNA genes and in gene OPA1. However, several other genes can cause optic atrophy and can only be identified by high throughput genetic analysis. Whole Exome Sequencing (WES) is becoming the primary choice in rare disease molecular diagnosis, being both cost effective and informative. We performed WES on a cohort of 106 cases, of which 74 isolated ON patients (ON) and 32 syndromic ON patients (sON). The total diagnostic yield amounts to 27%, slightly higher for syndromic ON (31%) than for isolated ON (26%). The majority of genes found are related to mitochondrial function and already reported for harbouring HON pathogenic variants: ACO2, AFG3L2, C19orf12, DNAJC30, FDXR, MECR, MTFMT, NDUFAF2, NDUFB11, NDUFV2, OPA1, PDSS1, SDHA, SSBP1, and WFS1. Among these OPA1, ACO2, and WFS1 were confirmed as the most relevant genetic causes of ON. Moreover, several genes were identified, especially in sON patients, with direct impairment of non-mitochondrial molecular pathways: from autophagy and ubiquitin system (LYST, SNF8, WDR45, UCHL1), to neural cells development and function (KIF1A, GFAP, EPHB2, CACNA1A, CACNA1F), but also vitamin metabolism (SLC52A2, BTD), cilia structure (USH2A), and nuclear pore shuttling (NUTF2). Functional validation on yeast model was performed for pathogenic variants detected in MECR, MTFMT, SDHA, and UCHL1 genes. For SDHA and UCHL1 also muscle biopsy and fibroblast cell lines from patients were analysed, pointing to possible pathogenic mechanisms that will be investigated in further studies. In conclusion, WES proved to be an efficient tool when applied to our ON cohort, for both common disease-genes identification and novel genes discovery. It is therefore recommended to consider WES in ON molecular diagnostic pipeline, as for other rare genetic diseases.
Resumo:
Autism Spectrum Disorder (ASD) is a heterogeneous and highly heritable neurodevelopmental disorder with a complex genetic architecture, consisting of a combination of common low-risk and more penetrant rare variants. This PhD project aimed to explore the contribution of rare variants in ASD susceptibility through NGS approaches in a cohort of 106 ASD families including 125 ASD individuals. Firstly, I explored the contribution of inherited rare variants towards the ASD phenotype in a girl with a maternally inherited pathogenic NRXN1 deletion. Whole exome sequencing of the trio family identified an increased burden of deleterious variants in the proband that could modulate the CNV penetrance and determine the disease development. In the second part of the project, I investigated the role of rare variants emerging from whole genome sequencing in ASD aetiology. To properly manage and analyse sequencing data, a robust and efficient variant filtering and prioritization pipeline was developed, and by its application a stringent set of rare recessive-acting and ultra-rare variants was obtained. As a first follow-up, I performed a preliminary analysis on de novo variants, identifying the most likely deleterious variants and highlighting candidate genes for further analyses. In the third part of the project, considering the well-established involvement of calcium signalling in the molecular bases of ASD, I investigated the role of rare variants in voltage-gated calcium channels genes, that mainly regulate intracellular calcium concentration, and whose alterations have been correlated with enhanced ASD risk. Specifically, I functionally tested the effect of rare damaging variants identified in CACNA1H, showing that CACNA1H variation may be involved in ASD development by additively combining with other high risk variants. This project highlights the challenges in the analysis and interpretation of variants from NGS analysis in ASD, and underlines the importance of a comprehensive assessment of the genomic landscape of ASD individuals.
Resumo:
High Energy efficiency and high performance are the key regiments for Internet of Things (IoT) end-nodes. Exploiting cluster of multiple programmable processors has recently emerged as a suitable solution to address this challenge. However, one of the main bottlenecks for multi-core architectures is the instruction cache. While private caches fall into data replication and wasting area, fully shared caches lack scalability and form a bottleneck for the operating frequency. Hence we propose a hybrid solution where a larger shared cache (L1.5) is shared by multiple cores connected through a low-latency interconnect to small private caches (L1). However, it is still limited by large capacity miss with a small L1. Thus, we propose a sequential prefetch from L1 to L1.5 to improve the performance with little area overhead. Moreover, to cut the critical path for better timing, we optimized the core instruction fetch stage with non-blocking transfer by adopting a 4 x 32-bit ring buffer FIFO and adding a pipeline for the conditional branch. We present a detailed comparison of different instruction cache architectures' performance and energy efficiency recently proposed for Parallel Ultra-Low-Power clusters. On average, when executing a set of real-life IoT applications, our two-level cache improves the performance by up to 20% and loses 7% energy efficiency with respect to the private cache. Compared to a shared cache system, it improves performance by up to 17% and keeps the same energy efficiency. In the end, up to 20% timing (maximum frequency) improvement and software control enable the two-level instruction cache with prefetch adapt to various battery-powered usage cases to balance high performance and energy efficiency.