868 resultados para data analysis software
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.
Resumo:
Photodynamic therapy (PDT) is a treatment modality that has advanced rapidly in recent years. It causes tissue and vascular damage with the interaction of a photosensitizing agent (PS), light of a proper wavelength, and molecular oxygen. Evaluation of vessel damage usually relies on histopathology evaluation. Results are often qualitative or at best semi-quantitative based on a subjective system. The aim of this study was to evaluate, using CD31 immunohistochem- istry and image analysis software, the vascular damage after PDT in a well-established rodent model of chemically induced mammary tumor. Fourteen Sprague-Dawley rats received a single dose of 7,12-dimethylbenz(a)anthraxcene (80 mg/kg by gavage), treatment efficacy was evaluated by comparing the vascular density of tumors after treatment with Photogem® as a PS, intraperitoneally, followed by interstitial fiber optic lighting, from a diode laser, at 200 mW/cm and light dose of 100 J/cm directed against his tumor (7 animals), with a control group (6 animals, no PDT). The animals were euthanized 30 hours after the lighting and mammary tumors were removed and samples from each lesion were formalin-fixed. Immunostained blood vessels were quantified by Image Pro-Plus version 7.0. The control group had an average of 3368.6 ± 4027.1 pixels per picture and the treated group had an average of 779 ± 1242.6 pixels per area (P < 0.01), indicating that PDT caused a significant decrease in vascular density of mammary tumors. The combined immu- nohistochemistry using CD31, with selection of representative areas by a trained pathology, followed by quantification of staining using Image Pro-Plus version 7.0 system was a practical and robust methodology for vessel damage evalua- tion, which probably could be used to assess other antiangiogenic treatments.
Resumo:
This thesis is developed in the contest of Ritmare project WP1, which main objective is the development of a sustainable fishery through the identification of populations boundaries in commercially important species in Italian Seas. Three main objectives are discussed in order to help reach the main purpose of identification of stock boundaries in Parapenaeus longirostris: 1 -Development of a representative sampling design for Italian seas; 2 -Evaluation of 2b-RAD protocol; 3 -Investigation of populations through biological data analysis. First of all we defined and accomplished a sampling design which properly represents all Italian seas. Then we used information and data about nursery areas distribution, abundance of populations and importance of P. longirostris in local fishery, to develop an experimental design that prioritize the most important areas to maximize the results with actual project funds. We introduced for the first time the use of 2b-RAD on this species, a genotyping method based on sequencing the uniform fragments produced by type IIB restriction endonucleases. Thanks to this method we were able to move from genetics to the more complex genomics. In order to proceed with 2b-RAD we performed several tests to identify the best DNA extraction kit and protocol and finally we were able to extract 192 high quality DNA extracts ready to be processed. We tested 2b-RAD with five samples and after high-throughput sequencing of libraries we used the software “Stacks” to analyze the sequences. We obtained positive results identifying a great number of SNP markers among the five samples. To guarantee a multidisciplinary approach we used the biological data associated to the collected samples to investigate differences between geographical samples. Such approach assures continuity with other project, for instance STOCKMED, which utilize a combination of molecular and biological analysis as well.
Resumo:
One of the most undervalued problems by smartphone users is the security of data on their mobile devices. Today smartphones and tablets are used to send messages and photos and especially to stay connected with social networks, forums and other platforms. These devices contain a lot of private information like passwords, phone numbers, private photos, emails, etc. and an attacker may choose to steal or destroy this information. The main topic of this thesis is the security of the applications present on the most popular stores (App Store for iOS and Play Store for Android) and of their mechanisms for the management of security. The analysis is focused on how the architecture of the two systems protects users from threats and highlights the real presence of malware and spyware in their respective application stores. The work described in subsequent chapters explains the study of the behavior of 50 Android applications and 50 iOS applications performed using network analysis software. Furthermore, this thesis presents some statistics about malware and spyware present on the respective stores and the permissions they require. At the end the reader will be able to understand how to recognize malicious applications and which of the two systems is more suitable for him. This is how this thesis is structured. The first chapter introduces the security mechanisms of the Android and iOS platform architectures and the security mechanisms of their respective application stores. The Second chapter explains the work done, what, why and how we have chosen the tools needed to complete our analysis. The third chapter discusses about the execution of tests, the protocol followed and the approach to assess the “level of danger” of each application that has been checked. The fourth chapter explains the results of the tests and introduces some statistics on the presence of malicious applications on Play Store and App Store. The fifth chapter is devoted to the study of the users, what they think about and how they might avoid malicious applications. The sixth chapter seeks to establish, following our methodology, what application store is safer. In the end, the seventh chapter concludes the thesis.
Controllo generalizzato via software di dispositivi per l'interconnessione flessibile di data center
Resumo:
La tesi riguarda le gestione via software di dispositivi che interconnettono componenti hardware di forwarding in una rete.
Resumo:
Studio ed analisi delle principali tecniche in ambito di Social Data Analysis. Progettazione e Realizzazione di una soluzione software implementata con linguaggio Java in ambiente Eclipse. Il software realizzato permette di integrare differenti servizi di API REST, per l'estrazione di dati sociali da Twitter, la loro memorizzazione in un database non-relazionale (realizzato con MongoDB), e la loro gestione. Inoltre permette di effettuare operazioni di classificazione di topic, e di analizzare dati complessivi sulle collection di dati estratti. Infine permette di visualizzare un albero delle "ricondivisioni", partendo da singoli tweet selezionati, ed una mappa geo-localizzata, contenente gli utenti coinvolti nella catena di ricondivisioni, e i relativi archi di "retweet".
Resumo:
A time series is a sequence of observations made over time. Examples in public health include daily ozone concentrations, weekly admissions to an emergency department or annual expenditures on health care in the United States. Time series models are used to describe the dependence of the response at each time on predictor variables including covariates and possibly previous values in the series. Time series methods are necessary to account for the correlation among repeated responses over time. This paper gives an overview of time series ideas and methods used in public health research.
Resumo:
Nitrogen and water are essential for plant growth and development. In this study, we designed experiments to produce gene expression data of poplar roots under nitrogen starvation and water deprivation conditions. We found low concentration of nitrogen led first to increased root elongation followed by lateral root proliferation and eventually increased root biomass. To identify genes regulating root growth and development under nitrogen starvation and water deprivation, we designed a series of data analysis procedures, through which, we have successfully identified biologically important genes. Differentially Expressed Genes (DEGs) analysis identified the genes that are differentially expressed under nitrogen starvation or drought. Protein domain enrichment analysis identified enriched themes (in same domains) that are highly interactive during the treatment. Gene Ontology (GO) enrichment analysis allowed us to identify biological process changed during nitrogen starvation. Based on the above analyses, we examined the local Gene Regulatory Network (GRN) and identified a number of transcription factors. After testing, one of them is a high hierarchically ranked transcription factor that affects root growth under nitrogen starvation. It is very tedious and time-consuming to analyze gene expression data. To avoid doing analysis manually, we attempt to automate a computational pipeline that now can be used for identification of DEGs and protein domain analysis in a single run. It is implemented in scripts of Perl and R.
Resumo:
Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.
New methods for quantification and analysis of quantitative real-time polymerase chain reaction data
Resumo:
Quantitative real-time polymerase chain reaction (qPCR) is a sensitive gene quantitation method that has been widely used in the biological and biomedical fields. The currently used methods for PCR data analysis, including the threshold cycle (CT) method, linear and non-linear model fitting methods, all require subtracting background fluorescence. However, the removal of background fluorescence is usually inaccurate, and therefore can distort results. Here, we propose a new method, the taking-difference linear regression method, to overcome this limitation. Briefly, for each two consecutive PCR cycles, we subtracted the fluorescence in the former cycle from that in the later cycle, transforming the n cycle raw data into n-1 cycle data. Then linear regression was applied to the natural logarithm of the transformed data. Finally, amplification efficiencies and the initial DNA molecular numbers were calculated for each PCR run. To evaluate this new method, we compared it in terms of accuracy and precision with the original linear regression method with three background corrections, being the mean of cycles 1-3, the mean of cycles 3-7, and the minimum. Three criteria, including threshold identification, max R2, and max slope, were employed to search for target data points. Considering that PCR data are time series data, we also applied linear mixed models. Collectively, when the threshold identification criterion was applied and when the linear mixed model was adopted, the taking-difference linear regression method was superior as it gave an accurate estimation of initial DNA amount and a reasonable estimation of PCR amplification efficiencies. When the criteria of max R2 and max slope were used, the original linear regression method gave an accurate estimation of initial DNA amount. Overall, the taking-difference linear regression method avoids the error in subtracting an unknown background and thus it is theoretically more accurate and reliable. This method is easy to perform and the taking-difference strategy can be extended to all current methods for qPCR data analysis.^
Resumo:
This research is concerned with the experimental software engineering area, specifically experiment replication. Replication has traditionally been viewed as a complex task in software engineering. This is possibly due to the present immaturity of the experimental paradigm applied to software development. Researchers usually use replication packages to replicate an experiment. However, replication packages are not the solution to all the information management problems that crop up when successive replications of an experiment accumulate. This research borrows ideas from the software configuration management and software product line paradigms to support the replication process. We believe that configuration management can help to manage and administer information from one replication to another: hypotheses, designs, data analysis, etc. The software product line paradigm can help to organize and manage any changes introduced into the experiment by each replication. We expect the union of the two paradigms in replication to improve the planning, design and execution of further replications and their alignment with existing replications. Additionally, this research work will contribute a web support environment for archiving information related to different experiment replications. Additionally, it will provide flexible enough information management support for running replications with different numbers and types of changes. Finally, it will afford massive storage of data from different replications. Experimenters working collaboratively on the same experiment must all have access to the different experiments.
Resumo:
A new version of the TomoRebuild data reduction software package is presented, for the reconstruction of scanning transmission ion microscopy tomography (STIMT) and particle induced X-ray emission tomography (PIXET) images. First, we present a state of the art of the reconstruction codes available for ion beam microtomography. The algorithm proposed here brings several advantages. It is a portable, multi-platform code, designed in C++ with well-separated classes for easier use and evolution. Data reduction is separated in different steps and the intermediate results may be checked if necessary. Although no additional graphic library or numerical tool is required to run the program as a command line, a user friendly interface was designed in Java, as an ImageJ plugin. All experimental and reconstruction parameters may be entered either through this plugin or directly in text format files. A simple standard format is proposed for the input of experimental data. Optional graphic applications using the ROOT interface may be used separately to display and fit energy spectra. Regarding the reconstruction process, the filtered backprojection (FBP) algorithm, already present in the previous version of the code, was optimized so that it is about 10 times as fast. In addition, Maximum Likelihood Expectation Maximization (MLEM) and its accelerated version Ordered Subsets Expectation Maximization (OSEM) algorithms were implemented. A detailed user guide in English is available. A reconstruction example of experimental data from a biological sample is given. It shows the capability of the code to reduce noise in the sinograms and to deal with incomplete data, which puts a new perspective on tomography using low number of projections or limited angle.
Resumo:
High-Performance Computing, Cloud computing and next-generation applications such e-Health or Smart Cities have dramatically increased the computational demand of Data Centers. The huge energy consumption, increasing levels of CO2 and the economic costs of these facilities represent a challenge for industry and researchers alike. Recent research trends propose the usage of holistic optimization techniques to jointly minimize Data Center computational and cooling costs from a multilevel perspective. This paper presents an analysis on the parameters needed to integrate the Data Center in a holistic optimization framework and leverages the usage of Cyber-Physical systems to gather workload, server and environmental data via software techniques and by deploying a non-intrusive Wireless Sensor Net- work (WSN). This solution tackles data sampling, retrieval and storage from a reconfigurable perspective, reducing the amount of data generated for optimization by a 68% without information loss, doubling the lifetime of the WSN nodes and allowing runtime energy minimization techniques in a real scenario.