931 resultados para Topological data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A time series is a sequence of observations made over time. Examples in public health include daily ozone concentrations, weekly admissions to an emergency department or annual expenditures on health care in the United States. Time series models are used to describe the dependence of the response at each time on predictor variables including covariates and possibly previous values in the series. Time series methods are necessary to account for the correlation among repeated responses over time. This paper gives an overview of time series ideas and methods used in public health research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nitrogen and water are essential for plant growth and development. In this study, we designed experiments to produce gene expression data of poplar roots under nitrogen starvation and water deprivation conditions. We found low concentration of nitrogen led first to increased root elongation followed by lateral root proliferation and eventually increased root biomass. To identify genes regulating root growth and development under nitrogen starvation and water deprivation, we designed a series of data analysis procedures, through which, we have successfully identified biologically important genes. Differentially Expressed Genes (DEGs) analysis identified the genes that are differentially expressed under nitrogen starvation or drought. Protein domain enrichment analysis identified enriched themes (in same domains) that are highly interactive during the treatment. Gene Ontology (GO) enrichment analysis allowed us to identify biological process changed during nitrogen starvation. Based on the above analyses, we examined the local Gene Regulatory Network (GRN) and identified a number of transcription factors. After testing, one of them is a high hierarchically ranked transcription factor that affects root growth under nitrogen starvation. It is very tedious and time-consuming to analyze gene expression data. To avoid doing analysis manually, we attempt to automate a computational pipeline that now can be used for identification of DEGs and protein domain analysis in a single run. It is implemented in scripts of Perl and R.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quantitative real-time polymerase chain reaction (qPCR) is a sensitive gene quantitation method that has been widely used in the biological and biomedical fields. The currently used methods for PCR data analysis, including the threshold cycle (CT) method, linear and non-linear model fitting methods, all require subtracting background fluorescence. However, the removal of background fluorescence is usually inaccurate, and therefore can distort results. Here, we propose a new method, the taking-difference linear regression method, to overcome this limitation. Briefly, for each two consecutive PCR cycles, we subtracted the fluorescence in the former cycle from that in the later cycle, transforming the n cycle raw data into n-1 cycle data. Then linear regression was applied to the natural logarithm of the transformed data. Finally, amplification efficiencies and the initial DNA molecular numbers were calculated for each PCR run. To evaluate this new method, we compared it in terms of accuracy and precision with the original linear regression method with three background corrections, being the mean of cycles 1-3, the mean of cycles 3-7, and the minimum. Three criteria, including threshold identification, max R2, and max slope, were employed to search for target data points. Considering that PCR data are time series data, we also applied linear mixed models. Collectively, when the threshold identification criterion was applied and when the linear mixed model was adopted, the taking-difference linear regression method was superior as it gave an accurate estimation of initial DNA amount and a reasonable estimation of PCR amplification efficiencies. When the criteria of max R2 and max slope were used, the original linear regression method gave an accurate estimation of initial DNA amount. Overall, the taking-difference linear regression method avoids the error in subtracting an unknown background and thus it is theoretically more accurate and reliable. This method is easy to perform and the taking-difference strategy can be extended to all current methods for qPCR data analysis.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Texas State Department of Highways and Public Transportation, Transportation Planning Division, Austin

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article is aimed primarily at eye care practitioners who are undertaking advanced clinical research, and who wish to apply analysis of variance (ANOVA) to their data. ANOVA is a data analysis method of great utility and flexibility. This article describes why and how ANOVA was developed, the basic logic which underlies the method and the assumptions that the method makes for it to be validly applied to data from clinical experiments in optometry. The application of the method to the analysis of a simple data set is then described. In addition, the methods available for making planned comparisons between treatment means and for making post hoc tests are evaluated. The problem of determining the number of replicates or patients required in a given experimental situation is also discussed. Copyright (C) 2000 The College of Optometrists.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of quantitative methods has become increasingly important in the study of neuropathology and especially in neurodegenerative disease. Disorders such as Alzheimer's disease (AD) and the frontotemporal dementias (FTD) are characterized by the formation of discrete, microscopic, pathological lesions which play an important role in pathological diagnosis. This chapter reviews the advantages and limitations of the different methods of quantifying pathological lesions in histological sections including estimates of density, frequency, coverage, and the use of semi-quantitative scores. The sampling strategies by which these quantitative measures can be obtained from histological sections, including plot or quadrat sampling, transect sampling, and point-quarter sampling, are described. In addition, data analysis methods commonly used to analysis quantitative data in neuropathology, including analysis of variance (ANOVA), polynomial curve fitting, multiple regression, classification trees, and principal components analysis (PCA), are discussed. These methods are illustrated with reference to quantitative studies of a variety of neurodegenerative disorders.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the results of our data mining study of Pb-Zn (lead-zinc) ore assay records from a mine enterprise in Bulgaria. We examined the dataset, cleaned outliers, visualized the data, and created dataset statistics. A Pb-Zn cluster data mining model was created for segmentation and prediction of Pb-Zn ore assay data. The Pb-Zn cluster data model consists of five clusters and DMX queries. We analyzed the Pb-Zn cluster content, size, structure, and characteristics. The set of the DMX queries allows for browsing and managing the clusters, as well as predicting ore assay records. A testing and validation of the Pb-Zn cluster data mining model was developed in order to show its reasonable accuracy before beingused in a production environment. The Pb-Zn cluster data mining model can be used for changes of the mine grinding and floatation processing parameters in almost real-time, which is important for the efficiency of the Pb-Zn ore beneficiation process. ACM Computing Classification System (1998): H.2.8, H.3.3.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Diabetes patients might suffer from an unhealthy life, long-term treatment and chronic complicated diseases. The decreasing hospitalization rate is a crucial problem for health care centers. This study combines the bagging method with base classifier decision tree and costs-sensitive analysis for diabetes patients' classification purpose. Real patients' data collected from a regional hospital in Thailand were analyzed. The relevance factors were selected and used to construct base classifier decision tree models to classify diabetes and non-diabetes patients. The bagging method was then applied to improve accuracy. Finally, asymmetric classification cost matrices were used to give more alternative models for diabetes data analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Systematic, high-quality observations of the atmosphere, oceans and terrestrial environments are required to improve understanding of climate characteristics and the consequences of climate change. The overall aim of this report is to carry out a comparative assessment of approaches taken to addressing the state of European observations systems and related data analysis by some leading actors in the field. This research reports on approaches to climate observations and analyses in Ireland, Switzerland, Germany, The Netherlands and Austria and explores options for a more coordinated approach to national responses to climate observations in Europe. The key aspects addressed are: an assessment of approaches to develop GCOS and provision of analysis of GCOS data; an evaluation of how these countries are reporting development of GCOS; highlighting best practice in advancing GCOS implementation including analysis of Essential Climate Variables (ECVs); a comparative summary of the differences and synergies in terms of the reporting of climate observations; an overview of relevant European initiatives and recommendations on how identified gaps might be addressed in the short to medium term.