926 resultados para visual data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

A time series is a sequence of observations made over time. Examples in public health include daily ozone concentrations, weekly admissions to an emergency department or annual expenditures on health care in the United States. Time series models are used to describe the dependence of the response at each time on predictor variables including covariates and possibly previous values in the series. Time series methods are necessary to account for the correlation among repeated responses over time. This paper gives an overview of time series ideas and methods used in public health research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nitrogen and water are essential for plant growth and development. In this study, we designed experiments to produce gene expression data of poplar roots under nitrogen starvation and water deprivation conditions. We found low concentration of nitrogen led first to increased root elongation followed by lateral root proliferation and eventually increased root biomass. To identify genes regulating root growth and development under nitrogen starvation and water deprivation, we designed a series of data analysis procedures, through which, we have successfully identified biologically important genes. Differentially Expressed Genes (DEGs) analysis identified the genes that are differentially expressed under nitrogen starvation or drought. Protein domain enrichment analysis identified enriched themes (in same domains) that are highly interactive during the treatment. Gene Ontology (GO) enrichment analysis allowed us to identify biological process changed during nitrogen starvation. Based on the above analyses, we examined the local Gene Regulatory Network (GRN) and identified a number of transcription factors. After testing, one of them is a high hierarchically ranked transcription factor that affects root growth under nitrogen starvation. It is very tedious and time-consuming to analyze gene expression data. To avoid doing analysis manually, we attempt to automate a computational pipeline that now can be used for identification of DEGs and protein domain analysis in a single run. It is implemented in scripts of Perl and R.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client’s site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quantitative real-time polymerase chain reaction (qPCR) is a sensitive gene quantitation method that has been widely used in the biological and biomedical fields. The currently used methods for PCR data analysis, including the threshold cycle (CT) method, linear and non-linear model fitting methods, all require subtracting background fluorescence. However, the removal of background fluorescence is usually inaccurate, and therefore can distort results. Here, we propose a new method, the taking-difference linear regression method, to overcome this limitation. Briefly, for each two consecutive PCR cycles, we subtracted the fluorescence in the former cycle from that in the later cycle, transforming the n cycle raw data into n-1 cycle data. Then linear regression was applied to the natural logarithm of the transformed data. Finally, amplification efficiencies and the initial DNA molecular numbers were calculated for each PCR run. To evaluate this new method, we compared it in terms of accuracy and precision with the original linear regression method with three background corrections, being the mean of cycles 1-3, the mean of cycles 3-7, and the minimum. Three criteria, including threshold identification, max R2, and max slope, were employed to search for target data points. Considering that PCR data are time series data, we also applied linear mixed models. Collectively, when the threshold identification criterion was applied and when the linear mixed model was adopted, the taking-difference linear regression method was superior as it gave an accurate estimation of initial DNA amount and a reasonable estimation of PCR amplification efficiencies. When the criteria of max R2 and max slope were used, the original linear regression method gave an accurate estimation of initial DNA amount. Overall, the taking-difference linear regression method avoids the error in subtracting an unknown background and thus it is theoretically more accurate and reliable. This method is easy to perform and the taking-difference strategy can be extended to all current methods for qPCR data analysis.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tradicionalmente, el uso de técnicas de análisis de datos ha sido una de las principales vías para el descubrimiento de conocimiento oculto en grandes cantidades de datos, recopilados por expertos en diferentes dominios. Por otra parte, las técnicas de visualización también se han usado para mejorar y facilitar este proceso. Sin embargo, existen limitaciones serias en la obtención de conocimiento, ya que suele ser un proceso lento, tedioso y en muchas ocasiones infructífero, debido a la dificultad de las personas para comprender conjuntos de datos de grandes dimensiones. Otro gran inconveniente, pocas veces tenido en cuenta por los expertos que analizan grandes conjuntos de datos, es la degradación involuntaria a la que someten a los datos durante las tareas de análisis, previas a la obtención final de conclusiones. Por degradación quiere decirse que los datos pueden perder sus propiedades originales, y suele producirse por una reducción inapropiada de los datos, alterando así su naturaleza original y llevando en muchos casos a interpretaciones y conclusiones erróneas que podrían tener serias implicaciones. Además, este hecho adquiere una importancia trascendental cuando los datos pertenecen al dominio médico o biológico, y la vida de diferentes personas depende de esta toma final de decisiones, en algunas ocasiones llevada a cabo de forma inapropiada. Ésta es la motivación de la presente tesis, la cual propone un nuevo framework visual, llamado MedVir, que combina la potencia de técnicas avanzadas de visualización y minería de datos para tratar de dar solución a estos grandes inconvenientes existentes en el proceso de descubrimiento de información válida. El objetivo principal es hacer más fácil, comprensible, intuitivo y rápido el proceso de adquisición de conocimiento al que se enfrentan los expertos cuando trabajan con grandes conjuntos de datos en diferentes dominios. Para ello, en primer lugar, se lleva a cabo una fuerte disminución en el tamaño de los datos con el objetivo de facilitar al experto su manejo, y a la vez preservando intactas, en la medida de lo posible, sus propiedades originales. Después, se hace uso de efectivas técnicas de visualización para representar los datos obtenidos, permitiendo al experto interactuar de forma sencilla e intuitiva con los datos, llevar a cabo diferentes tareas de análisis de datos y así estimular visualmente su capacidad de comprensión. De este modo, el objetivo subyacente se basa en abstraer al experto, en la medida de lo posible, de la complejidad de sus datos originales para presentarle una versión más comprensible, que facilite y acelere la tarea final de descubrimiento de conocimiento. MedVir se ha aplicado satisfactoriamente, entre otros, al campo de la magnetoencefalografía (MEG), que consiste en la predicción en la rehabilitación de lesiones cerebrales traumáticas (Traumatic Brain Injury (TBI) rehabilitation prediction). Los resultados obtenidos demuestran la efectividad del framework a la hora de acelerar y facilitar el proceso de descubrimiento de conocimiento sobre conjuntos de datos reales. ABSTRACT Traditionally, the use of data analysis techniques has been one of the main ways of discovering knowledge hidden in large amounts of data, collected by experts in different domains. Moreover, visualization techniques have also been used to enhance and facilitate this process. However, there are serious limitations in the process of knowledge acquisition, as it is often a slow, tedious and many times fruitless process, due to the difficulty for human beings to understand large datasets. Another major drawback, rarely considered by experts that analyze large datasets, is the involuntary degradation to which they subject the data during analysis tasks, prior to obtaining the final conclusions. Degradation means that data can lose part of their original properties, and it is usually caused by improper data reduction, thereby altering their original nature and often leading to erroneous interpretations and conclusions that could have serious implications. Furthermore, this fact gains a trascendental importance when the data belong to medical or biological domain, and the lives of people depends on the final decision-making, which is sometimes conducted improperly. This is the motivation of this thesis, which proposes a new visual framework, called MedVir, which combines the power of advanced visualization techniques and data mining to try to solve these major problems existing in the process of discovery of valid information. Thus, the main objective is to facilitate and to make more understandable, intuitive and fast the process of knowledge acquisition that experts face when working with large datasets in different domains. To achieve this, first, a strong reduction in the size of the data is carried out in order to make the management of the data easier to the expert, while preserving intact, as far as possible, the original properties of the data. Then, effective visualization techniques are used to represent the obtained data, allowing the expert to interact easily and intuitively with the data, to carry out different data analysis tasks, and so visually stimulating their comprehension capacity. Therefore, the underlying objective is based on abstracting the expert, as far as possible, from the complexity of the original data to present him a more understandable version, thus facilitating and accelerating the task of knowledge discovery. MedVir has been succesfully applied to, among others, the field of magnetoencephalography (MEG), which consists in predicting the rehabilitation of Traumatic Brain Injury (TBI). The results obtained successfully demonstrate the effectiveness of the framework to accelerate and facilitate the process of knowledge discovery on real world datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La gran cantidad de datos que se registran diariamente en los sistemas de base de datos de las organizaciones ha generado la necesidad de analizarla. Sin embargo, se enfrentan a la complejidad de procesar enormes volúmenes de datos a través de métodos tradicionales de análisis. Además, dentro de un contexto globalizado y competitivo las organizaciones se mantienen en la búsqueda constante de mejorar sus procesos, para lo cual requieren herramientas que les permitan tomar mejores decisiones. Esto implica estar mejor informado y conocer su historia digital para describir sus procesos y poder anticipar (predecir) eventos no previstos. Estos nuevos requerimientos de análisis de datos ha motivado el desarrollo creciente de proyectos de minería de datos. El proceso de minería de datos busca obtener desde un conjunto masivo de datos, modelos que permitan describir los datos o predecir nuevas instancias en el conjunto. Implica etapas de: preparación de los datos, procesamiento parcial o totalmente automatizado para identificar modelos en los datos, para luego obtener como salida patrones, relaciones o reglas. Esta salida debe significar un nuevo conocimiento para la organización, útil y comprensible para los usuarios finales, y que pueda ser integrado a los procesos para apoyar la toma de decisiones. Sin embargo, la mayor dificultad es justamente lograr que el analista de datos, que interviene en todo este proceso, pueda identificar modelos lo cual es una tarea compleja y muchas veces requiere de la experiencia, no sólo del analista de datos, sino que también del experto en el dominio del problema. Una forma de apoyar el análisis de datos, modelos y patrones es a través de su representación visual, utilizando las capacidades de percepción visual del ser humano, la cual puede detectar patrones con mayor facilidad. Bajo este enfoque, la visualización ha sido utilizada en minería datos, mayormente en el análisis descriptivo de los datos (entrada) y en la presentación de los patrones (salida), dejando limitado este paradigma para el análisis de modelos. El presente documento describe el desarrollo de la Tesis Doctoral denominada “Nuevos Esquemas de Visualizaciones para Mejorar la Comprensibilidad de Modelos de Data Mining”. Esta investigación busca aportar con un enfoque de visualización para apoyar la comprensión de modelos minería de datos, para esto propone la metáfora de modelos visualmente aumentados. ABSTRACT The large amount of data to be recorded daily in the systems database of organizations has generated the need to analyze it. However, faced with the complexity of processing huge volumes of data over traditional methods of analysis. Moreover, in a globalized and competitive environment organizations are kept constantly looking to improve their processes, which require tools that allow them to make better decisions. This involves being bettered informed and knows your digital story to describe its processes and to anticipate (predict) unanticipated events. These new requirements of data analysis, has led to the increasing development of data-mining projects. The data-mining process seeks to obtain from a massive data set, models to describe the data or predict new instances in the set. It involves steps of data preparation, partially or fully automated processing to identify patterns in the data, and then get output patterns, relationships or rules. This output must mean new knowledge for the organization, useful and understandable for end users, and can be integrated into the process to support decision-making. However, the biggest challenge is just getting the data analyst involved in this process, which can identify models is complex and often requires experience not only of the data analyst, but also the expert in the problem domain. One way to support the analysis of the data, models and patterns, is through its visual representation, i.e., using the capabilities of human visual perception, which can detect patterns easily in any context. Under this approach, the visualization has been used in data mining, mostly in exploratory data analysis (input) and the presentation of the patterns (output), leaving limited this paradigm for analyzing models. This document describes the development of the doctoral thesis entitled "New Visualizations Schemes to Improve Understandability of Data-Mining Models". This research aims to provide a visualization approach to support understanding of data mining models for this proposed metaphor visually enhanced models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Texas State Department of Highways and Public Transportation, Transportation Planning Division, Austin

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today, the data available to tackle many scientific challenges is vast in quantity and diverse in nature. The exploration of heterogeneous information spaces requires suitable mining algorithms as well as effective visual interfaces. miniDVMS v1.8 provides a flexible visual data mining framework which combines advanced projection algorithms developed in the machine learning domain and visual techniques developed in the information visualisation domain. The advantage of this interface is that the user is directly involved in the data mining process. Principled projection methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), are integrated with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates, and user interaction facilities, to provide this integrated visual data mining framework. The software also supports conventional visualisation techniques such as principal component analysis (PCA), Neuroscale, and PhiVis. This user manual gives an overview of the purpose of the software tool, highlights some of the issues to be taken care while creating a new model, and provides information about how to install and use the tool. The user manual does not require the readers to have familiarity with the algorithms it implements. Basic computing skills are enough to operate the software.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article is aimed primarily at eye care practitioners who are undertaking advanced clinical research, and who wish to apply analysis of variance (ANOVA) to their data. ANOVA is a data analysis method of great utility and flexibility. This article describes why and how ANOVA was developed, the basic logic which underlies the method and the assumptions that the method makes for it to be validly applied to data from clinical experiments in optometry. The application of the method to the analysis of a simple data set is then described. In addition, the methods available for making planned comparisons between treatment means and for making post hoc tests are evaluated. The problem of determining the number of replicates or patients required in a given experimental situation is also discussed. Copyright (C) 2000 The College of Optometrists.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visual mental imagery is a complex process that may be influenced by the content of mental images. Neuropsychological evidence from patients with hemineglect suggests that in the imagery domain environments and objects may be represented separately and may be selectively affected by brain lesions. In the present study, we used functional magnetic resonance imaging (fMRI) to assess the possibility of neural segregation among mental images depicting parts of an object, of an environment (imagined from a first-person perspective), and of a geographical map, using both a mass univariate and a multivariate approach. Data show that different brain areas are involved in different types of mental images. Imagining an environment relies mainly on regions known to be involved in navigational skills, such as the retrosplenial complex and parahippocampal gyrus, whereas imagining a geographical map mainly requires activation of the left angular gyrus, known to be involved in the representation of categorical relations. Imagining a familiar object mainly requires activation of parietal areas involved in visual space analysis in both the imagery and the perceptual domain. We also found that the pattern of activity in most of these areas specifically codes for the spatial arrangement of the parts of the mental image. Our results clearly demonstrate a functional neural segregation for different contents of mental images and suggest that visuospatial information is coded by different patterns of activity in brain areas involved in visual mental imagery. Hum Brain Mapp 36:945-958, 2015.