861 resultados para Curricular Support Data Analysis
Resumo:
Dimensionality reduction is employed for visual data analysis as a way to obtaining reduced spaces for high dimensional data or to mapping data directly into 2D or 3D spaces. Although techniques have evolved to improve data segregation on reduced or visual spaces, they have limited capabilities for adjusting the results according to user's knowledge. In this paper, we propose a novel approach to handling both dimensionality reduction and visualization of high dimensional data, taking into account user's input. It employs Partial Least Squares (PLS), a statistical tool to perform retrieval of latent spaces focusing on the discriminability of the data. The method employs a training set for building a highly precise model that can then be applied to a much larger data set very effectively. The reduced data set can be exhibited using various existing visualization techniques. The training data is important to code user's knowledge into the loop. However, this work also devises a strategy for calculating PLS reduced spaces when no training data is available. The approach produces increasingly precise visual mappings as the user feeds back his or her knowledge and is capable of working with small and unbalanced training sets.
Resumo:
In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.
Resumo:
Soil erosion on sloping agricultural land poses a serious problem for the environment, as well as for production. In areas with highly erodible soils, such as those in loess zones, application of soil and water conservation measures is crucial to sustain agricultural yields and to prevent or reduce land degradation. The present study, carried out in Faizabad, Tajikistan, was designed to evaluate the potential of local conservation measures on cropland using a spatial modelling approach to provide decision-making support for the planning of spatially explicit sustainable land use. A sampling design to support comparative analysis between well-conserved units and other field units was established in order to estimate factors that determine water erosion, according to the Revised Universal Soil Loss Equation (RUSLE). Such factor-based approaches allow ready application using a geographic information system (GIS) and facilitate straightforward scenario modelling in areas with limited data resources. The study showed first that assessment of erosion and conservation in an area with inhomogeneous vegetation cover requires the integration of plot-based cover. Plot-based vegetation cover can be effectively derived from high-resolution satellite imagery, providing a useful basis for plot-wise conservation planning. Furthermore, thorough field assessments showed that 25.7% of current total cropland is covered by conservation measures (terracing, agroforestry and perennial herbaceous fodder). Assessment of the effectiveness of these local measures, combined with the RUSLE calculations, revealed that current average soil loss could be reduced through low-cost measures such as contouring (by 11%), fodder plants (by 16%), and drainage ditches (by 53%). More expensive measures such as terracing and agroforestry can reduce erosion by as much as 63% (for agroforestry) and 93% (for agroforestry combined with terracing). Indeed, scenario runs for different levels of tolerable erosion rates showed that more cost-intensive and technologically advanced measures would lead to greater reduction of soil loss. However, given economic conditions in Tajikistan, it seems advisable to support the spread of low-cost and labourextensive measures.
Resumo:
The Simulation Automation Framework for Experiments (SAFE) streamlines the de- sign and execution of experiments with the ns-3 network simulator. SAFE ensures that best practices are followed throughout the workflow a network simulation study, guaranteeing that results are both credible and reproducible by third parties. Data analysis is a crucial part of this workflow, where mistakes are often made. Even when appearing in highly regarded venues, scientific graphics in numerous network simulation publications fail to include graphic titles, units, legends, and confidence intervals. After studying the literature in network simulation methodology and in- formation graphics visualization, I developed a visualization component for SAFE to help users avoid these errors in their scientific workflow. The functionality of this new component includes support for interactive visualization through a web-based interface and for the generation of high-quality, static plots that can be included in publications. The overarching goal of my contribution is to help users create graphics that follow best practices in visualization and thereby succeed in conveying the right information about simulation results.
Resumo:
A time series is a sequence of observations made over time. Examples in public health include daily ozone concentrations, weekly admissions to an emergency department or annual expenditures on health care in the United States. Time series models are used to describe the dependence of the response at each time on predictor variables including covariates and possibly previous values in the series. Time series methods are necessary to account for the correlation among repeated responses over time. This paper gives an overview of time series ideas and methods used in public health research.
Resumo:
Nitrogen and water are essential for plant growth and development. In this study, we designed experiments to produce gene expression data of poplar roots under nitrogen starvation and water deprivation conditions. We found low concentration of nitrogen led first to increased root elongation followed by lateral root proliferation and eventually increased root biomass. To identify genes regulating root growth and development under nitrogen starvation and water deprivation, we designed a series of data analysis procedures, through which, we have successfully identified biologically important genes. Differentially Expressed Genes (DEGs) analysis identified the genes that are differentially expressed under nitrogen starvation or drought. Protein domain enrichment analysis identified enriched themes (in same domains) that are highly interactive during the treatment. Gene Ontology (GO) enrichment analysis allowed us to identify biological process changed during nitrogen starvation. Based on the above analyses, we examined the local Gene Regulatory Network (GRN) and identified a number of transcription factors. After testing, one of them is a high hierarchically ranked transcription factor that affects root growth under nitrogen starvation. It is very tedious and time-consuming to analyze gene expression data. To avoid doing analysis manually, we attempt to automate a computational pipeline that now can be used for identification of DEGs and protein domain analysis in a single run. It is implemented in scripts of Perl and R.
Resumo:
Background. Various psychosocial factors have been demonstrated to be barriers for cervical cancer screening among Latinas in the United States, but few studies have researched whether depression and interpersonal violence act as psychosocial barriers to cervical cancer screening. ^ Methods. The proposed study assessed whether depression, interpersonal violence, lack of social support and demographic characteristics such as age, income, education and years in the United States acted as barriers to cervical cancer screening among cantineras in Houston, TX. This secondary data analysis utilized data from a previous cross-sectional study called Project GIRASOL- Community Outreach to Prevent Cervical Cancer among Latinas. The data from the baseline survey (sample size 331) was analyzed using Pearson chi-square and multiple logistic regression. ^ Results. Multiple logistic regression indicates that none and low levels of social support from relatives, depression, and total IPV are significant predictors of non-compliance to cervical cancer screening. ^ Conclusions. Future health interventions or physicians that promote cervical cancer screening among cantineras or recently immigrated Latinas with similar socio-demographic characteristics should try to identify whether Latinas are suffering from depression, interpersonal violence or lack of social support and provide proper referrals to alleviate the problems and positively influence screening behavior. ^
Resumo:
Objectives. Obesity is a growing problem in the United States among children. Great efforts are being made to target this problem, both at home and at school. While parents and peers have proven an effective means of distributing information, the well of the influence of teacher encouragement of health behaviors remains untapped. The purpose of this study is to assess the association of teacher encouragement with diet and physical activity behaviors and obesity in a sample of eighth grade students in central Texas. ^ Methods. In the spring of 2011, the Coordinated Approach to Child Health (CATCH) study distributed teacher surveys to each of the teachers in the schools on the grant. In addition to questions concerning the implementation of CATCH, this survey employed social support questions to gauge the prevalence of teacher encouragement of health behaviors in the classroom. During the same time frame, eighth graders in these same schools completed student surveys which assessed dietary and physical activity knowledge and behaviors and demographics and participated in objective measures of student height and weight. A cross-sectional secondary data analysis was conducted in order to compare self-reported teacher encouragement to student behaviors and several student obesity measures on a by school basis. ^ Results. 1150 teachers and 2582 students from 29 of the 30 measurement schools returned completed surveys. No statistically significant relationship was found between the six teacher encouragement measures and their corresponding student reported health behaviors, nor was one found the mean support per school and child percent overweight. A menial positive relationship was found between the mean support per school and child BMI z-scores, BMI, and percent obese (p = 0.035, 0.003 and 0.003, respectively); however, these relationships were not in the predicted direction. ^ Conclusion. While the findings of this investigation show primarily null results, motivating questions as to the impact to teacher encouragement on middle school student's health remain. It is possible that in order to draw more effective conclusions, more comprehensive studies are warranted which specifically target these relationships.^
New methods for quantification and analysis of quantitative real-time polymerase chain reaction data
Resumo:
Quantitative real-time polymerase chain reaction (qPCR) is a sensitive gene quantitation method that has been widely used in the biological and biomedical fields. The currently used methods for PCR data analysis, including the threshold cycle (CT) method, linear and non-linear model fitting methods, all require subtracting background fluorescence. However, the removal of background fluorescence is usually inaccurate, and therefore can distort results. Here, we propose a new method, the taking-difference linear regression method, to overcome this limitation. Briefly, for each two consecutive PCR cycles, we subtracted the fluorescence in the former cycle from that in the later cycle, transforming the n cycle raw data into n-1 cycle data. Then linear regression was applied to the natural logarithm of the transformed data. Finally, amplification efficiencies and the initial DNA molecular numbers were calculated for each PCR run. To evaluate this new method, we compared it in terms of accuracy and precision with the original linear regression method with three background corrections, being the mean of cycles 1-3, the mean of cycles 3-7, and the minimum. Three criteria, including threshold identification, max R2, and max slope, were employed to search for target data points. Considering that PCR data are time series data, we also applied linear mixed models. Collectively, when the threshold identification criterion was applied and when the linear mixed model was adopted, the taking-difference linear regression method was superior as it gave an accurate estimation of initial DNA amount and a reasonable estimation of PCR amplification efficiencies. When the criteria of max R2 and max slope were used, the original linear regression method gave an accurate estimation of initial DNA amount. Overall, the taking-difference linear regression method avoids the error in subtracting an unknown background and thus it is theoretically more accurate and reliable. This method is easy to perform and the taking-difference strategy can be extended to all current methods for qPCR data analysis.^
Resumo:
Abstract interpretation has been widely used for the analysis of object-oriented languages and, more precisely, Java source and bytecode. However, while most of the existing work deals with the problem of finding expressive abstract domains that track accurately the characteristics of a particular concrete property, the underlying fixpoint algorithms have received comparatively less attention. In fact, many existing (abstract interpretation based) fixpoint algorithms rely on relatively inefficient techniques to solve inter-procedural call graphs or are specific and tied to particular analyses. We argue that the design of an efficient fixpoint algorithm is pivotal to support the analysis of large programs. In this paper we introduce a novel algorithm for analysis of Java bytecode which includes a number of optimizations in order to reduce the number of iterations. Also, the algorithm is parametric in the sense that it is independent of the abstract domain used and it can be applied to different domains as "plug-ins". It is also incremental in the sense that, if desired, analysis data can be saved so that only a reduced amount of reanalysis is needed after a small program change, which can be instrumental for large programs. The algorithm is also multivariant and flowsensitive. Finally, another interesting characteristic of the algorithm is that it is based on a program transformation, prior to the analysis, that results in a highly uniform representation of all the features in the language and therefore simplifies analysis. Detailed descriptions of decompilation solutions are provided and discussed with an example.
Resumo:
La gran cantidad de datos que se registran diariamente en los sistemas de base de datos de las organizaciones ha generado la necesidad de analizarla. Sin embargo, se enfrentan a la complejidad de procesar enormes volúmenes de datos a través de métodos tradicionales de análisis. Además, dentro de un contexto globalizado y competitivo las organizaciones se mantienen en la búsqueda constante de mejorar sus procesos, para lo cual requieren herramientas que les permitan tomar mejores decisiones. Esto implica estar mejor informado y conocer su historia digital para describir sus procesos y poder anticipar (predecir) eventos no previstos. Estos nuevos requerimientos de análisis de datos ha motivado el desarrollo creciente de proyectos de minería de datos. El proceso de minería de datos busca obtener desde un conjunto masivo de datos, modelos que permitan describir los datos o predecir nuevas instancias en el conjunto. Implica etapas de: preparación de los datos, procesamiento parcial o totalmente automatizado para identificar modelos en los datos, para luego obtener como salida patrones, relaciones o reglas. Esta salida debe significar un nuevo conocimiento para la organización, útil y comprensible para los usuarios finales, y que pueda ser integrado a los procesos para apoyar la toma de decisiones. Sin embargo, la mayor dificultad es justamente lograr que el analista de datos, que interviene en todo este proceso, pueda identificar modelos lo cual es una tarea compleja y muchas veces requiere de la experiencia, no sólo del analista de datos, sino que también del experto en el dominio del problema. Una forma de apoyar el análisis de datos, modelos y patrones es a través de su representación visual, utilizando las capacidades de percepción visual del ser humano, la cual puede detectar patrones con mayor facilidad. Bajo este enfoque, la visualización ha sido utilizada en minería datos, mayormente en el análisis descriptivo de los datos (entrada) y en la presentación de los patrones (salida), dejando limitado este paradigma para el análisis de modelos. El presente documento describe el desarrollo de la Tesis Doctoral denominada “Nuevos Esquemas de Visualizaciones para Mejorar la Comprensibilidad de Modelos de Data Mining”. Esta investigación busca aportar con un enfoque de visualización para apoyar la comprensión de modelos minería de datos, para esto propone la metáfora de modelos visualmente aumentados. ABSTRACT The large amount of data to be recorded daily in the systems database of organizations has generated the need to analyze it. However, faced with the complexity of processing huge volumes of data over traditional methods of analysis. Moreover, in a globalized and competitive environment organizations are kept constantly looking to improve their processes, which require tools that allow them to make better decisions. This involves being bettered informed and knows your digital story to describe its processes and to anticipate (predict) unanticipated events. These new requirements of data analysis, has led to the increasing development of data-mining projects. The data-mining process seeks to obtain from a massive data set, models to describe the data or predict new instances in the set. It involves steps of data preparation, partially or fully automated processing to identify patterns in the data, and then get output patterns, relationships or rules. This output must mean new knowledge for the organization, useful and understandable for end users, and can be integrated into the process to support decision-making. However, the biggest challenge is just getting the data analyst involved in this process, which can identify models is complex and often requires experience not only of the data analyst, but also the expert in the problem domain. One way to support the analysis of the data, models and patterns, is through its visual representation, i.e., using the capabilities of human visual perception, which can detect patterns easily in any context. Under this approach, the visualization has been used in data mining, mostly in exploratory data analysis (input) and the presentation of the patterns (output), leaving limited this paradigm for analyzing models. This document describes the development of the doctoral thesis entitled "New Visualizations Schemes to Improve Understandability of Data-Mining Models". This research aims to provide a visualization approach to support understanding of data mining models for this proposed metaphor visually enhanced models.
Resumo:
O presente relatório foi realizado no âmbito da unidade curricular de Prática de Ensino Supervisionada (PES), integrada no plano de estudos do Mestrado Pré-Escolar e Ensino do 1.º Ciclo do Ensino Básico, descreve e analisa experiências de ensino/aprendizagem realizadas em contexto Pré-Escolar e de 1.º Ciclo do Ensino Básico. Foi nossa intenção promover experiências enriquecedoras que nos permitissem analisar o processo de transição entre estes dois contextos. Procuramos envolver as crianças em novas realidades potenciadoras de múltiplas interações sociais, integradas no contexto escolar, com o objetivo de potenciar as relações educativas entre os diferentes níveis de ensino. Dado que o nosso foco investigativo se centrava na transição entre os dois níveis educativos procuramos delinear uma questão que orientasse o nosso estudo: “Que estratégias se podem desenvolver em contexto de Educação Pré-Escolar e 1.º Ciclo do Ensino Básico que promovam a articulação, numa perspetiva de continuidade educativa, entre estes dois contextos?”. Para dar resposta a esta questão delineamos os seguintes objetivos: desenvolver estratégias integradoras que promovessem a adaptação das crianças a cada etapa formativa, perceber se as estratégias utlizadas foram as adequadas e que reflexos tiveram na transição educativa e perceber se as aprendizagens adquiridas facilitaram a articulação entre os dois níveis educativos, valorizando o desenvolvimento de competências neste domínio. Para orientar metodologicamente a investigação recorremos a uma metodologia qualitativa, cujos dados foram recolhidos através da observação participante, das notas de campo e dos registos fotográficos, respeitando o sigilo inerente à prática de investigação e solicitando autorização prévia para a sua implementação, as dois grupos de crianças, um do Pré-escolar, com idades de 5 e 6 anos e outro do 1.º Ciclo do Ensino Básico com 6 ano de idade. Da análise dos dados, evidencia-se a facilidade de adaptação das crianças, da Educação Pré-escolar para o 1.º CEB, demonstrando saber que as duas realidades são diferentes. Referem-se aos contextos como apresentando caraterísticas distintas e onde se realizam atividades diferentes, diversificadas. Realça-se, ainda, que o relacionamento interpessoal entre e intra grupo que foi de extrema importância para facilitar a transição, tendo-se verificado união, companheirismo, colaboração e apoio constantes dentro do grupo.
Resumo:
Mode of access: Internet.
Resumo:
Federal Highway Administration, Washington, D.C.