920 resultados para multivariate data analysis


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The current study is a secondary data analysis of a prospective cohort study that examined demographic and psychosocial variables and their associations with physical activity levels in Mexican-American adolescents in Houston, Texas. Body image, subjective social status, and anxiety were the main variables of interest. The sample included 952 unrelated Mexican-American adolescents in Houston, Texas. The majority (84.2%) of the study population did not meet physical activity standards prescribed by the CDC.^ In a multivariate model controlling for age, socioeconomic status, gender, general body image, preferred body image, subjective social status, and anxiety, gender and subjective social status were found to be the strongest determinants of physical activity levels. Males and those with a high subjective social status were more likely to participate in physical activity than those with low subjective status. Lower levels of anxiety and a more positive body image were also found to be associated with higher levels of physical activity. In multivariate analyses gender and subjective social status showed the strongest associations with physical activity.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The purpose of this study was to understand the scope of breast cancer disparities within the Texas Medical Center. The goal was to increase the awareness of breast cancer disparities at the health care organization level, and to foster the development of organizational interventions to reduce breast cancer disparities. The study seeks to answer the following questions: 1. Are hospitals in the Texas Medical Center implementing interventions to reduce breast cancer disparities? 2. What are their interventions for reducing the effects of non clinical factors on breast cancer treatment disparities? 3. What are their measures for monitoring, continuously improving, and evaluating the success of their interventions? ^ This research project was designed as a mixed methods case study. Quantitative breast cancer data for the years 2000-2009 was obtained from the Texas Cancer Registry (TCR). Qualitative data collection and analysis was done by conducting a total of 20 semi-structured interviews of administrators, physicians and nurses at five hospitals (A, B, C, D and E) in the Texas Medical Center (TMC). For quantitative analysis, the study was limited to early stage breast cancer patients: local and regional. The dependent variable was receipt of standard treatment: Surgery (Yes/No), BCS vs Mastectomy, Chemotherapy (Yes/No) and Radiation after BCS (Yes/No). The main independent variable was race: non-Hispanic White (NHW) , non-Hispanic Black (NHB), and Hispanic. Other covariates included age at diagnosis, diagnosis date, percent poverty, grade, stage, and regional nodes. Multivariate logistic regression was used to test the adjusted association between receipt of standard care and race. Qualitative data was analyzed with the Atlas.ti7 software (ATLAS.ti GmbH, Berlin). ^ Though there were significant differences by race for all dependent variables when the data was analyzed as a single group of all hospitals; at the level of the individual hospitals the results were not consistent by race/ethnicity across all dependent variables for hospitals A, B, and E. There were no racial differences in adjusted analysis for receipt of chemotherapy for the individual hospitals of interest in this study. For hospitals C and D, no racial disparities in treatment was observed in adjusted multivariable analysis. All organizations in this study were aware of the body of research which shows that there are disparities in breast cancer outcomes for patient population groups. However, qualitative data analysis found that there were differences in interest among hospitals in addressing breast cancer disparities in their patient population groups. Some organizations were actively implementing directed measures to reduce the breast cancer disparity gap in outcomes for patients, and others were not. Despite the differences in levels of interest, quantitative data analysis showed that organizations in the Texas Medical Center were making progress in reducing the burden of breast cancer disparities in the patient populations being served.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Sediments from immediately above basalt basement and from between sections of basalt recovered from Deep Sea Drilling Project Legs 5 and 63 were analyzed by atomic absorption spectroscopy for Mg, Al, Si, Ca, Mn, Fe, Co, Ni, Cu, Zn, and Ba. All of these sediments showed enrichment in Fe and Mn over values typical of detritus supplied to the northeastern Pacific Ocean. X-ray diffractometry and differential chemical leaching indicate that up to 50% of the sediment, by weight, is in amorphous phases and that these phases are rich in Mn, Co, Cu, Ni, and Zn. Multivariate statistical analysis and normative partitioning of the chemical data indicate that much of the excess Fe and other transition elements in the sediment originate from hydrothermal sources.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Nuevas biotecnologías permiten obtener información para caracterizar materiales genéticos a partir de múltiples marcadores, ya sean éstos moleculares y/o morfológicos. La ordenación del material genético a través de la exploración de patrones de variabilidad multidimensionales se aborda mediante diversas técnicas de análisis multivariado. Las técnicas multivariadas de reducción de dimensión (TRD) y la representación gráfica de las mismas cobran sustancial importancia en la visualización de datos multivariados en espacios de baja dimensión ya que facilitan la interpretación de interrelaciones entre las variables (marcadores) y entre los casos u observaciones bajo análisis. Tanto el Análisis de Componentes Principales, como el Análisis de Coordenadas Principales y el Análisis de Procrustes Generalizado son TRD aplicables a datos provenientes de marcadores moleculares y/o morfológicos. Los Árboles de Mínimo Recorrido y los biplots constituyen técnicas para lograr representaciones geométricas de resultados provenientes de TRD. En este trabajo se describen estas técnicas multivariadas y se ilustran sus aplicaciones sobre dos conjuntos de datos, moleculares y morfológicos, usados para caracterizar material genético fúngico.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

En este trabajo se presentan resultados de un relevamiento realizado a los estudiantes de la UNLP a partir de una encuesta por muestreo aleatorio. Se realiza en primer término un análisis descriptivo general sobre las principales características de dicha población, y en segundo lugar un análisis exploratorio que intenta sintetizar perfiles de los estudiantes según algunas dimensiones socio-demográficas, institucionales e ideológicas, mediante la técnica multivariada del análisis de correspondencias múltiples

Relevância:

90.00% 90.00%

Publicador:

Resumo:

En este trabajo se presentan resultados de un relevamiento realizado a los estudiantes de la UNLP a partir de una encuesta por muestreo aleatorio. Se realiza en primer término un análisis descriptivo general sobre las principales características de dicha población, y en segundo lugar un análisis exploratorio que intenta sintetizar perfiles de los estudiantes según algunas dimensiones socio-demográficas, institucionales e ideológicas, mediante la técnica multivariada del análisis de correspondencias múltiples

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Assemblages of organic-walled dinoflagellate cysts (dinocysts) from 116 marine surface samples have been analysed to assess the relationship between the spatial distribution of dinocysts and modern local environmental conditions [e.g. sea surface temperature (SST), sea surface salinity (SSS), productivity] in the eastern Indian Ocean. Results from the percentage analysis and statistical methods such as multivariate ordination analysis and end-member modelling, indicate the existence of three distinct environmental and oceanographic regions in the study area. Region 1 is located in western and eastern Indonesia and controlled by high SSTs and a low nutrient content of the surface waters. The Indonesian Throughflow (ITF) region (Region 2) is dominated by heterotrophic dinocyst species reflecting the region's high productivity. Region 3 is encompassing the area offshore north-west and west Australia which is characterised by the water masses of the Leeuwin Current, a saline and nutrient depleted southward current featuring energetic eddies.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

En este trabajo se presentan resultados de un relevamiento realizado a los estudiantes de la UNLP a partir de una encuesta por muestreo aleatorio. Se realiza en primer término un análisis descriptivo general sobre las principales características de dicha población, y en segundo lugar un análisis exploratorio que intenta sintetizar perfiles de los estudiantes según algunas dimensiones socio-demográficas, institucionales e ideológicas, mediante la técnica multivariada del análisis de correspondencias múltiples

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Thirty-eight samples from DSDP Sites 549 to 551 were analyzed for major and minor components and trace element abundances. Multivariate statistical analysis of geochemical data groups the samples into two major classes: an organic-carbon- rich group (> 1% TOC) containing high levels of marine organic matter and certain trace elements (Cu, Zn, V, Ni, Co, Ba, and Cr) and an organic-carbon-lean group depleted in these components. The greatest organic and trace metal enrichments occur in the uppermost Albian to Turanian sections of Sites 549 to 551. Carbon-isotopic values of bulk carbonate for the middle Cenomanian section of Site 550 (2.35 to 2.70 per mil) and the upper Cenomanian-Turonian sections of Sites 549 (3.35 to 4.47 per mil) and 551 (3.13 to 3.72 per mil) are similar to coeval values reported elsewhere in the region. The relatively heavy d13C values from Sites 549 and 551 indicate that this interval was deposited during the global "oceanic anoxic event" that occurred at the Cenomanian/Turonian boundary. Variation in the d18O of bulk carbonate for Section 550B-18-1 of middle Cenomanian age suggests that paleosalinity and/or paleotemperature variations may have occurred concurrently with periodic anoxia at this site. Climatically controlled increases in surface-water runoff may have caused surface waters to periodically freshen, resulting in stable salinity stratification

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Samples collected at two different depths (ca. 3200 and ca. 4200 m) in the Setúbal and Cascais canyons off the Portuguese coast, during the HERMES RRS Charles Darwin cruise CD179, were analysed for (1) sediment biogeochemistry (TOC, TN) and (2) composition, and structural and trophic diversity of nematode communities. Multivariate PERMANOVA analysis on the nematode community data revealed differences between sediment layers that were greater than differences between canyons, water depths, and stations. This suggests that biogeochemical gradients along the vertical sediment profile are crucial in determining nematode community structure. The interaction between canyon conditions and the nematode community is illustrated by biogeochemical patterns in the sediment and the prevalence of nematode genera that are able to persist in disturbed sediments. Trophic analysis of the nematode community indicated that non-selective deposit feeders are dominant, presumably because of their non-selective feeding behaviour compared to other feeding types, which gives them a competitive advantage in exploiting lower-quality food resources. This study presents a preliminary conceptual scheme for interactions between canyon conditions and the resident fauna.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The analysis of time-dependent data is an important problem in many application domains, and interactive visualization of time-series data can help in understanding patterns in large time series data. Many effective approaches already exist for visual analysis of univariate time series supporting tasks such as assessment of data quality, detection of outliers, or identification of periodically or frequently occurring patterns. However, much fewer approaches exist which support multivariate time series. The existence of multiple values per time stamp makes the analysis task per se harder, and existing visualization techniques often do not scale well. We introduce an approach for visual analysis of large multivariate time-dependent data, based on the idea of projecting multivariate measurements to a 2D display, visualizing the time dimension by trajectories. We use visual data aggregation metaphors based on grouping of similar data elements to scale with multivariate time series. Aggregation procedures can either be based on statistical properties of the data or on data clustering routines. Appropriately defined user controls allow to navigate and explore the data and interactively steer the parameters of the data aggregation to enhance data analysis. We present an implementation of our approach and apply it on a comprehensive data set from the field of earth bservation, demonstrating the applicability and usefulness of our approach.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Planktic foraminiferal (PF) flux and faunal composition from three sediment trap time series of 2002-2004 in the northeastern Atlantic show pronounced year-to-year variations despite similar sea surface temperature (SST). The averaged fauna of the in 2002/2003 is dominated by the species Globigerinita glutinata, whereas in 2003/2004 the averaged fauna is dominated by Globigerinoides ruber. We show that PF species respond primarily to productivity, triggered by the seasonal dynamics of vertical stratification of the upper water column. Multivariate statistical analysis reveals three distinct species groups, linked to bulk particle flux, to chlorophyll concentrations and to summer/fall oligotrophy with high SST and stratification. We speculate that the distinct nutrition strategies of strictly asymbiontic, facultatively symbiontic, and symbiontic species may play a key role in explaining their abundances and temporal succession. Advection of water masses within the Azores Current and species expatriation result in a highly diverse PF assemblage. The Azores Frontal Zone may have influenced the trap site in 2002, indicated by subsurface water cooling, by highest PF flux and high flux of the deep-dwelling species Globorotalia scitula. Similarity analyses with core top samples from the global ocean including 746 sites from the Atlantic suggest that the trap faunas have only poor analogs in the surface sediments. These differences have to be taken into account when estimating past oceanic properties from sediment PF data in the eastern subtropical North Atlantic.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An important competence of human data analysts is to interpret and explain the meaning of the results of data analysis to end-users. However, existing automatic solutions for intelligent data analysis provide limited help to interpret and communicate information to non-expert users. In this paper we present a general approach to generating explanatory descriptions about the meaning of quantitative sensor data. We propose a type of web application: a virtual newspaper with automatically generated news stories that describe the meaning of sensor data. This solution integrates a variety of techniques from intelligent data analysis into a web-based multimedia presentation system. We validated our approach in a real world problem and demonstrate its generality using data sets from several domains. Our experience shows that this solution can facilitate the use of sensor data by general users and, therefore, can increase the utility of sensor network infrastructures.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Differences in gene expression patterns have been documented not only in Multiple Sclerosis patients versus healthy controls but also in the relapse of the disease. Recently a new gene expression modulator has been identified: the microRNA or miRNA. The aim of this work is to analyze the possible role of miRNAs in multiple sclerosis, focusing on the relapse stage. We have analyzed the expression patterns of 364 miRNAs in PBMC obtained from multiple sclerosis patients in relapse status, in remission status and healthy controls. The expression patterns of the miRNAs with significantly different expression were validated in an independent set of samples. In order to determine the effect of the miRNAs, the expression of some predicted target genes of these were studied by qPCR. Gene interaction networks were constructed in order to obtain a co-expression and multivariate view of the experimental data. The data analysis and later validation reveal that two miRNAs (hsa-miR-18b and hsa-miR-599) may be relevant at the time of relapse and that another miRNA (hsa-miR-96) may be involved in remission. The genes targeted by hsa-miR-96 are involved in immunological pathways as Interleukin signaling and in other pathways as wnt signaling. This work highlights the importance of miRNA expression in the molecular mechanisms implicated in the disease. Moreover, the proposed involvement of these small molecules in multiple sclerosis opens up a new therapeutic approach to explore and highlight some candidate biomarker targets in MS