19 resultados para Cluster Analysis. Information Theory. Entropy. Cross Information Potential. Complex Data

em DigitalCommons@The Texas Medical Center


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Microarray technology is a high-throughput method for genotyping and gene expression profiling. Limited sensitivity and specificity are one of the essential problems for this technology. Most of existing methods of microarray data analysis have an apparent limitation for they merely deal with the numerical part of microarray data and have made little use of gene sequence information. Because it's the gene sequences that precisely define the physical objects being measured by a microarray, it is natural to make the gene sequences an essential part of the data analysis. This dissertation focused on the development of free energy models to integrate sequence information in microarray data analysis. The models were used to characterize the mechanism of hybridization on microarrays and enhance sensitivity and specificity of microarray measurements. ^ Cross-hybridization is a major obstacle factor for the sensitivity and specificity of microarray measurements. In this dissertation, we evaluated the scope of cross-hybridization problem on short-oligo microarrays. The results showed that cross hybridization on arrays is mostly caused by oligo fragments with a run of 10 to 16 nucleotides complementary to the probes. Furthermore, a free-energy based model was proposed to quantify the amount of cross-hybridization signal on each probe. This model treats cross-hybridization as an integral effect of the interactions between a probe and various off-target oligo fragments. Using public spike-in datasets, the model showed high accuracy in predicting the cross-hybridization signals on those probes whose intended targets are absent in the sample. ^ Several prospective models were proposed to improve Positional Dependent Nearest-Neighbor (PDNN) model for better quantification of gene expression and cross-hybridization. ^ The problem addressed in this dissertation is fundamental to the microarray technology. We expect that this study will help us to understand the detailed mechanism that determines sensitivity and specificity on the microarrays. Consequently, this research will have a wide impact on how microarrays are designed and how the data are interpreted. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

People often use tools to search for information. In order to improve the quality of an information search, it is important to understand how internal information, which is stored in user’s mind, and external information, represented by the interface of tools interact with each other. How information is distributed between internal and external representations significantly affects information search performance. However, few studies have examined the relationship between types of interface and types of search task in the context of information search. For a distributed information search task, how data are distributed, represented, and formatted significantly affects the user search performance in terms of response time and accuracy. Guided by UFuRT (User, Function, Representation, Task), a human-centered process, I propose a search model, task taxonomy. The model defines its relationship with other existing information models. The taxonomy clarifies the legitimate operations for each type of search task of relation data. Based on the model and taxonomy, I have also developed prototypes of interface for the search tasks of relational data. These prototypes were used for experiments. The experiments described in this study are of a within-subject design with a sample of 24 participants recruited from the graduate schools located in the Texas Medical Center. Participants performed one-dimensional nominal search tasks over nominal, ordinal, and ratio displays, and searched one-dimensional nominal, ordinal, interval, and ratio tasks over table and graph displays. Participants also performed the same task and display combination for twodimensional searches. Distributed cognition theory has been adopted as a theoretical framework for analyzing and predicting the search performance of relational data. It has been shown that the representation dimensions and data scales, as well as the search task types, are main factors in determining search efficiency and effectiveness. In particular, the more external representations used, the better search task performance, and the results suggest the ideal search performance occurs when the question type and corresponding data scale representation match. The implications of the study lie in contributing to the effective design of search interface for relational data, especially laboratory results, which are often used in healthcare activities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Despite almost 40 years of research into the etiology of Kawasaki Syndrome (KS), there is little research published on spatial and temporal clustering of KS cases. Previous analysis has found significant spatial and temporal clustering of cases, therefore cluster analyses were performed to substantiate these findings and provide insight into incident KS cases discharged from a pediatric tertiary care hospital. Identifying clusters from a single institution would allow for prospective analysis of risk factors and potential exposures for further insight into KS etiology. ^ Methods: A retrospective study was carried out to examine the epidemiology and distribution of patients presenting to Texas Children’s Hospital in Houston, Texas, with a diagnosis of Acute Febrile Mucocutaneous Lymph Node Syndrome (MCLS) upon discharge from January 1, 2005 to December 31, 2009. Spatial, temporal, and space-time cluster analyses were performed using the Bernoulli model with case and control event data. ^ Results: 397 of 102,761 total patients admitted to Texas Children’s Hospital had a principal or secondary diagnosis of Acute Febrile MCLS upon over the 5 year period. Demographic data for KS cases remained consistent with known disease epidemiology. Spatial, temporal, and space-time analyses of clustering using the Bernoulli model demonstrated no statistically significant clusters. ^ Discussion: Despite previous findings of spatial-temporal clustering of KS cases, there were no significant clusters of KS cases discharged from a single institution. This implicates the need for an expanded approach to conducting spatial-temporal cluster analysis and KS surveillance given the limitations of evaluating data from a single institution.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quantitative real-time polymerase chain reaction (qPCR) is a sensitive gene quantitation method that has been widely used in the biological and biomedical fields. The currently used methods for PCR data analysis, including the threshold cycle (CT) method, linear and non-linear model fitting methods, all require subtracting background fluorescence. However, the removal of background fluorescence is usually inaccurate, and therefore can distort results. Here, we propose a new method, the taking-difference linear regression method, to overcome this limitation. Briefly, for each two consecutive PCR cycles, we subtracted the fluorescence in the former cycle from that in the later cycle, transforming the n cycle raw data into n-1 cycle data. Then linear regression was applied to the natural logarithm of the transformed data. Finally, amplification efficiencies and the initial DNA molecular numbers were calculated for each PCR run. To evaluate this new method, we compared it in terms of accuracy and precision with the original linear regression method with three background corrections, being the mean of cycles 1-3, the mean of cycles 3-7, and the minimum. Three criteria, including threshold identification, max R2, and max slope, were employed to search for target data points. Considering that PCR data are time series data, we also applied linear mixed models. Collectively, when the threshold identification criterion was applied and when the linear mixed model was adopted, the taking-difference linear regression method was superior as it gave an accurate estimation of initial DNA amount and a reasonable estimation of PCR amplification efficiencies. When the criteria of max R2 and max slope were used, the original linear regression method gave an accurate estimation of initial DNA amount. Overall, the taking-difference linear regression method avoids the error in subtracting an unknown background and thus it is theoretically more accurate and reliable. This method is easy to perform and the taking-difference strategy can be extended to all current methods for qPCR data analysis.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background. In over 30 years, the prevalence of overweight for children and adolescents has increased across the United States (Barlow et al., 2007; Ogden, Flegal, Carroll, & Johnson, 2002). Childhood obesity is linked with adverse physiological and psychological issues in youth and affects ethnic/minority populations in disproportionate rates (Barlow et al., 2007; Butte et al., 2006; Butte, Cai, Cole, Wilson, Fisher, Zakeri, Ellis, & Comuzzie, 2007). More importantly, overweight in children and youth tends to track into adulthood (McNaughton, Ball, Mishra, & Crawford, 2008; Ogden et al., 2002). Childhood obesity affects body functions such as the cardiovascular, respiratory, gastrointestinal, and endocrine systems, including emotional health (Barlow et al., 2007, Ogden et al., 2002). Several dietary factors have been associated with the development of obesity in children; however, these factors have not been fully elucidated, especially in ethnic/minority children. In particular, few studies have been done to determine the effects of different meal patterns on the development of obesity in children. Purpose. The purpose of the study is to examine the relationships between daily proportions of energy consumed and energy derived from fat across breakfast, lunch, dinner, and snack, and obesity among Hispanic children and adolescents. Methods. A cross-sectional design was used to evaluate the relationship between dietary patterns and overweight status in Hispanic children and adolescents 4-19 years of age who participated in the Viva La Familia Study. The goal of the Viva La Familia Study was to evaluate genetic and environmental factors affecting childhood obesity and its co-morbidities in the Hispanic population (Butte et al., 2006, 2007). The study enrolled 1030 Hispanic children and adolescents from 319 families and examined factors related to increased body weight by focusing on a multilevel analysis of extensive sociodemographic, genetic, metabolic, and behavioral data. Baseline dietary intakes of the children were collected using 24-hour recalls, and body mass index was calculated from measured height and weight, and classified using the CDC standards. Dietary data were analyzed using a GEE population-averaged panel-data model with a cluster variable family identifier to include possible correlations within related data sets. A linear regression model was used to analyze associations of dietary patterns using possible covariates, and to examine the percentage of daily energy coming from breakfast, lunch, dinner, and snack while adjusting for age, sex, and BMI z-score. Random-effects logistic regression models were used to determine the relationship of the dietary variables with obesity status and to understand if the percent energy intake (%EI) derived from fat from all meals (breakfast, lunch, dinner, and snacks) affected obesity. Results. Older children (age 4-19 years) consumed a higher percent of energy at lunch and dinner and less percent energy from snacks compared to younger children. Age was significantly associated with percentage of total energy intake (%TEI) for lunch, as well as dinner, while no association was found by gender. Percent of energy consumed from dinner significantly differed by obesity status, with obese children consuming more energy at dinner (p = 0.03), but no associations were found between percent energy from fat and obesity across all meals. Conclusions. Information from this study can be used to develop interventions that target dietary intake patterns in obesity prevention programs for Hispanic children and adolescents. In particular, intervention programs for children should target dietary patterns with energy intake that is spread throughout the day and earlier in the day. These results indicate that a longitudinal study should be used to further explore the relationship of dietary patterns and BMI in this and other populations (Dubois et al., 2008; Rodriquez & Moreno, 2006; Thompson et al., 2005; Wilson et al., in review, 2008). ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hypertension in adults is defined by risk for cardiovascular morbidity and mortality, but in children, hypertension is defined using population norms. The diagnosis of hypertension in children and adolescents requires only casual blood pressure measurements, but the use of ambulatory blood pressure monitoring to further evaluate patients with elevated blood pressure has been recommended in the Fourth Report on the Diagnosis, Evaluation, and Treatment of High Blood Pressure in Children and Adolescents. The aim of this study is to assess the association between stage of hypertension (using both casual and 24 hour ambulatory blood pressure measurements) and target organ damage defined by left ventricular hypertrophy (LVH) in a sample of children and adolescents in Houston, TX. A retrospective analysis was performed on the primary de-identified data from the combination of participants in two, IRB approved, cross-sectional studies. The studies collected basic demographic data, height, weight, casual blood pressures, ambulatory blood pressures, and left ventricular measurements by echocardiography on children age 8 to 18 years old. Hypertension was defined and staged using the criteria for ambulatory blood pressure reported by Lurbe et al. [1] with some modification. Left ventricular hypertrophy was defined using left ventricular mass index (LVMI) criteria specific for children and adults. The pediatric criterion was LVMI2.7 > 95th percentile for gender and the adult criterion was LVMI2.7 > 51g/m2.7. Participants from the original studies were included in this analysis if they had complete demographic information, anthropometric measures, casual blood pressures, ambulatory blood pressures, and echocardiography data. There were 241 children and adolescents included: 19.1% were normotensive, 17.0% had white coat hypertension, 11.6% had masked hypertension, and 52.4% had confirmed hypertension. Of those with hypertension, 22.4% had stage 1 hypertension, 5.8% had stage 2 hypertension, and 24.1% had stage 3 hypertension. Participants with confirmed hypertension were more likely to have LVH by pediatric criterion than those who were normotensive [OR 2.19, 95% CI (1.04–4.63)]; LVH defined by adult criterion did not differ significantly in normotensives compared with hypertensives [OR 2.08, 95% CI (0.58–7.52)]. However, there was a significant trend in the increased prevalence of LVH across the six blood pressure categories for LVH defined by both pediatric and adult criteria (p < 0.001 and p = 0.02, respectively). Additionally, the mean LVM indexed by height 2.7 had a significantly increased trend across blood pressure stages from normal to stage 3 hypertension (p < 0.02). Pediatric hypertension is defined using population norms, and although children with mild hypertension are not at increased odds of having target organ damage defined by LVH, those with severe hypertension are more likely to have LVH. Staging hypertension by ambulatory blood pressure further describes an individual's risk for LVH target organ damage. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study retrospectively evaluated the spatial and temporal disease patterns associated with influenza-like illness (ILI), positive rapid influenza antigen detection tests (RIDT), and confirmed H1N1 S-OIV cases reported to the Cameron County Department of Health and Human Services between April 26 and May 13, 2009 using the space-time permutation scan statistic software SaTScan in conjunction with geographical information system (GIS) software ArcGIS 9.3. The rate and age-adjusted relative risk of each influenza measure was calculated and a cluster analysis was conducted to determine the geographic regions with statistically higher incidence of disease. A Poisson distribution model was developed to identify the effect that socioeconomic status, population density, and certain population attributes of a census block-group had on that area's frequency of S-OIV confirmed cases over the entire outbreak. Predominant among the spatiotemporal analyses of ILI, RIDT and S-OIV cases in Cameron County is the consistent pattern of a high concentration of cases along the southern border with Mexico. These findings in conjunction with the slight northward space-time shifts of ILI and RIDT cluster centers highlight the southern border as the primary site for public health interventions. Finally, the community-based multiple regression model revealed that three factors—percentage of the population under age 15, average household size, and the number of high school graduates over age 25—were significantly associated with laboratory-confirmed S-OIV in the Lower Rio Grande Valley. Together, these findings underscore the need for community-based surveillance, improve our understanding of the distribution of the burden of influenza within the community, and have implications for vaccination and community outreach initiatives.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clinical Research Data Quality Literature Review and Pooled Analysis We present a literature review and secondary analysis of data accuracy in clinical research and related secondary data uses. A total of 93 papers meeting our inclusion criteria were categorized according to the data processing methods. Quantitative data accuracy information was abstracted from the articles and pooled. Our analysis demonstrates that the accuracy associated with data processing methods varies widely, with error rates ranging from 2 errors per 10,000 files to 5019 errors per 10,000 fields. Medical record abstraction was associated with the highest error rates (70–5019 errors per 10,000 fields). Data entered and processed at healthcare facilities had comparable error rates to data processed at central data processing centers. Error rates for data processed with single entry in the presence of on-screen checks were comparable to double entered data. While data processing and cleaning methods may explain a significant amount of the variability in data accuracy, additional factors not resolvable here likely exist. Defining Data Quality for Clinical Research: A Concept Analysis Despite notable previous attempts by experts to define data quality, the concept remains ambiguous and subject to the vagaries of natural language. This current lack of clarity continues to hamper research related to data quality issues. We present a formal concept analysis of data quality, which builds on and synthesizes previously published work. We further posit that discipline-level specificity may be required to achieve the desired definitional clarity. To this end, we combine work from the clinical research domain with findings from the general data quality literature to produce a discipline-specific definition and operationalization for data quality in clinical research. While the results are helpful to clinical research, the methodology of concept analysis may be useful in other fields to clarify data quality attributes and to achieve operational definitions. Medical Record Abstractor’s Perceptions of Factors Impacting the Accuracy of Abstracted Data Medical record abstraction (MRA) is known to be a significant source of data errors in secondary data uses. Factors impacting the accuracy of abstracted data are not reported consistently in the literature. Two Delphi processes were conducted with experienced medical record abstractors to assess abstractor’s perceptions about the factors. The Delphi process identified 9 factors that were not found in the literature, and differed with the literature by 5 factors in the top 25%. The Delphi results refuted seven factors reported in the literature as impacting the quality of abstracted data. The results provide insight into and indicate content validity of a significant number of the factors reported in the literature. Further, the results indicate general consistency between the perceptions of clinical research medical record abstractors and registry and quality improvement abstractors. Distributed Cognition Artifacts on Clinical Research Data Collection Forms Medical record abstraction, a primary mode of data collection in secondary data use, is associated with high error rates. Distributed cognition in medical record abstraction has not been studied as a possible explanation for abstraction errors. We employed the theory of distributed representation and representational analysis to systematically evaluate cognitive demands in medical record abstraction and the extent of external cognitive support employed in a sample of clinical research data collection forms. We show that the cognitive load required for abstraction in 61% of the sampled data elements was high, exceedingly so in 9%. Further, the data collection forms did not support external cognition for the most complex data elements. High working memory demands are a possible explanation for the association of data errors with data elements requiring abstractor interpretation, comparison, mapping or calculation. The representational analysis used here can be used to identify data elements with high cognitive demands.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many statistical studies feature data with both exact-time and interval-censored events. While a number of methods currently exist to handle interval-censored events and multivariate exact-time events separately, few techniques exist to deal with their combination. This thesis develops a theoretical framework for analyzing a multivariate endpoint comprised of a single interval-censored event plus an arbitrary number of exact-time events. The approach fuses the exact-time events, modeled using the marginal method of Wei, Lin, and Weissfeld, with a piecewise-exponential interval-censored component. The resulting model incorporates more of the information in the data and also removes some of the biases associated with the exclusion of interval-censored events. A simulation study demonstrates that our approach produces reliable estimates for the model parameters and their variance-covariance matrix. As a real-world data example, we apply this technique to the Systolic Hypertension in the Elderly Program (SHEP) clinical trial, which features three correlated events: clinical non-fatal myocardial infarction, fatal myocardial infarction (two exact-time events), and silent myocardial infarction (one interval-censored event). ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Withdrawal reflexes of the mollusk Aplysia exhibit sensitization, a simple form of long-term memory (LTM). Sensitization is due, in part, to long-term facilitation (LTF) of sensorimotor neuron synapses. LTF is induced by the modulatory actions of serotonin (5-HT). Pettigrew et al. developed a computational model of the nonlinear intracellular signaling and gene network that underlies the induction of 5-HT-induced LTF. The model simulated empirical observations that repeated applications of 5-HT induce persistent activation of protein kinase A (PKA) and that this persistent activation requires a suprathreshold exposure of 5-HT. This study extends the analysis of the Pettigrew model by applying bifurcation analysis, singularity theory, and numerical simulation. Using singularity theory, classification diagrams of parameter space were constructed, identifying regions with qualitatively different steady-state behaviors. The graphical representation of these regions illustrates the robustness of these regions to changes in model parameters. Because persistent protein kinase A (PKA) activity correlates with Aplysia LTM, the analysis focuses on a positive feedback loop in the model that tends to maintain PKA activity. In this loop, PKA phosphorylates a transcription factor (TF-1), thereby increasing the expression of an ubiquitin hydrolase (Ap-Uch). Ap-Uch then acts to increase PKA activity, closing the loop. This positive feedback loop manifests multiple, coexisting steady states, or multiplicity, which provides a mechanism for a bistable switch in PKA activity. After the removal of 5-HT, the PKA activity either returns to its basal level (reversible switch) or remains at a high level (irreversible switch). Such an irreversible switch might be a mechanism that contributes to the persistence of LTM. The classification diagrams also identify parameters and processes that might be manipulated, perhaps pharmacologically, to enhance the induction of memory. Rational drug design, to affect complex processes such as memory formation, can benefit from this type of analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Enterococcus faecium has emerged as an important nosocomial pathogen worldwide, and this trend has been associated with the dissemination of a genetic lineage designated clonal cluster 17 (CC17). Enterococcal isolates were collected prospectively (2006 to 2008) from 32 hospitals in Colombia, Ecuador, Perú, and Venezuela and subjected to antimicrobial susceptibility testing. Genotyping was performed with all vancomycin-resistant E. faecium (VREfm) isolates by pulsed-field gel electrophoresis (PFGE) and multilocus sequence typing. All VREfm isolates were evaluated for the presence of 16 putative virulence genes (14 fms genes, the esp gene of E. faecium [espEfm], and the hyl gene of E. faecium [hylEfm]) and plasmids carrying the fms20-fms21 (pilA), hylEfm, and vanA genes. Of 723 enterococcal isolates recovered, E. faecalis was the most common (78%). Vancomycin resistance was detected in 6% of the isolates (74% of which were E. faecium). Eleven distinct PFGE types were found among the VREfm isolates, with most belonging to sequence types 412 and 18. The ebpAEfm-ebpBEfm-ebpCEfm (pilB) and fms11-fms19-fms16 clusters were detected in all VREfm isolates from the region, whereas espEfm and hylEfm were detected in 69% and 23% of the isolates, respectively. The fms20-fms21 (pilA) cluster, which encodes a putative pilus-like protein, was found on plasmids from almost all VREfm isolates and was sometimes found to coexist with hylEfm and the vanA gene cluster. The population genetics of VREfm in South America appear to resemble those of such strains in the United States in the early years of the CC17 epidemic. The overwhelming presence of plasmids encoding putative virulence factors and vanA genes suggests that E. faecium from the CC17 genogroup may disseminate in the region in the coming years.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Enterococcus faecium recently evolved from a generally avirulent commensal into a multidrug-resistant health care-associated pathogen causing difficult-to-treat infections, but little is known about the factors responsible for this change. We previously showed that some E. faecium strains express a cell wall-anchored collagen adhesin, Acm. Here we analyzed 90 E. faecium isolates (99% acm(+)) and found that the Acm protein was detected predominantly in clinically derived isolates, while the acm gene was present as a transposon-interrupted pseudogene in 12 of 47 isolates of nonclinical origin. A highly significant association between clinical (versus fecal or food) origin and collagen adherence (P

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently it has been proposed that the evaluation of effects of pollutants on aquatic organisms can provide an early warning system of potential environmental and human health risks (NRC 1991). Unfortunately there are few methods available to aquatic biologists to conduct assessments of the effects of pollutants on aquatic animal community health. The primary goal of this research was to develop and evaluate the feasibility of such a method. Specifically, the primary objective of this study was to develop a prototype rapid bioassessment technique similar to the Index of Biotic Integrity (IBI) for the upper Texas and Northwestern Gulf of Mexico coastal tributaries. The IBI consists of a series of "metrics" which describes specific attributes of the aquatic community. Each of these metrics are given a score which is then subtotaled to derive a total assessment of the "health" of the aquatic community. This IBI procedure may provide an additional assessment tool for professionals in water quality management.^ The experimental design consisted primarily of compiling previously collected data from monitoring conducted by the Texas Natural Resource Conservation Commission (TNRCC) at five bayous classified according to potential for anthropogenic impact and salinity regime. Standardized hydrological, chemical, and biological monitoring had been conducted in each of these watersheds. The identification and evaluation of candidate metrics for inclusion in the estuarine IBI was conducted through the use of correlation analysis, cluster analysis, stepwise and normal discriminant analysis, and evaluation of cumulative distribution frequencies. Scores of each included metric were determined based on exceedances of specific percentiles. Individual scores were summed and a total IBI score and rank for the community computed.^ Results of these analyses yielded the proposed metrics and rankings listed in this report. Based on the results of this study, incorporation of an estuarine IBI method as a water quality assessment tool is warranted. Adopted metrics were correlated to seasonal trends and less so to salinity gradients observed during the study (0-25 ppt). Further refinement of this method is needed using a larger more inclusive data set which includes additional habitat types, salinity ranges, and temporal variation. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Every x-ray attenuation curve inherently contains all the information necessary to extract the complete energy spectrum of a beam. To date, attempts to obtain accurate spectral information from attenuation data have been inadequate.^ This investigation presents a mathematical pair model, grounded in physical reality by the Laplace Transformation, to describe the attenuation of a photon beam and the corresponding bremsstrahlung spectral distribution. In addition the Laplace model has been mathematically extended to include characteristic radiation in a physically meaningful way. A method to determine the fraction of characteristic radiation in any diagnostic x-ray beam was introduced for use with the extended model.^ This work has examined the reconstructive capability of the Laplace pair model for a photon beam range of from 50 kVp to 25 MV, using both theoretical and experimental methods.^ In the diagnostic region, excellent agreement between a wide variety of experimental spectra and those reconstructed with the Laplace model was obtained when the atomic composition of the attenuators was accurately known. The model successfully reproduced a 2 MV spectrum but demonstrated difficulty in accurately reconstructing orthovoltage and 6 MV spectra. The 25 MV spectrum was successfully reconstructed although poor agreement with the spectrum obtained by Levy was found.^ The analysis of errors, performed with diagnostic energy data, demonstrated the relative insensitivity of the model to typical experimental errors and confirmed that the model can be successfully used to theoretically derive accurate spectral information from experimental attenuation data. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Improvements in the analysis of microarray images are critical for accurately quantifying gene expression levels. The acquisition of accurate spot intensities directly influences the results and interpretation of statistical analyses. This dissertation discusses the implementation of a novel approach to the analysis of cDNA microarray images. We use a stellar photometric model, the Moffat function, to quantify microarray spots from nylon microarray images. The inherent flexibility of the Moffat shape model makes it ideal for quantifying microarray spots. We apply our novel approach to a Wilms' tumor microarray study and compare our results with a fixed-circle segmentation approach for spot quantification. Our results suggest that different spot feature extraction methods can have an impact on the ability of statistical methods to identify differentially expressed genes. We also used the Moffat function to simulate a series of microarray images under various experimental conditions. These simulations were used to validate the performance of various statistical methods for identifying differentially expressed genes. Our simulation results indicate that tests taking into account the dependency between mean spot intensity and variance estimation, such as the smoothened t-test, can better identify differentially expressed genes, especially when the number of replicates and mean fold change are low. The analysis of the simulations also showed that overall, a rank sum test (Mann-Whitney) performed well at identifying differentially expressed genes. Previous work has suggested the strengths of nonparametric approaches for identifying differentially expressed genes. We also show that multivariate approaches, such as hierarchical and k-means cluster analysis along with principal components analysis, are only effective at classifying samples when replicate numbers and mean fold change are high. Finally, we show how our stellar shape model approach can be extended to the analysis of 2D-gel images by adapting the Moffat function to take into account the elliptical nature of spots in such images. Our results indicate that stellar shape models offer a previously unexplored approach for the quantification of 2D-gel spots. ^