8 resultados para Test data generation

em DigitalCommons@The Texas Medical Center


Relevância:

90.00% 90.00%

Publicador:

Resumo:

High-throughput assays, such as yeast two-hybrid system, have generated a huge amount of protein-protein interaction (PPI) data in the past decade. This tremendously increases the need for developing reliable methods to systematically and automatically suggest protein functions and relationships between them. With the available PPI data, it is now possible to study the functions and relationships in the context of a large-scale network. To data, several network-based schemes have been provided to effectively annotate protein functions on a large scale. However, due to those inherent noises in high-throughput data generation, new methods and algorithms should be developed to increase the reliability of functional annotations. Previous work in a yeast PPI network (Samanta and Liang, 2003) has shown that the local connection topology, particularly for two proteins sharing an unusually large number of neighbors, can predict functional associations between proteins, and hence suggest their functions. One advantage of the work is that their algorithm is not sensitive to noises (false positives) in high-throughput PPI data. In this study, we improved their prediction scheme by developing a new algorithm and new methods which we applied on a human PPI network to make a genome-wide functional inference. We used the new algorithm to measure and reduce the influence of hub proteins on detecting functionally associated proteins. We used the annotations of the Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) as independent and unbiased benchmarks to evaluate our algorithms and methods within the human PPI network. We showed that, compared with the previous work from Samanta and Liang, our algorithm and methods developed in this study improved the overall quality of functional inferences for human proteins. By applying the algorithms to the human PPI network, we obtained 4,233 significant functional associations among 1,754 proteins. Further comparisons of their KEGG and GO annotations allowed us to assign 466 KEGG pathway annotations to 274 proteins and 123 GO annotations to 114 proteins with estimated false discovery rates of <21% for KEGG and <30% for GO. We clustered 1,729 proteins by their functional associations and made pathway analysis to identify several subclusters that are highly enriched in certain signaling pathways. Particularly, we performed a detailed analysis on a subcluster enriched in the transforming growth factor β signaling pathway (P<10-50) which is important in cell proliferation and tumorigenesis. Analysis of another four subclusters also suggested potential new players in six signaling pathways worthy of further experimental investigations. Our study gives clear insight into the common neighbor-based prediction scheme and provides a reliable method for large-scale functional annotations in this post-genomic era.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Genetic anticipation is defined as a decrease in age of onset or increase in severity as the disorder is transmitted through subsequent generations. Anticipation has been noted in the literature for over a century. Recently, anticipation in several diseases including Huntington's Disease, Myotonic Dystrophy and Fragile X Syndrome were shown to be caused by expansion of triplet repeats. Anticipation effects have also been observed in numerous mental disorders (e.g. Schizophrenia, Bipolar Disorder), cancers (Li-Fraumeni Syndrome, Leukemia) and other complex diseases. ^ Several statistical methods have been applied to determine whether anticipation is a true phenomenon in a particular disorder, including standard statistical tests and newly developed affected parent/affected child pair methods. These methods have been shown to be inappropriate for assessing anticipation for a variety of reasons, including familial correlation and low power. Therefore, we have developed family-based likelihood modeling approaches to model the underlying transmission of the disease gene and penetrance function and hence detect anticipation. These methods can be applied in extended families, thus improving the power to detect anticipation compared with existing methods based only upon parents and children. The first method we have proposed is based on the regressive logistic hazard model. This approach models anticipation by a generational covariate. The second method allows alleles to mutate as they are transmitted from parents to offspring and is appropriate for modeling the known triplet repeat diseases in which the disease alleles can become more deleterious as they are transmitted across generations. ^ To evaluate the new methods, we performed extensive simulation studies for data simulated under different conditions to evaluate the effectiveness of the algorithms to detect genetic anticipation. Results from analysis by the first method yielded empirical power greater than 87% based on the 5% type I error critical value identified in each simulation depending on the method of data generation and current age criteria. Analysis by the second method was not possible due to the current formulation of the software. The application of this method to Huntington's Disease and Li-Fraumeni Syndrome data sets revealed evidence for a generation effect in both cases. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dental caries is a common preventable childhood disease leading to severe physical, mental and economic repercussions for children and their families if left untreated. A needs assessment in Harris County reported that 45.9% of second graders had untreated dental caries. In order to address this growing problem, the School Sealant Program (SSP), a primary preventive initiative, was launched by the Houston Department of Health and Human Services (HDHHS) to provide oral health education, and underutilized dental preventive services to second grade children from participating Local School Districts (LSDs). ^ To determine the effectiveness and efficiency of the SSP, a program evaluation was conducted by the HDHHS between September 2007 and June 2008 for the Oral Health Education (OHE) component of the SSP. The objective of the evaluation was to assess short term changes in oral health knowledge of the participants and determine if these changes, if any, were due to the OHE sessions. An 8-item multiple choice pre/post test was developed for this purpose and administered to the participants before and immediately after the OHE sessions. ^ The present project analyzed pre and post test data of 1,088 second graders from 22 participating schools. Changes in overall and topic-specific knowledge of the program participants before and after the OHE sessions were analyzed using the Wilcoxon's signed rank test. ^ Results. The overall knowledge assessment showed a statistically significant (p <0.001) increase in the dental health knowledge of the participants after the oral health education sessions. Participants in the higher scoring category (7-8 correct responses) increased from 9.5% at baseline to 60.8% after the education sessions. Overall knowledge increased in all school regions with the highest knowledge gains seen in the Central and South regions. Males and females had similar knowledge gains. Significant knowledge differences were also found for each of the topic specific categories (functions of teeth, healthy diet, healthy habits, dental sealants; p<0.001) indicating an increase in topic specific knowledge of the participants post-health education sessions. ^ Conclusions. The OHE sessions were successful in increasing the short term oral health knowledge of the participants. ^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The U.S. Air Force, as with the other branches of military services, has physical fitness standards imposed on their personnel. These standards ensure a healthy and fit combat force. To meet these standards, Airmen have to maintain a certain level of physical activity in their lifestyle. Objective. This was a cross sectional (prevalence) study to evaluate the association of Airmen's self-reported physical activity and their performance in the Air Force Physical Fitness Assessment in 2007. Methods. The self-reported physical activity data were obtained from the Air Force Web Health Assessment (AF WEB HA), a web-based health questionnaire completed by the Airmen during their annual Preventive Health Assessment. The physical activity levels were categorized as having met or not having met the Centers for Disease Control and Prevention (CDC) and the American College of Sports Medicine (ACSM) physical activity recommendations. Physical Fitness scores were collected from the Air Force Fitness Management System (AFFMS), a repository of physical fitness test data. Results. There were 49,029 Airmen who answered the AF WEB HA in 2007 and also took their physical fitness test. 94.4% (n = 46,304) of Airmen met the recommended physical activity guidelines and 79.9% (n = 39,178) passed the fitness test. Total Airmen who both met the physical activity recommendations and passed the fitness test was 75.6% (n = 37,088). Airmen who did not meet the activity recommendations and also failed the fitness test totaled 635 or 1.3% of the study group. The Mantel-Haenszel Chi-Square analysis of the data on the activity levels and the physical fitness test relationship was the following χ2 = 18.52, df 1, and p = <0.0001. The Odds Ratio (OR) was 1.22 (95% CI 1.12, 1.34). Conclusion. The study determined that there was a positive association between Airmen's self-reported physical activity and their performance in the physical fitness assessment.^

Relevância:

80.00% 80.00%

Publicador:

Resumo:

According to the transtheoretical model, consciousness raising and social liberation are processes used to help individuals progress through the stages of change for a given behavior. This study assessed the impact of these two processes on readiness to engage in regular physical activity among a convenience sample of 35 adults in the Magnolia Park/Lawndale-Wayside area of Houston, TX. The duration of the study was approximately 4 weeks. All pre/post-test data were collected via self-administered surveys available in English or Spanish. Baseline data were used to determine the culturally relevant content of a one-dose intervention consisting of a presentation and dissemination of educational materials as well as a list of local physical activity opportunities. Although the intervention did not improve progression through the stages of change, significant increases were evident among 5 out of 6 processes of change. Based on these results and qualitative data, this study recommended that the Houston Parks and Recreation Department incorporate cultural competency into the design and publication of materials and revise the schedule of available programs (i.e.: increase the number of walking programs) in order to reflect the physical activity preferences of Magnolia Park/Lawndale-Wayside residents.^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

An interim analysis is usually applied in later phase II or phase III trials to find convincing evidence of a significant treatment difference that may lead to trial termination at an earlier point than planned at the beginning. This can result in the saving of patient resources and shortening of drug development and approval time. In addition, ethics and economics are also the reasons to stop a trial earlier. In clinical trials of eyes, ears, knees, arms, kidneys, lungs, and other clustered treatments, data may include distribution-free random variables with matched and unmatched subjects in one study. It is important to properly include both subjects in the interim and the final analyses so that the maximum efficiency of statistical and clinical inferences can be obtained at different stages of the trials. So far, no publication has applied a statistical method for distribution-free data with matched and unmatched subjects in the interim analysis of clinical trials. In this simulation study, the hybrid statistic was used to estimate the empirical powers and the empirical type I errors among the simulated datasets with different sample sizes, different effect sizes, different correlation coefficients for matched pairs, and different data distributions, respectively, in the interim and final analysis with 4 different group sequential methods. Empirical powers and empirical type I errors were also compared to those estimated by using the meta-analysis t-test among the same simulated datasets. Results from this simulation study show that, compared to the meta-analysis t-test commonly used for data with normally distributed observations, the hybrid statistic has a greater power for data observed from normally, log-normally, and multinomially distributed random variables with matched and unmatched subjects and with outliers. Powers rose with the increase in sample size, effect size, and correlation coefficient for the matched pairs. In addition, lower type I errors were observed estimated by using the hybrid statistic, which indicates that this test is also conservative for data with outliers in the interim analysis of clinical trials.^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Next-generation sequencing (NGS) technology has become a prominent tool in biological and biomedical research. However, NGS data analysis, such as de novo assembly, mapping and variants detection is far from maturity, and the high sequencing error-rate is one of the major problems. . To minimize the impact of sequencing errors, we developed a highly robust and efficient method, MTM, to correct the errors in NGS reads. We demonstrated the effectiveness of MTM on both single-cell data with highly non-uniform coverage and normal data with uniformly high coverage, reflecting that MTM’s performance does not rely on the coverage of the sequencing reads. MTM was also compared with Hammer and Quake, the best methods for correcting non-uniform and uniform data respectively. For non-uniform data, MTM outperformed both Hammer and Quake. For uniform data, MTM showed better performance than Quake and comparable results to Hammer. By making better error correction with MTM, the quality of downstream analysis, such as mapping and SNP detection, was improved. SNP calling is a major application of NGS technologies. However, the existence of sequencing errors complicates this process, especially for the low coverage (