970 resultados para Statistical Method
Resumo:
The developmental processes and functions of an organism are controlled by the genes and the proteins that are derived from these genes. The identification of key genes and the reconstruction of gene networks can provide a model to help us understand the regulatory mechanisms for the initiation and progression of biological processes or functional abnormalities (e.g. diseases) in living organisms. In this dissertation, I have developed statistical methods to identify the genes and transcription factors (TFs) involved in biological processes, constructed their regulatory networks, and also evaluated some existing association methods to find robust methods for coexpression analyses. Two kinds of data sets were used for this work: genotype data and gene expression microarray data. On the basis of these data sets, this dissertation has two major parts, together forming six chapters. The first part deals with developing association methods for rare variants using genotype data (chapter 4 and 5). The second part deals with developing and/or evaluating statistical methods to identify genes and TFs involved in biological processes, and construction of their regulatory networks using gene expression data (chapter 2, 3, and 6). For the first part, I have developed two methods to find the groupwise association of rare variants with given diseases or traits. The first method is based on kernel machine learning and can be applied to both quantitative as well as qualitative traits. Simulation results showed that the proposed method has improved power over the existing weighted sum method (WS) in most settings. The second method uses multiple phenotypes to select a few top significant genes. It then finds the association of each gene with each phenotype while controlling the population stratification by adjusting the data for ancestry using principal components. This method was applied to GAW 17 data and was able to find several disease risk genes. For the second part, I have worked on three problems. First problem involved evaluation of eight gene association methods. A very comprehensive comparison of these methods with further analysis clearly demonstrates the distinct and common performance of these eight gene association methods. For the second problem, an algorithm named the bottom-up graphical Gaussian model was developed to identify the TFs that regulate pathway genes and reconstruct their hierarchical regulatory networks. This algorithm has produced very significant results and it is the first report to produce such hierarchical networks for these pathways. The third problem dealt with developing another algorithm called the top-down graphical Gaussian model that identifies the network governed by a specific TF. The network produced by the algorithm is proven to be of very high accuracy.
Resumo:
Our goal was to validate accuracy, consistency, and reproducibility/reliability of a new method for determining cup orientation in total hip arthroplasty (THA). This method allows matching the 3D-model from CT images or slices with the projected pelvis on an anteroposterior pelvic radiograph using a fully automated registration procedure. Cup orientation (inclination and anteversion) is calculated relative to the anterior pelvic plane, corrected for individual malposition of the pelvis during radiograph acquisition. Measurements on blinded and randomized radiographs of 80 cadaver and 327 patient hips were investigated. The method showed a mean accuracy of 0.7 +/- 1.7 degrees (-3.7 degrees to 4.0 degrees) for inclination and 1.2 +/- 2.4 degrees (-5.3 degrees to 5.6 degrees) for anteversion in the cadaver trials and 1.7 +/- 1.7 degrees (-4.6 degrees to 5.5 degrees) for inclination and 0.9 +/- 2.8 degrees (-5.2 degrees to 5.7 degrees) for anteversion in the clinical data when compared to CT-based measurements. No systematic errors in accuracy were detected with the Bland-Altman analysis. The software consistency and the reproducibility/reliability were very good. This software is an accurate, consistent, reliable, and reproducible method to measure cup orientation in THA using a sophisticated 2D/3D-matching technique. Its robust and accurate matching algorithm can be expanded to statistical models.
Resumo:
Gauging the maximum willingness to pay (WTP) of a product accurately is a critical success factor that determines not only market performance but also financial results. A number of approaches have therefore been developed to accurately estimate consumers’ willingness to pay. Here, four commonly used measurement approaches are compared using real purchase data as a benchmark. The relative strengths of each method are analyzed on the basis of statistical criteria and, more importantly, on their potential to predict managerially relevant criteria such as optimal price, quantity and profit. The results show a slight advantage of incentive-aligned approaches though the market settings need to be considered to choose the best-fitting procedure.
Resumo:
PURPOSE Segmentation of the proximal femur in digital antero-posterior (AP) pelvic radiographs is required to create a three-dimensional model of the hip joint for use in planning and treatment. However, manually extracting the femoral contour is tedious and prone to subjective bias, while automatic segmentation must accommodate poor image quality, anatomical structure overlap, and femur deformity. A new method was developed for femur segmentation in AP pelvic radiographs. METHODS Using manual annotations on 100 AP pelvic radiographs, a statistical shape model (SSM) and a statistical appearance model (SAM) of the femur contour were constructed. The SSM and SAM were used to segment new AP pelvic radiographs with a three-stage approach. At initialization, the mean SSM model is coarsely registered to the femur in the AP radiograph through a scaled rigid registration. Mahalanobis distance defined on the SAM is employed as the search criteria for each annotated suggested landmark location. Dynamic programming was used to eliminate ambiguities. After all landmarks are assigned, a regularized non-rigid registration method deforms the current mean shape of SSM to produce a new segmentation of proximal femur. The second and third stages are iteratively executed to convergence. RESULTS A set of 100 clinical AP pelvic radiographs (not used for training) were evaluated. The mean segmentation error was [Formula: see text], requiring [Formula: see text] s per case when implemented with Matlab. The influence of the initialization on segmentation results was tested by six clinicians, demonstrating no significance difference. CONCLUSIONS A fast, robust and accurate method for femur segmentation in digital AP pelvic radiographs was developed by combining SSM and SAM with dynamic programming. This method can be extended to segmentation of other bony structures such as the pelvis.
Resumo:
Calcium levels in spines play a significant role in determining the sign and magnitude of synaptic plasticity. The magnitude of calcium influx into spines is highly dependent on influx through N-methyl D-aspartate (NMDA) receptors, and therefore depends on the number of postsynaptic NMDA receptors in each spine. We have calculated previously how the number of postsynaptic NMDA receptors determines the mean and variance of calcium transients in the postsynaptic density, and how this alters the shape of plasticity curves. However, the number of postsynaptic NMDA receptors in the postsynaptic density is not well known. Anatomical methods for estimating the number of NMDA receptors produce estimates that are very different than those produced by physiological techniques. The physiological techniques are based on the statistics of synaptic transmission and it is difficult to experimentally estimate their precision. In this paper we use stochastic simulations in order to test the validity of a physiological estimation technique based on failure analysis. We find that the method is likely to underestimate the number of postsynaptic NMDA receptors, explain the source of the error, and re-derive a more precise estimation technique. We also show that the original failure analysis as well as our improved formulas are not robust to small estimation errors in key parameters.
Resumo:
We obtain upper bounds for the total variation distance between the distributions of two Gibbs point processes in a very general setting. Applications are provided to various well-known processes and settings from spatial statistics and statistical physics, including the comparison of two Lennard-Jones processes, hard core approximation of an area interaction process and the approximation of lattice processes by a continuous Gibbs process. Our proof of the main results is based on Stein's method. We construct an explicit coupling between two spatial birth-death processes to obtain Stein factors, and employ the Georgii-Nguyen-Zessin equation for the total bound.
Resumo:
Purpose: Proper delineation of ocular anatomy in 3D imaging is a big challenge, particularly when developing treatment plans for ocular diseases. Magnetic Resonance Imaging (MRI) is nowadays utilized in clinical practice for the diagnosis confirmation and treatment planning of retinoblastoma in infants, where it serves as a source of information, complementary to the Fundus or Ultrasound imaging. Here we present a framework to fully automatically segment the eye anatomy in the MRI based on 3D Active Shape Models (ASM), we validate the results and present a proof of concept to automatically segment pathological eyes. Material and Methods: Manual and automatic segmentation were performed on 24 images of healthy children eyes (3.29±2.15 years). Imaging was performed using a 3T MRI scanner. The ASM comprises the lens, the vitreous humor, the sclera and the cornea. The model was fitted by first automatically detecting the position of the eye center, the lens and the optic nerve, then aligning the model and fitting it to the patient. We validated our segmentation method using a leave-one-out cross validation. The segmentation results were evaluated by measuring the overlap using the Dice Similarity Coefficient (DSC) and the mean distance error. Results: We obtained a DSC of 94.90±2.12% for the sclera and the cornea, 94.72±1.89% for the vitreous humor and 85.16±4.91% for the lens. The mean distance error was 0.26±0.09mm. The entire process took 14s on average per eye. Conclusion: We provide a reliable and accurate tool that enables clinicians to automatically segment the sclera, the cornea, the vitreous humor and the lens using MRI. We additionally present a proof of concept for fully automatically segmenting pathological eyes. This tool reduces the time needed for eye shape delineation and thus can help clinicians when planning eye treatment and confirming the extent of the tumor.
Resumo:
We present an application and sample independent method for the automatic discrimination of noise and signal in optical coherence tomography Bscans. The proposed algorithm models the observed noise probabilistically and allows for a dynamic determination of image noise parameters and the choice of appropriate image rendering parameters. This overcomes the observer variability and the need for a priori information about the content of sample images, both of which are challenging to estimate systematically with current systems. As such, our approach has the advantage of automatically determining crucial parameters for evaluating rendered image quality in a systematic and task independent way. We tested our algorithm on data from four different biological and nonbiological samples (index finger, lemon slices, sticky tape, and detector cards) acquired with three different experimental spectral domain optical coherence tomography (OCT) measurement systems including a swept source OCT. The results are compared to parameters determined manually by four experienced OCT users. Overall, our algorithm works reliably regardless of which system and sample are used and estimates noise parameters in all cases within the confidence interval of those found by observers.
Resumo:
An efficient and reliable automated model that can map physical Soil and Water Conservation (SWC) structures on cultivated land was developed using very high spatial resolution imagery obtained from Google Earth and ArcGIS, ERDAS IMAGINE, and SDC Morphology Toolbox for MATLAB and statistical techniques. The model was developed using the following procedures: (1) a high-pass spatial filter algorithm was applied to detect linear features, (2) morphological processing was used to remove unwanted linear features, (3) the raster format was vectorized, (4) the vectorized linear features were split per hectare (ha) and each line was then classified according to its compass direction, and (5) the sum of all vector lengths per class of direction per ha was calculated. Finally, the direction class with the greatest length was selected from each ha to predict the physical SWC structures. The model was calibrated and validated on the Ethiopian Highlands. The model correctly mapped 80% of the existing structures. The developed model was then tested at different sites with different topography. The results show that the developed model is feasible for automated mapping of physical SWC structures. Therefore, the model is useful for predicting and mapping physical SWC structures areas across diverse areas.
Resumo:
Finite element (FE) analysis is an important computational tool in biomechanics. However, its adoption into clinical practice has been hampered by its computational complexity and required high technical competences for clinicians. In this paper we propose a supervised learning approach to predict the outcome of the FE analysis. We demonstrate our approach on clinical CT and X-ray femur images for FE predictions ( FEP), with features extracted, respectively, from a statistical shape model and from 2D-based morphometric and density information. Using leave-one-out experiments and sensitivity analysis, comprising a database of 89 clinical cases, our method is capable of predicting the distribution of stress values for a walking loading condition with an average correlation coefficient of 0.984 and 0.976, for CT and X-ray images, respectively. These findings suggest that supervised learning approaches have the potential to leverage the clinical integration of mechanical simulations for the treatment of musculoskeletal conditions.
Resumo:
Background: It is yet unclear if there are differences between using electronic key feature problems (KFPs) or electronic case-based multiple choice questions (cbMCQ) for the assessment of clinical decision making. Summary of Work: Fifth year medical students were exposed to clerkships which ended with a summative exam. Assessment of knowledge per exam was done by 6-9 KFPs, 9-20 cbMCQ and 9-28 MC questions. Each KFP consisted of a case vignette and three key features (KF) using “long menu” as question format. We sought students’ perceptions of the KFPs and cbMCQs in focus groups (n of students=39). Furthermore statistical data of 11 exams (n of students=377) concerning the KFPs and (cb)MCQs were compared. Summary of Results: The analysis of the focus groups resulted in four themes reflecting students’ perceptions of KFPs and their comparison with (cb)MCQ: KFPs were perceived as (i) more realistic, (ii) more difficult, (iii) more motivating for the intense study of clinical reasoning than (cb)MCQ and (iv) showed an overall good acceptance when some preconditions are taken into account. The statistical analysis revealed that there was no difference in difficulty; however KFP showed a higher discrimination and reliability (G-coefficient) even when corrected for testing times. Correlation of the different exam parts was intermediate. Conclusions: Students perceived the KFPs as more motivating for the study of clinical reasoning. Statistically KFPs showed a higher discrimination and higher reliability than cbMCQs. Take-home messages: Including KFPs with long menu questions into summative clerkship exams seems to offer positive educational effects.
Resumo:
Purpose. Fluorophotometry is a well validated method for assessing corneal permeability in human subjects. However, with the growing importance of basic science animal research in ophthalmology, fluorophotometry’s use in animals must be further evaluated. The purpose of this study was to evaluate corneal epithelial permeability following desiccating stress using the modified Fluorotron Master™. ^ Methods. Corneal permeability was evaluated prior to and after subjecting 6-8 week old C57BL/6 mice to experimental dry eye (EDE) for 2 and 5 days (n=9/time point). Untreated mice served as controls. Ten microliters of 0.001% sodium fluorescein (NaF) were instilled topically into each mouse’s left eye to create an eye bath, and left to permeate for 3 minutes. The eye bath was followed by a generous wash with Buffered Saline Solution (BSS) and alignment with the Fluorotron Master™. Seven corneal scans using the Fluorotron Master were performed during 15 minutes (1 st post-wash scans), followed by a second wash using BSS and another set of five corneal scans (2nd post-wash scans) during the next 15 minutes. Corneal permeability was calculated using data calculated with the FM™ Mouse software. ^ Results. When comparing the difference between the Post wash #1 scans within the group and the Post wash #2 scans within the group using a repeated measurement design, there was a statistical difference in the corneal fluorescein permeability of the Post-wash #1 scans after 5 days (1160.21±108.26 vs. 1000.47±75.56 ng/mL, P<0.016 for UT-5 day comparison 8 [0.008]), but not after only 2 days of EDE compared to Untreated mice (1115.64±118.94 vs. 1000.47±75.56 ng/mL, P>0.016 for UT-2 day comparison [0.050]). There was no statistical difference between the 2 day and 5 day Post wash #1 scans (P=.299). The Post-wash #2 scans demonstrated that EDE caused a significant NaF retention at both 2 and 5 days of EDE compared to baseline, untreated controls (1017.92±116.25, 1015.40±120.68 vs. 528.22±127.85 ng/mL, P<0.05 [0.0001 for both]). There was no statistical difference between the 2 day and 5 day Post wash #2 scans (P=.503). The comparison between the Untreated post wash #1 with untreated post wash #2 scans using a Paired T-test showed a significant difference between the two sets of scans (P=0.000). There is also a significant difference between the 2 day comparison and the 5 day comparison (P values = 0.010 and 0.002, respectively). ^ Conclusion. Desiccating stress increases permeability of the corneal epithelium to NaF, and increases NaF retention in the corneal stroma. The Fluorotron Master is a useful and sensitive tool to evaluate corneal permeability in murine dry eye, and will be a useful tool to evaluate the effectiveness of dry eye treatments in animal-model drug trials.^
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^