5 resultados para 380305 Knowledge Representation and Machine Learning
em Duke University
Resumo:
Spectral CT using a photon counting x-ray detector (PCXD) shows great potential for measuring material composition based on energy dependent x-ray attenuation. Spectral CT is especially suited for imaging with K-edge contrast agents to address the otherwise limited contrast in soft tissues. We have developed a micro-CT system based on a PCXD. This system enables full spectrum CT in which the energy thresholds of the PCXD are swept to sample the full energy spectrum for each detector element and projection angle. Measurements provided by the PCXD, however, are distorted due to undesirable physical eects in the detector and are very noisy due to photon starvation. In this work, we proposed two methods based on machine learning to address the spectral distortion issue and to improve the material decomposition. This rst approach is to model distortions using an articial neural network (ANN) and compensate for the distortion in a statistical reconstruction. The second approach is to directly correct for the distortion in the projections. Both technique can be done as a calibration process where the neural network can be trained using 3D printed phantoms data to learn the distortion model or the correction model of the spectral distortion. This replaces the need for synchrotron measurements required in conventional technique to derive the distortion model parametrically which could be costly and time consuming. The results demonstrate experimental feasibility and potential advantages of ANN-based distortion modeling and correction for more accurate K-edge imaging with a PCXD. Given the computational eciency with which the ANN can be applied to projection data, the proposed scheme can be readily integrated into existing CT reconstruction pipelines.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
Subspaces and manifolds are two powerful models for high dimensional signals. Subspaces model linear correlation and are a good fit to signals generated by physical systems, such as frontal images of human faces and multiple sources impinging at an antenna array. Manifolds model sources that are not linearly correlated, but where signals are determined by a small number of parameters. Examples are images of human faces under different poses or expressions, and handwritten digits with varying styles. However, there will always be some degree of model mismatch between the subspace or manifold model and the true statistics of the source. This dissertation exploits subspace and manifold models as prior information in various signal processing and machine learning tasks.
A near-low-rank Gaussian mixture model measures proximity to a union of linear or affine subspaces. This simple model can effectively capture the signal distribution when each class is near a subspace. This dissertation studies how the pairwise geometry between these subspaces affects classification performance. When model mismatch is vanishingly small, the probability of misclassification is determined by the product of the sines of the principal angles between subspaces. When the model mismatch is more significant, the probability of misclassification is determined by the sum of the squares of the sines of the principal angles. Reliability of classification is derived in terms of the distribution of signal energy across principal vectors. Larger principal angles lead to smaller classification error, motivating a linear transform that optimizes principal angles. This linear transformation, termed TRAIT, also preserves some specific features in each class, being complementary to a recently developed Low Rank Transform (LRT). Moreover, when the model mismatch is more significant, TRAIT shows superior performance compared to LRT.
The manifold model enforces a constraint on the freedom of data variation. Learning features that are robust to data variation is very important, especially when the size of the training set is small. A learning machine with large numbers of parameters, e.g., deep neural network, can well describe a very complicated data distribution. However, it is also more likely to be sensitive to small perturbations of the data, and to suffer from suffer from degraded performance when generalizing to unseen (test) data.
From the perspective of complexity of function classes, such a learning machine has a huge capacity (complexity), which tends to overfit. The manifold model provides us with a way of regularizing the learning machine, so as to reduce the generalization error, therefore mitigate overfiting. Two different overfiting-preventing approaches are proposed, one from the perspective of data variation, the other from capacity/complexity control. In the first approach, the learning machine is encouraged to make decisions that vary smoothly for data points in local neighborhoods on the manifold. In the second approach, a graph adjacency matrix is derived for the manifold, and the learned features are encouraged to be aligned with the principal components of this adjacency matrix. Experimental results on benchmark datasets are demonstrated, showing an obvious advantage of the proposed approaches when the training set is small.
Stochastic optimization makes it possible to track a slowly varying subspace underlying streaming data. By approximating local neighborhoods using affine subspaces, a slowly varying manifold can be efficiently tracked as well, even with corrupted and noisy data. The more the local neighborhoods, the better the approximation, but the higher the computational complexity. A multiscale approximation scheme is proposed, where the local approximating subspaces are organized in a tree structure. Splitting and merging of the tree nodes then allows efficient control of the number of neighbourhoods. Deviation (of each datum) from the learned model is estimated, yielding a series of statistics for anomaly detection. This framework extends the classical {\em changepoint detection} technique, which only works for one dimensional signals. Simulations and experiments highlight the robustness and efficacy of the proposed approach in detecting an abrupt change in an otherwise slowly varying low-dimensional manifold.
Resumo:
Transcription factors (TFs) control the temporal and spatial expression of target genes by interacting with DNA in a sequence-specific manner. Recent advances in high throughput experiments that measure TF-DNA interactions in vitro and in vivo have facilitated the identification of DNA binding sites for thousands of TFs. However, it remains unclear how each individual TF achieves its specificity, especially in the case of paralogous TFs that recognize distinct target genomic sites despite sharing very similar DNA binding motifs. In my work, I used a combination of high throughput in vitro protein-DNA binding assays and machine-learning algorithms to characterize and model the binding specificity of 11 paralogous TFs from 4 distinct structural families. My work proves that even very closely related paralogous TFs, with indistinguishable DNA binding motifs, oftentimes exhibit differential binding specificity for their genomic target sites, especially for sites with moderate binding affinity. Importantly, the differences I identify in vitro and through computational modeling help explain, at least in part, the differential in vivo genomic targeting by paralogous TFs. Future work will focus on in vivo factors that might also be important for specificity differences between paralogous TFs, such as DNA methylation, interactions with protein cofactors, or the chromatin environment. In this larger context, my work emphasizes the importance of intrinsic DNA binding specificity in targeting of paralogous TFs to the genome.
Resumo:
Background: Worldwide, it is estimated that there are up to 150 million street children. Street children are an understudied, vulnerable population. While many studies have characterized street children’s physical health, few have addressed the circumstances and barriers to their utilization of health services.
Methods: A systematic literature review was conducted to understand the barriers and facilitators that street children face when accessing healthcare in low and middle income countries. Six databases were used to search for peer review literature and one database and Google Search engine were used to find grey literature (theses, dissertations, reports, etc.). There were no exclusions based on study design. Studies were eligible for inclusion if the study population included street children, the study location was a low and middle income country defined by the World Bank, AND whose subject pertained to healthcare.
In addition, a cross-sectional study was conducted between May 2015 and August 2015 with the goal of understanding knowledge, attitudes, and health seeking practices of street children residing in Battambang, Cambodia. Time location and purposive sampling were used to recruit community (control) and street children. Both boys and girls between the ages of 10 and 18 were recruited. Data was collected through a verbally administered survey. The knowledge, attitudes and health seeking practices of community and street children were compared to determine potential differences in healthcare utilization.
Results: Of the 2933 abstracts screened for inclusion in the systematic literature review, eleven articles met all the inclusion criteria and were found to be relevant. Cost and perceived stigma appeared to be the largest barriers street children faced when attempting to seek care. Street children preferred to receive care from a hospital. However, negative experiences and mistreatment by health providers deterred children from going there. Instead, street children would often self treat and/or purchase medicine from a pharmacy or drug vendor. Family and peer support were found to be important for facilitating treatment.
The survey found similar results to the systematic review. Forty one community and thirty four street children were included in the analysis. Both community and street children reported the hospital as their top choice for care. When asked if someone went with them to seek care, both community and street children reported that family members, usually mothers, accompanied them. Community and street children both reported perceived stigma. All children had good knowledge of preventative care.
Conclusions: While most current services lack the proper accommodations for street children, there is a great potential to adapt them to better address street children’s needs. Street children need health services that are sensitive to their situation. Subsidies in health service costs or provision of credit may be ways to reduce constraints street children face when deciding to seek healthcare. Health worker education and interventions to reduce stigma are needed to create a positive environment in which street children are admitted and treated for health concerns.