3 resultados para R15 - Econometric and Input Output Models

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Improvements in genomic technology, both in the increased speed and reduced cost of sequencing, have expanded the appreciation of the abundance of human genetic variation. However the sheer amount of variation, as well as the varying type and genomic content of variation, poses a challenge in understanding the clinical consequence of a single mutation. This work uses several methodologies to interpret the observed variation in the human genome, and presents novel strategies for the prediction of allele pathogenicity.

Using the zebrafish model system as an in vivo assay of allele function, we identified a novel driver of Bardet-Biedl Syndrome (BBS) in CEP76. A combination of targeted sequencing of 785 cilia-associated genes in a cohort of BBS patients and subsequent in vivo functional assays recapitulating the human phenotype gave strong evidence for the role of CEP76 mutations in the pathology of an affected family. This portion of the work demonstrated the necessity of functional testing in validating disease-associated mutations, and added to the catalogue of known BBS disease genes.

Further study into the role of copy-number variations (CNVs) in a cohort of BBS patients showed the significant contribution of CNVs to disease pathology. Using high-density array comparative genomic hybridization (aCGH) we were able to identify pathogenic CNVs as small as several hundred bp. Dissection of constituent gene and in vivo experiments investigating epistatic interactions between affected genes allowed for an appreciation of several paradigms by which CNVs can contribute to disease. This study revealed that the contribution of CNVs to disease in BBS patients is much higher than previously expected, and demonstrated the necessity of consideration of CNV contribution in future (and retrospective) investigations of human genetic disease.

Finally, we used a combination of comparative genomics and in vivo complementation assays to identify second-site compensatory modification of pathogenic alleles. These pathogenic alleles, which are found compensated in other species (termed compensated pathogenic deviations [CPDs]), represent a significant fraction (from 3 – 10%) of human disease-associated alleles. In silico pathogenicity prediction algorithms, a valuable method of allele prioritization, often misrepresent these alleles as benign, leading to omission of possibly informative variants in studies of human genetic disease. We created a mathematical model that was able to predict CPDs and putative compensatory sites, and functionally showed in vivo that second-site mutation can mitigate the pathogenicity of disease alleles. Additionally, we made publically available an in silico module for the prediction of CPDs and modifier sites.

These studies have advanced the ability to interpret the pathogenicity of multiple types of human variation, as well as made available tools for others to do so as well.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The problem of social diffusion has animated sociological thinking on topics ranging from the spread of an idea, an innovation or a disease, to the foundations of collective behavior and political polarization. While network diffusion has been a productive metaphor, the reality of diffusion processes is often muddier. Ideas and innovations diffuse differently from diseases, but, with a few exceptions, the diffusion of ideas and innovations has been modeled under the same assumptions as the diffusion of disease. In this dissertation, I develop two new diffusion models for "socially meaningful" contagions that address two of the most significant problems with current diffusion models: (1) that contagions can only spread along observed ties, and (2) that contagions do not change as they spread between people. I augment insights from these statistical and simulation models with an analysis of an empirical case of diffusion - the use of enterprise collaboration software in a large technology company. I focus the empirical study on when people abandon innovations, a crucial, and understudied aspect of the diffusion of innovations. Using timestamped posts, I analyze when people abandon software to a high degree of detail.

To address the first problem, I suggest a latent space diffusion model. Rather than treating ties as stable conduits for information, the latent space diffusion model treats ties as random draws from an underlying social space, and simulates diffusion over the social space. Theoretically, the social space model integrates both actor ties and attributes simultaneously in a single social plane, while incorporating schemas into diffusion processes gives an explicit form to the reciprocal influences that cognition and social environment have on each other. Practically, the latent space diffusion model produces statistically consistent diffusion estimates where using the network alone does not, and the diffusion with schemas model shows that introducing some cognitive processing into diffusion processes changes the rate and ultimate distribution of the spreading information. To address the second problem, I suggest a diffusion model with schemas. Rather than treating information as though it is spread without changes, the schema diffusion model allows people to modify information they receive to fit an underlying mental model of the information before they pass the information to others. Combining the latent space models with a schema notion for actors improves our models for social diffusion both theoretically and practically.

The empirical case study focuses on how the changing value of an innovation, introduced by the innovations' network externalities, influences when people abandon the innovation. In it, I find that people are least likely to abandon an innovation when other people in their neighborhood currently use the software as well. The effect is particularly pronounced for supervisors' current use and number of supervisory team members who currently use the software. This case study not only points to an important process in the diffusion of innovation, but also suggests a new approach -- computerized collaboration systems -- to collecting and analyzing data on organizational processes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.

While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.

For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.