7 resultados para causal discovery

em Duke University


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Human genetics has been experiencing a wave of genetic discoveries thanks to the development of several technologies, such as genome-wide association studies (GWAS), whole-exome sequencing, and whole genome sequencing. Despite the massive genetic discoveries of new variants associated with human diseases, several key challenges emerge following the genetic discovery. GWAS is known to be good at identifying the locus associated with the patient phenotype. However, the actually causal variants responsible for the phenotype are often elusive. Another challenge in human genetics is that even the causal mutations are already known, the underlying biological effect might remain largely ambiguous. Functional evaluation plays a key role to solve these key challenges in human genetics both to identify causal variants responsible for the phenotype, and to further develop the biological insights from the disease-causing mutations.

We adopted various methods to characterize the effects of variants identified in human genetic studies, including patient genetic and phenotypic data, RNA chemistry, molecular biology, virology, and multi-electrode array and primary neuronal culture systems. Chapter 1 is a broader introduction for the motivation and challenges for functional evaluation in human genetic studies, and the background of several genetics discoveries, such as hepatitis C treatment response, in which we performed functional characterization.

Chapter 2 focuses on the characterization of causal variants following the GWAS study for hepatitis C treatment response. We characterized a non-coding SNP (rs4803217) of IL28B (IFNL3) in high linkage disequilibrium (LD) with the discovery SNP identified in the GWAS. In this chapter, we used inter-disciplinary approaches to characterize rs4803217 on RNA structure, disease association, and protein translation.

Chapter 3 describes another avenue of functional characterization following GWAS focusing on the novel transcripts and proteins identified near the IL28B (IFNL3) locus. It has been recently speculated that this novel protein, which was named IFNL4, may affect the HCV treatment response and clearance. In this chapter, we used molecular biology, virology, and patient genetic and phenotypic data to further characterize and understand the biology of IFNL4. The efforts in chapter 2 and 3 provided new insights to the candidate causal variant(s) responsible for the GWAS for HCV treatment response, however, more evidence is still required to make claims for the exact causal roles of these variants for the GWAS association.

Chapter 4 aims to characterize a mutation already known to cause a disease (seizure) in a mouse model. We demonstrate the potential use of multi-electrode array (MEA) system for the functional characterization and drug testing on mutations found in neurological diseases, such as seizure. Functional characterization in neurological diseases is relatively challenging and available systematic tools are relatively limited. This chapter shows an exploratory research and example to establish a system for the broader use for functional characterization and translational opportunities for mutations found in neurological diseases.

Overall, this dissertation spans a range of challenges of functional evaluations in human genetics. It is expected that the functional characterization to understand human mutations will become more central in human genetics, because there are still many biological questions remaining to be answered after the explosion of human genetic discoveries. The recent advance in several technologies, including genome editing and pluripotent stem cells, is also expected to make new tools available for functional studies in human diseases.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.

This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.

On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.

In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.

We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,

and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.

In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

© Institute of Mathematical Statistics, 2014.Motivated by recent findings in the field of consumer science, this paper evaluates the causal effect of debit cards on household consumption using population-based data from the Italy Survey on Household Income and Wealth (SHIW). Within the Rubin Causal Model, we focus on the estimand of population average treatment effect for the treated (PATT). We consider three existing estimators, based on regression, mixed matching and regression, propensity score weighting, and propose a new doubly-robust estimator. Semiparametric specification based on power series for the potential outcomes and the propensity score is adopted. Cross-validation is used to select the order of the power series. We conduct a simulation study to compare the performance of the estimators. The key assumptions, overlap and unconfoundedness, are systematically assessed and validated in the application. Our empirical results suggest statistically significant positive effects of debit cards on the monthly household spending in Italy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

MOTIVATION: Technological advances that allow routine identification of high-dimensional risk factors have led to high demand for statistical techniques that enable full utilization of these rich sources of information for genetics studies. Variable selection for censored outcome data as well as control of false discoveries (i.e. inclusion of irrelevant variables) in the presence of high-dimensional predictors present serious challenges. This article develops a computationally feasible method based on boosting and stability selection. Specifically, we modified the component-wise gradient boosting to improve the computational feasibility and introduced random permutation in stability selection for controlling false discoveries. RESULTS: We have proposed a high-dimensional variable selection method by incorporating stability selection to control false discovery. Comparisons between the proposed method and the commonly used univariate and Lasso approaches for variable selection reveal that the proposed method yields fewer false discoveries. The proposed method is applied to study the associations of 2339 common single-nucleotide polymorphisms (SNPs) with overall survival among cutaneous melanoma (CM) patients. The results have confirmed that BRCA2 pathway SNPs are likely to be associated with overall survival, as reported by previous literature. Moreover, we have identified several new Fanconi anemia (FA) pathway SNPs that are likely to modulate survival of CM patients. AVAILABILITY AND IMPLEMENTATION: The related source code and documents are freely available at https://sites.google.com/site/bestumich/issues. CONTACT: yili@umich.edu.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Intratumoral B lymphocytes are an integral part of the lung tumor microenvironment. Interrogation of the antibodies they express may improve our understanding of the host response to cancer and could be useful in elucidating novel molecular targets. We used two strategies to explore the repertoire of intratumoral B cell antibodies. First, we cloned VH and VL genes from single intratumoral B lymphocytes isolated from one lung tumor, expressed the genes as recombinant mAbs, and used the mAbs to identify the cognate tumor antigens. The Igs derived from intratumoral B cells demonstrated class switching, with a mean VH mutation frequency of 4%. Although there was no evidence for clonal expansion, these data are consistent with antigen-driven somatic hypermutation. Individual recombinant antibodies were polyreactive, although one clone demonstrated preferential immunoreactivity with tropomyosin 4 (TPM4). We found that higher levels of TPM4 antibodies were more common in cancer patients, but measurement of TPM4 antibody levels was not a sensitive test for detecting cancer. Second, in an effort to focus our recombinant antibody expression efforts on those B cells that displayed evidence of clonal expansion driven by antigen stimulation, we performed deep sequencing of the Ig genes of B cells collected from seven different tumors. Deep sequencing demonstrated somatic hypermutation but no dominant clones. These strategies may be useful for the study of B cell antibody expression, although identification of a dominant clone and unique therapeutic targets may require extensive investigation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Constitutive biosynthesis of lipid A via the Raetz pathway is essential for the viability and fitness of Gram-negative bacteria, includingChlamydia trachomatis Although nearly all of the enzymes in the lipid A biosynthetic pathway are highly conserved across Gram-negative bacteria, the cleavage of the pyrophosphate group of UDP-2,3-diacyl-GlcN (UDP-DAGn) to form lipid X is carried out by two unrelated enzymes: LpxH in beta- and gammaproteobacteria and LpxI in alphaproteobacteria. The intracellular pathogenC. trachomatislacks an ortholog for either of these two enzymes, and yet, it synthesizes lipid A and exhibits conservation of genes encoding other lipid A enzymes. Employing a complementation screen against aC. trachomatisgenomic library using a conditional-lethallpxHmutantEscherichia colistrain, we have identified an open reading frame (Ct461, renamedlpxG) encoding a previously uncharacterized enzyme that complements the UDP-DAGn hydrolase function inE. coliand catalyzes the conversion of UDP-DAGn to lipid Xin vitro LpxG shows little sequence similarity to either LpxH or LpxI, highlighting LpxG as the founding member of a third class of UDP-DAGn hydrolases. Overexpression of LpxG results in toxic accumulation of lipid X and profoundly reduces the infectivity ofC. trachomatis, validating LpxG as the long-sought-after UDP-DAGn pyrophosphatase in this prominent human pathogen. The complementation approach presented here overcomes the lack of suitable genetic tools forC. trachomatisand should be broadly applicable for the functional characterization of other essentialC. trachomatisgenes.IMPORTANCEChlamydia trachomatisis a leading cause of infectious blindness and sexually transmitted disease. Due to the lack of robust genetic tools, the functions of manyChlamydiagenes remain uncharacterized, including the essential gene encoding the UDP-DAGn pyrophosphatase activity for the biosynthesis of lipid A, the membrane anchor of lipooligosaccharide and the predominant lipid species of the outer leaflet of the bacterial outer membrane. We designed a complementation screen against theC. trachomatisgenomic library using a conditional-lethal mutant ofE. coliand identified the missing essential gene in the lipid A biosynthetic pathway, which we designatedlpxG We show that LpxG is a member of the calcineurin-like phosphatases and displays robust UDP-DAGn pyrophosphatase activityin vitro Overexpression of LpxG inC. trachomatisleads to the accumulation of the predicted lipid intermediate and reduces bacterial infectivity, validating thein vivofunction of LpxG and highlighting the importance of regulated lipid A biosynthesis inC. trachomatis.