920 resultados para nonparametric inference


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Causal inference with a continuous treatment is a relatively under-explored problem. In this dissertation, we adopt the potential outcomes framework. Potential outcomes are responses that would be seen for a unit under all possible treatments. In an observational study where the treatment is continuous, the potential outcomes are an uncountably infinite set indexed by treatment dose. We parameterize this unobservable set as a linear combination of a finite number of basis functions whose coefficients vary across units. This leads to new techniques for estimating the population average dose-response function (ADRF). Some techniques require a model for the treatment assignment given covariates, some require a model for predicting the potential outcomes from covariates, and some require both. We develop these techniques using a framework of estimating functions, compare them to existing methods for continuous treatments, and simulate their performance in a population where the ADRF is linear and the models for the treatment and/or outcomes may be misspecified. We also extend the comparisons to a data set of lottery winners in Massachusetts. Next, we describe the methods and functions in the R package causaldrf using data from the National Medical Expenditure Survey (NMES) and Infant Health and Development Program (IHDP) as examples. Additionally, we analyze the National Growth and Health Study (NGHS) data set and deal with the issue of missing data. Lastly, we discuss future research goals and possible extensions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Statistical association between a single nucleotide polymorphism (SNP) genotype and a quantitative trait in genome-wide association studies is usually assessed using a linear regression model, or, in the case of non-normally distributed trait values, using the Kruskal-Wallis test. While linear regression models assume an additive mode of inheritance via equi-distant genotype scores, Kruskal-Wallis test merely tests global differences in trait values associated with the three genotype groups. Both approaches thus exhibit suboptimal power when the underlying inheritance mode is dominant or recessive. Furthermore, these tests do not perform well in the common situations when only a few trait values are available in a rare genotype category (disbalance), or when the values associated with the three genotype categories exhibit unequal variance (variance heterogeneity). We propose a maximum test based on Marcus-type multiple contrast test for relative effect sizes. This test allows model-specific testing of either dominant, additive or recessive mode of inheritance, and it is robust against variance heterogeneity. We show how to obtain mode-specific simultaneous confidence intervals for the relative effect sizes to aid in interpreting the biological relevance of the results. Further, we discuss the use of a related all-pairwise comparisons contrast test with range preserving confidence intervals as an alternative to Kruskal-Wallis heterogeneity test. We applied the proposed maximum test to the Bogalusa Heart Study dataset, and gained a remarkable increase in the power to detect association, particularly for rare genotypes. Our simulation study also demonstrated that the proposed non-parametric tests control family-wise error rate in the presence of non-normality and variance heterogeneity contrary to the standard parametric approaches. We provide a publicly available R library nparcomp that can be used to estimate simultaneous confidence intervals or compatible multiplicity-adjusted p-values associated with the proposed maximum test.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present an IP-based nonparametric (revealed preference) testing procedure for rational consumption behavior in terms of general collective models, which include consumption externalities and public consumption. An empirical application to data drawn from the Russia Longitudinal Monitoring Survey (RLMS) demonstrates the practical usefulness of the procedure. Finally, we present extensions of the testing procedure to evaluate the goodness-of-…t of the collective model subject to testing, and to quantify and improve the power of the corresponding collective rationality tests.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In physics, one attempts to infer the rules governing a system given only the results of imperfect measurements. Hence, microscopic theories may be effectively indistinguishable experimentally. We develop an operationally motivated procedure to identify the corresponding equivalence classes of states, and argue that the renormalization group (RG) arises from the inherent ambiguities associated with the classes: one encounters flow parameters as, e.g., a regulator, a scale, or a measure of precision, which specify representatives in a given equivalence class. This provides a unifying framework and reveals the role played by information in renormalization. We validate this idea by showing that it justifies the use of low-momenta n-point functions as statistically relevant observables around a Gaussian hypothesis. These results enable the calculation of distinguishability in quantum field theory. Our methods also provide a way to extend renormalization techniques to effective models which are not based on the usual quantum-field formalism, and elucidates the relationships between various type of RG.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The size of online image datasets is constantly increasing. Considering an image dataset with millions of images, image retrieval becomes a seemingly intractable problem for exhaustive similarity search algorithms. Hashing methods, which encodes high-dimensional descriptors into compact binary strings, have become very popular because of their high efficiency in search and storage capacity. In the first part, we propose a multimodal retrieval method based on latent feature models. The procedure consists of a nonparametric Bayesian framework for learning underlying semantically meaningful abstract features in a multimodal dataset, a probabilistic retrieval model that allows cross-modal queries and an extension model for relevance feedback. In the second part, we focus on supervised hashing with kernels. We describe a flexible hashing procedure that treats binary codes and pairwise semantic similarity as latent and observed variables, respectively, in a probabilistic model based on Gaussian processes for binary classification. We present a scalable inference algorithm with the sparse pseudo-input Gaussian process (SPGP) model and distributed computing. In the last part, we define an incremental hashing strategy for dynamic databases where new images are added to the databases frequently. The method is based on a two-stage classification framework using binary and multi-class SVMs. The proposed method also enforces balance in binary codes by an imbalance penalty to obtain higher quality binary codes. We learn hash functions by an efficient algorithm where the NP-hard problem of finding optimal binary codes is solved via cyclic coordinate descent and SVMs are trained in a parallelized incremental manner. For modifications like adding images from an unseen class, we propose an incremental procedure for effective and efficient updates to the previous hash functions. Experiments on three large-scale image datasets demonstrate that the incremental strategy is capable of efficiently updating hash functions to the same retrieval performance as hashing from scratch.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Phylogenetic inference consist in the search of an evolutionary tree to explain the best way possible genealogical relationships of a set of species. Phylogenetic analysis has a large number of applications in areas such as biology, ecology, paleontology, etc. There are several criterias which has been defined in order to infer phylogenies, among which are the maximum parsimony and maximum likelihood. The first one tries to find the phylogenetic tree that minimizes the number of evolutionary steps needed to describe the evolutionary history among species, while the second tries to find the tree that has the highest probability of produce the observed data according to an evolutionary model. The search of a phylogenetic tree can be formulated as a multi-objective optimization problem, which aims to find trees which satisfy simultaneously (and as much as possible) both criteria of parsimony and likelihood. Due to the fact that these criteria are different there won't be a single optimal solution (a single tree), but a set of compromise solutions. The solutions of this set are called "Pareto Optimal". To find this solutions, evolutionary algorithms are being used with success nowadays.This algorithms are a family of techniques, which aren’t exact, inspired by the process of natural selection. They usually find great quality solutions in order to resolve convoluted optimization problems. The way this algorithms works is based on the handling of a set of trial solutions (trees in the phylogeny case) using operators, some of them exchanges information between solutions, simulating DNA crossing, and others apply aleatory modifications, simulating a mutation. The result of this algorithms is an approximation to the set of the “Pareto Optimal” which can be shown in a graph with in order that the expert in the problem (the biologist when we talk about inference) can choose the solution of the commitment which produces the higher interest. In the case of optimization multi-objective applied to phylogenetic inference, there is open source software tool, called MO-Phylogenetics, which is designed for the purpose of resolving inference problems with classic evolutionary algorithms and last generation algorithms. REFERENCES [1] C.A. Coello Coello, G.B. Lamont, D.A. van Veldhuizen. Evolutionary algorithms for solving multi-objective problems. Spring. Agosto 2007 [2] C. Zambrano-Vega, A.J. Nebro, J.F Aldana-Montes. MO-Phylogenetics: a phylogenetic inference software tool with multi-objective evolutionary metaheuristics. Methods in Ecology and Evolution. En prensa. Febrero 2016.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Organismal development, homeostasis, and pathology are rooted in inherently probabilistic events. From gene expression to cellular differentiation, rates and likelihoods shape the form and function of biology. Processes ranging from growth to cancer homeostasis to reprogramming of stem cells all require transitions between distinct phenotypic states, and these occur at defined rates. Therefore, measuring the fidelity and dynamics with which such transitions occur is central to understanding natural biological phenomena and is critical for therapeutic interventions.

While these processes may produce robust population-level behaviors, decisions are made by individual cells. In certain circumstances, these minuscule computing units effectively roll dice to determine their fate. And while the 'omics' era has provided vast amounts of data on what these populations are doing en masse, the behaviors of the underlying units of these processes get washed out in averages.

Therefore, in order to understand the behavior of a sample of cells, it is critical to reveal how its underlying components, or mixture of cells in distinct states, each contribute to the overall phenotype. As such, we must first define what states exist in the population, determine what controls the stability of these states, and measure in high dimensionality the dynamics with which these cells transition between states.

To address a specific example of this general problem, we investigate the heterogeneity and dynamics of mouse embryonic stem cells (mESCs). While a number of reports have identified particular genes in ES cells that switch between 'high' and 'low' metastable expression states in culture, it remains unclear how levels of many of these regulators combine to form states in transcriptional space. Using a method called single molecule mRNA fluorescent in situ hybridization (smFISH), we quantitatively measure and fit distributions of core pluripotency regulators in single cells, identifying a wide range of variabilities between genes, but each explained by a simple model of bursty transcription. From this data, we also observed that strongly bimodal genes appear to be co-expressed, effectively limiting the occupancy of transcriptional space to two primary states across genes studied here. However, these states also appear punctuated by the conditional expression of the most highly variable genes, potentially defining smaller substates of pluripotency.

Having defined the transcriptional states, we next asked what might control their stability or persistence. Surprisingly, we found that DNA methylation, a mark normally associated with irreversible developmental progression, was itself differentially regulated between these two primary states. Furthermore, both acute or chronic inhibition of DNA methyltransferase activity led to reduced heterogeneity among the population, suggesting that metastability can be modulated by this strong epigenetic mark.

Finally, because understanding the dynamics of state transitions is fundamental to a variety of biological problems, we sought to develop a high-throughput method for the identification of cellular trajectories without the need for cell-line engineering. We achieved this by combining cell-lineage information gathered from time-lapse microscopy with endpoint smFISH for measurements of final expression states. Applying a simple mathematical framework to these lineage-tree associated expression states enables the inference of dynamic transitions. We apply our novel approach in order to infer temporal sequences of events, quantitative switching rates, and network topology among a set of ESC states.

Taken together, we identify distinct expression states in ES cells, gain fundamental insight into how a strong epigenetic modifier enforces the stability of these states, and develop and apply a new method for the identification of cellular trajectories using scalable in situ readouts of cellular state.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Two new methodologies are introduced to improve inference in the evaluation of mutual fund performance against benchmarks. First, the benchmark models are estimated using panel methods with both fund and time effects. Second, the non-normality of individual mutual fund returns is accounted for by using panel bootstrap methods. We also augment the standard benchmark factors with fund-specific characteristics, such as fund size. Using a dataset of UK equity mutual fund returns, we find that fund size has a negative effect on the average fund manager’s benchmark-adjusted performance. Further, when we allow for time effects and the non-normality of fund returns, we find that there is no evidence that even the best performing fund managers can significantly out-perform the augmented benchmarks after fund management charges are taken into account.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, we consider Preference Inference based on a generalised form of Pareto order. Preference Inference aims at reasoning over an incomplete specification of user preferences. We focus on two problems. The Preference Deduction Problem (PDP) asks if another preference statement can be deduced (with certainty) from a set of given preference statements. The Preference Consistency Problem (PCP) asks if a set of given preference statements is consistent, i.e., the statements are not contradicting each other. Here, preference statements are direct comparisons between alternatives (strict and non-strict). It is assumed that a set of evaluation functions is known by which all alternatives can be rated. We consider Pareto models which induce order relations on the set of alternatives in a Pareto manner, i.e., one alternative is preferred to another only if it is preferred on every component of the model. We describe characterisations for deduction and consistency based on an analysis of the set of evaluation functions, and present algorithmic solutions and complexity results for PDP and PCP, based on Pareto models in general and for a special case. Furthermore, a comparison shows that the inference based on Pareto models is less cautious than some other types of well-known preference model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

info:eu-repo/semantics/publishedVersion

Relevância:

20.00% 20.00%

Publicador:

Resumo:

info:eu-repo/semantics/publishedVersion

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Se realizó un estudio observacional retrospectivo longitudinal en una Institución prestadora de Servicios de Salud de la ciudad de Bogotá, con el objetivo de evaluar la efectividad en el manejo del dolor de la terapia con acupuntura en el tratamiento de lumbalgia. Se tomaron 150 historias clínicas de pacientes con lumbalgia atendidos de enero de 2014 a mayo de 2016, las cuales fueron sometidas a los criterios de inclusión definidos por los autores, arrojando 48 historias sometidas a la prueba de Friedman con el fin de identificar el impacto sobre el dolor del tratamiento con acupuntura en los pacientes seleccionados. Adicionalmente, bajo un muestreo aleatorio simple de distribución normal sobre las 48 historias clínicas evaluadas, se seleccionaron 25 casos a los cuales se les aplicó una encuesta no estructurada, con el fin de obtener información sobre el estado de la patología después de finalizar el tratamiento e identificar las posibles causas de deserción. Con este estudio se concluye que la terapia con acupuntura es efectiva en el manejo del dolor de pacientes con lumbalgia, y que es necesario realizar más estudios que puedan sustentar la inclusión de la terapéutica en el manejo de esta patología.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Vulnerability and sustainability studies of an area help to assess both its level of exposure and capacity to support possible environmental impacts, and it is of primordial importance for proposals of the Legislation on Zoning, Allotment, Land Use/land cover, aiming to stimulate those areas indicated for urban growth, to discourage growth of overcrowded areas, to detect sections with restrictive use, as well as districts for permanent protection. This paper aims to analyze the vulnerability in the Maranhão Ilha, using GIS techniques, geospatial inference intersected with relevant social-environmental indicators.Estudos de vulnerabilidade e de sustentabilidade de uma área ajudam a avaliar o seu grau de exposição e sua capacidade de suporte a possíveis impactos ambientais, sendo fundamental para propostas de Lei de Zoneamento, Parcelamento, Uso e Ocupação do Solo, tendo por finalidade orientar as áreas onde deverá haver estímulo para o crescimento urbano; contenção da malha urbana; detecção de locais com possibilidade de uso restritivo, bem como locais de proteção permanente. Este trabalho propõe analisar o índice de vulnerabilidade a perda de solo da Ilha do Maranhão com base na metodologia proposta por (CREPANI, et al. 2001) e em técnicas de inferência espacial com apoio na AHP (Análise Hierárquica de Processo).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In questa tesi vengono discusse le principali tecniche di machine learning riguardanti l'inferenza di tipo nei linguaggi tipati dinamicamente come Python. In aggiunta è stato creato un dataset di progetti Python per l'addestramento di modelli capaci di analizzare il codice