953 resultados para Statistical inference


Relevância:

60.00% 60.00%

Publicador:

Resumo:

A longitudinal investigation of the health effects and reservoirs of Giardia lamblia was undertaken in forty households located in a rural Nile Delta region of Egypt. Stool specimens obtained once weekly for six months from two to four year old children were cyst or trophozoite-positive in 42 percent of the 724 examined. The mean duration of excretion in all but one Giardia-negative child was seven and one-half weeks with a range of one to 17 weeks. Clinical symptoms of illness were frequently observed within a month before or after Giardia excretion in stool of children, but a statistical inference of association was not demonstrated.^ Seventeen percent of 697 specimens obtained from their mothers was Giardia-positive for a mean duration of four weeks and a range of one to 18 weeks. Mothers were observed to excrete Giardia in stool less frequently during pregnancy than during lactation.^ Nine hundred sixty-two specimens were collected from 13 species of household livestock. Giardia was detected in a total of 22 specimens from cows, goats, sheep and one duck. Giardia cysts were detected in three of 899 samples of household drinking water.^ An ELISA technique of Giardia detection in human and animal stool was field tested under variable environmental conditions. The overall sensitivity of the assay of human specimens was 74 percent and specificity was 97 percent. These values for assay of animal specimens were 82 percent and 98 percent, respectively.^ Surface antigen studies reported from the NIH Laboratory of Parasitic Diseases show that antigens of three Egyptian human isolates are different from each other and from most other isolates against which they were tested.^ The ubiquity of human and animal fecal contamination combined with estimates of ill days per child per year in this setting are substantial arguments for the introduction of a suggested mass parasite control program to intervene in the cyclical transmission of agents of enteric disease. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Of the large clinical trials evaluating screening mammography efficacy, none included women ages 75 and older. Recommendations on an upper age limit at which to discontinue screening are based on indirect evidence and are not consistent. Screening mammography is evaluated using observational data from the SEER-Medicare linked database. Measuring the benefit of screening mammography is difficult due to the impact of lead-time bias, length bias and over-detection. The underlying conceptual model divides the disease into two stages: pre-clinical (T0) and symptomatic (T1) breast cancer. Treating the time in these phases as a pair of dependent bivariate observations, (t0,t1), estimates are derived to describe the distribution of this random vector. To quantify the effect of screening mammography, statistical inference is made about the mammography parameters that correspond to the marginal distribution of the symptomatic phase duration (T1). This shows the hazard ratio of death from breast cancer comparing women with screen-detected tumors to those detected at their symptom onset is 0.36 (0.30, 0.42), indicating a benefit among the screen-detected cases. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Esta metodología se ha desarrollado en el marco de un proyecto que es el objeto del Convenio Específico de Colaboración entre el Instituto Geográfico Nacional y la Escuela de Topografía de la Universidad Politécnica de Madrid relativo a la investigación, desarrollo, formación y difusión de conocimientos en el campo de las tecnologías de la información geográfica (TIG) para la investigación y desarrollo de la tecnología y metodología adecuada para la optimización de la información de la Base de Datos de Líneas Límite de la Dirección General del Instituto Geográfico Nacional. El fin fundamental del mismo es desarrollar una metodología para mejorar la precisión de la Base de Datos de Líneas Límite que tiene el Instituto Geográfico Nacional. La exigencia actual de calidad y seguridad en la descripción geométrica de las líneas límite obliga a optimizar dicha descripción mediante la aplicación de nuevas tecnologías no existentes en el momento del levantamiento, y al diseño de metodologías adecuadas que, minimizando los tiempos y costes de ejecución, consideren asimismo los distintos agentes que participan en España en la definición de las líneas límite. Para desarrollar dicha metodología será necesario en primer lugar digitalizar la información de los cuadernos de campo y las actas de deslinde existentes en el Instituto Geográfico Nacional, para que sea un trabajo abordable desde las tecnologías actuales; posteriormente, volcar la información referente a las líneas límite sobre ortofotografías a escala 1:5.000, a partir de los datos de los cuadernos de campo digitalizados. Se propondrá un nuevo sistema de gestión, tratamiento y almacenamiento de las líneas límite, con información sobre su linaje (origen de datos, precisión), así como el formato de salida de las propias líneas límite. Para controlar la calidad de la metodología propuesta, se deberá validar la misma mediante un estudio teórico de lamedida de rendimientos y precisiones y su verificación mediante toma de datos en campo. Particularmente, se llevará a cabo dicha validación en un conjunto de 140 líneas límite de 36 municipios de la provincia de Ávila y Segovia (los comprendidos en las hojas 556 y 457 del Mapa Topográfico Nacional 1:50.000). Una vez contrastada la metodología y efectuados los oportunos procesos de refinamiento, se redactarán las conclusiones de todo el proyecto, que englobarán las recomendaciones de trabajo y las precisiones resultantes, los rendimientos de los diferentes procesos y los costes que se generen mediante el empleo de la nueva metodología. ABSTARCT: This paper introduces the development of a methodology for the optimisation of the municipal boundaries database of the Instituto Geográfico Nacional. This project has arisen as part of a collaboration agreement between the Instituto Geográfico Nacional and the Escuela de Topografía of the Universidad Politécnica de Madrid which seeks to promote research, development and training in Geographic Information Technologies. Current quality requirements demand the use of new technologies to improve the accuracy of the geometrical description of municipal boundaries. These technologies didn’t exist when the municipal boundaries were first drawn up. Besides, it is convenient to design an appropriate methodology that minimises both costs and time employed. The two main steps in the process are: first, the conversion of all the available data (fixing boundary minutes and field survey notebooks) into digital format in order to make possible their integration in a CAD system; and second, the display and visual overlay of these digital data over an 1:5000 orthophotography of the study area, to identify the boundary monuments. A new system will be proposed to manage, process and storage municipal boundaries information, including its lineage; an output format for these data will be designed as well. In addition, a quality control will be designed to audit this scheme using Data Analysis and Statistical Inference techniques. Moreover, GPS technology will be used to get some boundary monuments co-ordinates to check the results of the proposed methodology. The complete scheme will be tested in a study area corresponding to Ávila and Segovia provinces comprising 140 boundary segments from 36 municipalities.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Markov Chain Monte Carlo methods are widely used in signal processing and communications for statistical inference and stochastic optimization. In this work, we introduce an efficient adaptive Metropolis-Hastings algorithm to draw samples from generic multimodal and multidimensional target distributions. The proposal density is a mixture of Gaussian densities with all parameters (weights, mean vectors and covariance matrices) updated using all the previously generated samples applying simple recursive rules. Numerical results for the one and two-dimensional cases are provided.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Monte Carlo (MC) methods are widely used in signal processing, machine learning and communications for statistical inference and stochastic optimization. A well-known class of MC methods is composed of importance sampling and its adaptive extensions (e.g., population Monte Carlo). In this work, we introduce an adaptive importance sampler using a population of proposal densities. The novel algorithm provides a global estimation of the variables of interest iteratively, using all the samples generated. The cloud of proposals is adapted by learning from a subset of previously generated samples, in such a way that local features of the target density can be better taken into account compared to single global adaptation procedures. Numerical results show the advantages of the proposed sampling scheme in terms of mean absolute error and robustness to initialization.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Esta tesis presenta el diseño y la aplicación de una metodología que permite la determinación de los parámetros para la planificación de nodos e infraestructuras logísticas en un territorio, considerando además el impacto de estas en los diferentes componentes territoriales, así como en el desarrollo poblacional, el desarrollo económico y el medio ambiente, presentando así un avance en la planificación integral del territorio. La Metodología propuesta está basada en Minería de Datos, que permite el descubrimiento de patrones detrás de grandes volúmenes de datos previamente procesados. Las características propias de los datos sobre el territorio y los componentes que lo conforman hacen de los estudios territoriales un campo ideal para la aplicación de algunas de las técnicas de Minería de Datos, tales como los ´arboles decisión y las redes bayesianas. Los árboles de decisión permiten representar y categorizar de forma esquemática una serie de variables de predicción que ayudan al análisis de una variable objetivo. Las redes bayesianas representan en un grafo acíclico dirigido, un modelo probabilístico de variables distribuidas en padres e hijos, y la inferencia estadística que permite determinar la probabilidad de certeza de una hipótesis planteada, es decir, permiten construir modelos de probabilidad conjunta que presentan de manera gráfica las dependencias relevantes en un conjunto de datos. Al igual que con los árboles de decisión, la división del territorio en diferentes unidades administrativas hace de las redes bayesianas una herramienta potencial para definir las características físicas de alguna tipología especifica de infraestructura logística tomando en consideración las características territoriales, poblacionales y económicas del área donde se plantea su desarrollo y las posibles sinergias que se puedan presentar sobre otros nodos e infraestructuras logísticas. El caso de estudio seleccionado para la aplicación de la metodología ha sido la República de Panamá, considerando que este país presenta algunas características singulares, entra las que destacan su alta concentración de población en la Ciudad de Panamá; que a su vez a concentrado la actividad económica del país; su alto porcentaje de zonas protegidas, lo que ha limitado la vertebración del territorio; y el Canal de Panamá y los puertos de contenedores adyacentes al mismo. La metodología se divide en tres fases principales: Fase 1: Determinación del escenario de trabajo 1. Revisión del estado del arte. 2. Determinación y obtención de las variables de estudio. Fase 2: Desarrollo del modelo de inteligencia artificial 3. Construcción de los ´arboles de decisión. 4. Construcción de las redes bayesianas. Fase 3: Conclusiones 5. Determinación de las conclusiones. Con relación al modelo de planificación aplicado al caso de estudio, una vez aplicada la metodología, se estableció un modelo compuesto por 47 variables que definen la planificación logística de Panamá, el resto de variables se definen a partir de estas, es decir, conocidas estas, el resto se definen a través de ellas. Este modelo de planificación establecido a través de la red bayesiana considera los aspectos de una planificación sostenible: económica, social y ambiental; que crean sinergia con la planificación de nodos e infraestructuras logísticas. The thesis presents the design and application of a methodology that allows the determination of parameters for the planning of nodes and logistics infrastructure in a territory, besides considering the impact of these different territorial components, as well as the population growth, economic and environmental development. The proposed methodology is based on Data Mining, which allows the discovery of patterns behind large volumes of previously processed data. The own characteristics of the territorial data makes of territorial studies an ideal field of knowledge for the implementation of some of the Data Mining techniques, such as Decision Trees and Bayesian Networks. Decision trees categorize schematically a series of predictor variables of an analyzed objective variable. Bayesian Networks represent a directed acyclic graph, a probabilistic model of variables divided in fathers and sons, and statistical inference that allow determine the probability of certainty in a hypothesis. The case of study for the application of the methodology is the Republic of Panama. This country has some unique features: a high population density in the Panama City, a concentration of economic activity, a high percentage of protected areas, and the Panama Canal. The methodology is divided into three main phases: Phase 1: definition of the work stage. 1. Review of the State of the art. 2. Determination of the variables. Phase 2: Development of artificial intelligence model 3. Construction of decision trees. 4. Construction of Bayesian Networks. Phase 3: conclusions 5. Determination of the conclusions. The application of the methodology to the case study established a model composed of 47 variables that define the logistics planning for Panama. This model of planning established through the Bayesian network considers aspects of sustainable planning and simulates the synergies between the nodes and logistical infrastructure planning.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The controversy over the interpretation of DNA profile evidence in forensic identification can be attributed in part to confusion over the mode(s) of statistical inference appropriate to this setting. Although there has been substantial discussion in the literature of, for example, the role of population genetics issues, few authors have made explicit the inferential framework which underpins their arguments. This lack of clarity has led both to unnecessary debates over ill-posed or inappropriate questions and to the neglect of some issues which can have important consequences. We argue that the mode of statistical inference which seems to underlie the arguments of some authors, based on a hypothesis testing framework, is not appropriate for forensic identification. We propose instead a logically coherent framework in which, for example, the roles both of the population genetics issues and of the nonscientific evidence in a case are incorporated. Our analysis highlights several widely held misconceptions in the DNA profiling debate. For example, the profile frequency is not directly relevant to forensic inference. Further, very small match probabilities may in some settings be consistent with acquittal. Although DNA evidence is typically very strong, our analysis of the coherent approach highlights situations which can arise in practice where alternative methods for assessing DNA evidence may be misleading.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Many long-lived marine species exhibit life history traits. that make them more vulnerable to overexploitation. Accurate population trend analysis is essential for development and assessment of management plans for these species. However, because many of these species disperse over large geographic areas, have life stages inaccessible to human surveyors, and/or undergo complex developmental migrations, data on trends in abundance are often available for only one stage of the population, usually breeding adults. The green turtle (Chelonia mydas) is one of these long-lived species for which population trends are based almost exclusively on either numbers of females that emerge to nest or numbers of nests deposited each year on geographically restricted beaches. In this study, we generated estimates of annual abundance for juvenile green turtles at two foraging grounds in the Bahamas based on long-term capture-mark-recapture (CMR) studies at Union Creek (24 years) and Conception Creek (13 years), using a two-stage approach. First, we estimated recapture probabilities from CMR data using the Cormack-Jolly-Seber models in the software program MARK; second, we estimated annual abundance of green turtles. at both study sites using the recapture probabilities in a Horvitz-Thompson type estimation procedure. Green turtle abundance did not change significantly in Conception Creek, but, in Union Creek, green turtle abundance had successive phases of significant increase, significant decrease, and stability. These changes in abundance resulted from changes in immigration, not survival or emigration. The trends in abundance on the foraging grounds did not conform to the significantly increasing trend for the major nesting population at Tortuguero, Costa Rica. This disparity highlights the challenges of assessing population-wide trends of green turtles and other long-lived species. The best approach for monitoring population trends may be a combination of (1) extensive surveys to provide data for large-scale trends in relative population abundance, and (2) intensive surveys, using CMR techniques, to estimate absolute abundance and evaluate the demographic processes' driving the trends.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Vector error-correction models (VECMs) have become increasingly important in their application to financial markets. Standard full-order VECM models assume non-zero entries in all their coefficient matrices. However, applications of VECM models to financial market data have revealed that zero entries are often a necessary part of efficient modelling. In such cases, the use of full-order VECM models may lead to incorrect inferences. Specifically, if indirect causality or Granger non-causality exists among the variables, the use of over-parameterised full-order VECM models may weaken the power of statistical inference. In this paper, it is argued that the zero–non-zero (ZNZ) patterned VECM is a more straightforward and effective means of testing for both indirect causality and Granger non-causality. For a ZNZ patterned VECM framework for time series of integrated order two, we provide a new algorithm to select cointegrating and loading vectors that can contain zero entries. Two case studies are used to demonstrate the usefulness of the algorithm in tests of purchasing power parity and a three-variable system involving the stock market.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Statistics is known to be an art as well as a science. The training of mathematical physicists predisposes them towards hypothesising plausible Bayesean priors. Tony Bracken and I were of that mind [1], but in our discussions we also recognised the Bayesean will-o'-the-wisp illustrated below.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Neural networks can be regarded as statistical models, and can be analysed in a Bayesian framework. Generalisation is measured by the performance on independent test data drawn from the same distribution as the training data. Such performance can be quantified by the posterior average of the information divergence between the true and the model distributions. Averaging over the Bayesian posterior guarantees internal coherence; Using information divergence guarantees invariance with respect to representation. The theory generalises the least mean squares theory for linear Gaussian models to general problems of statistical estimation. The main results are: (1)~the ideal optimal estimate is always given by average over the posterior; (2)~the optimal estimate within a computational model is given by the projection of the ideal estimate to the model. This incidentally shows some currently popular methods dealing with hyperpriors are in general unnecessary and misleading. The extension of information divergence to positive normalisable measures reveals a remarkable relation between the dlt dual affine geometry of statistical manifolds and the geometry of the dual pair of Banach spaces Ld and Ldd. It therefore offers conceptual simplification to information geometry. The general conclusion on the issue of evaluating neural network learning rules and other statistical inference methods is that such evaluations are only meaningful under three assumptions: The prior P(p), describing the environment of all the problems; the divergence Dd, specifying the requirement of the task; and the model Q, specifying available computing resources.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The problem of evaluating different learning rules and other statistical estimators is analysed. A new general theory of statistical inference is developed by combining Bayesian decision theory with information geometry. It is coherent and invariant. For each sample a unique ideal estimate exists and is given by an average over the posterior. An optimal estimate within a model is given by a projection of the ideal estimate. The ideal estimate is a sufficient statistic of the posterior, so practical learning rules are functions of the ideal estimator. If the sole purpose of learning is to extract information from the data, the learning rule must also approximate the ideal estimator. This framework is applicable to both Bayesian and non-Bayesian methods, with arbitrary statistical models, and to supervised, unsupervised and reinforcement learning schemes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Online learning is discussed from the viewpoint of Bayesian statistical inference. By replacing the true posterior distribution with a simpler parametric distribution, one can define an online algorithm by a repetition of two steps: An update of the approximate posterior, when a new example arrives, and an optimal projection into the parametric family. Choosing this family to be Gaussian, we show that the algorithm achieves asymptotic efficiency. An application to learning in single layer neural networks is given.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Accurate protein structure prediction remains an active objective of research in bioinformatics. Membrane proteins comprise approximately 20% of most genomes. They are, however, poorly tractable targets of experimental structure determination. Their analysis using bioinformatics thus makes an important contribution to their on-going study. Using a method based on Bayesian Networks, which provides a flexible and powerful framework for statistical inference, we have addressed the alignment-free discrimination of membrane from non-membrane proteins. The method successfully identifies prokaryotic and eukaryotic α-helical membrane proteins at 94.4% accuracy, β-barrel proteins at 72.4% accuracy, and distinguishes assorted non-membranous proteins with 85.9% accuracy. The method here is an important potential advance in the computational analysis of membrane protein structure. It represents a useful tool for the characterisation of membrane proteins with a wide variety of potential applications.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Membrane proteins, which constitute approximately 20% of most genomes, are poorly tractable targets for experimental structure determination, thus analysis by prediction and modelling makes an important contribution to their on-going study. Membrane proteins form two main classes: alpha helical and beta barrel trans-membrane proteins. By using a method based on Bayesian Networks, which provides a flexible and powerful framework for statistical inference, we addressed alpha-helical topology prediction. This method has accuracies of 77.4% for prokaryotic proteins and 61.4% for eukaryotic proteins. The method described here represents an important advance in the computational determination of membrane protein topology and offers a useful, and complementary, tool for the analysis of membrane proteins for a range of applications.