949 resultados para Statistical model


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Deep bed filtration occurs in several industrial and environmental processes like water filtration and soil contamination. In petroleum industry, deep bed filtration occurs near to injection wells during water injection, causing injectivity reduction. It also takes place during well drilling, sand production control, produced water disposal in aquifers, etc. The particle capture in porous media can be caused by different physical mechanisms (size exclusion, electrical forces, bridging, gravity, etc). A statistical model for filtration in porous media is proposed and analytical solutions for suspended and retained particles are derived. The model, which incorporates particle retention probability, is compared with the classical deep bed filtration model allowing a physical interpretation of the filtration coefficients. Comparison of the obtained analytical solutions for the proposed model with the classical model solutions allows concluding that the larger the particle capture probability, the larger the discrepancy between the proposed and the classical models

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Several deterministic and probabilistic methods are used to evaluate the probability of seismically induced liquefaction of a soil. The probabilistic models usually possess some uncertainty in that model and uncertainties in the parameters used to develop that model. These model uncertainties vary from one statistical model to another. Most of the model uncertainties are epistemic, and can be addressed through appropriate knowledge of the statistical model. One such epistemic model uncertainty in evaluating liquefaction potential using a probabilistic model such as logistic regression is sampling bias. Sampling bias is the difference between the class distribution in the sample used for developing the statistical model and the true population distribution of liquefaction and non-liquefaction instances. Recent studies have shown that sampling bias can significantly affect the predicted probability using a statistical model. To address this epistemic uncertainty, a new approach was developed for evaluating the probability of seismically-induced soil liquefaction, in which a logistic regression model in combination with Hosmer-Lemeshow statistic was used. This approach was used to estimate the population (true) distribution of liquefaction to non-liquefaction instances of standard penetration test (SPT) and cone penetration test (CPT) based most updated case histories. Apart from this, other model uncertainties such as distribution of explanatory variables and significance of explanatory variables were also addressed using KS test and Wald statistic respectively. Moreover, based on estimated population distribution, logistic regression equations were proposed to calculate the probability of liquefaction for both SPT and CPT based case history. Additionally, the proposed probability curves were compared with existing probability curves based on SPT and CPT case histories.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Dirichlet process mixture model (DPMM) is a ubiquitous, flexible Bayesian nonparametric statistical model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibbs sampling are required. As a result, DPMM-based methods, which have considerable potential, are restricted to applications in which computational resources and time for inference is plentiful. For example, they would not be practical for digital signal processing on embedded hardware, where computational resources are at a serious premium. Here, we develop a simplified yet statistically rigorous approximate maximum a-posteriori (MAP) inference algorithm for DPMMs. This algorithm is as simple as DP-means clustering, solves the MAP problem as well as Gibbs sampling, while requiring only a fraction of the computational effort. (For freely available code that implements the MAP-DP algorithm for Gaussian mixtures see http://www.maxlittle.net/.) Unlike related small variance asymptotics (SVA), our method is non-degenerate and so inherits the “rich get richer” property of the Dirichlet process. It also retains a non-degenerate closed-form likelihood which enables out-of-sample calculations and the use of standard tools such as cross-validation. We illustrate the benefits of our algorithm on a range of examples and contrast it to variational, SVA and sampling approaches from both a computational complexity perspective as well as in terms of clustering performance. We demonstrate the wide applicabiity of our approach by presenting an approximate MAP inference method for the infinite hidden Markov model whose performance contrasts favorably with a recently proposed hybrid SVA approach. Similarly, we show how our algorithm can applied to a semiparametric mixed-effects regression model where the random effects distribution is modelled using an infinite mixture model, as used in longitudinal progression modelling in population health science. Finally, we propose directions for future research on approximate MAP inference in Bayesian nonparametrics.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The performance of the SAOP potential for the calculation of NMR chemical shifts was evaluated. SAOP results show considerable improvement with respect to previous potentials, like VWN or BP86, at least for the carbon, nitrogen, oxygen, and fluorine chemical shifts. Furthermore, a few NMR calculations carried out on third period atoms (S, P, and Cl) improved when using the SAOP potential

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this article we present a hybrid approach for automatic summarization of Spanish medical texts. There are a lot of systems for automatic summarization using statistics or linguistics, but only a few of them combining both techniques. Our idea is that to reach a good summary we need to use linguistic aspects of texts, but as well we should benefit of the advantages of statistical techniques. We have integrated the Cortex (Vector Space Model) and Enertex (statistical physics) systems coupled with the Yate term extractor, and the Disicosum system (linguistics). We have compared these systems and afterwards we have integrated them in a hybrid approach. Finally, we have applied this hybrid system over a corpora of medical articles and we have evaluated their performances obtaining good results.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In the scope of the European project Hydroptimet, INTERREG IIIB-MEDOCC programme, limited area model (LAM) intercomparison of intense events that produced many damages to people and territory is performed. As the comparison is limited to single case studies, the work is not meant to provide a measure of the different models' skill, but to identify the key model factors useful to give a good forecast on such a kind of meteorological phenomena. This work focuses on the Spanish flash-flood event, also known as "Montserrat-2000" event. The study is performed using forecast data from seven operational LAMs, placed at partners' disposal via the Hydroptimet ftp site, and observed data from Catalonia rain gauge network. To improve the event analysis, satellite rainfall estimates have been also considered. For statistical evaluation of quantitative precipitation forecasts (QPFs), several non-parametric skill scores based on contingency tables have been used. Furthermore, for each model run it has been possible to identify Catalonia regions affected by misses and false alarms using contingency table elements. Moreover, the standard "eyeball" analysis of forecast and observed precipitation fields has been supported by the use of a state-of-the-art diagnostic method, the contiguous rain area (CRA) analysis. This method allows to quantify the spatial shift forecast error and to identify the error sources that affected each model forecasts. High-resolution modelling and domain size seem to have a key role for providing a skillful forecast. Further work is needed to support this statement, including verification using a wider observational data set.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this study we propose an evaluation of the angular effects altering the spectral response of the land-cover over multi-angle remote sensing image acquisitions. The shift in the statistical distribution of the pixels observed in an in-track sequence of WorldView-2 images is analyzed by means of a kernel-based measure of distance between probability distributions. Afterwards, the portability of supervised classifiers across the sequence is investigated by looking at the evolution of the classification accuracy with respect to the changing observation angle. In this context, the efficiency of various physically and statistically based preprocessing methods in obtaining angle-invariant data spaces is compared and possible synergies are discussed.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Microsatellite loci mutate at an extremely high rate and are generally thought to evolve through a stepwise mutation model. Several differentiation statistics taking into account the particular mutation scheme of the microsatellite have been proposed. The most commonly used is R(ST) which is independent of the mutation rate under a generalized stepwise mutation model. F(ST) and R(ST) are commonly reported in the literature, but often differ widely. Here we compare their statistical performances using individual-based simulations of a finite island model. The simulations were run under different levels of gene flow, mutation rates, population number and sizes. In addition to the per locus statistical properties, we compare two ways of combining R(ST) over loci. Our simulations show that even under a strict stepwise mutation model, no statistic is best overall. All estimators suffer to different extents from large bias and variance. While R(ST) better reflects population differentiation in populations characterized by very low gene-exchange, F(ST) gives better estimates in cases of high levels of gene flow. The number of loci sampled (12, 24, or 96) has only a minor effect on the relative performance of the estimators under study. For all estimators there is a striking effect of the number of samples, with the differentiation estimates showing very odd distributions for two samples.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

PURPOSE: Proper delineation of ocular anatomy in 3-dimensional (3D) imaging is a big challenge, particularly when developing treatment plans for ocular diseases. Magnetic resonance imaging (MRI) is presently used in clinical practice for diagnosis confirmation and treatment planning for treatment of retinoblastoma in infants, where it serves as a source of information, complementary to the fundus or ultrasonographic imaging. Here we present a framework to fully automatically segment the eye anatomy for MRI based on 3D active shape models (ASM), and we validate the results and present a proof of concept to automatically segment pathological eyes. METHODS AND MATERIALS: Manual and automatic segmentation were performed in 24 images of healthy children's eyes (3.29 ± 2.15 years of age). Imaging was performed using a 3-T MRI scanner. The ASM consists of the lens, the vitreous humor, the sclera, and the cornea. The model was fitted by first automatically detecting the position of the eye center, the lens, and the optic nerve, and then aligning the model and fitting it to the patient. We validated our segmentation method by using a leave-one-out cross-validation. The segmentation results were evaluated by measuring the overlap, using the Dice similarity coefficient (DSC) and the mean distance error. RESULTS: We obtained a DSC of 94.90 ± 2.12% for the sclera and the cornea, 94.72 ± 1.89% for the vitreous humor, and 85.16 ± 4.91% for the lens. The mean distance error was 0.26 ± 0.09 mm. The entire process took 14 seconds on average per eye. CONCLUSION: We provide a reliable and accurate tool that enables clinicians to automatically segment the sclera, the cornea, the vitreous humor, and the lens, using MRI. We additionally present a proof of concept for fully automatically segmenting eye pathology. This tool reduces the time needed for eye shape delineation and thus can help clinicians when planning eye treatment and confirming the extent of the tumor.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

PURPOSE: Statistical shape and appearance models play an important role in reducing the segmentation processing time of a vertebra and in improving results for 3D model development. Here, we describe the different steps in generating a statistical shape model (SSM) of the second cervical vertebra (C2) and provide the shape model for general use by the scientific community. The main difficulties in its construction are the morphological complexity of the C2 and its variability in the population. METHODS: The input dataset is composed of manually segmented anonymized patient computerized tomography (CT) scans. The alignment of the different datasets is done with the procrustes alignment on surface models, and then, the registration is cast as a model-fitting problem using a Gaussian process. A principal component analysis (PCA)-based model is generated which includes the variability of the C2. RESULTS: The SSM was generated using 92 CT scans. The resulting SSM was evaluated for specificity, compactness and generalization ability. The SSM of the C2 is freely available to the scientific community in Slicer (an open source software for image analysis and scientific visualization) with a module created to visualize the SSM using Statismo, a framework for statistical shape modeling. CONCLUSION: The SSM of the vertebra allows the shape variability of the C2 to be represented. Moreover, the SSM will enable semi-automatic segmentation and 3D model generation of the vertebra, which would greatly benefit surgery planning.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis was focussed on statistical analysis methods and proposes the use of Bayesian inference to extract information contained in experimental data by estimating Ebola model parameters. The model is a system of differential equations expressing the behavior and dynamics of Ebola. Two sets of data (onset and death data) were both used to estimate parameters, which has not been done by previous researchers in (Chowell, 2004). To be able to use both data, a new version of the model has been built. Model parameters have been estimated and then used to calculate the basic reproduction number and to study the disease-free equilibrium. Estimates of the parameters were useful to determine how well the model fits the data and how good estimates were, in terms of the information they provided about the possible relationship between variables. The solution showed that Ebola model fits the observed onset data at 98.95% and the observed death data at 93.6%. Since Bayesian inference can not be performed analytically, the Markov chain Monte Carlo approach has been used to generate samples from the posterior distribution over parameters. Samples have been used to check the accuracy of the model and other characteristics of the target posteriors.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We study the problem of measuring the uncertainty of CGE (or RBC)-type model simulations associated with parameter uncertainty. We describe two approaches for building confidence sets on model endogenous variables. The first one uses a standard Wald-type statistic. The second approach assumes that a confidence set (sampling or Bayesian) is available for the free parameters, from which confidence sets are derived by a projection technique. The latter has two advantages: first, confidence set validity is not affected by model nonlinearities; second, we can easily build simultaneous confidence intervals for an unlimited number of variables. We study conditions under which these confidence sets take the form of intervals and show they can be implemented using standard methods for solving CGE models. We present an application to a CGE model of the Moroccan economy to study the effects of policy-induced increases of transfers from Moroccan expatriates.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam sentence using statistical models. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set among the sentence pairs of the source and target language before subjecting them for training. This paper deals with certain techniques which can be adopted for improving the alignment model of SMT. Methods to incorporate the parts of speech information into the bilingual corpus has resulted in eliminating many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Presence of Malayalam words with predictable translations has also contributed in reducing the insignificant alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.