986 resultados para Probability estimation
Resumo:
Mendelian models can predict who carries an inherited deleterious mutation of known disease genes based on family history. For example, the BRCAPRO model is commonly used to identify families who carry mutations of BRCA1 and BRCA2, based on familial breast and ovarian cancers. These models incorporate the age of diagnosis of diseases in relatives and current age or age of death. We develop a rigorous foundation for handling multiple diseases with censoring. We prove that any disease unrelated to mutations can be excluded from the model, unless it is sufficiently common and dependent on a mutation-related disease time. Furthermore, if a family member has a disease with higher probability density among mutation carriers, but the model does not account for it, then the carrier probability is deflated. However, even if a family only has diseases the model accounts for, if the model excludes a mutation-related disease, then the carrier probability will be inflated. In light of these results, we extend BRCAPRO to account for surviving all non-breast/ovary cancers as a single outcome. The extension also enables BRCAPRO to extract more useful information from male relatives. Using 1500 familes from the Cancer Genetics Network, accounting for surviving other cancers improves BRCAPRO’s concordance index from 0.758 to 0.762 (p = 0.046), improves its positive predictive value from 35% to 39% (p < 10−6) without impacting its negative predictive value, and improves its overall calibration, although calibration slightly worsens for those with carrier probability < 10%. Copyright c 2000 John Wiley & Sons, Ltd.
Resumo:
In networks with small buffers, such as optical packet switching based networks, the convolution approach is presented as one of the most accurate method used for the connection admission control. Admission control and resource management have been addressed in other works oriented to bursty traffic and ATM. This paper focuses on heterogeneous traffic in OPS based networks. Using heterogeneous traffic and bufferless networks the enhanced convolution approach is a good solution. However, both methods (CA and ECA) present a high computational cost for high number of connections. Two new mechanisms (UMCA and ISCA) based on Monte Carlo method are proposed to overcome this drawback. Simulation results show that our proposals achieve lower computational cost compared to enhanced convolution approach with an small stochastic error in the probability estimation
Resumo:
In networks with small buffers, such as optical packet switching based networks, the convolution approach is presented as one of the most accurate method used for the connection admission control. Admission control and resource management have been addressed in other works oriented to bursty traffic and ATM. This paper focuses on heterogeneous traffic in OPS based networks. Using heterogeneous traffic and bufferless networks the enhanced convolution approach is a good solution. However, both methods (CA and ECA) present a high computational cost for high number of connections. Two new mechanisms (UMCA and ISCA) based on Monte Carlo method are proposed to overcome this drawback. Simulation results show that our proposals achieve lower computational cost compared to enhanced convolution approach with an small stochastic error in the probability estimation
Resumo:
PURPOSE: The aim of this study was to develop models based on kernel regression and probability estimation in order to predict and map IRC in Switzerland by taking into account all of the following: architectural factors, spatial relationships between the measurements, as well as geological information. METHODS: We looked at about 240,000 IRC measurements carried out in about 150,000 houses. As predictor variables we included: building type, foundation type, year of construction, detector type, geographical coordinates, altitude, temperature and lithology into the kernel estimation models. We developed predictive maps as well as a map of the local probability to exceed 300 Bq/m(3). Additionally, we developed a map of a confidence index in order to estimate the reliability of the probability map. RESULTS: Our models were able to explain 28% of the variations of IRC data. All variables added information to the model. The model estimation revealed a bandwidth for each variable, making it possible to characterize the influence of each variable on the IRC estimation. Furthermore, we assessed the mapping characteristics of kernel estimation overall as well as by municipality. Overall, our model reproduces spatial IRC patterns which were already obtained earlier. On the municipal level, we could show that our model accounts well for IRC trends within municipal boundaries. Finally, we found that different building characteristics result in different IRC maps. Maps corresponding to detached houses with concrete foundations indicate systematically smaller IRC than maps corresponding to farms with earth foundation. CONCLUSIONS: IRC mapping based on kernel estimation is a powerful tool to predict and analyze IRC on a large-scale as well as on a local level. This approach enables to develop tailor-made maps for different architectural elements and measurement conditions and to account at the same time for geological information and spatial relations between IRC measurements.
Resumo:
In clinical trials, it may be of interest taking into account physical and emotional well-being in addition to survival when comparing treatments. Quality-adjusted survival time has the advantage of incorporating information about both survival time and quality-of-life. In this paper, we discuss the estimation of the expected value of the quality-adjusted survival, based on multistate models for the sojourn times in health states. Semiparametric and parametric (with exponential distribution) approaches are considered. A simulation study is presented to evaluate the performance of the proposed estimator and the jackknife resampling method is used to compute bias and variance of the estimator. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
An extension of some standard likelihood based procedures to heteroscedastic nonlinear regression models under scale mixtures of skew-normal (SMSN) distributions is developed. This novel class of models provides a useful generalization of the heteroscedastic symmetrical nonlinear regression models (Cysneiros et al., 2010), since the random term distributions cover both symmetric as well as asymmetric and heavy-tailed distributions such as skew-t, skew-slash, skew-contaminated normal, among others. A simple EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters is presented and the observed information matrix is derived analytically. In order to examine the performance of the proposed methods, some simulation studies are presented to show the robust aspect of this flexible class against outlying and influential observations and that the maximum likelihood estimates based on the EM-type algorithm do provide good asymptotic properties. Furthermore, local influence measures and the one-step approximations of the estimates in the case-deletion model are obtained. Finally, an illustration of the methodology is given considering a data set previously analyzed under the homoscedastic skew-t nonlinear regression model. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Debris flow hazard modelling at medium (regional) scale has been subject of various studies in recent years. In this study, hazard zonation was carried out, incorporating information about debris flow initiation probability (spatial and temporal), and the delimitation of the potential runout areas. Debris flow hazard zonation was carried out in the area of the Consortium of Mountain Municipalities of Valtellina di Tirano (Central Alps, Italy). The complexity of the phenomenon, the scale of the study, the variability of local conditioning factors, and the lacking data limited the use of process-based models for the runout zone delimitation. Firstly, a map of hazard initiation probabilities was prepared for the study area, based on the available susceptibility zoning information, and the analysis of two sets of aerial photographs for the temporal probability estimation. Afterwards, the hazard initiation map was used as one of the inputs for an empirical GIS-based model (Flow-R), developed at the University of Lausanne (Switzerland). An estimation of the debris flow magnitude was neglected as the main aim of the analysis was to prepare a debris flow hazard map at medium scale. A digital elevation model, with a 10 m resolution, was used together with landuse, geology and debris flow hazard initiation maps as inputs of the Flow-R model to restrict potential areas within each hazard initiation probability class to locations where debris flows are most likely to initiate. Afterwards, runout areas were calculated using multiple flow direction and energy based algorithms. Maximum probable runout zones were calibrated using documented past events and aerial photographs. Finally, two debris flow hazard maps were prepared. The first simply delimits five hazard zones, while the second incorporates the information about debris flow spreading direction probabilities, showing areas more likely to be affected by future debris flows. Limitations of the modelling arise mainly from the models applied and analysis scale, which are neglecting local controlling factors of debris flow hazard. The presented approach of debris flow hazard analysis, associating automatic detection of the source areas and a simple assessment of the debris flow spreading, provided results for consequent hazard and risk studies. However, for the validation and transferability of the parameters and results to other study areas, more testing is needed.
Resumo:
En este trabajo se investiga la persistencia de las estimaciones puntuales subjetivas de rendimientos en cultivos anua- les realizadas por un amplio grupo de agricultores. La persistencia en el tiempo es una condición necesaria para la co- herencia y la confiabilidad de las estimaciones subjetivas de variables aleatorias. Los sujetos entrevistados estimaron valores puntuales de rendimientos de cultivos anuales (rendimientos medio, mayor, mínimo y más frecuente). Se han encontrado diferencias relativas poco importantes en todas las variables, excepto en los rendimientos mínimos, donde existe una alta dispersión. Los resultados son interesantes para estimar la adecuación de las técnicas de estimación de probabilidades subjetivas para ser utilizadas en los sistemas de ayuda en la toma de decisiones en agricultura.
Resumo:
The use of the Bayes factor (BF) or likelihood ratio as a metric to assess the probative value of forensic traces is largely supported by operational standards and recommendations in different forensic disciplines. However, the progress towards more widespread consensus about foundational principles is still fragile as it raises new problems about which views differ. It is not uncommon e.g. to encounter scientists who feel the need to compute the probability distribution of a given expression of evidential value (i.e. a BF), or to place intervals or significance probabilities on such a quantity. The article here presents arguments to show that such views involve a misconception of principles and abuse of language. The conclusion of the discussion is that, in a given case at hand, forensic scientists ought to offer to a court of justice a given single value for the BF, rather than an expression based on a distribution over a range of values.
Resumo:
Construction of multiple sequence alignments is a fundamental task in Bioinformatics. Multiple sequence alignments are used as a prerequisite in many Bioinformatics methods, and subsequently the quality of such methods can be critically dependent on the quality of the alignment. However, automatic construction of a multiple sequence alignment for a set of remotely related sequences does not always provide biologically relevant alignments.Therefore, there is a need for an objective approach for evaluating the quality of automatically aligned sequences. The profile hidden Markov model is a powerful approach in comparative genomics. In the profile hidden Markov model, the symbol probabilities are estimated at each conserved alignment position. This can increase the dimension of parameter space and cause an overfitting problem. These two research problems are both related to conservation. We have developed statistical measures for quantifying the conservation of multiple sequence alignments. Two types of methods are considered, those identifying conserved residues in an alignment position, and those calculating positional conservation scores. The positional conservation score was exploited in a statistical prediction model for assessing the quality of multiple sequence alignments. The residue conservation score was used as part of the emission probability estimation method proposed for profile hidden Markov models. The results of the predicted alignment quality score highly correlated with the correct alignment quality scores, indicating that our method is reliable for assessing the quality of any multiple sequence alignment. The comparison of the emission probability estimation method with the maximum likelihood method showed that the number of estimated parameters in the model was dramatically decreased, while the same level of accuracy was maintained. To conclude, we have shown that conservation can be successfully used in the statistical model for alignment quality assessment and in the estimation of emission probabilities in the profile hidden Markov models.
Resumo:
In this paper a layered architecture to spot and characterize vowel segments in running speech is presented. The detection process is based on neuromorphic principles, as is the use of Hebbian units in layers to implement lateral inhibition, band probability estimation and mutual exclusion. Results are presented showing how the association between the acoustic set of patterns and the phonologic set of symbols may be created. Possible applications of this methodology are to be found in speech event spotting, in the study of pathological voice and in speaker biometric characterization, among others.
Resumo:
We present a novel method, called the transform likelihood ratio (TLR) method, for estimation of rare event probabilities with heavy-tailed distributions. Via a simple transformation ( change of variables) technique the TLR method reduces the original rare event probability estimation with heavy tail distributions to an equivalent one with light tail distributions. Once this transformation has been established we estimate the rare event probability via importance sampling, using the classical exponential change of measure or the standard likelihood ratio change of measure. In the latter case the importance sampling distribution is chosen from the same parametric family as the transformed distribution. We estimate the optimal parameter vector of the importance sampling distribution using the cross-entropy method. We prove the polynomial complexity of the TLR method for certain heavy-tailed models and demonstrate numerically its high efficiency for various heavy-tailed models previously thought to be intractable. We also show that the TLR method can be viewed as a universal tool in the sense that not only it provides a unified view for heavy-tailed simulation but also can be efficiently used in simulation with light-tailed distributions. We present extensive simulation results which support the efficiency of the TLR method.
Resumo:
The subject of this thesis is the n-tuple net.work (RAMnet). The major advantage of RAMnets is their speed and the simplicity with which they can be implemented in parallel hardware. On the other hand, this method is not a universal approximator and the training procedure does not involve the minimisation of a cost function. Hence RAMnets are potentially sub-optimal. It is important to understand the source of this sub-optimality and to develop the analytical tools that allow us to quantify the generalisation cost of using this model for any given data. We view RAMnets as classifiers and function approximators and try to determine how critical their lack of' universality and optimality is. In order to understand better the inherent. restrictions of the model, we review RAMnets showing their relationship to a number of well established general models such as: Associative Memories, Kamerva's Sparse Distributed Memory, Radial Basis Functions, General Regression Networks and Bayesian Classifiers. We then benchmark binary RAMnet. model against 23 other algorithms using real-world data from the StatLog Project. This large scale experimental study indicates that RAMnets are often capable of delivering results which are competitive with those obtained by more sophisticated, computationally expensive rnodels. The Frequency Weighted version is also benchmarked and shown to perform worse than the binary RAMnet for large values of the tuple size n. We demonstrate that the main issues in the Frequency Weighted RAMnets is adequate probability estimation and propose Good-Turing estimates in place of the more commonly used :Maximum Likelihood estimates. Having established the viability of the method numerically, we focus on providillg an analytical framework that allows us to quantify the generalisation cost of RAMnets for a given datasetL. For the classification network we provide a semi-quantitative argument which is based on the notion of Tuple distance. It gives a good indication of whether the network will fail for the given data. A rigorous Bayesian framework with Gaussian process prior assumptions is given for the regression n-tuple net. We show how to calculate the generalisation cost of this net and verify the results numerically for one dimensional noisy interpolation problems. We conclude that the n-tuple method of classification based on memorisation of random features can be a powerful alternative to slower cost driven models. The speed of the method is at the expense of its optimality. RAMnets will fail for certain datasets but the cases when they do so are relatively easy to determine with the analytical tools we provide.
Resumo:
This dissertation consists of three separate essays on job search and labor market dynamics. In the first essay, “The Impact of Labor Market Conditions on Job Creation: Evidence from Firm Level Data”, I study how much changes in labor market conditions reduce employment fluctuations over the business cycle. Changes in labor market conditions make hiring more expensive during expansions and cheaper during recessions, creating counter-cyclical incentives for job creation. I estimate firm level elasticities of labor demand with respect to changes in labor market conditions, considering two margins: changes in labor market tightness and changes in wages. Using employer-employee matched data from Brazil, I find that all firms are more sensitive to changes in wages rather than labor market tightness, and there is substantial heterogeneity in labor demand elasticity across regions. Based on these results, I demonstrate that changes in labor market conditions reduce the variance of employment growth over the business cycle by 20% in a median region, and this effect is equally driven by changes along each margin. Moreover, I show that the magnitude of the effect of labor market conditions on employment growth can be significantly affected by economic policy. In particular, I document that the rapid growth of the national minimum wages in Brazil in 1997-2010 amplified the impact of the change in labor market conditions during local expansions and diminished this impact during local recessions.
In the second essay, “A Framework for Estimating Persistence of Local Labor
Demand Shocks”, I propose a decomposition which allows me to study the persistence of local labor demand shocks. Persistence of labor demand shocks varies across industries, and the incidence of shocks in a region depends on the regional industrial composition. As a result, less diverse regions are more likely to experience deeper shocks, but not necessarily more long lasting shocks. Building on this idea, I propose a decomposition of local labor demand shocks into idiosyncratic location shocks and nationwide industry shocks and estimate the variance and the persistence of these shocks using the Quarterly Census of Employment and Wages (QCEW) in 1990-2013.
In the third essay, “Conditional Choice Probability Estimation of Continuous- Time Job Search Models”, co-authored with Peter Arcidiacono and Arnaud Maurel, we propose a novel, computationally feasible method of estimating non-stationary job search models. Non-stationary job search models arise in many applications, where policy change can be anticipated by the workers. The most prominent example of such policy is the expiration of unemployment benefits. However, estimating these models still poses a considerable computational challenge, because of the need to solve a differential equation numerically at each step of the optimization routine. We overcome this challenge by adopting conditional choice probability methods, widely used in dynamic discrete choice literature, to job search models and show how the hazard rate out of unemployment and the distribution of the accepted wages, which can be estimated in many datasets, can be used to infer the value of unemployment. We demonstrate how to apply our method by analyzing the effect of the unemployment benefit expiration on duration of unemployment using the data from the Survey of Income and Program Participation (SIPP) in 1996-2007.
Resumo:
The occurrence frequency of failure events serve as critical indexes representing the safety status of dam-reservoir systems. Although overtopping is the most common failure mode with significant consequences, this type of event, in most cases, has a small probability. Estimation of such rare event risks for dam-reservoir systems with crude Monte Carlo (CMC) simulation techniques requires a prohibitively large number of trials, where significant computational resources are required to reach the satisfied estimation results. Otherwise, estimation of the disturbances would not be accurate enough. In order to reduce the computation expenses and improve the risk estimation efficiency, an importance sampling (IS) based simulation approach is proposed in this dissertation to address the overtopping risks of dam-reservoir systems. Deliverables of this study mainly include the following five aspects: 1) the reservoir inflow hydrograph model; 2) the dam-reservoir system operation model; 3) the CMC simulation framework; 4) the IS-based Monte Carlo (ISMC) simulation framework; and 5) the overtopping risk estimation comparison of both CMC and ISMC simulation. In a broader sense, this study meets the following three expectations: 1) to address the natural stochastic characteristics of the dam-reservoir system, such as the reservoir inflow rate; 2) to build up the fundamental CMC and ISMC simulation frameworks of the dam-reservoir system in order to estimate the overtopping risks; and 3) to compare the simulation results and the computational performance in order to demonstrate the ISMC simulation advantages. The estimation results of overtopping probability could be used to guide the future dam safety investigations and studies, and to supplement the conventional analyses in decision making on the dam-reservoir system improvements. At the same time, the proposed methodology of ISMC simulation is reasonably robust and proved to improve the overtopping risk estimation. The more accurate estimation, the smaller variance, and the reduced CPU time, expand the application of Monte Carlo (MC) technique on evaluating rare event risks for infrastructures.