953 resultados para Probability Metrics


Relevância:

10.00% 10.00%

Publicador:

Resumo:

We consider the problem of binary classification where the classifier can, for a particular cost, choose not to classify an observation. Just as in the conventional classification problem, minimization of the sample average of the cost is a difficult optimization problem. As an alternative, we propose the optimization of a certain convex loss function φ, analogous to the hinge loss used in support vector machines (SVMs). Its convexity ensures that the sample average of this surrogate loss can be efficiently minimized. We study its statistical properties. We show that minimizing the expected surrogate loss—the φ-risk—also minimizes the risk. We also study the rate at which the φ-risk approaches its minimum value. We show that fast rates are possible when the conditional probability P(Y=1|X) is unlikely to be close to certain critical values.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One of the nice properties of kernel classifiers such as SVMs is that they often produce sparse solutions. However, the decision functions of these classifiers cannot always be used to estimate the conditional probability of the class label. We investigate the relationship between these two properties and show that these are intimately related: sparseness does not occur when the conditional probabilities can be unambiguously estimated. We consider a family of convex loss functions and derive sharp asymptotic results for the fraction of data that becomes support vectors. This enables us to characterize the exact trade-off between sparseness and the ability to estimate conditional probabilities for these loss functions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The risk, or probability of error, of the classifier produced by the AdaBoost algorithm is investigated. In particular, we consider the stopping strategy to be used in AdaBoost to achieve universal consistency. We show that provided AdaBoost is stopped after n1-ε iterations---for sample size n and ε ∈ (0,1)---the sequence of risks of the classifiers it produces approaches the Bayes risk.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Early models of bankruptcy prediction employed financial ratios drawn from pre-bankruptcy financial statements and performed well both in-sample and out-of-sample. Since then there has been an ongoing effort in the literature to develop models with even greater predictive performance. A significant innovation in the literature was the introduction into bankruptcy prediction models of capital market data such as excess stock returns and stock return volatility, along with the application of the Black–Scholes–Merton option-pricing model. In this note, we test five key bankruptcy models from the literature using an upto- date data set and find that they each contain unique information regarding the probability of bankruptcy but that their performance varies over time. We build a new model comprising key variables from each of the five models and add a new variable that proxies for the degree of diversification within the firm. The degree of diversification is shown to be negatively associated with the risk of bankruptcy. This more general model outperforms the existing models in a variety of in-sample and out-of-sample tests.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Gaussian mixture models (GMMs) have become an established means of modeling feature distributions in speaker recognition systems. It is useful for experimentation and practical implementation purposes to develop and test these models in an efficient manner particularly when computational resources are limited. A method of combining vector quantization (VQ) with single multi-dimensional Gaussians is proposed to rapidly generate a robust model approximation to the Gaussian mixture model. A fast method of testing these systems is also proposed and implemented. Results on the NIST 1996 Speaker Recognition Database suggest comparable and in some cases an improved verification performance to the traditional GMM based analysis scheme. In addition, previous research for the task of speaker identification indicated a similar system perfomance between the VQ Gaussian based technique and GMMs

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Maternal and infant mortality is a global health issue with a significant social and economic impact. Each year, over half a million women worldwide die due to complications related to pregnancy or childbirth, four million infants die in the first 28 days of life, and eight million infants die in the first year. Ninety-nine percent of maternal and infant deaths are in developing countries. Reducing maternal and infant mortality is among the key international development goals. In China, the national maternal mortality ratio and infant mortality rate were reduced greatly in the past two decades, yet a large discrepancy remains between urban and rural areas. To address this problem, a large-scale Safe Motherhood Programme was initiated in 2000. The programme was implemented in Guangxi in 2003. Interventions in the programme included both demand-side and supply side-interventions focusing on increasing health service use and improving birth outcomes. Little is known about the effects and economic outcomes of the Safe Motherhood Programme in Guangxi, although it has been implemented for seven years. The aim of this research is to estimate the effectiveness and cost-effectiveness of the interventions in the Safe Motherhood Programme in Guangxi, China. The objectives of this research include: 1. To evaluate whether the changes of health service use and birth outcomes are associated with the interventions in the Safe Motherhood Programme. 2. To estimate the cost-effectiveness of the interventions in the Safe Motherhood Programme and quantify the uncertainty surrounding the decision. 3. To assess the expected value of perfect information associated with both the whole decision and individual parameters, and interpret the findings to inform priority setting in further research and policy making in this area. A quasi-experimental study design was used in this research to assess the effectiveness of the programme in increasing health service use and improving birth outcomes. The study subjects were 51 intervention counties and 30 control counties. Data on the health service use, birth outcomes and socio-economic factors from 2001 to 2007 were collected from the programme database and statistical yearbooks. Based on the profile plots of the data, general linear mixed models were used to evaluate the effectiveness of the programme while controlling for the effects of baseline levels of the response variables, change of socio-economic factors over time and correlations among repeated measurements from the same county. Redundant multicollinear variables were deleted from the mixed model using the results of the multicollinearity diagnoses. For each response variable, the best covariance structure was selected from 15 alternatives according to the fit statistics including Akaike information criterion, Finite-population corrected Akaike information criterion, and Schwarz.s Bayesian information criterion. Residual diagnostics were used to validate the model assumptions. Statistical inferences were made to show the effect of the programme on health service use and birth outcomes. A decision analytic model was developed to evaluate the cost-effectiveness of the programme, quantify the decision uncertainty, and estimate the expected value of perfect information associated with the decision. The model was used to describe the transitions between health states for women and infants and reflect the change of both costs and health benefits associated with implementing the programme. Result gained from the mixed models and other relevant evidence identified were synthesised appropriately to inform the input parameters of the model. Incremental cost-effectiveness ratios of the programme were calculated for the two groups of intervention counties over time. Uncertainty surrounding the parameters was dealt with using probabilistic sensitivity analysis, and uncertainty relating to model assumptions was handled using scenario analysis. Finally the expected value of perfect information for both the whole model and individual parameters in the model were estimated to inform priority setting in further research in this area.The annual change rates of the antenatal care rate and the institutionalised delivery rate were improved significantly in the intervention counties after the programme was implemented. Significant improvements were also found in the annual change rates of the maternal mortality ratio, the infant mortality rate, the incidence rate of neonatal tetanus and the mortality rate of neonatal tetanus in the intervention counties after the implementation of the programme. The annual change rate of the neonatal mortality rate was also improved, although the improvement was only close to statistical significance. The influences of the socio-economic factors on the health service use indicators and birth outcomes were identified. The rural income per capita had a significant positive impact on the health service use indicators, and a significant negative impact on the birth outcomes. The number of beds in healthcare institutions per 1,000 population and the number of rural telephone subscribers per 1,000 were found to be positively significantly related to the institutionalised delivery rate. The length of highway per square kilometre negatively influenced the maternal mortality ratio. The percentage of employed persons in the primary industry had a significant negative impact on the institutionalised delivery rate, and a significant positive impact on the infant mortality rate and neonatal mortality rate. The incremental costs of implementing the programme over the existing practice were US $11.1 million from the societal perspective, and US $13.8 million from the perspective of the Ministry of Health. Overall, 28,711 life years were generated by the programme, producing an overall incremental cost-effectiveness ratio of US $386 from the societal perspective, and US $480 from the perspective of the Ministry of Health, both of which were below the threshold willingness-to-pay ratio of US $675. The expected net monetary benefit generated by the programme was US $8.3 million from the societal perspective, and US $5.5 million from the perspective of the Ministry of Health. The overall probability that the programme was cost-effective was 0.93 and 0.89 from the two perspectives, respectively. The incremental cost-effectiveness ratio of the programme was insensitive to the different estimates of the three parameters relating to the model assumptions. Further research could be conducted to reduce the uncertainty surrounding the decision, in which the upper limit of investment was US $0.6 million from the societal perspective, and US $1.3 million from the perspective of the Ministry of Health. It is also worthwhile to get a more precise estimate of the improvement of infant mortality rate. The population expected value of perfect information for individual parameters associated with this parameter was US $0.99 million from the societal perspective, and US $1.14 million from the perspective of the Ministry of Health. The findings from this study have shown that the interventions in the Safe Motherhood Programme were both effective and cost-effective in increasing health service use and improving birth outcomes in rural areas of Guangxi, China. Therefore, the programme represents a good public health investment and should be adopted and further expanded to an even broader area if possible. This research provides economic evidence to inform efficient decision making in improving maternal and infant health in developing countries.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Stochastic models for competing clonotypes of T cells by multivariate, continuous-time, discrete state, Markov processes have been proposed in the literature by Stirk, Molina-París and van den Berg (2008). A stochastic modelling framework is important because of rare events associated with small populations of some critical cell types. Usually, computational methods for these problems employ a trajectory-based approach, based on Monte Carlo simulation. This is partly because the complementary, probability density function (PDF) approaches can be expensive but here we describe some efficient PDF approaches by directly solving the governing equations, known as the Master Equation. These computations are made very efficient through an approximation of the state space by the Finite State Projection and through the use of Krylov subspace methods when evolving the matrix exponential. These computational methods allow us to explore the evolution of the PDFs associated with these stochastic models, and bimodal distributions arise in some parameter regimes. Time-dependent propensities naturally arise in immunological processes due to, for example, age-dependent effects. Incorporating time-dependent propensities into the framework of the Master Equation significantly complicates the corresponding computational methods but here we describe an efficient approach via Magnus formulas. Although this contribution focuses on the example of competing clonotypes, the general principles are relevant to multivariate Markov processes and provide fundamental techniques for computational immunology.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The stochastic simulation algorithm was introduced by Gillespie and in a different form by Kurtz. There have been many attempts at accelerating the algorithm without deviating from the behavior of the simulated system. The crux of the explicit τ-leaping procedure is the use of Poisson random variables to approximate the number of occurrences of each type of reaction event during a carefully selected time period, τ. This method is acceptable providing the leap condition, that no propensity function changes “significantly” during any time-step, is met. Using this method there is a possibility that species numbers can, artificially, become negative. Several recent papers have demonstrated methods that avoid this situation. One such method classifies, as critical, those reactions in danger of sending species populations negative. At most, one of these critical reactions is allowed to occur in the next time-step. We argue that the criticality of a reactant species and its dependent reaction channels should be related to the probability of the species number becoming negative. This way only reactions that, if fired, produce a high probability of driving a reactant population negative are labeled critical. The number of firings of more reaction channels can be approximated using Poisson random variables thus speeding up the simulation while maintaining the accuracy. In implementing this revised method of criticality selection we make use of the probability distribution from which the random variable describing the change in species number is drawn. We give several numerical examples to demonstrate the effectiveness of our new method.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis examines the ways in which citizens find out about socio-political issues. The project set out to discover how audience characteristics such as scepticism towards the media, gratifications sought, need for cognition and political interest influence information selection. While most previous information choice studies have focused on how individuals select from a narrow range of media types, this thesis considered a much wider sweep of the information landscape. This approach was taken to obtain an understanding of information choices in a more authentic context - in everyday life, people are not simply restricted to one or two news sources. Rather, they may obtain political information from a vast range of information sources, including media sources (e.g. radio, television, newspapers) and sources from beyond the media (eg. interpersonal sources, public speaking events, social networking websites). Thus, the study included both media and non-news media information sources. Data collection for the project consisted of a written, postal survey. The survey was administered to a probability sample in the greater Brisbane region, which is the third largest city in Australia. Data was collected during March and April 2008, approximately four months after the 2007 Australian Federal Election. Hence, the study was conducted in a non-election context. 585 usable surveys were obtained. In addition to measuring the attitudinal characteristics listed above, respondents were surveyed as to which information sources (eg. television shows, radio stations, websites and festivals) they usually use to find out about socio-political issues. Multiple linear regression analysis was conducted to explore patterns of influence between the audience characteristics and information consumption patterns. The results of this analysis indicated an apparent difference between the way citizens use news media sources and the way they use information sources from beyond the news media. In essence, it appears that non-news media information sources are used very deliberately to seek socio-political information, while media sources are used in a less purposeful way. If media use in a non-election context, such as that of the present study, is not primarily concerned with deliberate information seeking, media use must instead have other primary purposes, with political information acquisition as either a secondary driver, or a by-product of that primary purpose. It appears, then, that political information consumption in a media-saturated society is more about routine ‘practices’ than it is about ‘information seeking’. The suggestion that media use is no longer primarily concerned with information seeking, but rather, is simply a behaviour which occurs within the broader set of everyday practices reflects Couldry’s (2004) media as practice paradigm. These findings highlight the need for more authentic and holistic contexts for media research. It is insufficient to consider information choices in isolation, or even from a wider range of information sources, such as that incorporated in the present study. Future media research must take greater account of the broader social contexts and practices in which media-oriented behaviours occur. The findings also call into question the previously assumed centrality of trust to information selection decisions. Citizens regularly use media they do not trust to find out about politics. If people are willing to use information sources they do not trust for democratically important topics such as politics, it is important that citizens possess the media literacy skills to effectively understand and evaluate the information they are presented with. Without the application of such media literacy skills, a steady diet of ‘fast food’ media may result in uninformed or misinformed voting decisions, which have implications for the effectiveness of democratic processes. This research has emphasized the need for further holistic and authentically contextualised media use research, to better understand how citizens use information sources to find out about important topics such as politics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Modelling how a word is activated in human memory is an important requirement for determining the probability of recall of a word in an extra-list cueing experiment. The spreading activation, spooky-action-at-a-distance and entanglement models have all been used to model the activation of a word. Recently a hypothesis was put forward that the mean activation levels of the respective models are as follows: Spreading � Entanglment � Spooking-action-at-a-distance This article investigates this hypothesis by means of a substantial empirical analysis of each model using the University of South Florida word association, rhyme and word norms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis investigates profiling and differentiating customers through the use of statistical data mining techniques. The business application of our work centres on examining individuals’ seldomly studied yet critical consumption behaviour over an extensive time period within the context of the wireless telecommunication industry; consumption behaviour (as oppose to purchasing behaviour) is behaviour that has been performed so frequently that it become habitual and involves minimal intentions or decision making. Key variables investigated are the activity initialised timestamp and cell tower location as well as the activity type and usage quantity (e.g., voice call with duration in seconds); and the research focuses are on customers’ spatial and temporal usage behaviour. The main methodological emphasis is on the development of clustering models based on Gaussian mixture models (GMMs) which are fitted with the use of the recently developed variational Bayesian (VB) method. VB is an efficient deterministic alternative to the popular but computationally demandingMarkov chainMonte Carlo (MCMC) methods. The standard VBGMMalgorithm is extended by allowing component splitting such that it is robust to initial parameter choices and can automatically and efficiently determine the number of components. The new algorithm we propose allows more effective modelling of individuals’ highly heterogeneous and spiky spatial usage behaviour, or more generally human mobility patterns; the term spiky describes data patterns with large areas of low probability mixed with small areas of high probability. Customers are then characterised and segmented based on the fitted GMM which corresponds to how each of them uses the products/services spatially in their daily lives; this is essentially their likely lifestyle and occupational traits. Other significant research contributions include fitting GMMs using VB to circular data i.e., the temporal usage behaviour, and developing clustering algorithms suitable for high dimensional data based on the use of VB-GMM.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Biochemical reactions underlying genetic regulation are often modelled as a continuous-time, discrete-state, Markov process, and the evolution of the associated probability density is described by the so-called chemical master equation (CME). However the CME is typically difficult to solve, since the state-space involved can be very large or even countably infinite. Recently a finite state projection method (FSP) that truncates the state-space was suggested and shown to be effective in an example of a model of the Pap-pili epigenetic switch. However in this example, both the model and the final time at which the solution was computed, were relatively small. Presented here is a Krylov FSP algorithm based on a combination of state-space truncation and inexact matrix-vector product routines. This allows larger-scale models to be studied and solutions for larger final times to be computed in a realistic execution time. Additionally the new method computes the solution at intermediate times at virtually no extra cost, since it is derived from Krylov-type methods for computing matrix exponentials. For the purpose of comparison the new algorithm is applied to the model of the Pap-pili epigenetic switch, where the original FSP was first demonstrated. Also the method is applied to a more sophisticated model of regulated transcription. Numerical results indicate that the new approach is significantly faster and extendable to larger biological models.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We consider a continuous time model for election timing in a Majoritarian Parliamentary System where the government maintains a constitutional right to call an early election. Our model is based on the two-party-preferred data that measure the popularity of the government and the opposition over time. We describe the poll process by a Stochastic Differential Equation (SDE) and use a martingale approach to derive a Partial Differential Equation (PDE) for the government’s expected remaining life in office. A comparison is made between a three-year and a four-year maximum term and we also provide the exercise boundary for calling an election. Impacts on changes in parameters in the SDE, the probability of winning the election and maximum terms on the call exercise boundaries are discussed and analysed. An application of our model to the Australian Federal Election for House of Representatives is also given.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we construct a mathematical model for the genetic regulatory network of the lactose operon. This mathematical model contains transcription and translation of the lactose permease (LacY) and a reporter gene GFP. The probability of transcription of LacY is determined by 14 binding states out of all 50 possible binding states of the lactose operon based on the quasi-steady-state assumption for the binding reactions, while we calculate the probability of transcription for the reporter gene GFP based on 5 binding states out of 19 possible binding states because the binding site O2 is missing for this reporter gene. We have tested different mechanisms for the transport of thio-methylgalactoside (TMG) and the effect of different Hill coefficients on the simulated LacY expression levels. Using this mathematical model we have realized one of the experimental results with different LacY concentrations, which are induced by different concentrations of TMG.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Building on the recommendations of the Bradley Review (2008), the Australian Federal government intends to promote a higher level of penetration of tertiary qualification across the broader Australian community which is anticipated to result in increased levels of standardisation across university degrees. In the field of property, tertiary academic programs are very closely aligned to the needs of a range of built environment professions and there are well developed synergies between the relevant professional bodies and the educational institutions. The strong nexus between the academic and the professional content is characterised by ongoing industry accreditation which nominates a range of outcomes which the academic programs must maintain across a range of specified metrics. Commonly, the accrediting bodies focus on standard of minimum requirements especially in the area of specialised subject areas where they require property graduates to demonstrate appropriate learning and attitudes. In addition to nominated content fields, in every undergraduate degree program there are also many other subjects which provide a richer experience for the students beyond the merely professional. This study focuses on the nonspecialised knowledge field which varies across the universities offering property degree courses as every university has the freedom to pursue its own policy for these non-specialised units. With universities being sensitive to their role of in the appropriate socialisation of new entrants, first year units have been used as a vehicle to support students’ transition into university education and the final year units seek to support students’ integration into the professional world. Consequentially, many property programs have to squeeze their property-specific units to accommodate more generic units for both first year and final year units and the resulting diversity is a feature of the current range of property degrees across Australia which this research will investigate. The matrix of knowledge fields nominated by the Australian Property Institute for accreditation of degrees accepted for Certified Practising Valuer (CPV) educational requirement and the complementary requirements of the other major accrediting body (RICS) are used to classify and compare similarities and differences across property degrees in the light of the streamlining anticipated from the Bradley Review.