900 resultados para Convex functions
Resumo:
In recent years the development and use of crash prediction models for roadway safety analyses have received substantial attention. These models, also known as safety performance functions (SPFs), relate the expected crash frequency of roadway elements (intersections, road segments, on-ramps) to traffic volumes and other geometric and operational characteristics. A commonly practiced approach for applying intersection SPFs is to assume that crash types occur in fixed proportions (e.g., rear-end crashes make up 20% of crashes, angle crashes 35%, and so forth) and then apply these fixed proportions to crash totals to estimate crash frequencies by type. As demonstrated in this paper, such a practice makes questionable assumptions and results in considerable error in estimating crash proportions. Through the use of rudimentary SPFs based solely on the annual average daily traffic (AADT) of major and minor roads, the homogeneity-in-proportions assumption is shown not to hold across AADT, because crash proportions vary as a function of both major and minor road AADT. For example, with minor road AADT of 400 vehicles per day, the proportion of intersecting-direction crashes decreases from about 50% with 2,000 major road AADT to about 15% with 82,000 AADT. Same-direction crashes increase from about 15% to 55% for the same comparison. The homogeneity-in-proportions assumption should be abandoned, and crash type models should be used to predict crash frequency by crash type. SPFs that use additional geometric variables would only exacerbate the problem quantified here. Comparison of models for different crash types using additional geometric variables remains the subject of future research.
Resumo:
This article applies social network analysis techniques to a case study of police corruption in order to produce findings which will assist in corruption prevention and investigation. Police corruption is commonly studied but rarely are sophisticated tools of analyse engaged to add rigour to the field of study. This article analyses the ‘First Joke’ a systemic and long lasting corruption network in the Queensland Police Force, a state police agency in Australia. It uses the data obtained from a commission of inquiry which exposed the network and develops hypotheses as to the nature of the networks structure based on existing literature into dark networks and criminal networks. These hypotheses are tested by entering the data into UCINET and analysing the outcomes through social network analysis measures of average path distance, centrality and density. The conclusions reached show that the network has characteristics not predicted by the literature.
Resumo:
Genomic and proteomic analyses have attracted a great deal of interests in biological research in recent years. Many methods have been applied to discover useful information contained in the enormous databases of genomic sequences and amino acid sequences. The results of these investigations inspire further research in biological fields in return. These biological sequences, which may be considered as multiscale sequences, have some specific features which need further efforts to characterise using more refined methods. This project aims to study some of these biological challenges with multiscale analysis methods and stochastic modelling approach. The first part of the thesis aims to cluster some unknown proteins, and classify their families as well as their structural classes. A development in proteomic analysis is concerned with the determination of protein functions. The first step in this development is to classify proteins and predict their families. This motives us to study some unknown proteins from specific families, and to cluster them into families and structural classes. We select a large number of proteins from the same families or superfamilies, and link them to simulate some unknown large proteins from these families. We use multifractal analysis and the wavelet method to capture the characteristics of these linked proteins. The simulation results show that the method is valid for the classification of large proteins. The second part of the thesis aims to explore the relationship of proteins based on a layered comparison with their components. Many methods are based on homology of proteins because the resemblance at the protein sequence level normally indicates the similarity of functions and structures. However, some proteins may have similar functions with low sequential identity. We consider protein sequences at detail level to investigate the problem of comparison of proteins. The comparison is based on the empirical mode decomposition (EMD), and protein sequences are detected with the intrinsic mode functions. A measure of similarity is introduced with a new cross-correlation formula. The similarity results show that the EMD is useful for detection of functional relationships of proteins. The third part of the thesis aims to investigate the transcriptional regulatory network of yeast cell cycle via stochastic differential equations. As the investigation of genome-wide gene expressions has become a focus in genomic analysis, researchers have tried to understand the mechanisms of the yeast genome for many years. How cells control gene expressions still needs further investigation. We use a stochastic differential equation to model the expression profile of a target gene. We modify the model with a Gaussian membership function. For each target gene, a transcriptional rate is obtained, and the estimated transcriptional rate is also calculated with the information from five possible transcriptional regulators. Some regulators of these target genes are verified with the related references. With these results, we construct a transcriptional regulatory network for the genes from the yeast Saccharomyces cerevisiae. The construction of transcriptional regulatory network is useful for detecting more mechanisms of the yeast cell cycle.
Resumo:
Optimal design for generalized linear models has primarily focused on univariate data. Often experiments are performed that have multiple dependent responses described by regression type models, and it is of interest and of value to design the experiment for all these responses. This requires a multivariate distribution underlying a pre-chosen model for the data. Here, we consider the design of experiments for bivariate binary data which are dependent. We explore Copula functions which provide a rich and flexible class of structures to derive joint distributions for bivariate binary data. We present methods for deriving optimal experimental designs for dependent bivariate binary data using Copulas, and demonstrate that, by including the dependence between responses in the design process, more efficient parameter estimates are obtained than by the usual practice of simply designing for a single variable only. Further, we investigate the robustness of designs with respect to initial parameter estimates and Copula function, and also show the performance of compound criteria within this bivariate binary setting.
Resumo:
Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0–1 loss function. The convexity makes these algorithms computationally efficient. The use of a surrogate, however, has statistical consequences that must be balanced against the computational virtues of convexity. To study these issues, we provide a general quantitative relationship between the risk as assessed using the 0–1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function—that it satisfies a pointwise form of Fisher consistency for classification. The relationship is based on a simple variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise, and show that in this case, strictly convex loss functions lead to faster rates of convergence of the risk than would be implied by standard uniform convergence arguments. Finally, we present applications of our results to the estimation of convergence rates in function classes that are scaled convex hulls of a finite-dimensional base class, with a variety of commonly used loss functions.
Resumo:
We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.
Resumo:
One of the nice properties of kernel classifiers such as SVMs is that they often produce sparse solutions. However, the decision functions of these classifiers cannot always be used to estimate the conditional probability of the class label. We investigate the relationship between these two properties and show that these are intimately related: sparseness does not occur when the conditional probabilities can be unambiguously estimated. We consider a family of convex loss functions and derive sharp asymptotic results for the fraction of data that becomes support vectors. This enables us to characterize the exact trade-off between sparseness and the ability to estimate conditional probabilities for these loss functions.
Resumo:
A number of learning problems can be cast as an Online Convex Game: on each round, a learner makes a prediction x from a convex set, the environment plays a loss function f, and the learner’s long-term goal is to minimize regret. Algorithms have been proposed by Zinkevich, when f is assumed to be convex, and Hazan et al., when f is assumed to be strongly convex, that have provably low regret. We consider these two settings and analyze such games from a minimax perspective, proving minimax strategies and lower bounds in each case. These results prove that the existing algorithms are essentially optimal.
Resumo:
Multivariate volatility forecasts are an important input in many financial applications, in particular portfolio optimisation problems. Given the number of models available and the range of loss functions to discriminate between them, it is obvious that selecting the optimal forecasting model is challenging. The aim of this thesis is to thoroughly investigate how effective many commonly used statistical (MSE and QLIKE) and economic (portfolio variance and portfolio utility) loss functions are at discriminating between competing multivariate volatility forecasts. An analytical investigation of the loss functions is performed to determine whether they identify the correct forecast as the best forecast. This is followed by an extensive simulation study examines the ability of the loss functions to consistently rank forecasts, and their statistical power within tests of predictive ability. For the tests of predictive ability, the model confidence set (MCS) approach of Hansen, Lunde and Nason (2003, 2011) is employed. As well, an empirical study investigates whether simulation findings hold in a realistic setting. In light of these earlier studies, a major empirical study seeks to identify the set of superior multivariate volatility forecasting models from 43 models that use either daily squared returns or realised volatility to generate forecasts. This study also assesses how the choice of volatility proxy affects the ability of the statistical loss functions to discriminate between forecasts. Analysis of the loss functions shows that QLIKE, MSE and portfolio variance can discriminate between multivariate volatility forecasts, while portfolio utility cannot. An examination of the effective loss functions shows that they all can identify the correct forecast at a point in time, however, their ability to discriminate between competing forecasts does vary. That is, QLIKE is identified as the most effective loss function, followed by portfolio variance which is then followed by MSE. The major empirical analysis reports that the optimal set of multivariate volatility forecasting models includes forecasts generated from daily squared returns and realised volatility. Furthermore, it finds that the volatility proxy affects the statistical loss functions’ ability to discriminate between forecasts in tests of predictive ability. These findings deepen our understanding of how to choose between competing multivariate volatility forecasts.