15 resultados para Zero-inflated models, Statistical models, Poisson, Negative binomial, Statistical methods
em Aston University Research Archive
Resumo:
Emrouznejad et al. (2010) proposed a Semi-Oriented Radial Measure (SORM) model for assessing the efficiency of Decision Making Units (DMUs) by Data Envelopment Analysis (DEA) with negative data. This paper provides a necessary and sufficient condition for boundedness of the input and output oriented SORM models.
Resumo:
An organism living in water, and present at low density, may be distributed at random and therefore, samples taken from the water are likely to be distributed according to the Poisson distribution. The distribution of many organisms, however, is not random, individuals being either aggregated into clusters or more uniformly distributed. By fitting a Poisson distribution to data, it is only possible to test the hypothesis that an observed set of frequencies does not deviate significantly from an expected random pattern. Significant deviations from random, either as a result of increasing uniformity or aggregation, may be recognized by either rejection of the random hypothesis or by examining the variance/mean (V/M) ratio of the data. Hence, a V/M ratio not significantly different from unity indicates a random distribution, greater than unity a clustered distribution, and less then unity a regular or uniform distribution . If individual cells are clustered, however, the negative binomial distribution should provide a better description of the data. In addition, a parameter of this distribution, viz., the binomial exponent (k), may be used as a measure of the ‘intensity’ of aggregation present. Hence, this Statnote describes how to fit the negative binomial distribution to counts of a microorganism in samples taken from a freshwater environment.
Resumo:
In this thesis, we explore the relationship between absorptive capacity and alliances, and their influence on firms’ competitive advantage in the US and European biopharmaceutical sectors. The study undertaken in this thesis is based on data from a large-scale international survey of over 2,500 biopharmaceutical firms in the US, the UK, Germany, France and Ireland. The thesis advanced a conceptual framework, which integrated the multi-dimensions of absorptive capacity, exploration-exploitation alliances, and competitive advantage, into a biopharmaceutical firm’s new product development process. The proposed framework is then tested in the empirical analysis, using truncated models to estimate firms’ sales growth, with zero-inflated negative binominal models capturing the number of alliances in which firms engage, and aspects of realised absorptive capacity analysed by ordinal probit models. The empirical results suggest that both skill-based and exploitation-based absorptive capacity play crucial roles in shaping firms’ competitive advantage, while neither exploratory nor exploitation alliances contribute to the improvement in firms’ competitive position. In terms of the interaction between firms’ absorptive capacity and alliance behaviour, the results suggest that engagement with exploratory alliances depends more strongly on firms’ assimilation capability (skills levels and continuity of R&D activities), while participation in exploitation alliances is more conditional on firms’ relevant knowledge monitoring capability. The results highlight the major differences between the determinants of firms’ alliance behaviour, and competitive advantage in the US and Europe – in the US firms’ skill levels prove more significant in determining firms’ engagement with exploratory alliances, whereas in Europe continuity of R&D proves more important. Correspondingly, while US firms’ engagement with exploitation alliances depends on market monitoring capability, that in Europe is more strongly linked to exploitation-based absorptive capacity. In respect of the determinants of firms’ competitive advantage – in Europe, market monitoring capability, engagement with exploitation alliances, and continuous R&D activities, prove more important, while in the US, it is firms’ market characteristics that matter most.
Resumo:
We study the performance of Low Density Parity Check (LDPC) error-correcting codes using the methods of statistical physics. LDPC codes are based on the generation of codewords using Boolean sums of the original message bits by employing two randomly-constructed sparse matrices. These codes can be mapped onto Ising spin models and studied using common methods of statistical physics. We examine various regular constructions and obtain insight into their theoretical and practical limitations. We also briefly report on results obtained for irregular code constructions, for codes with non-binary alphabet, and on how a finite system size effects the error probability.
Resumo:
The PC12 and SH-SY5Y cell models have been proposed as potentially realistic models to investigate neuronal cell toxicity. The effects of oxidative stress (OS) caused by both H2O2 and Aβ on both cell models were assessed by several methods. Cell toxicity was quantitated by measuring cell viability using the 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium (MTT) viability assay, an indicator of the integrity of the electron transfer chain (ETC), and cell morphology by fluorescence and video microscopy, both of which showed OS to cause decreased viability and changes in morphology. Levels of intracellular peroxide production, and changes in glutathione and carbonyl levels were also assessed, which showed OS to cause increases in intracellular peroxide production, glutathione and carbonyl levels. Differentiated SH-SY5y cells were also employed and observed to exhibit the greatest sensitivity to toxicity. The neurotrophic factor, nerve growth factor (NGF) was shown to cause protection against OS. Cells pre-treated with NGF showed higher viability after OS, generally less apoptotic morphology, recorded less apoptotic nucleiods, generally lower levels of intracellular peroxides and changes in gene expression. The neutrophic factor, brain derived growth factor (BDNF) and ascorbic acid (AA) were also investigated. BDNF showed no specific neuroprotection, however the preliminary data does warrant further investigation. AA showed a 'janus face' showing either anti-oxidant action and neuroprotection or pro-oxidant action depending on the situation. Results showed that the toxic effects of compounds such as Aβ and H2O2 are cell type dependent, and that OS alters glutathione metabolism in neuronal cells. Following toxic insult, glutathione levels are depleted to low levels. It is herein suggested that this lowering triggers an adaptive response causing alterations in glutathione metabolism as assessed by evaluation of glutathione mRNA biosynthetic enzyme expression and the subsequent increase in glutathione peroxidase (GPX) levels.
Resumo:
This paper describes how modern machine learning techniques can be used in conjunction with statistical methods to forecast short term movements in exchange rates, producing models suitable for use in trading. It compares the results achieved by two different techniques, and shows how they can be used in a complementary fashion. The paper draws on experience of both inter- and intra-day forecasting taken from earlier studies conducted by Logica and Chemical Bank Quantitative Research and Trading (QRT) group's experience in developing trading models.
Resumo:
The last decade has seen a considerable increase in the application of quantitative methods in the study of histological sections of brain tissue and especially in the study of neurodegenerative disease. These disorders are characterised by the deposition and aggregation of abnormal or misfolded proteins in the form of extracellular protein deposits such as senile plaques (SP) and intracellular inclusions such as neurofibrillary tangles (NFT). Quantification of brain lesions and studying the relationships between lesions and normal anatomical features of the brain, including neurons, glial cells, and blood vessels, has become an important method of elucidating disease pathogenesis. This review describes methods for quantifying the abundance of a histological feature such as density, frequency, and 'load' and the sampling methods by which quantitative measures can be obtained including plot/quadrat sampling, transect sampling, and the point-quarter method. In addition, methods for determining the spatial pattern of a histological feature, i.e., whether the feature is distributed at random, regularly, or is aggregated into clusters, are described. These methods include the use of the Poisson and binomial distributions, pattern analysis by regression, Fourier analysis, and methods based on mapped point patterns. Finally, the statistical methods available for studying the degree of spatial correlation between pathological lesions and neurons, glial cells, and blood vessels are described.
Resumo:
This thesis presents a thorough and principled investigation into the application of artificial neural networks to the biological monitoring of freshwater. It contains original ideas on the classification and interpretation of benthic macroinvertebrates, and aims to demonstrate their superiority over the biotic systems currently used in the UK to report river water quality. The conceptual basis of a new biological classification system is described, and a full review and analysis of a number of river data sets is presented. The biological classification is compared to the common biotic systems using data from the Upper Trent catchment. This data contained 292 expertly classified invertebrate samples identified to mixed taxonomic levels. The neural network experimental work concentrates on the classification of the invertebrate samples into biological class, where only a subset of the sample is used to form the classification. Other experimentation is conducted into the identification of novel input samples, the classification of samples from different biotopes and the use of prior information in the neural network models. The biological classification is shown to provide an intuitive interpretation of a graphical representation, generated without reference to the class labels, of the Upper Trent data. The selection of key indicator taxa is considered using three different approaches; one novel, one from information theory and one from classical statistical methods. Good indicators of quality class based on these analyses are found to be in good agreement with those chosen by a domain expert. The change in information associated with different levels of identification and enumeration of taxa is quantified. The feasibility of using neural network classifiers and predictors to develop numeric criteria for the biological assessment of sediment contamination in the Great Lakes is also investigated.
Resumo:
Objective - This study investigated and compared the prevalence of microalbuminuria and overt proteinuria and their determinants in a cohort of UK resident patients of white European or south Asian ethnicity with type 2 diabetes mellitus. Research design and methods - A total of 1978 patients, comprising 1486 of south Asian and 492 of white European ethnicity, in 25 general practices in Coventry and Birmingham inner city areas in England were studied in a cross-sectional study. Demographic and risk factor data were collected and presence of microalbuminuria and overt proteinuria assessed. Main outcome measures - Prevalences of microalbuminuria and overt proteinuria. Results - Urinary albumin:creatinine measurements were available for 1852 (94%) patients. The south Asian group had a lower prevalence of microalbuminuria, 19% vs. 23% and a higher prevalence of overt proteinuria, 8% vs. 3%, X2?=?15.85, 2df, P?=?0.0004. In multiple logistic regression models, adjusted for confounding factors, significantly increased risk for the south Asian vs. white European patients for overt proteinuria was shown; OR (95% CI) 2.17 (1.05, 4.49), P?=?0.0365. For microalbuminuria, an interaction effect for ethnicity and duration of diabetes suggested that risk for south Asian patients was lower in early years following diagnosis; OR for SA vs. WH at durations 0 and 1 year were 0.56 (0.37, 0.86) and 0.59 (0.39, 0.89) respectively. After 20 years’ duration, OR?=?1.40 (0.63, 3.08). Limitations - Comparability of ethnicity defined groups; statistical methods controlled for differences between groups, but residual confounding may remain. Analyses are based on a single measure of albumin:creatinine ratio. Conclusions - There were significant differences between ethnicity groups in risk factor profiles and microalbuminuria and overt proteinuria outcomes. Whilst south Asian patients had no excess risk of microalbuminuria, the risk of overt proteinuria was elevated significantly, which might be explained by faster progression of renal dysfunction in patients of south Asian ethnicity.
Resumo:
Solving many scientific problems requires effective regression and/or classification models for large high-dimensional datasets. Experts from these problem domains (e.g. biologists, chemists, financial analysts) have insights into the domain which can be helpful in developing powerful models but they need a modelling framework that helps them to use these insights. Data visualisation is an effective technique for presenting data and requiring feedback from the experts. A single global regression model can rarely capture the full behavioural variability of a huge multi-dimensional dataset. Instead, local regression models, each focused on a separate area of input space, often work better since the behaviour of different areas may vary. Classical local models such as Mixture of Experts segment the input space automatically, which is not always effective and it also lacks involvement of the domain experts to guide a meaningful segmentation of the input space. In this paper we addresses this issue by allowing domain experts to interactively segment the input space using data visualisation. The segmentation output obtained is then further used to develop effective local regression models.
Resumo:
This paper provides the most fully comprehensive evidence to date on whether or not monetary aggregates are valuable for forecasting US inflation in the early to mid 2000s. We explore a wide range of different definitions of money, including different methods of aggregation and different collections of included monetary assets. In our forecasting experiment we use two nonlinear techniques, namely, recurrent neural networks and kernel recursive least squares regressiontechniques that are new to macroeconomics. Recurrent neural networks operate with potentially unbounded input memory, while the kernel regression technique is a finite memory predictor. The two methodologies compete to find the best fitting US inflation forecasting models and are then compared to forecasts from a nave random walk model. The best models were nonlinear autoregressive models based on kernel methods. Our findings do not provide much support for the usefulness of monetary aggregates in forecasting inflation. Beyond its economic findings, our study is in the tradition of physicists' long-standing interest in the interconnections among statistical mechanics, neural networks, and related nonparametric statistical methods, and suggests potential avenues of extension for such studies. © 2010 Elsevier B.V. All rights reserved.
Resumo:
Two new methodologies are introduced to improve inference in the evaluation of mutual fund performance against benchmarks. First, the benchmark models are estimated using panel methods with both fund and time effects. Second, the non-normality of individual mutual fund returns is accounted for by using panel bootstrap methods. We also augment the standard benchmark factors with fund-specific characteristics, such as fund size. Using a dataset of UK equity mutual fund returns, we find that fund size has a negative effect on the average fund manager’s benchmark-adjusted performance. Further, when we allow for time effects and the non-normality of fund returns, we find that there is no evidence that even the best performing fund managers can significantly out-perform the augmented benchmarks after fund management charges are taken into account.
Resumo:
The topic of this thesis is the development of knowledge based statistical software. The shortcomings of conventional statistical packages are discussed to illustrate the need to develop software which is able to exhibit a greater degree of statistical expertise, thereby reducing the misuse of statistical methods by those not well versed in the art of statistical analysis. Some of the issues involved in the development of knowledge based software are presented and a review is given of some of the systems that have been developed so far. The majority of these have moved away from conventional architectures by adopting what can be termed an expert systems approach. The thesis then proposes an approach which is based upon the concept of semantic modelling. By representing some of the semantic meaning of data, it is conceived that a system could examine a request to apply a statistical technique and check if the use of the chosen technique was semantically sound, i.e. will the results obtained be meaningful. Current systems, in contrast, can only perform what can be considered as syntactic checks. The prototype system that has been implemented to explore the feasibility of such an approach is presented, the system has been designed as an enhanced variant of a conventional style statistical package. This involved developing a semantic data model to represent some of the statistically relevant knowledge about data and identifying sets of requirements that should be met for the application of the statistical techniques to be valid. Those areas of statistics covered in the prototype are measures of association and tests of location.
Resumo:
Background: Allergy is a form of hypersensitivity to normally innocuous substances, such as dust, pollen, foods or drugs. Allergens are small antigens that commonly provoke an IgE antibody response. There are two types of bioinformatics-based allergen prediction. The first approach follows FAO/WHO Codex alimentarius guidelines and searches for sequence similarity. The second approach is based on identifying conserved allergenicity-related linear motifs. Both approaches assume that allergenicity is a linearly coded property. In the present study, we applied ACC pre-processing to sets of known allergens, developing alignment-independent models for allergen recognition based on the main chemical properties of amino acid sequences.Results: A set of 684 food, 1,156 inhalant and 555 toxin allergens was collected from several databases. A set of non-allergens from the same species were selected to mirror the allergen set. The amino acids in the protein sequences were described by three z-descriptors (z1, z2 and z3) and by auto- and cross-covariance (ACC) transformation were converted into uniform vectors. Each protein was presented as a vector of 45 variables. Five machine learning methods for classification were applied in the study to derive models for allergen prediction. The methods were: discriminant analysis by partial least squares (DA-PLS), logistic regression (LR), decision tree (DT), naïve Bayes (NB) and k nearest neighbours (kNN). The best performing model was derived by kNN at k = 3. It was optimized, cross-validated and implemented in a server named AllerTOP, freely accessible at http://www.pharmfac.net/allertop. AllerTOP also predicts the most probable route of exposure. In comparison to other servers for allergen prediction, AllerTOP outperforms them with 94% sensitivity.Conclusions: AllerTOP is the first alignment-free server for in silico prediction of allergens based on the main physicochemical properties of proteins. Significantly, as well allergenicity AllerTOP is able to predict the route of allergen exposure: food, inhalant or toxin. © 2013 Dimitrov et al.; licensee BioMed Central Ltd.