30 resultados para Genetics Statistical methods
Resumo:
The INTAMAP FP6 project has developed an interoperable framework for real-time automatic mapping of critical environmental variables by extending spatial statistical methods and employing open, web-based, data exchange protocols and visualisation tools. This paper will give an overview of the underlying problem, of the project, and discuss which problems it has solved and which open problems seem to be most relevant to deal with next. The interpolation problem that INTAMAP solves is the generic problem of spatial interpolation of environmental variables without user interaction, based on measurements of e.g. PM10, rainfall or gamma dose rate, at arbitrary locations or over a regular grid covering the area of interest. It deals with problems of varying spatial resolution of measurements, the interpolation of averages over larger areas, and with providing information on the interpolation error to the end-user. In addition, monitoring network optimisation is addressed in a non-automatic context.
Resumo:
A history of government drug regulation and the relationship between the pharmaceutical companies in the U.K. and the licensing authority is outlined. Phases of regulatory stringency are identified with the formation of the Committees on Safety of Drugs and Medicines viewed as watersheds. A study of the impact of government regulation on industrial R&D activities focuses on the effects on the rate and direction of new product innovation. A literature review examines the decline in new chemical entity innovation. Regulations are cited as a major but not singular cause of the decline. Previous research attempting to determine the causes of such a decline on an empirical basis is given and the methodological problems associated with such research are identified. The U.K. owned sector of the British pharmaceutical industry is selected for a study employing a bottom-up approach allowing disaggregation of data. A historical background to the industry is provided, with each company analysed or a case study basis. Variations between companies regarding the policies adopted for R&D are emphasised. The process of drug innovation is described in order to determine possible indicators of the rate and direction of inventive and innovative activity. All possible indicators are considered and their suitability assessed. R&D expenditure data for the period 1960-1983 is subsequently presented as an input indicator. Intermediate output indicators are treated in a similar way and patent data are identified as a readily-available and useful source. The advantages and disadvantages of using such data are considered. Using interview material, patenting policies for most of the U.K. companies are described providing a background for a patent-based study. Sources of patent data are examined with an emphasis on computerised systems. A number of searches using a variety of sources are presented. Patent family size is examined as a possible indicator of an invention's relative importance. The patenting activity of the companies over the period 1960-1983 is given and the variation between companies is noted. The relationship between patent data and other indicators used is analysed using statistical methods resulting in an apparent lack of correlation. An alternative approach taking into account variations in company policy and phases in research activity indicates a stronger relationship between patenting activity, R&D Expenditure and NCE output over the period. The relationship is not apparent at an aggregated company level. Some evidence is presented for a relationship between phases of regulatory stringency, inventive and innovative activity but the importance of other factors is emphasised.
Resumo:
This thesis presents a thorough and principled investigation into the application of artificial neural networks to the biological monitoring of freshwater. It contains original ideas on the classification and interpretation of benthic macroinvertebrates, and aims to demonstrate their superiority over the biotic systems currently used in the UK to report river water quality. The conceptual basis of a new biological classification system is described, and a full review and analysis of a number of river data sets is presented. The biological classification is compared to the common biotic systems using data from the Upper Trent catchment. This data contained 292 expertly classified invertebrate samples identified to mixed taxonomic levels. The neural network experimental work concentrates on the classification of the invertebrate samples into biological class, where only a subset of the sample is used to form the classification. Other experimentation is conducted into the identification of novel input samples, the classification of samples from different biotopes and the use of prior information in the neural network models. The biological classification is shown to provide an intuitive interpretation of a graphical representation, generated without reference to the class labels, of the Upper Trent data. The selection of key indicator taxa is considered using three different approaches; one novel, one from information theory and one from classical statistical methods. Good indicators of quality class based on these analyses are found to be in good agreement with those chosen by a domain expert. The change in information associated with different levels of identification and enumeration of taxa is quantified. The feasibility of using neural network classifiers and predictors to develop numeric criteria for the biological assessment of sediment contamination in the Great Lakes is also investigated.
Resumo:
Research into FL/EFL macro-reading (the effect of the broader context of reading) has been little explored in spite of its importance in the FL/EFL reading programmes. This study was designed to build on previous work by explaining in more depth the influence of the socio-educational reading environment in an Arab university (Al-Fateh University in Tripoli, Libya) - as reported by students, upon these students' reading ability in English and Arabic (particularly the former). Certain aspects of the lecturers' reading habits and attitudes and classroom operation were also investigated. Written cloze tests in English and Arabic and self-administered questionnaires were given to 125 preliminary-year undergraduates in three faculties of Al-Fateh University on the basis of their use of English as a medium of instruction (one representing the Arts' stream and two representing the Science stream). Twenty two lecturers were interviewed and observed by an inventory technique along with twenty other preliminary-year students. Factor analysis and standard multiple regression technique were among the statistical methods used to analyse the main data. The findings demonstrate a significant relationship between reading ability in English and the reading individual and environmental variables - as defined in the study. A combination of common and different series of such predictors were found accountable for the variation (43% for the first year English specialist; 48% for the combined Medicine student sample) in the English reading tests. Also found was a significant, though not very large, relationship between reading ability in Arabic and the reading environment. Non-statistical but objective analyses, based on the present data, also revealed an overall association between English reading performance and an important number of reading environmental variables - where many `poor' users of the reading environment (particularly the academic one) obtained low scores in the English cloze tests. Accepting the limitations of a single study, it is nevertheless clear that the reading environment at the University is in need of improvement and that students' use of it also requires better guidance and training in how to use it effectively. Suggestions are made for appropriate educational changes.
Resumo:
The fluids used in hydraulic systems inevitably contain large numbers of small, solid particles, a phenomenon known as 'fluid contamination'. Particles enter a hydraulic system from the environment, and are generated within it by processes of wear. At the same time, particles are removed from the system fluid by sedimentation and in hydraulic filters. This thesis considers the problems caused by fluid contamination, as they affect a manufacturer of axial piston pumps. The specific project aim was to investigate methods of predicting or determining the effects of fluid contamination on this type of pump. The thesis starts with a theoretical analysis of the contaminated lubrication of a slipper-pad bearing. Statistical methods are used to develop a model of the blocking, by particles, of the control capillaries used in such bearings. The results obtained are compared to published, experimental data. Poor correlation between theory and practice suggests that more research is required in this area before such theoretical analysis can be used in industry. Accelerated wear tests have been developed in the U.S.A. in an attempt to predict pump life when operating on contaminated fluids. An analysis of such tests shows that reliability data can only be obtained from extensive test programmes. The value of contamination testing is suggested to be in determining failure modes, and in identifying those pump components which are susceptible to the effects of contamination. A suitable test is described, and the results of a series of tests on axial piston pumps are presented and discussed. The thesis concludes that pump reliability data can only be obtained from field experience. The level of confidence which can be placed in results from normal laboratory testing is shown to be too low for the data to be of real value. Recommendations are therefore given for the ways in which service data should be collected and analysed.
Resumo:
This work is concerned with the development of techniques for the evaluation of large-scale highway schemes with particular reference to the assessment of their costs and benefits in the context of the current transport planning (T.P.P.) process. It has been carried out in close cooperation with West Midlands County Council, although its application and results are applicable elsewhere. The background to highway evaluation and its development in recent years has been described and the emergence of a number of deficiencies in current planning practise noted. One deficiency in particular stood out, that stemming from inadequate methods of scheme generation and the research has concentrated upon improving this stage of appraisal, to ensure that subsequent stages of design, assessment and implementation are based upon a consistent and responsive foundation. Deficiencies of scheme evaluation were found to stem from inadequate development of appraisal methodologies suffering from difficulties of valuation, measurement and aggregation of the disparate variables that characterise highway evaluation. A failure to respond to local policy priorities was also noted. A 'problem' rather than 'goals' based approach to scheme generation was taken, as it represented the current and foreseeable resource allocation context more realistically. A review of techniques with potential for highway problem based scheme generation, which would work within a series of practical and theoretical constraints were assessed and that of multivariate analysis, and classical factor analysis in particular, was selected, because it offerred considerable application to the difficulties of valuation, measurement and aggregation that existed. Computer programs were written to adapt classical factor analysis to the requirements of T.P.P. highway evaluation, using it to derive a limited number of factors which described the extensive quantity of highway problem data. From this, a series of composite problem scores for 1979 were derived for a case study area of south Birmingham, based upon the factorial solutions, and used to assess highway sites in terms of local policy issues. The methodology was assessed in the light of its ability to describe highway problems in both aggregate and disaggregate terms, to guide scheme design, coordinate with current scheme evaluation methods, and in general to improve upon current appraisal. Analysis of the results was both in subjective, 'common-sense' terms and using statistical methods to assess the changes in problem definition, distribution and priorities that emerged. Overall, the technique was found to improve upon current scheme generation methods in all respects and in particular in overcoming the problems of valuation, measurement and aggregation without recourse to unsubstantiated and questionable assumptions. A number of deficiencies which remained have been outlined and a series of research priorities described which need to be reviewed in the light of current and future evaluation needs.
Resumo:
Exploratory analysis of petroleum geochemical data seeks to find common patterns to help distinguish between different source rocks, oils and gases, and to explain their source, maturity and any intra-reservoir alteration. However, at the outset, one is typically faced with (a) a large matrix of samples, each with a range of molecular and isotopic properties, (b) a spatially and temporally unrepresentative sampling pattern, (c) noisy data and (d) often, a large number of missing values. This inhibits analysis using conventional statistical methods. Typically, visualisation methods like principal components analysis are used, but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this paper we introduce a complementary approach based on a non-linear probabilistic model. Generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, while also dealing with missing data. We show how using generative topographic mapping also provides an optimal method with which to replace missing values in two geochemical datasets, particularly where a large proportion of the data is missing.
Resumo:
Objective - This study investigated and compared the prevalence of microalbuminuria and overt proteinuria and their determinants in a cohort of UK resident patients of white European or south Asian ethnicity with type 2 diabetes mellitus. Research design and methods - A total of 1978 patients, comprising 1486 of south Asian and 492 of white European ethnicity, in 25 general practices in Coventry and Birmingham inner city areas in England were studied in a cross-sectional study. Demographic and risk factor data were collected and presence of microalbuminuria and overt proteinuria assessed. Main outcome measures - Prevalences of microalbuminuria and overt proteinuria. Results - Urinary albumin:creatinine measurements were available for 1852 (94%) patients. The south Asian group had a lower prevalence of microalbuminuria, 19% vs. 23% and a higher prevalence of overt proteinuria, 8% vs. 3%, X2?=?15.85, 2df, P?=?0.0004. In multiple logistic regression models, adjusted for confounding factors, significantly increased risk for the south Asian vs. white European patients for overt proteinuria was shown; OR (95% CI) 2.17 (1.05, 4.49), P?=?0.0365. For microalbuminuria, an interaction effect for ethnicity and duration of diabetes suggested that risk for south Asian patients was lower in early years following diagnosis; OR for SA vs. WH at durations 0 and 1 year were 0.56 (0.37, 0.86) and 0.59 (0.39, 0.89) respectively. After 20 years’ duration, OR?=?1.40 (0.63, 3.08). Limitations - Comparability of ethnicity defined groups; statistical methods controlled for differences between groups, but residual confounding may remain. Analyses are based on a single measure of albumin:creatinine ratio. Conclusions - There were significant differences between ethnicity groups in risk factor profiles and microalbuminuria and overt proteinuria outcomes. Whilst south Asian patients had no excess risk of microalbuminuria, the risk of overt proteinuria was elevated significantly, which might be explained by faster progression of renal dysfunction in patients of south Asian ethnicity.
Resumo:
A major challenge in text mining for biomedicine is automatically extracting protein-protein interactions from the vast amount of biomedical literature. We have constructed an information extraction system based on the Hidden Vector State (HVS) model for protein-protein interactions. The HVS model can be trained using only lightly annotated data whilst simultaneously retaining sufficient ability to capture the hierarchical structure. When applied in extracting protein-protein interactions, we found that it performed better than other established statistical methods and achieved 61.5% in F-score with balanced recall and precision values. Moreover, the statistical nature of the pure data-driven HVS model makes it intrinsically robust and it can be easily adapted to other domains.
Resumo:
Richard Armstrong was educated at King’s College London (1968-1971) and subsequently at St. Catherine’s College Oxford (1972-1976). His early research involved the application of statistical methods to problems in botany and ecology. For the last 34 years, he has been a lecturer in Botany, Microbiology, Ecology, Neuroscience, and Optometry at the University of Aston. His current research interests include the application of quantitative methods to the study of neuropathology of neurodegenerative diseases with special reference to vision and the visual system.
Resumo:
In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities closely related to the target effectively and efficiently. With respect to such relatedness, a measure of relation strength between entities is defined. LRD uses relation strength to enhance the vector space model, and uses the enhanced vector space model for query based IR on documents and clustering documents in order to discover complex relationships among terms and entities. Our experiments on a standard dataset for query based IR shows that our LRD method performed significantly better than traditional vector space model and other five standard statistical methods for vector expansion.
Resumo:
The research presented in this thesis investigates the nature of the relationship between the development of the Knowledge-Based Economy (KBE) and Structural Funds (SF) in European regions. A particular focus is placed on the West Midlands (UK) and Silesia (Poland). The time-frame taken into account in this research is the years 1999 to 2009. This is methodologically addressed by firstly establishing a new way of calculating the General Index of the KBE for all of the EU regions; secondly, applying a number of statistical methods to measure the influence of the Funds on the changes in the regional KBE over time; and finally, by conducting a series of semi-structured stakeholder interviews in the two key case study regions: the West Midlands and Silesia. The three main findings of the thesis are: first, over the examined time-frame, the values of the KBE General Index increased in over 66% of the EU regions; furthermore, the number of the “new” EU regions in which the KBE increased over time is far higher than in the “old” EU. Second, any impact of Structural Funds on the regional KBE occurs only in the minority of the European regions and any form of functional dependency between the two can be observed only in 30% of the regions. Third, although the pattern of development of the regional KBE and the correlation coefficients differ in the cases of Silesia and the West Midlands, the analysis of variance carried out yields identical results for both regions. Furthermore, the qualitative analysis’ results show similarities in the approach towards the Structural Funds in the two key case-study regions.
Resumo:
Objective: Patients with Tourette syndrome (TS) often report characteristic sensory experiences, also called premonitory urges (PUs), which precede tic expression and have high diagnostic relevance. This study investigated the usefulness of a scale developed and validated in children and adolescents-the Premonitory Urge for Tics Scale (PUTS, Woods et al., 2005 [13])-for the assessment of PUs in adult patients with TS. Method: Standard statistical methods were applied to test the psychometric properties of the PUTS in 102 adult TS outpatients recruited from two specialist clinics in the United Kingdom. Results: The PUTS showed good acceptability and endorsement rates, with evenly distributed scores and low floor and ceiling effects. Item-total correlations were moderate to strong; PUTS total scores were significantly correlated with quantitative measures of TS severity. The PUTS showed excellent internal consistency reliability (Cronbach's alpha=0.85) and Spearman's correlations demonstrated satisfactory convergent and discriminant validity. Conclusions: Although originally devised to assess urges to tic in young patients with TS, the PUTS demonstrated good psychometric properties in a large sample of adults recruited at specialist TS clinics. This instrument is therefore recommended for use across the life span as a valid and reliable self-report measure of sensory experiences accompanying tic expression. © 2013 The Japanese Society of Child Neurology.
Resumo:
Solving many scientific problems requires effective regression and/or classification models for large high-dimensional datasets. Experts from these problem domains (e.g. biologists, chemists, financial analysts) have insights into the domain which can be helpful in developing powerful models but they need a modelling framework that helps them to use these insights. Data visualisation is an effective technique for presenting data and requiring feedback from the experts. A single global regression model can rarely capture the full behavioural variability of a huge multi-dimensional dataset. Instead, local regression models, each focused on a separate area of input space, often work better since the behaviour of different areas may vary. Classical local models such as Mixture of Experts segment the input space automatically, which is not always effective and it also lacks involvement of the domain experts to guide a meaningful segmentation of the input space. In this paper we addresses this issue by allowing domain experts to interactively segment the input space using data visualisation. The segmentation output obtained is then further used to develop effective local regression models.
Resumo:
This paper provides the most fully comprehensive evidence to date on whether or not monetary aggregates are valuable for forecasting US inflation in the early to mid 2000s. We explore a wide range of different definitions of money, including different methods of aggregation and different collections of included monetary assets. In our forecasting experiment we use two nonlinear techniques, namely, recurrent neural networks and kernel recursive least squares regressiontechniques that are new to macroeconomics. Recurrent neural networks operate with potentially unbounded input memory, while the kernel regression technique is a finite memory predictor. The two methodologies compete to find the best fitting US inflation forecasting models and are then compared to forecasts from a nave random walk model. The best models were nonlinear autoregressive models based on kernel methods. Our findings do not provide much support for the usefulness of monetary aggregates in forecasting inflation. Beyond its economic findings, our study is in the tradition of physicists' long-standing interest in the interconnections among statistical mechanics, neural networks, and related nonparametric statistical methods, and suggests potential avenues of extension for such studies. © 2010 Elsevier B.V. All rights reserved.