23 resultados para Mixed Type Variables Clustering
em Aston University Research Archive
Resumo:
A multi-chromosome GA (Multi-GA) was developed, based upon concepts from the natural world, allowing improved flexibility in a number of areas including representation, genetic operators, their parameter rates and real world multi-dimensional applications. A series of experiments were conducted, comparing the performance of the Multi-GA to a traditional GA on a number of recognised and increasingly complex test optimisation surfaces, with promising results. Further experiments demonstrated the Multi-GA's flexibility through the use of non-binary chromosome representations and its applicability to dynamic parameterisation. A number of alternative and new methods of dynamic parameterisation were investigated, in addition to a new non-binary 'Quotient crossover' mechanism. Finally, the Multi-GA was applied to two real world problems, demonstrating its ability to handle mixed type chromosomes within an individual, the limited use of a chromosome level fitness function, the introduction of new genetic operators for structural self-adaptation and its viability as a serious real world analysis tool. The first problem involved optimum placement of computers within a building, allowing the Multi-GA to use multiple chromosomes with different type representations and different operators in a single individual. The second problem, commonly associated with Geographical Information Systems (GIS), required a spatial analysis location of the optimum number and distribution of retail sites over two different population grids. In applying the Multi-GA, two new genetic operators (addition and deletion) were developed and explored, resulting in the definition of a mechanism for self-modification of genetic material within the Multi-GA structure and a study of this behaviour.
Resumo:
We investigated the role of local and global information on perceptual encoding of faces in patient HJA, who shows prosopagnosia and visual agnosia following occipito-temporal damage. HJA and an age-matched control were tested in a simultaneous matching task which focused on detection of local changes in faces: the inversion of central parts (eyes and mouth) relative to their context (as in the Thatcher illusion). Same-different judgements were made to normal, “thatcherised” and mixed type face pairs. Whole faces (Experiment 1), or face parts (Experiment 2), were presented in upright and inverted orientations. Compared to the control, HJA was severely impaired at matching whole faces, but he improved dramatically when face parts were presented in isolation. This suggests an inhibitory influence of face context on HJAs processing of local parts and a relatively intact ability to process part-based information from a face (when context cannot interfere). Face inversion did not affect HJAs performance. A control experiment (Experiment 3) with non-face stimuli (houses) suggested that the inhibitory influence of context on HJAs performance was restricted to faces. These results indicate that contextual information in a face can have an adverse influence on the processing of local part-based information in prosopagnosia.
Resumo:
Analysing the molecular polymorphism and interactions of DNA, RNA and proteins is of fundamental importance in biology. Predicting functions of polymorphic molecules is important in order to design more effective medicines. Analysing major histocompatibility complex (MHC) polymorphism is important for mate choice, epitope-based vaccine design and transplantation rejection etc. Most of the existing exploratory approaches cannot analyse these datasets because of the large number of molecules with a high number of descriptors per molecule. This thesis develops novel methods for data projection in order to explore high dimensional biological dataset by visualising them in a low-dimensional space. With increasing dimensionality, some existing data visualisation methods such as generative topographic mapping (GTM) become computationally intractable. We propose variants of these methods, where we use log-transformations at certain steps of expectation maximisation (EM) based parameter learning process, to make them tractable for high-dimensional datasets. We demonstrate these proposed variants both for synthetic and electrostatic potential dataset of MHC class-I. We also propose to extend a latent trait model (LTM), suitable for visualising high dimensional discrete data, to simultaneously estimate feature saliency as an integrated part of the parameter learning process of a visualisation model. This LTM variant not only gives better visualisation by modifying the project map based on feature relevance, but also helps users to assess the significance of each feature. Another problem which is not addressed much in the literature is the visualisation of mixed-type data. We propose to combine GTM and LTM in a principled way where appropriate noise models are used for each type of data in order to visualise mixed-type data in a single plot. We call this model a generalised GTM (GGTM). We also propose to extend GGTM model to estimate feature saliencies while training a visualisation model and this is called GGTM with feature saliency (GGTM-FS). We demonstrate effectiveness of these proposed models both for synthetic and real datasets. We evaluate visualisation quality using quality metrics such as distance distortion measure and rank based measures: trustworthiness, continuity, mean relative rank errors with respect to data space and latent space. In cases where the labels are known we also use quality metrics of KL divergence and nearest neighbour classifications error in order to determine the separation between classes. We demonstrate the efficacy of these proposed models both for synthetic and real biological datasets with a main focus on the MHC class-I dataset.
Resumo:
Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.
Resumo:
In multilevel analyses, problems may arise when using Likert-type scales at the lowest level of analysis. Specifically, increases in variance should lead to greater censoring for the groups whose true scores fall at either end of the distribution. The current study used simulation methods to examine the influence of single-item Likert-type scale usage on ICC(1), ICC(2), and group-level correlations. Results revealed substantial underestimation of ICC(1) when using Likert-type scales with common response formats (e.g., 5 points). ICC(2) and group-level correlations were also underestimated, but to a lesser extent. Finally, the magnitude of underestimation was driven in large part to an interaction between Likert-type scale usage and the amounts of within- and between-group variance. © Sage Publications.
Resumo:
Objective: Reduced insulin sensitivity associated with fasting hyperproinsulinaemia is common in type 2 diabetes. Proinsulinaemia is an established independent cardiovascular risk factor. The objective was to investigate fasting and postprandial release of insulin, proinsulin (PI) and 32-33 split proinsulin (SPI) before and after sensitization to insulin with pioglitazone compared to a group treated with glibenclamide. Design and patients: A randomized double-blind placebo-controlled trial. Twenty-two type 2 diabetic patients were recruited along with 10 normal subjects. After 4 weeks washout, patients received a mixed meal and were assigned to receive pioglitazone or glibenclamide for 20 weeks, after which patients received another identical test meal. The treatment regimes were designed to maintain glycaemic control (HbA1c) at pretreatment levels so that ß-cells received an equivalent glycaemic stimulus for both test meals. Measurements: Plasma insulin, PI, SPI and glucose concentrations were measured over an 8-h postprandial period. The output of PI and SPI was measured as the integrated postprandial response (area under the curve, AUC). Results: Pioglitazone treatment resulted in a significant reduction in fasting levels of PI and SPI compared to those of the controls. Postprandially, pioglitazone treatment had no effect on the insulin AUC response to the meal but significantly reduced the PI and SPI AUCs. Glibenclamide increased fasting insulin and the postprandial insulin AUC but had no effect on the PI and SPI AUCs. Conclusions: Sensitization to insulin with pioglitazone reduces the amount of insulin precursor species present in fasting and postprandially and may reduce cardiovascular risk. © 2007 The Authors.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
Clustering of ballooned neurons (BN) and tau positive neurons with inclusion bodies (tau+ neurons) was studied in the upper and lower laminae of the frontal, parietal and temporal cortex in 12 patients with corticobasal degeneration (CBD). In a significant proportion of brain areas examined, BN and tau+ neurons exhibited clustering with a regular distribution of clusters parallel to the pia mater. A regular pattern of clustering of BN and tau+ neurons was observed equally frequently in all cortical areas examined and in the upper and lower laminae. No significant correlations were observed between the cluster sizes of BN or tau+ neurons in the upper compared with the lower cortex or between the cluster sizes of BN and tau+ neurons. The results suggest that BN and tau+ neurons in CBD exhibit the same type of spatial pattern as lesions in Alzheimer's disease, Lewy body dementia and Pick's disease. The regular periodicity of the cerebral cortical lesions is consistent with the degeneration of the cortico-cortical projections in CBD.
Resumo:
The clustering pattern of diffuse, primitive and classic β-amyloid (Aβ) deposits was studied in the upper laminae of the frontal cortex of 9 patients with sporadic Alzheimer's disease (AD). Aβ stained tissue was counterstained with collagen type IV antiserum to determine whether the clusters of Aβ deposits were related to blood vessels. In all patients, Aβ deposits and blood vessels were clustered, with in many patients, a regular periodicity of clusters along the cortex parallel to the pia. The classic Aβ deposit clusters coincided with those of the larger blood vessels in all patients and with clusters of smaller blood vessels in 4 patients. Diffuse deposit clusters were related to blood vessels in 3 patients. Primitive deposit clusters were either unrelated to or negatively correlated with the blood vessels in six patients. Hence, Aβ deposit subtypes differ in their relationship to blood vessels. The data suggest a direct and specific role for the larger blood vessels in the formation of amyloid cores in AD. © 1995.
Resumo:
Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.
Resumo:
This thesis describes the design and implementation of a new dynamic simulator called DASP. It is a computer program package written in standard Fortran 77 for the dynamic analysis and simulation of chemical plants. Its main uses include the investigation of a plant's response to disturbances, the determination of the optimal ranges and sensitivities of controller settings and the simulation of the startup and shutdown of chemical plants. The design and structure of the program and a number of features incorporated into it combine to make DASP an effective tool for dynamic simulation. It is an equation-oriented dynamic simulator but the model equations describing the user's problem are generated from in-built model equation library. A combination of the structuring of the model subroutines, the concept of a unit module, and the use of the connection matrix of the problem given by the user have been exploited to achieve this objective. The Executive program has a structure similar to that of a CSSL-type simulator. DASP solves a system of differential equations coupled to nonlinear algebraic equations using an advanced mixed equation solver. The strategy used in formulating the model equations makes it possible to obtain the steady state solution of the problem using the same model equations. DASP can handle state and time events in an efficient way and this includes the modification of the flowsheet. DASP is highly portable and this has been demonstrated by running it on a number of computers with only trivial modifications. The program runs on a microcomputer with 640 kByte of memory. It is a semi-interactive program, with the bulk of all input data given in pre-prepared data files with communication with the user is via an interactive terminal. Using the features in-built in the package, the user can view or modify the values of any input data, variables and parameters in the model, and modify the structure of the flowsheet of the problem during a simulation session. The program has been demonstrated and verified using a number of example problems.
Resumo:
Background/Aim - People of south Asian origin have an excessive risk of morbidity and mortality from cardiovascular disease. We examined the effect of ethnicity on known risk factors and analysed the risk of cardiovascular events and mortality in UK south Asian and white Europeans patients with type 2 diabetes over a 2 year period. Methods - A total of 1486 south Asian (SA) and 492 white European (WE) subjects with type 2 diabetes were recruited from 25 general practices in Coventry and Birmingham, UK. Baseline data included clinical history, anthropometry and measurements of traditional risk factors – blood pressure, total cholesterol, HbA1c. Multiple linear regression models were used to examine ethnicity differences in individual risk factors. Ten-year cardiovascular risk was estimated using the Framingham and UKPDS equations. All subjects were followed up for 2 years. Cardiovascular events (CVD) and mortality between the two groups were compared. Findings - Significant differences were noted in risk profiles between both groups. After adjustment for clustering and confounding a significant ethnicity effect remained only for higher HbA1c (0.50 [0.22 to 0.77]; P?=?0.0004) and lower HDL (-0.09 [-0.17 to -0.01]; P?=?0.0266). Baseline CVD history was predictive of CVD events during follow-up for SA (P?0.0001) but not WE (P?=?0.189). Mean age at death was 66.8 (11.8) for SA vs. 74.2 (12.1) for WE, a difference of 7.4 years (95% CI 1.0 to 13.7 years), P?=?0.023. The adjusted odds ratio of CVD event or death from CVD was greater but not significantly so in SA than in WE (OR 1.4 [0.9 to 2.2]). Limitations - Fewer events in both groups and short period of follow-up are key limitations. Longer follow-up is required to see if the observed differences between the ethnic groups persist. Conclusion - South Asian patients with type 2 diabetes in the UK have a higher cardiovascular risk and present with cardiovascular events at a significantly younger age than white Europeans. Enhanced and ethnicity specific targets and effective treatments are needed if these inequalities are to be reduced.
Resumo:
The prevalence rates of type2 diabetes mellitus (T2DM) continues to rise among British Pakistanis. The aim of this project was to explore T2DM perceptions and any preventative intentions among British Pakistani women and to discover whether they are doing anything to prevent the onset in themselves and their families. Initially a systematic review was conducted to investigate 20 existing prevention interventions and to assess their effectiveness (n=12,419). Mixed methods approach was adopted and three studies were conducted. The first study consisted of two focus groups with T2DM mothers (n=8) and three focus groups with non-T2DM mothers (n=17). The second study consisted of four focus groups young British Pakistani females (n=11). All focus groups were transcribed verbatim and analysed using thematic analysis. Following these a quantitative study was undertaken comprising of a questionnaire survey; 12 prevention-perception items (derived from the qualitative data) and the Illness-Perception Questionnaire Revised (IPQ-R) using participants from the same populations: T2DM mothers (n=41), non-T2DM mother (n=47) and young women (n=42). Results were analysed using multiple/hierarchical regression. The systematic review highlighted that the most effective prevention programmes focussed on behaviour and lifestyle with a combination of support and education to participants. The research studies demonstrated that T2DM was seen as an older person’s disease to be dealt with if/when it happens. T2DM mothers demonstrated knowledge and prevention understanding. There were non-significant relationships between prevention perceptions and T2DM illness perceptions across all three groups. The finding of this thesis emphasised that lifestyle interventions are crucial to aiding T2DM preventions as a good healthy diet and regular physical activity are the key components to T2DM prevention, and the importance of personal experience in perceived severity and lay-beliefs regarding T2DM and on family/cultural influences in British-Pakistanis. The findings of this project can be used to design culturally specific interventions towards preventing T2DM in the British Pakistani community.
Resumo:
By evolving brands and building on the importance of self-expression, Aaker (1997) developed the brand personality framework as a means to understand brand-consumer relationships. The brand personality framework captures the core values and characteristics described in human personality research in an attempt to humanize brands. Although influential across many streams of brand personality research, the current conceptualization of brand personality only offers a positively-framed approach. To date, no research, both conceptually and empirically, has thoroughly incorporated factors reflective of Negative Brand Personality, despite the fact that almost all researchers in personality are in agreement that factors akin to Extraversion (positive) and Neuroticism (negative) should be in a comprehensive personality scale to accommodate consumers’ expressions. As a result, the study of brand personality is only half complete since the current research trend is to position brand personality under brand image. However, with the brand personality concept being confused with brand identity at the empirical stage, factors reflective of Negative Brand Personality have been neglected. Accordingly, this thesis extends the current conceptualization of brand personality by demarcating the existing typologies of desirable brand personality and incorporating the characteristics reflective of consumers’ discrepant self-meaning to provide a more complete understanding of brand personality. However, it is not enough to interpret negative factors as the absence of positive factors. Negative factors reflect consumers’ anxious and frustrated feelings. Therefore, this thesis contributes to the current conceptualization of brand personality by, firstly, presenting a conceptual definition of Negative Brand Personality in order to provide a theoretical basis for the development of a Negative Brand Personality scale, then, secondly, identifying what constitutes Negative Brand Personality and to what extent consumers’ cognitive dissonance explains the nature of Negative Brand Personality, and, thirdly, ascertaining the impact Negative Brand Personality has on attitudinal constructs, namely: Negative Attitude, Detachment, Brand Loyalty and Satisfaction, which have proven to predict behaviors such as choice and (re-)purchasing. In order to deliver on the three main contributions, two comprehensive studies were conducted to a) develop a valid, parsimonious, yet relatively short measure of Negative Brand Personality, and b) ascertain how the Negative Brand Personality measure behaves within a network of related constructs. The mixed methods approach, grounded in theoretical and empirical development, provides evidence to suggest that there are four factors to Negative Brand Personality and, tested through use of a structural equation modeling technique, that these are influenced by Brand Confusion, Price Unfairness, Self- Incongruence and Corporate Hypocrisy. Negative Brand Personality factors mainly determined Consumers Negative Attitudes and Brand Detachment. The research contributes to the literature on brand personality by improving the consumer-brand relationship by means of engaging in a brandconsumer conversation in order to reduce consumers’ cognitive strain. The study concludes with a discussion on the theoretical and practical implications of the findings, its limitations, and potential directions for future research.
Resumo:
Biological experiments often produce enormous amount of data, which are usually analyzed by data clustering. Cluster analysis refers to statistical methods that are used to assign data with similar properties into several smaller, more meaningful groups. Two commonly used clustering techniques are introduced in the following section: principal component analysis (PCA) and hierarchical clustering. PCA calculates the variance between variables and groups them into a few uncorrelated groups or principal components (PCs) that are orthogonal to each other. Hierarchical clustering is carried out by separating data into many clusters and merging similar clusters together. Here, we use an example of human leukocyte antigen (HLA) supertype classification to demonstrate the usage of the two methods. Two programs, Generating Optimal Linear Partial Least Square Estimations (GOLPE) and Sybyl, are used for PCA and hierarchical clustering, respectively. However, the reader should bear in mind that the methods have been incorporated into other software as well, such as SIMCA, statistiXL, and R.