12 resultados para Missing values structures
em Aston University Research Archive
Resumo:
Exploratory analysis of data in all sciences seeks to find common patterns to gain insights into the structure and distribution of the data. Typically visualisation methods like principal components analysis are used but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this technical report we discuss a complementary approach based on a non-linear probabilistic model. The generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate far more structure than a two dimensional principal components plot could, and deal at the same time with missing data. We show that using the generative topographic mapping provides us with an optimal method to explore the data while being able to replace missing values in a dataset, particularly where a large proportion of the data is missing.
Resumo:
Exploratory analysis of petroleum geochemical data seeks to find common patterns to help distinguish between different source rocks, oils and gases, and to explain their source, maturity and any intra-reservoir alteration. However, at the outset, one is typically faced with (a) a large matrix of samples, each with a range of molecular and isotopic properties, (b) a spatially and temporally unrepresentative sampling pattern, (c) noisy data and (d) often, a large number of missing values. This inhibits analysis using conventional statistical methods. Typically, visualisation methods like principal components analysis are used, but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this paper we introduce a complementary approach based on a non-linear probabilistic model. Generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, while also dealing with missing data. We show how using generative topographic mapping also provides an optimal method with which to replace missing values in two geochemical datasets, particularly where a large proportion of the data is missing.
Resumo:
One of the main challenges of classifying clinical data is determining how to handle missing features. Most research favours imputing of missing values or neglecting records that include missing data, both of which can degrade accuracy when missing values exceed a certain level. In this research we propose a methodology to handle data sets with a large percentage of missing values and with high variability in which particular data are missing. Feature selection is effected by picking variables sequentially in order of maximum correlation with the dependent variable and minimum correlation with variables already selected. Classification models are generated individually for each test case based on its particular feature set and the matching data values available in the training population. The method was applied to real patients' anonymous mental-health data where the task was to predict the suicide risk judgement clinicians would give for each patient's data, with eleven possible outcome classes: zero to ten, representing no risk to maximum risk. The results compare favourably with alternative methods and have the advantage of ensuring explanations of risk are based only on the data given, not imputed data. This is important for clinical decision support systems using human expertise for modelling and explaining predictions.
Resumo:
We analyse how the Generative Topographic Mapping (GTM) can be modified to cope with missing values in the training data. Our approach is based on an Expectation -Maximisation (EM) method which estimates the parameters of the mixture components and at the same time deals with the missing values. We incorporate this algorithm into a hierarchical GTM. We verify the method on a toy data set (using a single GTM) and a realistic data set (using a hierarchical GTM). The results show our algorithm can help to construct informative visualisation plots, even when some of the training points are corrupted with missing values.
Resumo:
Heterogeneous and incomplete datasets are common in many real-world visualisation applications. The probabilistic nature of the Generative Topographic Mapping (GTM), which was originally developed for complete continuous data, can be extended to model heterogeneous (i.e. containing both continuous and discrete values) and missing data. This paper describes and assesses the resulting model on both synthetic and real-world heterogeneous data with missing values.
Resumo:
Background: The aim of this study was to describe bilateral visual outcomes and the effect of incomplete follow-up after 3 years of ranibizumab therapy for neovascular age-related macular degeneration. Secondarily, the demands on service provision over a 3-year period were described. Methods: Data on visual acuity, hospital visits, and injections were collected over 36 months on consecutive patients commencing treatment over a 9-month period. Visual outcome was determined for 1) all patients, using last observation carried forward for missed visits due to early discontinuation and 2) only those patients completing full 36-month follow-up. Results: Over 3 years, 120 patients cumulatively attended hospital for 1,823 noninjection visits and 1,365 injection visits. A visual acuity loss of <15 letters (L) was experienced by 78.2% of patients. For all patients (n=120), there was a mean loss of 1.68 L using last observation carried forward for missing values. Excluding five patients who died and 30 who discontinued follow-up, mean gain was 1.47 L. In bilateral cases, final acuity was on average 9 L better in second eyes compared to first eyes. Also, 91% of better-seeing eyes continued to be the better-seeing eye. Conclusion: We have demonstrated our approach to describing the long-term service provision and visual outcomes of ranibizumab therapy for neovascular age-related macular degeneration in a consecutive cohort of patients. Although there was a heavy burden with very frequent injections and clinic visits, patients can expect a good level of visual stability and a very high chance of maintaining their better-seeing eye for up to 3 years. © 2014 Chavan et al. This work is published by Dove Medical Press Limited.
Resumo:
We use molecular dynamics simulations to compare the conformational structure and dynamics of a 21-base pair RNA sequence initially constructed according to the canonical A-RNA and A'-RNA forms in the presence of counterions and explicit water. Our study aims to add a dynamical perspective to the solid-state structural information that has been derived from X-ray data for these two characteristic forms of RNA. Analysis of the three main structural descriptors commonly used to differentiate between the two forms of RNA namely major groove width, inclination and the number of base pairs in a helical twist over a 30 ns simulation period reveals a flexible structure in aqueous solution with fluctuations in the values of these structural parameters encompassing the range between the two crystal forms and more. This provides evidence to suggest that the identification of distinct A-RNA and A'-RNA structures, while relevant in the crystalline form, may not be generally relevant in the context of RNA in the aqueous phase. The apparent structural flexibility observed in our simulations is likely to bear ramifications for the interactions of RNA with biological molecules (e.g. proteins) and non-biological molecules (e.g. non-viral gene delivery vectors). © CSIRO 2009.
Resumo:
A series of ethylene propylene terpolymer vulcanizates, prepared by varying termonomer type, cure system, cure time and cure temperature, are characterized by determining the number and type of cross-links present. The termonomers used represent the types currently available in commercial quantities. Characterization is carried out by measuring the C1 constant of the Mooney Rivlin Saunders equation before and after treatment with the chemical probes propane-2-thiol/piperidine and n-hexane thiol/piperidine, thus making it possible to calculate the relative proportions of mono-sulphidic, di-sulphidic and poly- sulphidic cross-links. The cure systems used included both sulphur and peroxide formulations. Specific physical properties are determined for each network and an attempt is made to correlate observed changes in these with variations in network structure. A survey of the economics of each formulation based on a calculated efficiency parameter for each cure system is included. Values of C1 are calculated from compression modulus data after the reliability of the technique when used with ethylene propylene terpolymers had been established. This is carried out by comparing values from both compression and extension stress strain measurements for natural rubber vulcanizates and by assessing the effects of sample dimensions and the degree of swelling. The technique of compression modulus is much more widely applicable than previously thought. The basic structure of an ethylene propylene terpolymer network appears to be independent of the type of cure system used ( sulphur based systems only), the proportions of constituent cross-links being nearly constant.
Resumo:
This thesis reports a cross-national study carried out in England and India in an attempt to clarify the association of certain cultural and non-cultural characteristics with people's work-related attitudes and values, and with the structure of their work organizations. Three perspectives are considered to be relevant to the objectives of the study. The contingency perspective suggests that a 'fit' between an organization's context and its structural arrangements will be fundamentally necessary for achieving success and survival. The political economy perspective argues for the determining role of the social and economic structures within which the organization operates. The culturalist perspective looks to cultural attitudes and values of organizational members for an explanation for their organization's structure. The empirical investigation was carried out in three stages in each of the two countries involved by means of surveys of cultural attitudes, work-related attitudes and organizational structures and systems. The cultural surveys suggested that Indian and English people were different from one another with regard to fear of, and respect and obedience to, their seniors, ability to cope with ambiguity, honesty, independence, expression of emotions, fatalism, reserve, and care for others; they were similar with regard to tolerance, friendliness, attitude to change, attitude to law, self-control and self-confidence, and attitude to social differentiation. The second stage of the study, involving the employees of fourteen organizations, found that the English ones perceived themselves to have more power at work, expressed more tolerance for ambiguity, and had different expectations from their job than did the Indian equivalents. The two samples were similar with respect to commitment to their company and trust in their colleagues. The findings also suggested that employees' occupations, education and age had some influences on their work-related attitudes. The final stage of the research was a study of structures, control systems, and reward and punishment policies of the same fourteen organizations which were matched almost completely on their contextual factors across the two countries. English and Indian organizations were found to be similar in terms of centralization, specialization, chief executive's span of control, height and management control strategies. English organizations, however, were far more formalized, spent more time on consultation and their managers delegated authority lower down the hierarchy than Indian organizations. The major finding of the study was the multiple association that cultural, national and contingency factors had with the structural characteristics of the organizations and with the work-related attitudes of their members. On the basis of this finding, a multi-perspective model for understanding organizational structures and systems is proposed in which the contributions made by contingency, political economy and cultural perspectives are recognized and incorporated.
Resumo:
Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.
Resumo:
This dissertation investigates the very important and current problem of modelling human expertise. This is an apparent issue in any computer system emulating human decision making. It is prominent in Clinical Decision Support Systems (CDSS) due to the complexity of the induction process and the vast number of parameters in most cases. Other issues such as human error and missing or incomplete data present further challenges. In this thesis, the Galatean Risk Screening Tool (GRiST) is used as an example of modelling clinical expertise and parameter elicitation. The tool is a mental health clinical record management system with a top layer of decision support capabilities. It is currently being deployed by several NHS mental health trusts across the UK. The aim of the research is to investigate the problem of parameter elicitation by inducing them from real clinical data rather than from the human experts who provided the decision model. The induced parameters provide an insight into both the data relationships and how experts make decisions themselves. The outcomes help further understand human decision making and, in particular, help GRiST provide more accurate emulations of risk judgements. Although the algorithms and methods presented in this dissertation are applied to GRiST, they can be adopted for other human knowledge engineering domains.
Resumo:
Threshold stress intensity values, ranging from ∼6 to 16 MN m −3/2 can be obtained in powder-formed Nimonic AP1 by changing the microstructure. The threshold and low crack growth rate behaviour at room temperature of a number of widely differing API microstructures, with both ‘necklace’ and fully recrystallized grain structures of various sizes and uniform and bimodal γ′-distributions, have been investigated. The results indicate that grain size is an important microstructural parameter which can control threshold behaviour, with the value of threshold stress intensity increasing with increasing grain size, but that the γ′-distribution is also important. In this Ni-base alloy, as in many others, near threshold fatigue crack growth occurs in a crystallographic manner along {111} planes. This is due to the development of a dislocation structure involving persistent slip bands on {111} planes in the plastic zone, caused by the presence of ordered shearable precipitates in the microstructure. However, as the stress intensity range is increased, a striated growth mode takes over. The results presented show that this transition from faceted to striated growth is associated with a sudden increase in crack propagation rate and occurs when the size of the reverse plastic zone at the crack tip becomes equal to the grain size, independent of any other microstructural variables.