32 resultados para data analysis: algorithms and implementation

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: Recently, much research has been proposed using nature inspired algorithms to perform complex machine learning tasks. Ant colony optimization (ACO) is one such algorithm based on swarm intelligence and is derived from a model inspired by the collective foraging behavior of ants. Taking advantage of the ACO in traits such as self-organization and robustness, this paper investigates ant-based algorithms for gene expression data clustering and associative classification. Methods and material: An ant-based clustering (Ant-C) and an ant-based association rule mining (Ant-ARM) algorithms are proposed for gene expression data analysis. The proposed algorithms make use of the natural behavior of ants such as cooperation and adaptation to allow for a flexible robust search for a good candidate solution. Results: Ant-C has been tested on the three datasets selected from the Stanford Genomic Resource Database and achieved relatively high accuracy compared to other classical clustering methods. Ant-ARM has been tested on the acute lymphoblastic leukemia (ALL)/acute myeloid leukemia (AML) dataset and generated about 30 classification rules with high accuracy. Conclusions: Ant-C can generate optimal number of clusters without incorporating any other algorithms such as K-means or agglomerative hierarchical clustering. For associative classification, while a few of the well-known algorithms such as Apriori, FP-growth and Magnum Opus are unable to mine any association rules from the ALL/AML dataset within a reasonable period of time, Ant-ARM is able to extract associative classification rules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of quantitative methods has become increasingly important in the study of neurodegenerative disease. Disorders such as Alzheimer's disease (AD) are characterized by the formation of discrete, microscopic, pathological lesions which play an important role in pathological diagnosis. This article reviews the advantages and limitations of the different methods of quantifying the abundance of pathological lesions in histological sections, including estimates of density, frequency, coverage, and the use of semiquantitative scores. The major sampling methods by which these quantitative measures can be obtained from histological sections, including plot or quadrat sampling, transect sampling, and point-quarter sampling, are also described. In addition, the data analysis methods commonly used to analyse quantitative data in neuropathology, including analyses of variance (ANOVA) and principal components analysis (PCA), are discussed. These methods are illustrated with reference to particular problems in the pathological diagnosis of AD and dementia with Lewy bodies (DLB).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reports on the results of research into the connections between transaction attributes and buyer-supplier relationships (BSRs) in advanced manufacturing technology (AMT) acquisition and implementation. The investigation began by examining the impact of the different patterns of BSR on the performance of the AMT acquisition. In understanding the phenomena, the study drew upon and integrated the literature of transaction cost economics theory, BSRs, and AMT, and used this as the basis for a theoretical framework and hypotheses development. This framework was then empirically tested using data that were gathered through a questionnaire survey with 147 companies and analyzed using a structural equation modeling technique. The results of the analysis indicated that the higher the level of technological specificity and uncertainty, the more firms are likely to engage in a stronger relationship with technology suppliers. However, the complexity of the technology being implemented was associated with BSR only indirectly through its association with the level of uncertainty (which has a direct impact upon BSR). The analysis also provided strong support for the premise that developing strong BSR could lead to an improved performance in acquiring and implementing AMT. The implications of the study are offered for both the academic and practitioner audience.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In some studies, the data are not measurements but comprise counts or frequencies of particular events. In such cases, an investigator may be interested in whether one specific event happens more frequently than another or whether an event occurs with a frequency predicted by a scientific model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PCA/FA is a method of analyzing complex data sets in which there are no clearly defined X or Y variables. It has multiple uses including the study of the pattern of variation between individual entities such as patients with particular disorders and the detailed study of descriptive variables. In most applications, variables are related to a smaller number of ‘factors’ or PCs that account for the maximum variance in the data and hence, may explain important trends among the variables. An increasingly important application of the method is in the ‘validation’ of questionnaires that attempt to relate subjective aspects of a patients experience with more objective measures of vision.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis reports the results of research into the connections between transaction attributes and buyer-supplier relationships (BSR) in advanced manufacturing technology (AMT) acquisitions and implementation. It also examines the impact of the different patterns of BSR on performance. Specifically, it addresses the issues of how the three transaction attributes; namely level of complexity, level of asset specificity, and level of uncertainty, can affect the relationships between the technology buyer and suppler in AMT acquisition and implementation, and then to see the impact of different patterns of BSR on the two aspect of performance; namely technology and implementation performance. In understanding the pohenomena, the study mainly draws on and integrates the literature of transaction cost economics theory,buyer-supplier relationships and advanced manufacturing technology as a basis of theoretical framework and hypotheses development.data were gathered through a questionnaire survey with 147 responses and seven semi-structured interviews of manufacturing firms in Malaysia. Quantitative data were analysed mainly using the AMOS (Analysis of Moment Structure) package for structural equation modeling and SPSS (Statistical Package for Social Science) for analysis of variance (ANOVA). Data from interview sessions were used to develop a case study with the intention of providing a richer and deeper understanding on the subject under investigation and to offer triangulation in the research process. he results of the questionnaire survey indicate that the higher the level of technological specificity and uncertainty, the more firms are likely to engage in a closer relationship with technology suppliers.However, the complexity of the technology being implemented is associated with BSR only because it is associated with the level of uncertainty that has direct impact upon BSR.The analysis also provides strong support for the premise that developing strong BSR could lead to an improved performance. However, with high levels of transaction attribute, implementation performance suffers more when firms have weak relationships with technology suppliers than with moderate and low levels of transaction attributes. The implications of the study are offered for both the academic and practitioner audience. The thesis closes with reports on its limitations and suggestions for further research that would address some of these limitations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

INTAMAP is a Web Processing Service for the automatic spatial interpolation of measured point data. Requirements were (i) using open standards for spatial data such as developed in the context of the Open Geospatial Consortium (OGC), (ii) using a suitable environment for statistical modelling and computation, and (iii) producing an integrated, open source solution. The system couples an open-source Web Processing Service (developed by 52°North), accepting data in the form of standardised XML documents (conforming to the OGC Observations and Measurements standard) with a computing back-end realised in the R statistical environment. The probability distribution of interpolation errors is encoded with UncertML, a markup language designed to encode uncertain data. Automatic interpolation needs to be useful for a wide range of applications and the algorithms have been designed to cope with anisotropy, extreme values, and data with known error distributions. Besides a fully automatic mode, the system can be used with different levels of user control over the interpolation process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce a flexible visual data mining framework which combines advanced projection algorithms from the machine learning domain and visual techniques developed in the information visualization domain. The advantage of such an interface is that the user is directly involved in the data mining process. We integrate principled projection algorithms, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates and billboarding, to provide a visual data mining framework. Results on a real-life chemoinformatics dataset using GTM are promising and have been analytically compared with the results from the traditional projection methods. It is also shown that the HGTM algorithm provides additional value for large datasets. The computational complexity of these algorithms is discussed to demonstrate their suitability for the visual data mining framework. Copyright 2006 ACM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A recent novel approach to the visualisation and analysis of datasets, and one which is particularly applicable to those of a high dimension, is discussed in the context of real applications. A feed-forward neural network is utilised to effect a topographic, structure-preserving, dimension-reducing transformation of the data, with an additional facility to incorporate different degrees of associated subjective information. The properties of this transformation are illustrated on synthetic and real datasets, including the 1992 UK Research Assessment Exercise for funding in higher education. The method is compared and contrasted to established techniques for feature extraction, and related to topographic mappings, the Sammon projection and the statistical field of multidimensional scaling.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

DUE TO INCOMPLETE PAPERWORK, ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper surveys the context of feature extraction by neural network approaches, and compares and contrasts their behaviour as prospective data visualisation tools in a real world problem. We also introduce and discuss a hybrid approach which allows us to control the degree of discriminatory and topographic information in the extracted feature space.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper explores how transaction attributes of technology affect differences in the relationship between technology buyers and suppliers. It also examines the impact on performance of different patterns of relationship between technology buyers and suppliers. Data obtained from 147 manufacturing firms in Malaysia are used to test several hypotheses, which were derived from a review of the literature on technology, transaction cost theory and buyer–supplier relationships (BSR). The research results indicate that the higher the level of technological complexity, specificity and uncertainty, the more firms are likely to engage in a closer relationship with technology suppliers. Even though the majority of firms reported improvements in their performance, results indicate that firms demonstrating a closer relationship with technology suppliers are more likely to achieve higher levels of performance than those that do not. It is also shown that with high levels of transaction attribute, implementation performance suffers more when firms have weak relationships with technology suppliers than with moderate and low levels of transaction attribute.