919 resultados para Multi-Factor ModeI, Missing Data
Resumo:
The fuzzy min–max neural network classifier is a supervised learning method. This classifier takes the hybrid neural networks and fuzzy systems approach. All input variables in the network are required to correspond to continuously valued variables, and this can be a significant constraint in many real-world situations where there are not only quantitative but also categorical data. The usual way of dealing with this type of variables is to replace the categorical by numerical values and treat them as if they were continuously valued. But this method, implicitly defines a possibly unsuitable metric for the categories. A number of different procedures have been proposed to tackle the problem. In this article, we present a new method. The procedure extends the fuzzy min–max neural network input to categorical variables by introducing new fuzzy sets, a new operation, and a new architecture. This provides for greater flexibility and wider application. The proposed method is then applied to missing data imputation in voting intention polls. The micro data—the set of the respondents’ individual answers to the questions—of this type of poll are especially suited for evaluating the method since they include a large number of numerical and categorical attributes.
Resumo:
This paper proposes a new methodology for object based 2-D data fu- sion, with a multiscale character. This methodology is intended to be use in agriculture, specifically in the characterization of the water status of different crops, so as to have an appropriate water management at a farm-holding scale. As a first approach to its evaluation, vegetation cover vigor data has been integrated with texture data. For this purpose, NDVI maps have been calculated using a multispectral image and Lacunarity maps from the panchromatic image. Preliminary results show this methodology is viable in the integration and management of large volumes of data, which characterize the behavior of agricultural covers at farm-holding scale.
Resumo:
There are many situations where input feature vectors are incomplete and methods to tackle the problem have been studied for a long time. A commonly used procedure is to replace each missing value with an imputation. This paper presents a method to perform categorical missing data imputation from numerical and categorical variables. The imputations are based on Simpson’s fuzzy min-max neural networks where the input variables for learning and classification are just numerical. The proposed method extends the input to categorical variables by introducing new fuzzy sets, a new operation and a new architecture. The procedure is tested and compared with others using opinion poll data.
Resumo:
Multiple frequency bio-electrical impedance analysis (MFBIA) may be useful for monitoring fluid balance in newborn infants or to provide early prediction of the outcome following perinatal asphyxia. A reference range of data is needed for identification of babies with abnormal impedance values. This was a cross-sectional observational study in 84 term and near-term healthy neonates less than 12 h postpartum. Whole body and cerebral MFBIA measurements were performed at the bedside in the post-natal ward. Gestational age, post-natal age, gender, birthweight, head circumference and foot length measures were recorded. Reference values for impedance at the characteristic frequency (Z(C)) and resistance at zero frequency (R-0) are reported for whole body and cerebral impedance. Significant correlations (p < 0.05) were observed between whole body impedance and birthweight, footlength and head circumference. Females had a significantly higher whole body R0 than males. Cerebral impedance did not correlate significantly with any of the demographic measures and therewere no gender differences observed for cerebral impedance. The reference range for whole body multi-frequency bio-impedance values in term and near-term infants within the first 12 h postpartum can be calculated from the footlength (FL) using the following equations: Z(C) = (942.9 - 4.818* FL) +/- 124.6 Omega; R-0 = (1042 - 4.520(*)FL) +/- 135.5 Omega. For cerebral impedance the reference range is 29.5-48.7 Omega for Z(C) and 33.7-58.0 Omega for R-0.
Resumo:
Exploratory analysis of data in all sciences seeks to find common patterns to gain insights into the structure and distribution of the data. Typically visualisation methods like principal components analysis are used but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this technical report we discuss a complementary approach based on a non-linear probabilistic model. The generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate far more structure than a two dimensional principal components plot could, and deal at the same time with missing data. We show that using the generative topographic mapping provides us with an optimal method to explore the data while being able to replace missing values in a dataset, particularly where a large proportion of the data is missing.
Resumo:
Exploratory analysis of data seeks to find common patterns to gain insights into the structure and distribution of the data. In geochemistry it is a valuable means to gain insights into the complicated processes making up a petroleum system. Typically linear visualisation methods like principal components analysis, linked plots, or brushing are used. These methods can not directly be employed when dealing with missing data and they struggle to capture global non-linear structures in the data, however they can do so locally. This thesis discusses a complementary approach based on a non-linear probabilistic model. The generative topographic mapping (GTM) enables the visualisation of the effects of very many variables on a single plot, which is able to incorporate more structure than a two dimensional principal components plot. The model can deal with uncertainty, missing data and allows for the exploration of the non-linear structure in the data. In this thesis a novel approach to initialise the GTM with arbitrary projections is developed. This makes it possible to combine GTM with algorithms like Isomap and fit complex non-linear structure like the Swiss-roll. Another novel extension is the incorporation of prior knowledge about the structure of the covariance matrix. This extension greatly enhances the modelling capabilities of the algorithm resulting in better fit to the data and better imputation capabilities for missing data. Additionally an extensive benchmark study of the missing data imputation capabilities of GTM is performed. Further a novel approach, based on missing data, will be introduced to benchmark the fit of probabilistic visualisation algorithms on unlabelled data. Finally the work is complemented by evaluating the algorithms on real-life datasets from geochemical projects.
Resumo:
Exploratory analysis of petroleum geochemical data seeks to find common patterns to help distinguish between different source rocks, oils and gases, and to explain their source, maturity and any intra-reservoir alteration. However, at the outset, one is typically faced with (a) a large matrix of samples, each with a range of molecular and isotopic properties, (b) a spatially and temporally unrepresentative sampling pattern, (c) noisy data and (d) often, a large number of missing values. This inhibits analysis using conventional statistical methods. Typically, visualisation methods like principal components analysis are used, but these methods are not easily able to deal with missing data nor can they capture non-linear structure in the data. One approach to discovering complex, non-linear structure in the data is through the use of linked plots, or brushing, while ignoring the missing data. In this paper we introduce a complementary approach based on a non-linear probabilistic model. Generative topographic mapping enables the visualisation of the effects of very many variables on a single plot, while also dealing with missing data. We show how using generative topographic mapping also provides an optimal method with which to replace missing values in two geochemical datasets, particularly where a large proportion of the data is missing.
Resumo:
Investment in capacity expansion remains one of the most critical decisions for a manufacturing organisation with global production facilities. Multiple factors need to be considered making the decision process very complex. The purpose of this paper is to establish the state-of-the-art in multi-factor models for capacity expansion of manufacturing plants within a corporation. The research programme consisting of an extensive literature review and a structured assessment of the strengths and weaknesses of the current research is presented. The study found that there is a wealth of mathematical multi-factor models for evaluating capacity expansion decisions however no single contribution captures all the different facets of the problem.