986 resultados para Missing values


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The fuzzy min–max neural network classifier is a supervised learning method. This classifier takes the hybrid neural networks and fuzzy systems approach. All input variables in the network are required to correspond to continuously valued variables, and this can be a significant constraint in many real-world situations where there are not only quantitative but also categorical data. The usual way of dealing with this type of variables is to replace the categorical by numerical values and treat them as if they were continuously valued. But this method, implicitly defines a possibly unsuitable metric for the categories. A number of different procedures have been proposed to tackle the problem. In this article, we present a new method. The procedure extends the fuzzy min–max neural network input to categorical variables by introducing new fuzzy sets, a new operation, and a new architecture. This provides for greater flexibility and wider application. The proposed method is then applied to missing data imputation in voting intention polls. The micro data—the set of the respondents’ individual answers to the questions—of this type of poll are especially suited for evaluating the method since they include a large number of numerical and categorical attributes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: An estimation of cut-off points for the diagnosis of diabetes mellitus (DM) based on individual risk factors. Methods: A subset of the 1991 Oman National Diabetes Survey is used, including all patients with a 2h post glucose load >= 200 mg/dl (278 subjects) and a control group of 286 subjects. All subjects previously diagnosed as diabetic and all subjects with missing data values were excluded. The data set was analyzed by use of the SPSS Clementine data mining system. Decision Tree Learners (C5 and CART) and a method for mining association rules (the GRI algorithm) are used. The fasting plasma glucose (FPG), age, sex, family history of diabetes and body mass index (BMI) are input risk factors (independent variables), while diabetes onset (the 2h post glucose load >= 200 mg/dl) is the output (dependent variable). All three techniques used were tested by use of crossvalidation (89.8%). Results: Rules produced for diabetes diagnosis are: A- GRI algorithm (1) FPG>=108.9 mg/dl, (2) FPG>=107.1 and age>39.5 years. B- CART decision trees: FPG >=110.7 mg/dl. C- The C5 decision tree learner: (1) FPG>=95.5 and 54, (2) FPG>=106 and 25.2 kg/m2. (3) FPG>=106 and =133 mg/dl. The three techniques produced rules which cover a significant number of cases (82%), with confidence between 74 and 100%. Conclusion: Our approach supports the suggestion that the present cut-off value of fasting plasma glucose (126 mg/dl) for the diagnosis of diabetes mellitus needs revision, and the individual risk factors such as age and BMI should be considered in defining the new cut-off value.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present a new method for ecologically sustainable land use planning within multiple land use schemes. Our aims were (1) to develop a method that can be used to locate important areas based on their ecological values; (2) to evaluate the quality, quantity, availability, and usability of existing ecological data sets; and (3) to demonstrate the use of the method in Eastern Finland, where there are requirements for the simultaneous development of nature conservation, tourism, and recreation. We compiled all available ecological data sets from the study area, complemented the missing data using habitat suitability modeling, calculated the total ecological score (TES) for each 1 ha grid cell in the study area, and finally, demonstrated the use of TES in assessing the success of nature conservation in covering ecologically valuable areas and locating ecologically sustainable areas for tourism and recreational infrastructure. The method operated quite well at the level required for regional and local scale planning. The quality, quantity, availability, and usability of existing data sets were generally high, and they could be further complemented by modeling. There are still constraints that limit the use of the method in practical land use planning. However, as increasing data become available and open access, and modeling tools improve, the usability and applicability of the method will increase.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Field and laboratory measurements identified a complex relationship between odour emission rates provided by the US EPA dynamic emission chamber and the University of New South Wales wind tunnel. Using a range of model compounds in an aqueous odour source, we demonstrate that emission rates derived from the wind tunnel and flux chamber are a function of the solubility of the materials being emitted, the concentrations of the materials within the liquid; and the aerodynamic conditions within the device – either velocity in the wind tunnel, or flushing rate for the flux chamber. The ratio of wind tunnel to flux chamber odour emission rates (OU m-2 s) ranged from about 60:1 to 112:1. The emission rates of the model odorants varied from about 40:1 to over 600:1. These results may provide, for the first time, a basis for the development of a model allowing an odour emission rate derived from either device to be used for odour dispersion modelling.