942 resultados para categorical and mix datasets
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
OBJECTIVE: To describe the distribution of edentulism and estimate the prevalence of functional dentition and shortened dental arch among elderly population. METHODS: A population-based epidemiological study was carried out with a sample of 5,349 respondents aged 65 to 74 years obtained from the 2002 and 2003 Brazilian Ministry of Health/Division of Oral Health survey database. The following variables were studied: gender; macroregion of residence; missing teeth; percentage that met the World Health Organization goal for oral health in the age group 65 to 74 years (50% having at least 20 natural teeth); presence of shortened dental arch; number of posterior occluding pairs of teeth. The Chi-square test assessed the association between categorical variables. The Kruskal-Wallis and Mann-Whitney tests were used to assess differences of mean between number of posterior occluding pairs teeth, macro-region and gender. RESULTS: The elderly population had an average of 5.49 teeth (SD: 7.93) with a median of 0. The proportion of completely edentulous respondents was 54.7%. Complete edentulism was 18.2% in the upper arch and 1.9% in the lower arch. The World Health Organization goal was achieved in 10% of all respondents studied. However, only 2.7% had acceptable masticatory function and aesthetics (having at least shortened dental arch) and a mean number of posterior occluding pairs of 6.94 (SD=2.97). There were significant differences of the percentage of respondents that met the World Health Organization goal and presence of shortened dental arch between men and women. There were differences in shortened dental arch between macroregions. CONCLUSIONS: The Brazilian epidemiological oral health survey showed high rate of edentulism and low rate of shortened dental arch in the elderly population studied, thus suggesting significant functional and aesthetic impairment in all Brazilian macroregions especially among women.
Resumo:
In this study, efforts were made in order to put forward an integrated recycling approach for the thermoset based glass fibre reinforced polymer (GPRP) rejects derived from the pultrusion manufacturing industry. Both the recycling process and the development of a new cost-effective end-use application for the recyclates were considered. For this purpose, i) among the several available recycling techniques for thermoset based composite materials, the most suitable one for the envisaged application was selected (mechanical recycling); and ii) an experimental work was carried out in order to assess the added-value of the obtained recyclates as aggregates and reinforcement replacements into concrete-polymer composite materials. Potential recycling solution was assessed by mechanical behaviour of resultant GFRP waste modified concrete-polymer composites with regard to unmodified materials. In the mix design process of the new GFRP waste based composite material, the recyclate content and size grade, and the effect of the incorporation of an adhesion promoter were considered as material factors and systematically tested between reasonable ranges. The optimization process of the modified formulations was supported by the Fuzzy Boolean Nets methodology, which allowed finding the best balance between material parameters that maximizes both flexural and compressive strengths of final composite. Comparing to related end-use applications of GFRP wastes in cementitious based concrete materials, the proposed solution overcome some of the problems found, namely the possible incompatibilities arisen from alkalis-silica reaction and the decrease in the mechanical properties due to high water-cement ratio required to achieve the desirable workability. Obtained results were very promising towards a global cost-effective waste management solution for GFRP industrial wastes and end-of-life products that will lead to a more sustainable composite materials industry.
Resumo:
In this study, the effect of incorporation of recycled glass fibre reinforced plastics (GFRP) waste materials, obtained by means of shredding and milling processes, on mechanical behaviour of polyester polymer mortars (PM) was assessed. For this purpose, different contents of GFRP recyclates, between 4% up to 12% in weight, were incorporated into polyester PM materials as sand aggregates and filler replacements. The effect of the addition of a silane coupling agent to resin binder was also evaluated. Applied waste material was proceeding from the shredding of the leftovers resultant from the cutting and assembly processes of GFRP pultrusion profiles. Currently, these leftovers as well as non-conform products and scrap resulting from pultrusion manufacturing process are landfilled, with additional costs to producers and suppliers. Hence, besides the evident environmental benefits, a viable and feasible solution for these wastes would also conduct to significant economic advantages. Design of experiments and data treatment were accomplish by means of full factorial design approach and analysis of variance ANOVA. Experimental results were promising toward the recyclability of GFRP waste materials as partial replacement of aggregates and reinforcement for PM materials, with significant improvements on mechanical properties of resultant mortars with regards to waste-free formulations.
Resumo:
The purpose of this article is to analyse and evaluate the economical, energetic and environmental impacts of the increasing penetration of renewable energies and electrical vehicles in isolated systems, such as Terceira Island in Azores and Madeira Island. Given the fact that the islands are extremely dependent on the importation of fossil fuels - not only for the production of energy, but also for the transportation’s sector – it’s intended to analyse how it is possible to reduce that dependency and determine the resultant reduction of pollutant gas emissions. Different settings have been analysed - with and without the penetration of EVs. The Terceira Island is an interesting case study, where EVs charging during off-peak hours could allow an increase in geothermal power, limited by the valley of power demand. The percentage of renewable energy in the electric power mix could reach the 74% in 2030 while at the same time, it is possible to reduce the emissions of pollutant gases in 45% and the purchase of fossil fuels in 44%. In Madeira, apart from wind, solar and small hydro power, there are not so many endogenous resources and the Island’s emission factor cannot be so reduced as in Terceira. Although, it is possible to reduce fossil fuels imports and emissions in 1.8% in 2030 when compared with a BAU scenario with a 14% of the LD fleet composed by EVs.
Resumo:
The purpose of this article is to analyse and evaluate the economical, energetic and environmental impacts of the increasing penetration of renewable energies and electrical vehicles in isolated systems, such as Terceira Island in Azores and Madeira Island. Given the fact that the islands are extremely dependent on the importation of fossil fuels - not only for the production of energy, but also for the transportation’s sector – it’s intended to analyse how it is possible to reduce that dependency and determine the resultant reduction of pollutant gas emissions. Different settings have been analysed - with and without the penetration of EVs. The Terceira Island is an interesting case study, where EVs charging during off-peak hours could allow an increase in geothermal power, limited by the valley of power demand. The percentage of renewable energy in the electric power mix could reach the 74% in 2030 while at the same time, it is possible to reduce the emissions of pollutant gases in 45% and the purchase of fossil fuels in 44%. In Madeira, apart from wind, solar and small hydro power, there are not so many endogenous resources and the Island’s emission factor cannot be so reduced as in Terceira. Although, it is possible to reduce fossil fuels imports and emissions in 1.8% in 2030 when compared with a BAU scenario with a 14% of the LD fleet composed by EVs.
Resumo:
To estimate the mid-point of an open-ended income category and to assess the impact of two equivalence scales on income-health associations. Data were obtained from the 2010 Brazilian Oral Health Survey ( Pesquisa Nacional de Saúde Bucal – SBBrasil 2010). Income was converted from categorical to two continuous variables ( per capita and equivalized) for each mid-point. The median mid-point was R$ 14,523.50 and the mean, R$ 24,507.10. When per capita income was applied, 53% of the population were below the poverty line, compared with 15% with equivalized income. The magnitude of income-health associations was similar for continuous income, but categorized equivalized income tended to decrease the strength of association.
Resumo:
The Iberian viticultural regions are convened according to the Denomination of Origin (DO) and present different climates, soils, topography and management practices. All these elements influence the vegetative growth of different varieties throughout the peninsula, and are tied to grape quality and wine type. In the current study, an integrated analysis of climate, soil, topography and vegetative growth was performed for the Iberian DO regions, using state-of-the-art datasets. For climatic assessment, a categorized index, accounting for phenological/thermal development, water availability and grape ripening conditions was computed. Soil textural classes were established to distinguish soil types. Elevation and aspect (orientation) were also taken into account, as the leading topographic elements. A spectral vegetation index was used to assess grapevine vegetative growth and an integrated analysis of all variables was performed. The results showed that the integrated climate-soil-topography influence on vine performance is evident. Most Iberian vineyards are grown in temperate dry climates with loamy soils, presenting low vegetative growth. Vineyards in temperate humid conditions tend to show higher vegetative growth. Conversely, in cooler/warmer climates, lower vigour vineyards prevail and other factors, such as soil type and precipitation acquire more important roles in driving vigour. Vines in prevailing loamy soils are grown over a wide climatic diversity, suggesting that precipitation is the primary factor influencing vigour. The present assessment of terroir characteristics allows direct comparison among wine regions and may have great value to viticulturists, particularly under a changing climate.
Resumo:
OBJECTIVE To analyze the association between sleep quality and quality of life of nursing professionals according to their work schedules.METHODS A prospective, cross-sectional, observational study was conducted between January and December 2010, with 264 nursing professionals, drawn from 989 subjects at Botucatu General Hospital and stratified by professional category. The Pittsburg Sleep Quality Index and the WHOQOL-bref were administered to evaluate sleep quality and quality of life, respectively. Self-reported demographic data were collected with a standard form. Continuous variables were reported as means and standard deviations, and categorical variables were expressed as proportions. Associations were evaluated using Spearman’s correlation coefficient. The association of night-shift work and gender with sleep disturbance was evaluated by logistic regression analysis using a model adjusted for age and considering sleep disturbance the dependent variable. The level of significance was p < 0.05.RESULTS Night-shift work was associated with severe worsening of at least one component of sleep quality in the model adjusted for age (OR = 1.91; 95%CI 1.04;3.50; p = 0.036). Female gender was associated with sleep disturbance (OR = 3.40; 95%CI 1.37;8.40; p = 0.008). Quality of life and quality of sleep were closely correlated (R = -0.56; p < 0.001).CONCLUSIONS Characteristics of the nursing profession affect sleep quality and quality of life, and these two variables are associated.
Resumo:
OBJECTIVE To analyze if dietary patterns during the third gestational trimester are associated with birth weight.METHODS Longitudinal study conducted in the cities of Petropolis and Queimados, Rio de Janeiro (RJ), Southeastern Brazil, between 2007 and 2008. We analyzed data from the first and second follow-up wave of a prospective cohort. Food consumption of 1,298 pregnant women was assessed using a semi-quantitative questionnaire about food frequency. Dietary patterns were obtained by exploratory factor analysis, using the Varimax rotation method. We also applied the multivariate linear regression model to estimate the association between food consumption patterns and birth weight.RESULTS Four patterns of consumption – which explain 36.4% of the variability – were identified and divided as follows: (1) prudent pattern (milk, yogurt, cheese, fruit and fresh-fruit juice, cracker, and chicken/beef/fish/liver), which explained 14.9% of the consumption; (2) traditional pattern, consisting of beans, rice, vegetables, breads, butter/margarine and sugar, which explained 8.8% of the variation in consumption; (3) Western pattern (potato/cassava/yams, macaroni, flour/farofa/grits, pizza/hamburger/deep fried pastries, soft drinks/cool drinks and pork/sausages/egg), which accounts for 6.9% of the variance; and (4) snack pattern (sandwich cookie, salty snacks, chocolate, and chocolate drink mix), which explains 5.7% of the consumption variability. The snack dietary pattern was positively associated with birth weight (β = 56.64; p = 0.04) in pregnant adolescents.CONCLUSIONS For pregnant adolescents, the greater the adherence to snack pattern during pregnancy, the greater the baby’s birth weight.
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion. © 2014 Springer-Verlag Berlin Heidelberg.
Resumo:
Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies