29 resultados para k-means clustering

em Repositório Científico do Instituto Politécnico de Lisboa - Portugal


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the present paper we compare clustering solutions using indices of paired agreement. We propose a new method - IADJUST - to correct indices of paired agreement, excluding agreement by chance. This new method overcomes previous limitations known in the literature as it permits the correction of any index. We illustrate its use in external clustering validation, to measure the accordance between clusters and an a priori known structure. The adjusted indices are intended to provide a realistic measure of clustering performance that excludes agreement by chance with ground truth. We use simulated data sets, under a range of scenarios - considering diverse numbers of clusters, clusters overlaps and balances - to discuss the pertinence and the precision of our proposal. Precision is established based on comparisons with the analytical approach for correction specific indices that can be corrected in this way are used for this purpose. The pertinence of the proposed correction is discussed when making a detailed comparison between the performance of two classical clustering approaches, namely Expectation-Maximization (EM) and K-Means (KM) algorithms. Eight indices of paired agreement are studied and new corrected indices are obtained.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Audiometer systems provide enormous amounts of detailed TV watching data. Several relevant and interdependent factors may influence TV viewers' behavior. In this work we focus on the time factor and derive Temporal Patterns of TV watching, based on panel data. Clustering base attributes are originated from 1440 binary minute-related attributes, capturing the TV watching status (watch/not watch). Since there are around 2500 panel viewers a data reduction procedure is first performed. K-Means algorithm is used to obtain daily clusters of viewers. Weekly patterns are then derived which rely on daily patterns. The obtained solutions are tested for consistency and stability. Temporal TV watching patterns provide new insights concerning Portuguese TV viewers' behavior.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dissertação para a obtenção do grau de Mestre em Engenharia Electrotécnica Ramo de Energia

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Trabalho realizado pelos alunos do 1º ano, 2º semestre, da licenciatura de RPCE, 2015, no âmbito da unidade curricular de Estatística Multivariada

Relevância:

30.00% 30.00%

Publicador:

Resumo:

No literature data above atmospheric pressure could be found for the viscosity of TOTIVI. As a consequence, the present viscosity results could only be compared upon extrapolation of the vibrating wire data to 0.1 MPa. Independent viscosity measurements were performed, at atmospheric pressure, using an Ubbelohde capillary in order to compare with the vibrating wire results, extrapolated by means of the above mentioned correlation. The two data sets agree within +/- 1%, which is commensurate with the mutual uncertainty of the experimental methods. Comparisons of the literature data obtained at atmospheric pressure with the present extrapolated vibrating-wire viscosity measurements have shown an agreement within +/- 2% for temperatures up to 339 K and within +/- 3.3% for temperatures up to 368 K. (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Solvatochromic UV-Vis shifts of four indicators (4-nitroaniline, 4-nitroanisole, 4-nitrophenol and N,N-dimethy-1-4-nitro aniline) have been measured at 298.15 K in the ternary mixture methano1/1-propanol/acetonitrile (MeOH/1-PrOH/MeCN) in a total of 22 mole fractions, along with 18 additional mole fractions for each of the corresponding binary mixtures, MeOH/1-PrOH, 1-PrOH/MeCN and MeOH/MeCN. These values, combined with our previous experimental results for 2,6-dipheny1-4-(2,4,6-triphenylpyridinium-1-yl)phenolate (Reichardt's betaine dye) in the same mixtures, permitted the computation of the Kamlet-Taft solvent parameters, alpha, beta, and pi*. The rationalization of the spectroscopic behavior of each probe within each mixture's whole mole fraction range was achieved through the use of the Bosch and Roses preferential solvation model. The applied model allowed the identification of synergistic behaviors in MeCN/alcohol mixtures and thus to infer the existence of solvent complexes in solution. Also, the addition of small amounts of MeCN to the binary mixtures was seen to cause a significant variation in pi*, whereas the addition of alcohol to MeCN mixtures always lead to a sudden change in a and The behavior of these parameters in the ternary mixture was shown to be mainly determined by the contributions of the underlying binary mixtures. (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Enthalpies of solution of 1-butyl-3-methylimidazolium tetra fluoroborate, [BMIm]BF4, are reported at 298.15 K in a set of 15 hydrogen bond donor and hydrogen bond acceptor solvents, chosen by their diversity, namely, water, methanol, ethanol, 1,2-ethanediol, 2-choroethanol, 2-methoxyethanol, formamide, propylene carbonate, nitromethane, acetonitrile, dimethyl sulfoxide, acetone, N,N-dimethylformamide, N,N-dimethylacetamide, and aniline. These values are shown to be largely independent of [BMIm]BF4 concentration. The obtained enthalpies of solution vary from very endothermic to quite exothermic, thus showing a very high sensitivity of the enthalpies of solution of [BMIm]BF4 to solvent properties. Solvent effects on the solution process of this IL are analyzed by a quantitative structure-property relationship methodology, using the TAKA equation and a modified equation, which significantly improves the model's predictive ability. The observed differences in the enthalpies of solution are rationalized in terms of the solvent properties found to be relevant, that is, pi* and E-T(N).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Um dos maiores desafios tecnológicos no presente é o de se conseguir gerar e manter, de uma maneira eficiente e consistente, uma base de dados de objectos multimédia, em particular, de imagens. A necessidade de desenvolver métodos de pesquisa automáticos baseados no conteúdo semântico das imagens tornou-se de máxima importância. MPEG-7 é um standard que descreve o contudo dos dados multimédia que suportam estes requisitos operacionais. Adiciona um conjunto de descritores audiovisuais de baixo nível. O histograma é a característica mais utilizada para representar as características globais de uma imagem. Neste trabalho é usado o “Edge Histogram Descriptor” (EHD), que resulta numa representação de baixo nível que permite a computação da similaridade entre imagens. Neste trabalho, é obtida uma caracterização semântica da imagem baseada neste descritor usando dois métodos da classificação: o algoritmo k Nearest Neighbors (k-NN) e uma Rede Neuronal (RN) de retro propagação. No algoritmo k-NN é usada a distância Euclidiana entre os descritores de duas imagens para calcular a similaridade entre imagens diferentes. A RN requer um processo de aprendizagem prévia, que inclui responder correctamente às amostras do treino e às amostras de teste. No fim deste trabalho, será apresentado um estudo sobre os resultados dos dois métodos da classificação.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dust is a complex mixture of particles of organic and inorganic origin and different gases absorbed in aerosol droplets. In a poultry unit include dried faecal matter and urine, skin flakes, ammonia, carbon dioxide, pollens, feed and litter particles, feathers, grain mites, fungi spores, bacteria, viruses and their constituents. Dust particles vary in size and differentiation between particle size fractions is important in health studies in order to quantify penetration within the respiratory system. A descriptive study was developed in order to assess exposure to particles in a poultry unit during different operations, namely routine examination and floor turn over. Direct-reading equipment was used (Lighthouse, model 3016 IAQ). Particle measurement was performed in 5 different sizes (PM0.5; PM1.0; PM2.5; PM5.0; PM10). The chemical composition of poultry litter was also determined by neutron activation analysis. Normally, the litter of poultry pavilions is turned over weekly and it was during this operation that the higher exposure of particles was observed. In all the tasks considered PM5.0 and PM10.0 were the sizes with higher concentrations values. PM10 is what turns out to have higher values and PM0.5 the lowest values. The chemical element with the highest concentration was Mg (5.7E6 mg.kg-1), followed by K (1.5E4 mg.kg-1), Ca (4.8E3 mg.kg-1), Na (1.7E3 mg.kg-1), Fe (2.1E2 mg.kg-1) and Zn (4.2E1 mg.kg-1). This high presence of particles in the respirable range (<5–7μm) means that poultry dust particles can penetrate into the gas exchange region of the lung. Larger particles (PM10) present a range of concentrations from 5.3E5 and 3.0E6 mg/m3.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The purpose of this paper was to introduce the symbolic formalism based on kneading theory, which allows us to study the renormalization of non-autonomous periodic dynamical systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Les méthodes modernes d’enseignement exigent de recréer le milieu de la langue étudiée, de faire parler les élèves dans des situations différentes. En Géorgie, l’enseignement de la langue étrangère s’effectue à partir de 6 ans, en même temps que celui de la langue maternelle. Les élèves apprennent à écrire en français après l’apprentissage de l’écriture en géorgien. A l’âge de 7-10 ans, ils connaissent déjà 3 alphabets différents : le géorgien, le latin et le cyrillique. L’objectif de cet article est de proposer une méthode qui pourra faciliter l’apprentissage du français aux non francophones grâce aux moyens audiovisuels qui sont très efficaces surtout au moment quand l’enfant ne sait ni lire, ni écrire en langue étrangère. Cependant, les moyens audiovisuels doivent être utilisés à des doses normales sans empêcher l’activité de l’élève.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering analysis is a useful tool to detect and monitor disease patterns and, consequently, to contribute for an effective population disease management. Portugal has the highest incidence of tuberculosis in the European Union (in 2012, 21.6 cases per 100.000 inhabitants), although it has been decreasing consistently. Two critical PTB (Pulmonary Tuberculosis) areas, metropolitan Oporto and metropolitan Lisbon regions, were previously identified through spatial and space-time clustering for PTB incidence rate and risk factors. Identifying clusters of temporal trends can further elucidate policy makers about municipalities showing a faster or a slower TB control improvement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In data clustering, the problem of selecting the subset of most relevant features from the data has been an active research topic. Feature selection for clustering is a challenging task due to the absence of class labels for guiding the search for relevant features. Most methods proposed for this goal are focused on numerical data. In this work, we propose an approach for clustering and selecting categorical features simultaneously. We assume that the data originate from a finite mixture of multinomial distributions and implement an integrated expectation-maximization (EM) algorithm that estimates all the parameters of the model and selects the subset of relevant features simultaneously. The results obtained on synthetic data illustrate the performance of the proposed approach. An application to real data, referred to official statistics, shows its usefulness.