902 resultados para K-MEANS
Resumo:
Background: The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. New method: We propose a complete pipeline for the cluster analysis of ERP data. To increase the signalto-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA)to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). Results: After validating the pipeline on simulated data, we tested it on data from two experiments – a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership.
Resumo:
Extratropical transition (ET) has eluded objective identification since the realisation of its existence in the 1970s. Recent advances in numerical, computational models have provided data of higher resolution than previously available. In conjunction with this, an objective characterisation of the structure of a storm has now become widely accepted in the literature. Here we present a method of combining these two advances to provide an objective method for defining ET. The approach involves applying K-means clustering to isolate different life-cycle stages of cyclones and then analysing the progression through these stages. This methodology is then tested by applying it to five recent years from the European Centre of Medium-Range Weather Forecasting operational analyses. It is found that this method is able to determine the general characteristics for ET in the Northern Hemisphere. Between 2008 and 2012, 54% (±7, 32 of 59) of Northern Hemisphere tropical storms are estimated to undergo ET. There is great variability across basins and time of year. To fully capture all the instances of ET is necessary to introduce and characterise multiple pathways through transition. Only one of the three transition types needed has been previously well-studied. A brief description of the alternate types of transitions is given, along with illustrative storms, to assist with further study
Resumo:
Precipitation over western Europe (WE) is projected to increase (decrease) roughly northward (equatorward) of 50°N during the 21st century. These changes are generally attributed to alterations in the regional large-scale circulation, e.g., jet stream, cyclone activity, and blocking frequencies. A novel weather typing within the sector (30°W–10°E, 25–70°N) is used for a more comprehensive dynamical interpretation of precipitation changes. A k-means clustering on daily mean sea level pressure was undertaken for ERA-Interim reanalysis (1979–2014). Eight weather types are identified: S1, S2, S3 (summertime types), W1, W2, W3 (wintertime types), B1, and B2 (blocking-like types). Their distinctive dynamical characteristics allow identifying the main large-scale precipitation-driving mechanisms. Simulations with 22 Coupled Model Intercomparison Project 5 models for recent climate conditions show biases in reproducing the observed seasonality of weather types. In particular, an overestimation of weather type frequencies associated with zonal airflow is identified. Considering projections following the (Representative Concentration Pathways) RCP8.5 scenario over 2071–2100, the frequencies of the three driest types (S1, B2, and W3) are projected to increase (mainly S1, +4%) in detriment of the rainiest types, particularly W1 (−3%). These changes explain most of the precipitation projections over WE. However, a weather type-independent background signal is identified (increase/decrease in precipitation over northern/southern WE), suggesting modifications in precipitation-generating processes and/or model inability to accurately simulate these processes. Despite these caveats in the precipitation scenarios for WE, which must be duly taken into account, our approach permits a better understanding of the projected trends for precipitation over WE.
Resumo:
This paper is concerned with the computational efficiency of fuzzy clustering algorithms when the data set to be clustered is described by a proximity matrix only (relational data) and the number of clusters must be automatically estimated from such data. A fuzzy variant of an evolutionary algorithm for relational clustering is derived and compared against two systematic (pseudo-exhaustive) approaches that can also be used to automatically estimate the number of fuzzy clusters in relational data. An extensive collection of experiments involving 18 artificial and two real data sets is reported and analyzed. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
This paper proposes a novel way to combine different observation models in a particle filter framework. This, so called, auto-adjustable observation model, enhance the particle filter accuracy when the tracked objects overlap without infringing a great runtime penalty to the whole tracking system. The approach has been tested under two important real world situations related to animal behavior: mice and larvae tracking. The proposal was compared to some state-of-art approaches and the results show, under the datasets tested, that a good trade-off between accuracy and runtime can be achieved using an auto-adjustable observation model. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Parkinson's disease (PD) is the second most common neurodegenerative disorder (after Alzheimer's disease) and directly affects upto 5 million people worldwide. The stages (Hoehn and Yaar) of disease has been predicted by many methods which will be helpful for the doctors to give the dosage according to it. So these methods were brought up based on the data set which includes about seventy patients at nine clinics in Sweden. The purpose of the work is to analyze unsupervised technique with supervised neural network techniques in order to make sure the collected data sets are reliable to make decisions. The data which is available was preprocessed before calculating the features of it. One of the complex and efficient feature called wavelets has been calculated to present the data set to the network. The dimension of the final feature set has been reduced using principle component analysis. For unsupervised learning k-means gives the closer result around 76% while comparing with supervised techniques. Back propagation and J4 has been used as supervised model to classify the stages of Parkinson's disease where back propagation gives the variance percentage of 76-82%. The results of both these models have been analyzed. This proves that the data which are collected are reliable to predict the disease stages in Parkinson's disease.
Resumo:
O propósito dessa dissertação é avaliar, numa perspectiva geográfica, os setores industriais no Brasil nas últimas três décadas. Numa primeira instância, o objetivo é verificar o nível de especialização e concentração dos estados brasileiros em termos industriais, utilizando-se os índices de Krugman e Gini, respectivamente. Com os resultados desses dois índices, os estados brasileiros são separados em quatro grupos, segundo o método de grupamento de médias K. Através de um produto interno usual entre o vetor da distribuição da produção industrial dos setores nos estados e vetores de algumas características desses setores (chamado de Viés das Características da Indústria - VCI), verifica-se em que tipos de indústrias os estados estão se especializando e/ou concentrando. Uma análise multivariada de componentes principais é feita com os VCI’s, na qual esses componentes principais são usados para verificar a similaridade dos estados. Sob outra perspectiva, busca-se investigar o nível de concentração geográfico dos setores industriais brasileiros. Para tanto, utilizaram-se o índice Gini e o índice de Venables. Nesse último, a distância entre os estados não é negligenciada para mensuração da concentração. Os setores industriais são separados em três grupos pelo método de grupamento de médias K, no qual as variáveis utilizadas são os componentes principais das características das indústrias. Utilizando outro produto interno, o Viés da Característica dos Estados (VCE), observa-se em que tipo de estados os setores industriais estão se concentrando ou não. Para visualizar como essas duas perspectivas, ou seja, como as características dos estados e das indústrias influenciam a localização dos setores industriais no território brasileiro, um modelo econométrico de dados cruzados de Midelfart-Knarvik e outros (2000) é estabelecido para o caso brasileiro. Neste modelo econométrico, é possível investigar como a interação das características das indústrias e dos estados podem determinar onde a indústria se localiza. Os principais resultados mostram que os fortes investimentos em infraestrutura na década de 70 e a abertura comercial na década de 90 foram marcantes para localização da indústria brasileira.
Resumo:
The main objective of this study is to apply recently developed methods of physical-statistic to time series analysis, particularly in electrical induction s profiles of oil wells data, to study the petrophysical similarity of those wells in a spatial distribution. For this, we used the DFA method in order to know if we can or not use this technique to characterize spatially the fields. After obtain the DFA values for all wells, we applied clustering analysis. To do these tests we used the non-hierarchical method called K-means. Usually based on the Euclidean distance, the K-means consists in dividing the elements of a data matrix N in k groups, so that the similarities among elements belonging to different groups are the smallest possible. In order to test if a dataset generated by the K-means method or randomly generated datasets form spatial patterns, we created the parameter Ω (index of neighborhood). High values of Ω reveals more aggregated data and low values of Ω show scattered data or data without spatial correlation. Thus we concluded that data from the DFA of 54 wells are grouped and can be used to characterize spatial fields. Applying contour level technique we confirm the results obtained by the K-means, confirming that DFA is effective to perform spatial analysis
Resumo:
In recent years, the DFA introduced by Peng, was established as an important tool capable of detecting long-range autocorrelation in time series with non-stationary. This technique has been successfully applied to various areas such as: Econophysics, Biophysics, Medicine, Physics and Climatology. In this study, we used the DFA technique to obtain the Hurst exponent (H) of the profile of electric density profile (RHOB) of 53 wells resulting from the Field School of Namorados. In this work we want to know if we can or not use H to spatially characterize the spatial data field. Two cases arise: In the first a set of H reflects the local geology, with wells that are geographically closer showing similar H, and then one can use H in geostatistical procedures. In the second case each well has its proper H and the information of the well are uncorrelated, the profiles show only random fluctuations in H that do not show any spatial structure. Cluster analysis is a method widely used in carrying out statistical analysis. In this work we use the non-hierarchy method of k-means. In order to verify whether a set of data generated by the k-means method shows spatial patterns, we create the parameter Ω (index of neighborhood). High Ω shows more aggregated data, low Ω indicates dispersed or data without spatial correlation. With help of this index and the method of Monte Carlo. Using Ω index we verify that random cluster data shows a distribution of Ω that is lower than actual cluster Ω. Thus we conclude that the data of H obtained in 53 wells are grouped and can be used to characterize space patterns. The analysis of curves level confirmed the results of the k-means
Resumo:
The extent of the Brazilian Atlantic rainforest, a global biodiversity hotspot, has been reduced to less than 7% of its original range. Yet, it contains one of the richest butterfly fauna in the world. Butterflies are commonly used as environmental indicators, mostly because of their strict association with host plants, microclimate and resource availability. This research describes diversity, composition and species richness of frugivorous butterflies in a forest fragment in the Brazilian Northeast. It compares communities in different physiognomies and seasons. The climate in the study area is classified as tropical rainy, with two well defined seasons. Butterfly captures were made with 60 Van Someren-Rydon traps, randomly located within six different habitat units (10 traps per unit) that varied from very open (e.g. coconut plantation) to forest interior. Sampling was made between January and December 2008, for five days each month. I captured 12090 individuals from 32 species. The most abundant species were Taygetis laches, Opsiphanes invirae and Hamadryas februa, which accounted for 70% of all captures. Similarity analysis identified two main groups, one of species associated with open or disturbed areas and a second by species associated with shaded areas. There was a strong seasonal component in species composition, with less species and lower abundance in the dry season and more species and higher abundance in the rainy season. K-means analysis indicates that choice of habitat units overestimated faunal perceptions, suggesting less distinct units. The species Taygetis virgilia, Hamadryas chloe, Callicore pygas e Morpho achilles were associated with less disturbed habitats, while Yphthimoides sp, Historis odius, H. acheronta, Hamadryas feronia e Siderone marthesia likey indicate open or disturbed habitats. This research brings important information for conservation of frugivorous butterflies, and will serve as baseline for future projects in environmental monitoring
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Os solos submetidos aos sistemas de produção sem preparo estão sujeitos à compactação, provocada pelo tráfego de máquinas, tornando necessário o acompanhamento das alterações do ambiente físico, que, quando desfavorável, restringe o crescimento radicular, podendo reduzir a produtividade das culturas. O objetivo do trabalho foi avaliar o efeito de diferentes intensidades de compactação na qualidade física de um Latossolo Vermelho textura média, localizado em Jaboticabal (SP), sob cultivo de milho, usando métodos de estatística multivariada. O delineamento experimental foi inteiramente casualizado, com seis intensidades de compactação e quatro repetições. Foram coletadas amostras indeformadas do solo nas camadas de 0,02-0,05, 0,08-0,11 e 0,15-0,18 m para determinação da densidade do solo (Ds), na camada de 0-0,20 m. As características da cultura avaliadas foram: densidade radicular, diâmetro radicular, matéria seca das raízes, altura das plantas, altura de inserção da primeira espiga, diâmetro do colmo e matéria seca das plantas. As análises de agrupamentos e componentes principais permitiram identificar três grupos de alta, média e baixa produtividade de plantas de milho, segundo variáveis do solo, do sistema radicular e da parte aérea das plantas. A classificação dos acessos em grupos foi feita por três métodos: método de agrupamentos hierárquico, método não-hierárquico k-means e análise de componentes principais. Os componentes principais evidenciaram que elevadas produtividades de milho estão correlacionadas com o bom crescimento da parte aérea das plantas, em condições de menor densidade do solo, proporcionando elevada produção de matéria seca das raízes, contudo, de pequeno diâmetro. A qualidade física do Latossolo Vermelho para o cultivo do milho foi assegurada até à densidade do solo de 1,38 Mg m-3.
Resumo:
A erodibilidade é um fator de extrema importância na caracterização da perda de solo, representando os processos que regulam a infiltração de água e sua resistência à desagregação e o transporte de partículas. Assim, por meio da análise de dependência espacial dos componentes principais da erodibilidade (fator K), objetivou-se estimar a erodibilidade do solo em uma área de nascentes da microbacia do Córrego do Tijuco, Monte Alto-SP, e analisar a variabilidade espacial das variáveis granulométricas do solo ao longo do relevo. A erodibilidade média da área foi considerada alta, e a análise de agrupamento k-means apontou para uma formação de cinco grupos: no primeiro, os altos teores de areia grossa (AG) e média (AM) condicionaram sua distribuição nas áreas planas; o segundo, caracterizado pelo alto teor de areia fina (AF), distribui-se nos declives mais convexos; o terceiro, com altos teores de silte e areia muito fina (AMF), concentrou-se nos maiores declives e concavidades; o quarto, com maior teor de argila, seguiu as zonas de escoamento de água; e o quinto, com alto teor de matéria orgânica (MO) e areia grossa (AG), distribui-se nas proximidades da zona urbana. A análise de componentes principais (ACP) mostrou quatro componentes com 87,4 % das informações, sendo o primeiro componente principal (CP1) discriminado pelo transporte seletivo de partículas principalmente em zonas pontuais de maior declividade e acúmulo de sedimentos; o segundo (CP2), discriminado pela baixa coesão entre as partículas, mostra acúmulo da areia fina nas áreas de menor cota em toda a área de concentração de água; o terceiro (CP3), discriminado pela maior agregação do solo, concentra-se principalmente nas bases de grandes declives; e o quarto (CP4), discriminado pela areia muito fina, distribui-se ao longo das declividades nas maiores altitudes. Os resultados sugerem o comportamento granulométrico do solo, que se mostra suscetível ao processo erosivo devido às condições texturais superficiais e à movimentação do relevo.
Resumo:
The use of the maps obtained from remote sensing orbital images submitted to digital processing became fundamental to optimize conservation and monitoring actions of the coral reefs. However, the accuracy reached in the mapping of submerged areas is limited by variation of the water column that degrades the signal received by the orbital sensor and introduces errors in the final result of the classification. The limited capacity of the traditional methods based on conventional statistical techniques to solve the problems related to the inter-classes took the search of alternative strategies in the area of the Computational Intelligence. In this work an ensemble classifiers was built based on the combination of Support Vector Machines and Minimum Distance Classifier with the objective of classifying remotely sensed images of coral reefs ecosystem. The system is composed by three stages, through which the progressive refinement of the classification process happens. The patterns that received an ambiguous classification in a certain stage of the process were revalued in the subsequent stage. The prediction non ambiguous for all the data happened through the reduction or elimination of the false positive. The images were classified into five bottom-types: deep water; under-water corals; inter-tidal corals; algal and sandy bottom. The highest overall accuracy (89%) was obtained from SVM with polynomial kernel. The accuracy of the classified image was compared through the use of error matrix to the results obtained by the application of other classification methods based on a single classifier (neural network and the k-means algorithm). In the final, the comparison of results achieved demonstrated the potential of the ensemble classifiers as a tool of classification of images from submerged areas subject to the noise caused by atmospheric effects and the water column