825 resultados para means clustering
Resumo:
Radial basis functions can be combined into a network structure that has several advantages over conventional neural network solutions. However, to operate effectively the number and positions of the basis function centres must be carefully selected. Although no rigorous algorithm exists for this purpose, several heuristic methods have been suggested. In this paper a new method is proposed in which radial basis function centres are selected by the mean-tracking clustering algorithm. The mean-tracking algorithm is compared with k means clustering and it is shown that it achieves significantly better results in terms of radial basis function performance. As well as being computationally simpler, the mean-tracking algorithm in general selects better centre positions, thus providing the radial basis functions with better modelling accuracy
Resumo:
Solar-powered vehicle activated signs (VAS) are speed warning signs powered by batteries that are recharged by solar panels. These signs are more desirable than other active warning signs due to the low cost of installation and the minimal maintenance requirements. However, one problem that can affect a solar-powered VAS is the limited power capacity available to keep the sign operational. In order to be able to operate the sign more efficiently, it is proposed that the sign be appropriately triggered by taking into account the prevalent conditions. Triggering the sign depends on many factors such as the prevailing speed limit, road geometry, traffic behaviour, the weather and the number of hours of daylight. The main goal of this paper is therefore to develop an intelligent algorithm that would help optimize the trigger point to achieve the best compromise between speed reduction and power consumption. Data have been systematically collected whereby vehicle speed data were gathered whilst varying the value of the trigger speed threshold. A two stage algorithm is then utilized to extract the trigger speed value. Initially the algorithm employs a Self-Organising Map (SOM), to effectively visualize and explore the properties of the data that is then clustered in the second stage using K-means clustering method. Preliminary results achieved in the study indicate that using a SOM in conjunction with K-means method is found to perform well as opposed to direct clustering of the data by K-means alone. Using a SOM in the current case helped the algorithm determine the number of clusters in the data set, which is a frequent problem in data clustering.
Resumo:
Salamanca has been considered among the most polluted cities in Mexico. The vehicular park, the industry and the emissions produced by agriculture, as well as orography and climatic characteristics have propitiated the increment in pollutant concentration of Particulate Matter less than 10 μg/m3 in diameter (PM10). In this work, a Multilayer Perceptron Neural Network has been used to make the prediction of an hour ahead of pollutant concentration. A database used to train the Neural Network corresponds to historical time series of meteorological variables (wind speed, wind direction, temperature and relative humidity) and air pollutant concentrations of PM10. Before the prediction, Fuzzy c-Means clustering algorithm have been implemented in order to find relationship among pollutant and meteorological variables. These relationship help us to get additional information that will be used for predicting. Our experiments with the proposed system show the importance of this set of meteorological variables on the prediction of PM10 pollutant concentrations and the neural network efficiency. The performance estimation is determined using the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The results shown that the information obtained in the clustering step allows a prediction of an hour ahead, with data from past 2 hours
Resumo:
In this work we propose an image acquisition and processing methodology (framework) developed for performance in-field grapes and leaves detection and quantification, based on a six step methodology: 1) image segmentation through Fuzzy C-Means with Gustafson Kessel (FCM-GK) clustering; 2) obtaining of FCM-GK outputs (centroids) for acting as seeding for K-Means clustering; 3) Identification of the clusters generated by K-Means using a Support Vector Machine (SVM) classifier. 4) Performance of morphological operations over the grapes and leaves clusters in order to fill holes and to eliminate small pixels clusters; 5)Creation of a mosaic image by Scale-Invariant Feature Transform (SIFT) in order to avoid overlapping between images; 6) Calculation of the areas of leaves and grapes and finding of the centroids in the grape bunches. Image data are collected using a colour camera fixed to a mobile platform. This platform was developed to give a stabilized surface to guarantee that the images were acquired parallel to de vineyard rows. In this way, the platform avoids the distortion of the images that lead to poor estimation of the areas. Our preliminary results are promissory, although they still have shown that it is necessary to implement a camera stabilization system to avoid undesired camera movements, and also a parallel processing procedure in order to speed up the mosaicking process.
Resumo:
Salamanca, situated in center of Mexico is among the cities which suffer most from the air pollution in Mexico. The vehicular park and the industry, as well as orography and climatic characteristics have propitiated the increment in pollutant concentration of Sulphur Dioxide (SO2). In this work, a Multilayer Perceptron Neural Network has been used to make the prediction of an hour ahead of pollutant concentration. A database used to train the Neural Network corresponds to historical time series of meteorological variables and air pollutant concentrations of SO2. Before the prediction, Fuzzy c-Means and K-means clustering algorithms have been implemented in order to find relationship among pollutant and meteorological variables. Our experiments with the proposed system show the importance of this set of meteorological variables on the prediction of SO2 pollutant concentrations and the neural network efficiency. The performance estimation is determined using the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The results showed that the information obtained in the clustering step allows a prediction of an hour ahead, with data from past 2 hours.
Resumo:
In this paper we present an efficient k-Means clustering algorithm for two dimensional data. The proposed algorithm re-organizes dataset into a form of nested binary tree*. Data items are compared at each node with only two nearest means with respect to each dimension and assigned to the one that has the closer mean. The main intuition of our research is as follows: We build the nested binary tree. Then we scan the data in raster order by in-order traversal of the tree. Lastly we compare data item at each node to the only two nearest means to assign the value to the intendant cluster. In this way we are able to save the computational cost significantly by reducing the number of comparisons with means and also by the least use to Euclidian distance formula. Our results showed that our method can perform clustering operation much faster than the classical ones. © Springer-Verlag Berlin Heidelberg 2005
Resumo:
Clustering algorithms, pattern mining techniques and associated quality metrics emerged as reliable methods for modeling learners’ performance, comprehension and interaction in given educational scenarios. The specificity of available data such as missing values, extreme values or outliers, creates a challenge to extract significant user models from an educational perspective. In this paper we introduce a pattern detection mechanism with-in our data analytics tool based on k-means clustering and on SSE, silhouette, Dunn index and Xi-Beni index quality metrics. Experiments performed on a dataset obtained from our online e-learning platform show that the extracted interaction patterns were representative in classifying learners. Furthermore, the performed monitoring activities created a strong basis for generating automatic feedback to learners in terms of their course participation, while relying on their previous performance. In addition, our analysis introduces automatic triggers that highlight learners who will potentially fail the course, enabling tutors to take timely actions.
Resumo:
L'esperimento ATLAS, come gli altri esperimenti che operano al Large Hadron Collider, produce Petabytes di dati ogni anno, che devono poi essere archiviati ed elaborati. Inoltre gli esperimenti si sono proposti di rendere accessibili questi dati in tutto il mondo. In risposta a questi bisogni è stato progettato il Worldwide LHC Computing Grid che combina la potenza di calcolo e le capacità di archiviazione di più di 170 siti sparsi in tutto il mondo. Nella maggior parte dei siti del WLCG sono state sviluppate tecnologie per la gestione dello storage, che si occupano anche della gestione delle richieste da parte degli utenti e del trasferimento dei dati. Questi sistemi registrano le proprie attività in logfiles, ricchi di informazioni utili agli operatori per individuare un problema in caso di malfunzionamento del sistema. In previsione di un maggiore flusso di dati nei prossimi anni si sta lavorando per rendere questi siti ancora più affidabili e uno dei possibili modi per farlo è lo sviluppo di un sistema in grado di analizzare i file di log autonomamente e individuare le anomalie che preannunciano un malfunzionamento. Per arrivare a realizzare questo sistema si deve prima individuare il metodo più adatto per l'analisi dei file di log. In questa tesi viene studiato un approccio al problema che utilizza l'intelligenza artificiale per analizzare i logfiles, più nello specifico viene studiato l'approccio che utilizza dell'algoritmo di clustering K-means.
Resumo:
Este trabalho teve por objetivo estudar as causas de variação nos preços de bovinos da raça nelore pertencentes a rebanhos de seleção, os quais foram comercializados em leilões, para verificar as influências das avaliações genéticas e dos julgamentos de exterior sobre esses preços. Para tanto, foram computados os preços de venda de 426 bovinos da referida raça em 12 leilões ocorridos em diversas localidades brasileiras (regiões Centro-Oeste, Norte e Sudeste), entre os anos de 2002 e 2005. O valor médio foi de R$ 3.325,49, sendo o mínimo de R$ 1.400,00 e o máximo de R$ 10.500,00. Esses dados foram digitados juntamente com outras informações que eram apresentadas nos catálogos dos leilões. As informações registradas incluíram o sexo de cada animal, o nome do leilão e as DEPs informadas nos catálogos. Além da avaliação da influência das informações dos catálogos, também foi avaliada a influência das informações dos reprodutores, pais dos animais vendidos nos leilões, envolvendo suas DEPs publicadas em um sumário de reprodutores da raça e as pontuações de suas progênies em julgamentos. Os métodos estatísticos aplicados foram análises de variâncias e análises de agrupamento (método K-médias). Como resultado, foi observado que animais com superioridade genética em características relacionadas a desempenho ponderal, considerando-se os efeitos diretos e maternos, foram valorizados ao serem comercializados nos leilões. Em contra-partida, a pontuação dos reprodutores nos julgamentos não teve influência significativa sobre os preços médios de venda de suas progênies nos leilões.
Resumo:
Background: Since establishing universal free access to antiretroviral therapy in 1996, the Brazilian Health System has increased the number of centers providing HIV/AIDS outpatient care from 33 to 540. There had been no formal monitoring of the quality of these services until a survey of 336 AIDS health centers across 7 Brazilian states was undertaken in 2002. Managers of the services were asked to assess their clinics according to parameters of service inputs and service delivery processes. This report analyzes the survey results and identifies predictors of the overall quality of service delivery. Methods: The survey involved completion of a multiple-choice questionnaire comprising 107 parameters of service inputs and processes of delivering care, with responses assessed according to their likely impact on service quality using a 3-point scale. K-means clustering was used to group these services according to their scored responses. Logistic regression analysis was performed to identify predictors of high service quality. Results: The questionnaire was completed by 95.8% (322) of the managers of the sites surveyed. Most sites scored about 50% of the benchmark expectation. K-means clustering analysis identified four quality levels within which services could be grouped: 76 services (24%) were classed as level 1 (best), 53 (16%) as level 2 (medium), 113 (35%) as level 3 (poor), and 80 (25%) as level 4 (very poor). Parameters of service delivery processes were more important than those relating to service inputs for determining the quality classification. Predictors of quality services included larger care sites, specialization for HIV/AIDS, and location within large municipalities. Conclusion: The survey demonstrated highly variable levels of HIV/AIDS service quality across the sites. Many sites were found to have deficiencies in the processes of service delivery processes that could benefit from quality improvement initiatives. These findings could have implications for how HIV/AIDS services are planned in Brazil to achieve quality standards, such as for where service sites should be located, their size and staffing requirements. A set of service delivery indicators has been identified that could be used for routine monitoring of HIV/AIDS service delivery for HIV/AIDS in Brazil (and potentially in other similar settings).
Resumo:
Examples from the Murray-Darling basin in Australia are used to illustrate different methods of disaggregation of reconnaissance-scale maps. One approach for disaggregation revolves around the de-convolution of the soil-landscape paradigm elaborated during a soil survey. The descriptions of soil ma units and block diagrams in a soil survey report detail soil-landscape relationships or soil toposequences that can be used to disaggregate map units into component landscape elements. Toposequences can be visualised on a computer by combining soil maps with digital elevation data. Expert knowledge or statistics can be used to implement the disaggregation. Use of a restructuring element and k-means clustering are illustrated. Another approach to disaggregation uses training areas to develop rules to extrapolate detailed mapping into other, larger areas where detailed mapping is unavailable. A two-level decision tree example is presented. At one level, the decision tree method is used to capture mapping rules from the training area; at another level, it is used to define the domain over which those rules can be extrapolated. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
Understanding the ecological role of benthic microalgae, a highly productive component of coral reef ecosystems, requires information on their spatial distribution. The spatial extent of benthic microalgae on Heron Reef (southern Great Barrier Reef, Australia) was mapped using data from the Landsat 5 Thematic Mapper sensor. integrated with field measurements of sediment chlorophyll concentration and reflectance. Field-measured sediment chlorophyll concentrations. 2 ranging from 23-1.153 mg chl a m(2), were classified into low, medium, and high concentration classes (1-170, 171-290, and > 291 mg chl a m(-2)) using a K-means clustering algorithm. The mapping process assumed that areas in the Thematic Mapper image exhibiting similar reflectance levels in red and blue bands would correspond to areas of similar chlorophyll a levels. Regions of homogenous reflectance values corresponding to low, medium, and high chlorophyll levels were identified over the reef sediment zone by applying a standard image classification algorithm to the Thematic Mapper image. The resulting distribution map revealed large-scale ( > 1 km 2) patterns in chlorophyll a levels throughout the sediment zone of Heron Reef. Reef-wide estimates of chlorophyll a distribution indicate that benthic Microalgae may constitute up to 20% of the total benthic chlorophyll a at Heron Reef. and thus contribute significantly to total primary productivity on the reef.
Resumo:
In this paper an approach to extreme event control in wastewater treatment plant operation by use of automatic supervisory control is discussed. The framework presented is based on the fact that different operational conditions manifest themselves as clusters in a multivariate measurement space. These clusters are identified and linked to specific and corresponding events by use of principal component analysis and fuzzy c-means clustering. A reduced system model is assigned to each type of extreme event and used to calculate appropriate local controller set points. In earlier work we have shown that this approach is applicable to wastewater treatment control using look-up tables to determine current set points. In this work we focus on the automatic determination of appropriate set points by use of steady state and dynamic predictions. The performance of a relatively simple steady-state supervisory controller is compared with that of a model predictive supervisory controller. Also, a look-up table approach is included in the comparison, as it provides a simple and robust alternative to the steady-state and model predictive controllers, The methodology is illustrated in a simulation study.
Resumo:
Os avanços tecnológicos e científicos, na área da saúde, têm vindo a aliar áreas como a Medicina e a Matemática, cabendo à ciência adequar de forma mais eficaz os meios de investigação, diagnóstico, monitorização e terapêutica. Os métodos desenvolvidos e os estudos apresentados nesta dissertação resultam da necessidade de encontrar respostas e soluções para os diferentes desafios identificados na área da anestesia. A índole destes problemas conduz, necessariamente, à aplicação, adaptação e conjugação de diferentes métodos e modelos das diversas áreas da matemática. A capacidade para induzir a anestesia em pacientes, de forma segura e confiável, conduz a uma enorme variedade de situações que devem ser levadas em conta, exigindo, por isso, intensivos estudos. Assim, métodos e modelos de previsão, que permitam uma melhor personalização da dosagem a administrar ao paciente e por monitorizar, o efeito induzido pela administração de cada fármaco, com sinais mais fiáveis, são fundamentais para a investigação e progresso neste campo. Neste contexto, com o objetivo de clarificar a utilização em estudos na área da anestesia de um ajustado tratamento estatístico, proponho-me abordar diferentes análises estatísticas para desenvolver um modelo de previsão sobre a resposta cerebral a dois fármacos durante sedação. Dados obtidos de voluntários serão utilizados para estudar a interação farmacodinâmica entre dois fármacos anestésicos. Numa primeira fase são explorados modelos de regressão lineares que permitam modelar o efeito dos fármacos no sinal cerebral BIS (índice bispectral do EEG – indicador da profundidade de anestesia); ou seja estimar o efeito que as concentrações de fármacos têm na depressão do eletroencefalograma (avaliada pelo BIS). Na segunda fase deste trabalho, pretende-se a identificação de diferentes interações com Análise de Clusters bem como a validação do respetivo modelo com Análise Discriminante, identificando grupos homogéneos na amostra obtida através das técnicas de agrupamento. O número de grupos existentes na amostra foi, numa fase exploratória, obtido pelas técnicas de agrupamento hierárquicas, e a caracterização dos grupos identificados foi obtida pelas técnicas de agrupamento k-means. A reprodutibilidade dos modelos de agrupamento obtidos foi testada através da análise discriminante. As principais conclusões apontam que o teste de significância da equação de Regressão Linear indicou que o modelo é altamente significativo. As variáveis propofol e remifentanil influenciam significativamente o BIS e o modelo melhora com a inclusão do remifentanil. Este trabalho demonstra ainda ser possível construir um modelo que permite agrupar as concentrações dos fármacos, com base no efeito no sinal cerebral BIS, com o apoio de técnicas de agrupamento e discriminantes. Os resultados desmontram claramente a interacção farmacodinâmica dos dois fármacos, quando analisamos o Cluster 1 e o Cluster 3. Para concentrações semelhantes de propofol o efeito no BIS é claramente diferente dependendo da grandeza da concentração de remifentanil. Em suma, o estudo demostra claramente, que quando o remifentanil é administrado com o propofol (um hipnótico) o efeito deste último é potenciado, levando o sinal BIS a valores bastante baixos.
Resumo:
In recent decades, all over the world, competition in the electric power sector has deeply changed the way this sector’s agents play their roles. In most countries, electric process deregulation was conducted in stages, beginning with the clients of higher voltage levels and with larger electricity consumption, and later extended to all electrical consumers. The sector liberalization and the operation of competitive electricity markets were expected to lower prices and improve quality of service, leading to greater consumer satisfaction. Transmission and distribution remain noncompetitive business areas, due to the large infrastructure investments required. However, the industry has yet to clearly establish the best business model for transmission in a competitive environment. After generation, the electricity needs to be delivered to the electrical system nodes where demand requires it, taking into consideration transmission constraints and electrical losses. If the amount of power flowing through a certain line is close to or surpasses the safety limits, then cheap but distant generation might have to be replaced by more expensive closer generation to reduce the exceeded power flows. In a congested area, the optimal price of electricity rises to the marginal cost of the local generation or to the level needed to ration demand to the amount of available electricity. Even without congestion, some power will be lost in the transmission system through heat dissipation, so prices reflect that it is more expensive to supply electricity at the far end of a heavily loaded line than close to an electric power generation. Locational marginal pricing (LMP), resulting from bidding competition, represents electrical and economical values at nodes or in areas that may provide economical indicator signals to the market agents. This article proposes a data-mining-based methodology that helps characterize zonal prices in real power transmission networks. To test our methodology, we used an LMP database from the California Independent System Operator for 2009 to identify economical zones. (CAISO is a nonprofit public benefit corporation charged with operating the majority of California’s high-voltage wholesale power grid.) To group the buses into typical classes that represent a set of buses with the approximate LMP value, we used two-step and k-means clustering algorithms. By analyzing the various LMP components, our goal was to extract knowledge to support the ISO in investment and network-expansion planning.