960 resultados para k-means
Resumo:
Analisa-se se os funcionamentos inferenciais têm uma estrutura própria dos sistemas dinâmicos não lineais, estudados a partir de quatro gráficas humorísticas. Os primeiros resultados com o tratamento estadístico lineal de K-medias projetam a presencia de perfis de diferentes funcionamentos inferenciais em função das diferentes piadas. Os resultados com a técnica da wavelet, proveniente dos sistemas dinâmicos não lineais, mostram patrões dos funcionamentos inferenciais que dão conta de sua natureza multifractal, sem uma sequencialidade fixa e sem uma organização aparente. Isto implica que é necessário revisar a concepção de estádios sequenciais fixos como os que dominam os estudos de desenvolvimento cognitivo.
Resumo:
Integrar diferentes unidades de análisis para el estudio de la personalidad y considerar estas unidades en su predicción de la satisfacción y el rendimiento en adolescentes. 296 estudiantes de ESO de entre 15 y 18 años. 162 son mujeres y 134 varones. Las aplicaciones de las pruebas se realizan en horario de tutorías dentro del Plan Acción Tutorial (PAT). Se les explica a los alumnos que participan en la investigación sobre 'metas que se proponen realizar en un futuro' y que las pruebas que se administran les pueden ayudar en el futuro para la toma de decisiones. Las aplicaciones de las pruebas se realizan en dos sesiones de evaluación. En la primera, se aplican las pruebas de personalidad y satisfacción. En la segunda se evalúan metas personales. El rendimiento académico se operativiza por la puntuación del adolescente en su curso académico. Todos los alumnos participan voluntariamente en la investigación. Escala de objetivos o metas personales, escala de satisfacción por áreas vitales (ESAV), Inventario de personalidad para adolescentes de Millón (MAPI), estilos básicos de personalidad, escalas de correlatos comportamentales. Para el análisis de los datos, se utilizan programas estadísticos SPSS, SPAD, LISREL VIII y para el cálculo del tamaño del efecto el Statistical Power Computer Analysis. Las técnicas de análisis de datos se centran en Análisis de Correspondencia Múltiple (ACM), análisis de conglomerados K means, Análisis de varianza y diferencias entre coeficientes de correlación. Los resultados indican que los adolescentes que se plantean metas relacionadas con las tareas vitales a desarrollar en un futuro próximo manifiestan mayores niveles de satisfacción. Además, las diferencias en los estilos de personalidad, permiten entender el sistema de metas personales en cuatro grupos de adolescentes. La consideración de los estilos de personalidad y las metas personales permiten entender la adaptación de los adolescentes a su entorno considerando la satisfacción autopercibida y el rendimiento académico.
Resumo:
Resumen tomado de la publicación. Con el apoyo económico del departamento MIDE de la UNED. Incluye anexo con el cuestionario utilizado para la realización del estudio
Resumo:
Resumen tomado de la publicaci??n
Resumo:
The k-means cluster technique is used to examine 43 yr of daily winter Northern Hemisphere (NH) polar stratospheric data from the 40-yr ECMWF Re-Analysis (ERA-40). The results show that the NH winter stratosphere exists in two natural well-separated states. In total, 10% of the analyzed days exhibit a warm disturbed state that is typical of sudden stratospheric warming events. The remaining 90% of the days are in a state typical of a colder undisturbed vortex. These states are determined objectively, with no preconceived notion of the groups. The two stratospheric states are described and compared with alternative indicators of the polar winter flow, such as the northern annular mode. It is shown that the zonally averaged zonal winds in the polar upper stratosphere at 7 hPa can best distinguish between the two states, using a threshold value of 4 m s−1, which is remarkably close to the standard WMO criterion for major warming events. The analysis also determines that there are no further divisions within the warm state, indicating that there is no well-designated threshold between major and minor warmings, nor between split and displaced vortex events. These different manifestations are simply members of a continuum of warming events.
Resumo:
Radial basis functions can be combined into a network structure that has several advantages over conventional neural network solutions. However, to operate effectively the number and positions of the basis function centres must be carefully selected. Although no rigorous algorithm exists for this purpose, several heuristic methods have been suggested. In this paper a new method is proposed in which radial basis function centres are selected by the mean-tracking clustering algorithm. The mean-tracking algorithm is compared with k means clustering and it is shown that it achieves significantly better results in terms of radial basis function performance. As well as being computationally simpler, the mean-tracking algorithm in general selects better centre positions, thus providing the radial basis functions with better modelling accuracy
Resumo:
A fast backward elimination algorithm is introduced based on a QR decomposition and Givens transformations to prune radial-basis-function networks. Nodes are sequentially removed using an increment of error variance criterion. The procedure is terminated by using a prediction risk criterion so as to obtain a model structure with good generalisation properties. The algorithm can be used to postprocess radial basis centres selected using a k-means routine and, in this mode, it provides a hybrid supervised centre selection approach.
Resumo:
This paper deals with the selection of centres for radial basis function (RBF) networks. A novel mean-tracking clustering algorithm is described as a way in which centers can be chosen based on a batch of collected data. A direct comparison is made between the mean-tracking algorithm and k-means clustering and it is shown how mean-tracking clustering is significantly better in terms of achieving an RBF network which performs accurate function modelling.
Resumo:
Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element
Resumo:
A statistical–dynamical regionalization approach is developed to assess possible changes in wind storm impacts. The method is applied to North Rhine-Westphalia (Western Germany) using the FOOT3DK mesoscale model for dynamical downscaling and ECHAM5/OM1 global circulation model climate projections. The method first classifies typical weather developments within the reanalysis period using K-means cluster algorithm. Most historical wind storms are associated with four weather developments (primary storm-clusters). Mesoscale simulations are performed for representative elements for all clusters to derive regional wind climatology. Additionally, 28 historical storms affecting Western Germany are simulated. Empirical functions are estimated to relate wind gust fields and insured losses. Transient ECHAM5/OM1 simulations show an enhanced frequency of primary storm-clusters and storms for 2060–2100 compared to 1960–2000. Accordingly, wind gusts increase over Western Germany, reaching locally +5% for 98th wind gust percentiles (A2-scenario). Consequently, storm losses are expected to increase substantially (+8% for A1B-scenario, +19% for A2-scenario). Regional patterns show larger changes over north-eastern parts of North Rhine-Westphalia than for western parts. For storms with return periods above 20 yr, loss expectations for Germany may increase by a factor of 2. These results document the method's functionality to assess future changes in loss potentials in regional terms.
Resumo:
Boreal winter wind storm situations over Central Europe are investigated by means of an objective cluster analysis. Surface data from the NCEP-Reanalysis and ECHAM4/OPYC3-climate change GHG simulation (IS92a) are considered. To achieve an optimum separation of clusters of extreme storm conditions, 55 clusters of weather patterns are differentiated. To reduce the computational effort, a PCA is initially performed, leading to a data reduction of about 98 %. The clustering itself was computed on 3-day periods constructed with the first six PCs using "k-means" clustering algorithm. The applied method enables an evaluation of the time evolution of the synoptic developments. The climate change signal is constructed by a projection of the GCM simulation on the EOFs attained from the NCEP-Reanalysis. Consequently, the same clusters are obtained and frequency distributions can be compared. For Central Europe, four primary storm clusters are identified. These clusters feature almost 72 % of the historical extreme storms events and add only to 5 % of the total relative frequency. Moreover, they show a statistically significant signature in the associated wind fields over Europe. An increased frequency of Central European storm clusters is detected with enhanced GHG conditions, associated with an enhancement of the pressure gradient over Central Europe. Consequently, more intense wind events over Central Europe are expected. The presented algorithm will be highly valuable for the analysis of huge data amounts as is required for e.g. multi-model ensemble analysis, particularly because of the enormous data reduction.
Resumo:
Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.
Resumo:
Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.
Resumo:
Background: The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. New method: We propose a complete pipeline for the cluster analysis of ERP data. To increase the signalto-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA)to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). Results: After validating the pipeline on simulated data, we tested it on data from two experiments – a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership.
Resumo:
Extratropical transition (ET) has eluded objective identification since the realisation of its existence in the 1970s. Recent advances in numerical, computational models have provided data of higher resolution than previously available. In conjunction with this, an objective characterisation of the structure of a storm has now become widely accepted in the literature. Here we present a method of combining these two advances to provide an objective method for defining ET. The approach involves applying K-means clustering to isolate different life-cycle stages of cyclones and then analysing the progression through these stages. This methodology is then tested by applying it to five recent years from the European Centre of Medium-Range Weather Forecasting operational analyses. It is found that this method is able to determine the general characteristics for ET in the Northern Hemisphere. Between 2008 and 2012, 54% (±7, 32 of 59) of Northern Hemisphere tropical storms are estimated to undergo ET. There is great variability across basins and time of year. To fully capture all the instances of ET is necessary to introduce and characterise multiple pathways through transition. Only one of the three transition types needed has been previously well-studied. A brief description of the alternate types of transitions is given, along with illustrative storms, to assist with further study