1000 resultados para K......
Resumo:
K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determine data partitions and to compute their associated centres of mass, called centroids. The straightforward implementation of the algorithm is often referred to as `brute force' since it computes a proximity measure from each data point to each centroid at every iteration of the K-Means process. Efficient implementations of the K-Means algorithm have been predominantly based on multi-dimensional binary search trees (KD-Trees). A combination of an efficient data structure and geometrical constraints allow to reduce the number of distance computations required at each iteration. In this work we present a general space partitioning approach for improving the efficiency and the scalability of the K-Means algorithm. We propose to adopt approximate hierarchical clustering methods to generate binary space partitioning trees in contrast to KD-Trees. In the experimental analysis, we have tested the performance of the proposed Binary Space Partitioning K-Means (BSP-KM) when a divisive clustering algorithm is used. We have carried out extensive experimental tests to compare the proposed approach to the one based on KD-Trees (KD-KM) in a wide range of the parameters space. BSP-KM is more scalable than KDKM, while keeping the deterministic nature of the `brute force' algorithm. In particular, the proposed space partitioning approach has shown to overcome the well-known limitation of KD-Trees in high-dimensional spaces and can also be adopted to improve the efficiency of other algorithms in which KD-Trees have been used.
Resumo:
Radial basis function networks can be trained quickly using linear optimisation once centres and other associated parameters have been initialised. The authors propose a small adjustment to a well accepted initialisation algorithm which improves the network accuracy over a range of problems. The algorithm is described and results are presented.
Resumo:
Measurement is reported at 4 deg K (and blocked transmission below 10-5) of PbTe/ZnS thin-film filters deposited on Ge substrates. The reduced carrier-absorption which is obtained by cooling these PbTe films is found to accord with simple theory. Advantage for various high-performance multilayers by cooling is significant at the longer wavelengths, and has been verified.
Resumo:
Investment risk models with infinite variance provide a better description of distributions of individual property returns in the IPD database over the period 1981 to 2003 than Normally distributed risk models, which mirrors results in the U.S. and Australia using identical methodology. Real estate investment risk is heteroscedastic, but the Characteristic Exponent of the investment risk function is constant across time yet may vary by property type. Asset diversification is far less effective at reducing the impact of non-systematic investment risk on real estate portfolios than in the case of assets with Normally distributed investment risk. Multi-risk factor portfolio allocation models based on measures of investment codependence from finite-variance statistics are ineffectual in the real estate context.
Resumo:
A four-wavelength MAD experiment on a new brominated octanucleotide is reported here. d[ACGTACG(5-BrU)], C77H81BrN30O32P7, (DNA) = 2235, tetragonal, P43212 (No. 96), a = 43.597, c = 26.268 Å, V = 49927.5 Å3, Z = 8, T = 100 K, R = 10.91% for 4312 reflections between 15.0 and 1.46 Å resolution. The self-complementary brominated octanucleotide d[ACGTACG(5-BrU)]2 has been crystallized and data measured to 1.45 Å at both 293 K and a second crystal flash frozen at 100 K. The latter data collection was carried out to the same resolution at the four wavelengths 0.9344, 0.9216, 0.9208 and 0.9003 Å, around the Br K edge at 0.92 Å and the structure determined from a map derived from a MAD data analysis using pseudo-MIR methodology, as implemented in the program MLPHARE. This is one of the first successful MAD phasing experiments carried out at Sincrotrone Elettra in Trieste, Italy. The structure was refined using the data measured at 0.9003 Å, anisotropic temperature factors and the restrained least-squares refinement implemented in the program SHELX96, and the helical parameters are compared with those previously determined for the isomorphous d(ACGTACGT)2 analogue. The asymmetric unit consists of a single strand of octamer with 96 water molecules. No countercations were located. The A-DNA helix geometry obtained has been analysed using the CURVES program.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.