3 resultados para algorithm Context

em Universidad de Alicante


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Outliers are objects that show abnormal behavior with respect to their context or that have unexpected values in some of their parameters. In decision-making processes, information quality is of the utmost importance. In specific applications, an outlying data element may represent an important deviation in a production process or a damaged sensor. Therefore, the ability to detect these elements could make the difference between making a correct and an incorrect decision. This task is complicated by the large sizes of typical databases. Due to their importance in search processes in large volumes of data, researchers pay special attention to the development of efficient outlier detection techniques. This article presents a computationally efficient algorithm for the detection of outliers in large volumes of information. This proposal is based on an extension of the mathematical framework upon which the basic theory of detection of outliers, founded on Rough Set Theory, has been constructed. From this starting point, current problems are analyzed; a detection method is proposed, along with a computational algorithm that allows the performance of outlier detection tasks with an almost-linear complexity. To illustrate its viability, the results of the application of the outlier-detection algorithm to the concrete example of a large database are presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Iterative Closest Point algorithm (ICP) is commonly used in engineering applications to solve the rigid registration problem of partially overlapped point sets which are pre-aligned with a coarse estimate of their relative positions. This iterative algorithm is applied in many areas such as the medicine for volumetric reconstruction of tomography data, in robotics to reconstruct surfaces or scenes using range sensor information, in industrial systems for quality control of manufactured objects or even in biology to study the structure and folding of proteins. One of the algorithm’s main problems is its high computational complexity (quadratic in the number of points with the non-optimized original variant) in a context where high density point sets, acquired by high resolution scanners, are processed. Many variants have been proposed in the literature whose goal is the performance improvement either by reducing the number of points or the required iterations or even enhancing the complexity of the most expensive phase: the closest neighbor search. In spite of decreasing its complexity, some of the variants tend to have a negative impact on the final registration precision or the convergence domain thus limiting the possible application scenarios. The goal of this work is the improvement of the algorithm’s computational cost so that a wider range of computationally demanding problems from among the ones described before can be addressed. For that purpose, an experimental and mathematical convergence analysis and validation of point-to-point distance metrics has been performed taking into account those distances with lower computational cost than the Euclidean one, which is used as the de facto standard for the algorithm’s implementations in the literature. In that analysis, the functioning of the algorithm in diverse topological spaces, characterized by different metrics, has been studied to check the convergence, efficacy and cost of the method in order to determine the one which offers the best results. Given that the distance calculation represents a significant part of the whole set of computations performed by the algorithm, it is expected that any reduction of that operation affects significantly and positively the overall performance of the method. As a result, a performance improvement has been achieved by the application of those reduced cost metrics whose quality in terms of convergence and error has been analyzed and validated experimentally as comparable with respect to the Euclidean distance using a heterogeneous set of objects, scenarios and initial situations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes an adaptive algorithm for clustering cumulative probability distribution functions (c.p.d.f.) of a continuous random variable, observed in different populations, into the minimum homogeneous clusters, making no parametric assumptions about the c.p.d.f.’s. The distance function for clustering c.p.d.f.’s that is proposed is based on the Kolmogorov–Smirnov two sample statistic. This test is able to detect differences in position, dispersion or shape of the c.p.d.f.’s. In our context, this statistic allows us to cluster the recorded data with a homogeneity criterion based on the whole distribution of each data set, and to decide whether it is necessary to add more clusters or not. In this sense, the proposed algorithm is adaptive as it automatically increases the number of clusters only as necessary; therefore, there is no need to fix in advance the number of clusters. The output of the algorithm are the common c.p.d.f. of all observed data in the cluster (the centroid) and, for each cluster, the Kolmogorov–Smirnov statistic between the centroid and the most distant c.p.d.f. The proposed algorithm has been used for a large data set of solar global irradiation spectra distributions. The results obtained enable to reduce all the information of more than 270,000 c.p.d.f.’s in only 6 different clusters that correspond to 6 different c.p.d.f.’s.