854 resultados para data gathering algorithm
Resumo:
The aim of this paper is to propose a mathematical model to determine invariant sets, set covering, orbits and, in particular, attractors in the set of tourism variables. Analysis was carried out based on a pre-designed algorithm and applying our interpretation of chaos theory developed in the context of General Systems Theory. This article sets out the causal relationships associated with tourist flows in order to enable the formulation of appropriate strategies. Our results can be applied to numerous cases. For example, in the analysis of tourist flows, these findings can be used to determine whether the behaviour of certain groups affects that of other groups and to analyse tourist behaviour in terms of the most relevant variables. Unlike statistical analyses that merely provide information on current data, our method uses orbit analysis to forecast, if attractors are found, the behaviour of tourist variables in the immediate future.
Resumo:
In this paper, we present a simple algorithm for assessing the validity of the RVoG model for PolInSAR-based inversion techniques. This approach makes use of two important features characterizing a homogeneous random volume over a ground surface, i.e., the independence on polarization states of wave propagation through the volume and the structure of the polarimetric interferometric coherency matrix. These two features have led to two different methods proposed in the literature for retrieving the topographic phase within natural covers, i.e., the well-known line fitting procedure and the observation of the (1, 2) element of the polarimetric interferometric coherency matrix. We show that differences between outputs from both approaches can be interpreted in terms of the PolInSAR modeling based on the Freeman-Durden concept, and this leads to the definition of a RVoG/non-RVoG test. The algorithm is tested with both indoor and airborne data over agricultural and tropical forest areas.
Resumo:
Outliers are objects that show abnormal behavior with respect to their context or that have unexpected values in some of their parameters. In decision-making processes, information quality is of the utmost importance. In specific applications, an outlying data element may represent an important deviation in a production process or a damaged sensor. Therefore, the ability to detect these elements could make the difference between making a correct and an incorrect decision. This task is complicated by the large sizes of typical databases. Due to their importance in search processes in large volumes of data, researchers pay special attention to the development of efficient outlier detection techniques. This article presents a computationally efficient algorithm for the detection of outliers in large volumes of information. This proposal is based on an extension of the mathematical framework upon which the basic theory of detection of outliers, founded on Rough Set Theory, has been constructed. From this starting point, current problems are analyzed; a detection method is proposed, along with a computational algorithm that allows the performance of outlier detection tasks with an almost-linear complexity. To illustrate its viability, the results of the application of the outlier-detection algorithm to the concrete example of a large database are presented.
Resumo:
The Iterative Closest Point algorithm (ICP) is commonly used in engineering applications to solve the rigid registration problem of partially overlapped point sets which are pre-aligned with a coarse estimate of their relative positions. This iterative algorithm is applied in many areas such as the medicine for volumetric reconstruction of tomography data, in robotics to reconstruct surfaces or scenes using range sensor information, in industrial systems for quality control of manufactured objects or even in biology to study the structure and folding of proteins. One of the algorithm’s main problems is its high computational complexity (quadratic in the number of points with the non-optimized original variant) in a context where high density point sets, acquired by high resolution scanners, are processed. Many variants have been proposed in the literature whose goal is the performance improvement either by reducing the number of points or the required iterations or even enhancing the complexity of the most expensive phase: the closest neighbor search. In spite of decreasing its complexity, some of the variants tend to have a negative impact on the final registration precision or the convergence domain thus limiting the possible application scenarios. The goal of this work is the improvement of the algorithm’s computational cost so that a wider range of computationally demanding problems from among the ones described before can be addressed. For that purpose, an experimental and mathematical convergence analysis and validation of point-to-point distance metrics has been performed taking into account those distances with lower computational cost than the Euclidean one, which is used as the de facto standard for the algorithm’s implementations in the literature. In that analysis, the functioning of the algorithm in diverse topological spaces, characterized by different metrics, has been studied to check the convergence, efficacy and cost of the method in order to determine the one which offers the best results. Given that the distance calculation represents a significant part of the whole set of computations performed by the algorithm, it is expected that any reduction of that operation affects significantly and positively the overall performance of the method. As a result, a performance improvement has been achieved by the application of those reduced cost metrics whose quality in terms of convergence and error has been analyzed and validated experimentally as comparable with respect to the Euclidean distance using a heterogeneous set of objects, scenarios and initial situations.
Resumo:
This paper proposes an adaptive algorithm for clustering cumulative probability distribution functions (c.p.d.f.) of a continuous random variable, observed in different populations, into the minimum homogeneous clusters, making no parametric assumptions about the c.p.d.f.’s. The distance function for clustering c.p.d.f.’s that is proposed is based on the Kolmogorov–Smirnov two sample statistic. This test is able to detect differences in position, dispersion or shape of the c.p.d.f.’s. In our context, this statistic allows us to cluster the recorded data with a homogeneity criterion based on the whole distribution of each data set, and to decide whether it is necessary to add more clusters or not. In this sense, the proposed algorithm is adaptive as it automatically increases the number of clusters only as necessary; therefore, there is no need to fix in advance the number of clusters. The output of the algorithm are the common c.p.d.f. of all observed data in the cluster (the centroid) and, for each cluster, the Kolmogorov–Smirnov statistic between the centroid and the most distant c.p.d.f. The proposed algorithm has been used for a large data set of solar global irradiation spectra distributions. The results obtained enable to reduce all the information of more than 270,000 c.p.d.f.’s in only 6 different clusters that correspond to 6 different c.p.d.f.’s.
Resumo:
In this study, a methodology based in a dynamical framework is proposed to incorporate additional sources of information to normalized difference vegetation index (NDVI) time series of agricultural observations for a phenological state estimation application. The proposed implementation is based on the particle filter (PF) scheme that is able to integrate multiple sources of data. Moreover, the dynamics-led design is able to conduct real-time (online) estimations, i.e., without requiring to wait until the end of the campaign. The evaluation of the algorithm is performed by estimating the phenological states over a set of rice fields in Seville (SW, Spain). A Landsat-5/7 NDVI series of images is complemented with two distinct sources of information: SAR images from the TerraSAR-X satellite and air temperature information from a ground-based station. An improvement in the overall estimation accuracy is obtained, especially when the time series of NDVI data is incomplete. Evaluations on the sensitivity to different development intervals and on the mitigation of discontinuities of the time series are also addressed in this work, demonstrating the benefits of this data fusion approach based on the dynamic systems.
Resumo:
In the wake of the disclosures surrounding PRISM and other US surveillance programmes, this paper assesses the large-scale surveillance practices by a selection of EU member states: the UK, Sweden, France, Germany and the Netherlands. Given the large-scale nature of these practices, which represent a reconfiguration of traditional intelligence gathering, the paper contends that an analysis of European surveillance programmes cannot be reduced to a question of the balance between data protection versus national security, but has to be framed in terms of collective freedoms and democracy. It finds that four of the five EU member states selected for in-depth examination are engaging in some form of large-scale interception and surveillance of communication data, and identifies parallels and discrepancies between these programmes and the NSA-run operations. The paper argues that these programmes do not stand outside the realm of EU intervention but can be analysed from an EU law perspective via i) an understanding of national security in a democratic rule of law framework where fundamental human rights and judicial oversight constitute key norms; ii) the risks posed to the internal security of the Union as a whole as well as the privacy of EU citizens as data owners and iii) the potential spillover into the activities and responsibilities of EU agencies. The paper then presents a set of policy recommendations to the European Parliament.
Resumo:
This study examines the challenges posed to European law by third country access to data held by private companies for the purposes of law enforcement. It pays particular attention to the implications for rule of law and fundamental rights of foreign authorities’ direct access to electronic information falling outside pre-established channels of supranational cooperation. A special focus is given to EU-US relations and the practical issues emerging in transatlantic relations covering mutual legal assistance and evidence gathering for law enforcement purposes in criminal proceedings.
Resumo:
Connection establishment is a fundamental function for any connection-oriented network protocol and the efficiency of this function defines the flexibility and responsiveness of the protocol. This process initializes data transmission and performs transmission parameters negotiation, what makes it mandatory process and integral part of entire transmission. Thus, the duration of the connection establishment will affect the transmission process duration. This paper describes an implementation of a handshake algorithm, designed for connection with multiple peers, that is used in Reliable Multi-Destination Transport(RMDT) protocol, its optimization and testing.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Mixture models implemented via the expectation-maximization (EM) algorithm are being increasingly used in a wide range of problems in pattern recognition such as image segmentation. However, the EM algorithm requires considerable computational time in its application to huge data sets such as a three-dimensional magnetic resonance (MR) image of over 10 million voxels. Recently, it was shown that a sparse, incremental version of the EM algorithm could improve its rate of convergence. In this paper, we show how this modified EM algorithm can be speeded up further by adopting a multiresolution kd-tree structure in performing the E-step. The proposed algorithm outperforms some other variants of the EM algorithm for segmenting MR images of the human brain. (C) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Resumo:
The expectation-maximization (EM) algorithm has been of considerable interest in recent years as the basis for various algorithms in application areas of neural networks such as pattern recognition. However, there exists some misconceptions concerning its application to neural networks. In this paper, we clarify these misconceptions and consider how the EM algorithm can be adopted to train multilayer perceptron (MLP) and mixture of experts (ME) networks in applications to multiclass classification. We identify some situations where the application of the EM algorithm to train MLP networks may be of limited value and discuss some ways of handling the difficulties. For ME networks, it is reported in the literature that networks trained by the EM algorithm using iteratively reweighted least squares (IRLS) algorithm in the inner loop of the M-step, often performed poorly in multiclass classification. However, we found that the convergence of the IRLS algorithm is stable and that the log likelihood is monotonic increasing when a learning rate smaller than one is adopted. Also, we propose the use of an expectation-conditional maximization (ECM) algorithm to train ME networks. Its performance is demonstrated to be superior to the IRLS algorithm on some simulated and real data sets.
Resumo:
Finding motifs that can elucidate rules that govern peptide binding to medically important receptors is important for screening targets for drugs and vaccines. This paper focuses on elucidation of peptide binding to I-A(g7) molecule of the non-obese diabetic (NOD) mouse - an animal model for insulin-dependent diabetes mellitus (IDDM). A number of proposed motifs that describe peptide binding to I-A(g7) have been proposed. These motifs results from independent experimental studies carried out on small data sets. Testing with multiple data sets showed that each of the motifs at best describes only a subset of the solution space, and these motifs therefore lack generalization ability. This study focuses on seeking a motif with higher generalization ability so that it can predict binders in all A(g7) data sets with high accuracy. A binding score matrix representing peptide binding motif to A(g7) was derived using genetic algorithm (GA). The evolved score matrix significantly outperformed previously reported
Resumo:
Background: The multitude of motif detection algorithms developed to date have largely focused on the detection of patterns in primary sequence. Since sequence-dependent DNA structure and flexibility may also play a role in protein-DNA interactions, the simultaneous exploration of sequence-and structure-based hypotheses about the composition of binding sites and the ordering of features in a regulatory region should be considered as well. The consideration of structural features requires the development of new detection tools that can deal with data types other than primary sequence. Results: GANN ( available at http://bioinformatics.org.au/gann) is a machine learning tool for the detection of conserved features in DNA. The software suite contains programs to extract different regions of genomic DNA from flat files and convert these sequences to indices that reflect sequence and structural composition or the presence of specific protein binding sites. The machine learning component allows the classification of different types of sequences based on subsamples of these indices, and can identify the best combinations of indices and machine learning architecture for sequence discrimination. Another key feature of GANN is the replicated splitting of data into training and test sets, and the implementation of negative controls. In validation experiments, GANN successfully merged important sequence and structural features to yield good predictive models for synthetic and real regulatory regions. Conclusion: GANN is a flexible tool that can search through large sets of sequence and structural feature combinations to identify those that best characterize a set of sequences.
Resumo:
We present a new algorithm for detecting intercluster galaxy filaments based upon the assumption that the orientations of constituent galaxies along such filaments are non-isotropic. We apply the algorithm to the 2dF Galaxy Redshift Survey catalogue and find that it readily detects many straight filaments between close cluster pairs. At large intercluster separations (> 15 h(-1) Mpc), we find that the detection efficiency falls quickly, as it also does with more complex filament morphologies. We explore the underlying assumptions and suggest that it is only in the case of close cluster pairs that we can expect galaxy orientations to be significantly correlated with filament direction.