854 resultados para data gathering algorithm
Resumo:
Mestrado em Radioterapia
Resumo:
Trabalho de Projeto para obtenção do grau de Mestre em Engenharia Informática e de Computadores
Resumo:
Linear unmixing decomposes a hyperspectral image into a collection of reflectance spectra of the materials present in the scene, called endmember signatures, and the corresponding abundance fractions at each pixel in a spatial area of interest. This paper introduces a new unmixing method, called Dependent Component Analysis (DECA), which overcomes the limitations of unmixing methods based on Independent Component Analysis (ICA) and on geometrical properties of hyperspectral data. DECA models the abundance fractions as mixtures of Dirichlet densities, thus enforcing the constraints on abundance fractions imposed by the acquisition process, namely non-negativity and constant sum. The mixing matrix is inferred by a generalized expectation-maximization (GEM) type algorithm. The performance of the method is illustrated using simulated and real data.
Resumo:
Conferência - 16th International Symposium on Wireless Personal Multimedia Communications (WPMC)- Jun 24-27, 2013
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
Cluster analysis for categorical data has been an active area of research. A well-known problem in this area is the determination of the number of clusters, which is unknown and must be inferred from the data. In order to estimate the number of clusters, one often resorts to information criteria, such as BIC (Bayesian information criterion), MML (minimum message length, proposed by Wallace and Boulton, 1968), and ICL (integrated classification likelihood). In this work, we adopt the approach developed by Figueiredo and Jain (2002) for clustering continuous data. They use an MML criterion to select the number of clusters and a variant of the EM algorithm to estimate the model parameters. This EM variant seamlessly integrates model estimation and selection in a single algorithm. For clustering categorical data, we assume a finite mixture of multinomial distributions and implement a new EM algorithm, following a previous version (Silvestre et al., 2008). Results obtained with synthetic datasets are encouraging. The main advantage of the proposed approach, when compared to the above referred criteria, is the speed of execution, which is especially relevant when dealing with large data sets.
Resumo:
Managing the physical and compute infrastructure of a large data center is an embodiment of a Cyber-Physical System (CPS). The physical parameters of the data center (such as power, temperature, pressure, humidity) are tightly coupled with computations, even more so in upcoming data centers, where the location of workloads can vary substantially due, for example, to workloads being moved in a cloud infrastructure hosted in the data center. In this paper, we describe a data collection and distribution architecture that enables gathering physical parameters of a large data center at a very high temporal and spatial resolutionof the sensor measurements. We think this is an important characteristic to enable more accurate heat-flow models of the data center andwith them, _and opportunities to optimize energy consumption. Havinga high resolution picture of the data center conditions, also enables minimizing local hotspots, perform more accurate predictive maintenance (pending failures in cooling and other infrastructure equipment can be more promptly detected) and more accurate billing. We detail this architecture and define the structure of the underlying messaging system that is used to collect and distribute the data. Finally, we show the results of a preliminary study of a typical data center radio environment.
Resumo:
Consider the problem of designing an algorithm for acquiring sensor readings. Consider specifically the problem of obtaining an approximate representation of sensor readings where (i) sensor readings originate from different sensor nodes, (ii) the number of sensor nodes is very large, (iii) all sensor nodes are deployed in a small area (dense network) and (iv) all sensor nodes communicate over a communication medium where at most one node can transmit at a time (a single broadcast domain). We present an efficient algorithm for this problem, and our novel algorithm has two desired properties: (i) it obtains an interpolation based on all sensor readings and (ii) it is scalable, that is, its time-complexity is independent of the number of sensor nodes. Achieving these two properties is possible thanks to the close interlinking of the information processing algorithm, the communication system and a model of the physical world.
Resumo:
Network control systems (NCSs) are spatially distributed systems in which the communication between sensors, actuators and controllers occurs through a shared band-limited digital communication network. However, the use of a shared communication network, in contrast to using several dedicated independent connections, introduces new challenges which are even more acute in large scale and dense networked control systems. In this paper we investigate a recently introduced technique of gathering information from a dense sensor network to be used in networked control applications. Obtaining efficiently an approximate interpolation of the sensed data is exploited as offering a good tradeoff between accuracy in the measurement of the input signals and the delay to the actuation. These are important aspects to take into account for the quality of control. We introduce a variation to the state-of-the-art algorithms which we prove to perform relatively better because it takes into account the changes over time of the input signal within the process of obtaining an approximate interpolation.
Resumo:
Hexagonal wireless sensor network refers to a network topology where a subset of nodes have six peer neighbors. These nodes form a backbone for multi-hop communications. In a previous work, we proposed the use of hexagonal topology in wireless sensor networks and discussed its properties in relation to real-time (bounded latency) multi-hop communications in large-scale deployments. In that work, we did not consider the problem of hexagonal topology formation in practice - which is the subject of this research. In this paper, we present a decentralized algorithm that forms the hexagonal topology backbone in an arbitrary but sufficiently dense network deployment. We implemented a prototype of our algorithm in NesC for TinyOS based platforms. We present data from field tests of our implementation, collected using a deployment of fifty wireless sensor nodes.
Resumo:
In visual sensor networks, local feature descriptors can be computed at the sensing nodes, which work collaboratively on the data obtained to make an efficient visual analysis. In fact, with a minimal amount of computational effort, the detection and extraction of local features, such as binary descriptors, can provide a reliable and compact image representation. In this paper, it is proposed to extract and code binary descriptors to meet the energy and bandwidth constraints at each sensing node. The major contribution is a binary descriptor coding technique that exploits the correlation using two different coding modes: Intra, which exploits the correlation between the elements that compose a descriptor; and Inter, which exploits the correlation between descriptors of the same image. The experimental results show bitrate savings up to 35% without any impact in the performance efficiency of the image retrieval task. © 2014 EURASIP.
Resumo:
Objective: Summarize all relevant findings in published literature regarding the potential dose reduction related to image quality using Sinogram-Affirmed Iterative Reconstruction (SAFIRE) compared to Filtered Back Projection (FBP). Background: Computed Tomography (CT) is one of the most used radiographic modalities in clinical practice providing high spatial and contrast resolution. However it also delivers a relatively high radiation dose to the patient. Reconstructing raw-data using Iterative Reconstruction (IR) algorithms has the potential to iteratively reduce image noise while maintaining or improving image quality of low dose standard FBP reconstructions. Nevertheless, long reconstruction times made IR unpractical for clinical use until recently. Siemens Medical developed a new IR algorithm called SAFIRE, which uses up to 5 different strength levels, and poses an alternative to the conventional IR with a significant reconstruction time reduction. Methods: MEDLINE, ScienceDirect and CINAHL databases were used for gathering literature. Eleven articles were included in this review (from 2012 to July 2014). Discussion: This narrative review summarizes the results of eleven articles (using studies on both patients and phantoms) and describes SAFIRE strengths for noise reduction in low dose acquisitions while providing acceptable image quality. Conclusion: Even though the results differ slightly, the literature gathered for this review suggests that the dose in current CT protocols can be reduced at least 50% while maintaining or improving image quality. There is however a lack of literature concerning paediatric population (with increased radiation sensitivity). Further studies should also assess the impact of SAFIRE on diagnostic accuracy.
Resumo:
Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies