872 resultados para heterogeneous data sources


Relevância:

90.00% 90.00%

Publicador:

Resumo:

In recent years many real time applications need to handle data streams. We consider the distributed environments in which remote data sources keep on collecting data from real world or from other data sources, and continuously push the data to a central stream processor. In these kinds of environments, significant communication is induced by the transmitting of rapid, high-volume and time-varying data streams. At the same time, the computing overhead at the central processor is also incurred. In this paper, we develop a novel filter approach, called DTFilter approach, for evaluating the windowed distinct queries in such a distributed system. DTFilter approach is based on the searching algorithm using a data structure of two height-balanced trees, and it avoids transmitting duplicate items in data streams, thus lots of network resources are saved. In addition, theoretical analysis of the time spent in performing the search, and of the amount of memory needed is provided. Extensive experiments also show that DTFilter approach owns high performance.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The use of digital communication systems is increasing very rapidly. This is due to lower system implementation cost compared to analogue transmission and at the same time, the ease with which several types of data sources (data, digitised speech and video, etc.) can be mixed. The emergence of packet broadcast techniques as an efficient type of multiplexing, especially with the use of contention random multiple access protocols, has led to a wide-spread application of these distributed access protocols in local area networks (LANs) and a further extension of them to radio and mobile radio communication applications. In this research, a proposal for a modified version of the distributed access contention protocol which uses the packet broadcast switching technique has been achieved. The carrier sense multiple access with collision avoidance (CSMA/CA) is found to be the most appropriate protocol which has the ability to satisfy equally the operational requirements for local area networks as well as for radio and mobile radio applications. The suggested version of the protocol is designed in a way in which all desirable features of its precedents is maintained. However, all the shortcomings are eliminated and additional features have been added to strengthen its ability to work with radio and mobile radio channels. Operational performance evaluation of the protocol has been carried out for the two types of non-persistent and slotted non-persistent, through mathematical and simulation modelling of the protocol. The results obtained from the two modelling procedures validate the accuracy of both methods, which compares favourably with its precedent protocol CSMA/CD (with collision detection). A further extension of the protocol operation has been suggested to operate with multichannel systems. Two multichannel systems based on the CSMA/CA protocol for medium access are therefore proposed. These are; the dynamic multichannel system, which is based on two types of channel selection, the random choice (RC) and the idle choice (IC), and the sequential multichannel system. The latter has been proposed in order to supress the effect of the hidden terminal, which always represents a major problem with the usage of the contention random multiple access protocols with radio and mobile radio channels. Verification of their operation performance evaluation has been carried out using mathematical modelling for the dynamic system. However, simulation modelling has been chosen for the sequential system. Both systems are found to improve system operation and fault tolerance when compared to single channel operation.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Overlaying maps using a desktop GIS is often the first step of a multivariate spatial analysis. The potential of this operation has increased considerably as data sources an dWeb services to manipulate them are becoming widely available via the Internet. Standards from the OGC enable such geospatial ‘mashups’ to be seamless and user driven, involving discovery of thematic data. The user is naturally inclined to look for spatial clusters and ‘correlation’ of outcomes. Using classical cluster detection scan methods to identify multivariate associations can be problematic in this context, because of a lack of control on or knowledge about background populations. For public health and epidemiological mapping, this limiting factor can be critical but often the focus is on spatial identification of risk factors associated with health or clinical status. In this article we point out that this association itself can ensure some control on underlying populations, and develop an exploratory scan statistic framework for multivariate associations. Inference using statistical map methodologies can be used to test the clustered associations. The approach is illustrated with a hypothetical data example and an epidemiological study on community MRSA. Scenarios of potential use for online mashups are introduced but full implementation is left for further research.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The number of remote sensing platforms and sensors rises almost every year, yet much work on the interpretation of land cover is still carried out using either single images or images from the same source taken at different dates. Two questions could be asked of this proliferation of images: can the information contained in different scenes be used to improve the classification accuracy and, what is the best way to combine the different imagery? Two of these multiple image sources are MODIS on the Terra platform and ETM+ on board Landsat7, which are suitably complementary. Daily MODIS images with 36 spectral bands in 250-1000 m spatial resolution and seven spectral bands of ETM+ with 30m and 16 days spatial and temporal resolution respectively are available. In the UK, cloud cover may mean that only a few ETM+ scenes may be available for any particular year and these may not be at the time of year of most interest. The MODIS data may provide information on land cover over the growing season, such as harvest dates, that is not present in the ETM+ data. Therefore, the primary objective of this work is to develop a methodology for the integration of medium spatial resolution Landsat ETM+ image, with multi-temporal, multi-spectral, low-resolution MODIS \Terra images, with the aim of improving the classification of agricultural land. Additionally other data may also be incorporated such as field boundaries from existing maps. When classifying agricultural land cover of the type seen in the UK, where crops are largely sown in homogenous fields with clear and often mapped boundaries, the classification is greatly improved using the mapped polygons and utilising the classification of the polygon as a whole as an apriori probability in classifying each individual pixel using a Bayesian approach. When dealing with multiple images from different platforms and dates it is highly unlikely that the pixels will be exactly co-registered and these pixels will contain a mixture of different real world land covers. Similarly the different atmospheric conditions prevailing during the different days will mean that the same emission from the ground will give rise to different sensor reception. Therefore, a method is presented with a model of the instantaneous field of view and atmospheric effects to enable different remote sensed data sources to be integrated.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Satellite information, in combination with conventional point source measurements, can be a valuable source of information. This thesis is devoted to the spatial estimation of areal rainfall over a region using both the measurements from a dense and sparse network of rain-gauges and images from the meteorological satellites. A primary concern is to study the effects of such satellite assisted rainfall estimates on the performance of rainfall-runoff models. Low-cost image processing systems and peripherals are used to process and manipulate the data. Both secondary as well as primary satellite images were used for analysis. The secondary data was obtained from the in-house satellite receiver and the primary data was obtained from an outside source. Ground truth data was obtained from the local Water Authority. A number of algorithms are presented that combine the satellite and conventional data sources to produce areal rainfall estimates and the results are compared with some of the more traditional methodologies. The results indicate that the satellite cloud information is valuable in the assessment of the spatial distribution of areal rainfall, for both half-hourly as well as daily estimates of rainfall. It is also demonstrated how the performance of the simple multiple regression rainfall-runoff model is improved when satellite cloud information is used as a separate input in addition to rainfall estimates from conventional means. The use of low-cost equipment, from image processing systems to satellite imagery, makes it possible for developing countries to introduce such systems in areas where the benefits are greatest.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objectives: Are behavioural interventions effective in reducing the rate of sexually transmitted infections (STIs) among genitourinary medicine (GUM) clinic patients? Design: Systematic review and meta-analysis of published articles. Data sources: Medline, CINAHL, Embase, PsychINFO, Applied Social Sciences Index and Abstracts, Cochrane Library Controlled Clinical Trials Register, National Research Register (1966 to January 2004). Review methods: Randomised controlled trials of behavioural interventions in sexual health clinic patients were included if they reported change to STI rates or self reported sexual behaviour. Trial quality was assessed using the Jadad score and results pooled using random effects meta-analyses where outcomes were consistent across studies. Results: 14 trials were included; 12 based in the United States. Experimental interventions were heterogeneous and most control interventions were more structured than typical UK care. Eight trials reported data on laboratory confirmed infections, of which four observed a greater reduction in their intervention groups (in two cases this result was statistically significant, p<0.05). Seven trials reported consistent condom use, of which six observed a greater increase among their intervention subjects. Results for other measures of sexual behaviour were inconsistent. Success in reducing STIs was related to trial quality, use of social cognition models, and formative research in the target population. However, effectiveness was not related to intervention format or length. Conclusions: While results were heterogeneous, several trials observed reductions in STI rates. The most effective interventions were developed through extensive formative research. These findings should encourage further research in the United Kingdom where new approaches to preventing STIs are urgently required.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Linked Data semantic sources, in particular DBpedia, can be used to answer many user queries. PowerAqua is an open multi-ontology Question Answering (QA) system for the Semantic Web (SW). However, the emergence of Linked Data, characterized by its openness, heterogeneity and scale, introduces a new dimension to the Semantic Web scenario, in which exploiting the relevant information to extract answers for Natural Language (NL) user queries is a major challenge. In this paper we discuss the issues and lessons learned from our experience of integrating PowerAqua as a front-end for DBpedia and a subset of Linked Data sources. As such, we go one step beyond the state of the art on end-users interfaces for Linked Data by introducing mapping and fusion techniques needed to translate a user query by means of multiple sources. Our first informal experiments probe whether, in fact, it is feasible to obtain answers to user queries by composing information across semantic sources and Linked Data, even in its current form, where the strength of Linked Data is more a by-product of its size than its quality. We believe our experiences can be extrapolated to a variety of end-user applications that wish to scale, open up, exploit and re-use what possibly is the greatest wealth of data about everything in the history of Artificial Intelligence. © 2010 Springer-Verlag.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

questions of forming of learning sets for artificial neural networks in problems of lossless data compression are considered. Methods of construction and use of learning sets are studied. The way of forming of learning set during training an artificial neural network on the data stream is offered.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper, we first overview the French project on heritage called PATRIMA, launched in 2011 as one of the Projets d'investissement pour l'avenir, a French funding program meant to last for the next ten years. The overall purpose of the PATRIMA project is to promote and fund research on various aspects of heritage presentation and preservation. Such research being interdisciplinary, research groups in history, physics, chemistry, biology and computer science are involved in this project. The PATRIMA consortium involves research groups from universities and from the main museums or cultural heritage institutions in Paris and surroundings. More specifically, the main members of the consortium are the two universities of Cergy-Pontoise and Versailles Saint-Quentin and the following famous museums or cultural institutions: Musée du Louvre, Château de Versailles, Bibliothèque nationale de France, Musée du Quai Branly, Musée Rodin. In the second part of the paper, we focus on two projects funded by PATRIMA named EDOP and Parcours and dealing with data integration. The goal of the EDOP project is to provide users with a data space for the integration of heterogeneous information about heritage; Linked Open Data are considered for an effective access to the corresponding data sources. On the other hand, the Parcours project aims at building an ontology on the terminology about the techniques dealing with restoration and/or conservation. Such an ontology is meant to provide a common terminology to researchers using different databases and different vocabularies.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

2010 Mathematics Subject Classification: 94A17.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Recent advances in airborne Light Detection and Ranging (LIDAR) technology allow rapid and inexpensive measurements of topography over large areas. Airborne LIDAR systems usually return a 3-dimensional cloud of point measurements from reflective objects scanned by the laser beneath the flight path. This technology is becoming a primary method for extracting information of different kinds of geometrical objects, such as high-resolution digital terrain models (DTMs), buildings and trees, etc. In the past decade, LIDAR gets more and more interest from researchers in the field of remote sensing and GIS. Compared to the traditional data sources, such as aerial photography and satellite images, LIDAR measurements are not influenced by sun shadow and relief displacement. However, voluminous data pose a new challenge for automated extraction the geometrical information from LIDAR measurements because many raster image processing techniques cannot be directly applied to irregularly spaced LIDAR points. ^ In this dissertation, a framework is proposed to filter out information about different kinds of geometrical objects, such as terrain and buildings from LIDAR automatically. They are essential to numerous applications such as flood modeling, landslide prediction and hurricane animation. The framework consists of several intuitive algorithms. Firstly, a progressive morphological filter was developed to detect non-ground LIDAR measurements. By gradually increasing the window size and elevation difference threshold of the filter, the measurements of vehicles, vegetation, and buildings are removed, while ground data are preserved. Then, building measurements are identified from no-ground measurements using a region growing algorithm based on the plane-fitting technique. Raw footprints for segmented building measurements are derived by connecting boundary points and are further simplified and adjusted by several proposed operations to remove noise, which is caused by irregularly spaced LIDAR measurements. To reconstruct 3D building models, the raw 2D topology of each building is first extracted and then further adjusted. Since the adjusting operations for simple building models do not work well on 2D topology, 2D snake algorithm is proposed to adjust 2D topology. The 2D snake algorithm consists of newly defined energy functions for topology adjusting and a linear algorithm to find the minimal energy value of 2D snake problems. Data sets from urbanized areas including large institutional, commercial, and small residential buildings were employed to test the proposed framework. The results demonstrated that the proposed framework achieves a very good performance. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With the advent of peer to peer networks, and more importantly sensor networks, the desire to extract useful information from continuous and unbounded streams of data has become more prominent. For example, in tele-health applications, sensor based data streaming systems are used to continuously and accurately monitor Alzheimer's patients and their surrounding environment. Typically, the requirements of such applications necessitate the cleaning and filtering of continuous, corrupted and incomplete data streams gathered wirelessly in dynamically varying conditions. Yet, existing data stream cleaning and filtering schemes are incapable of capturing the dynamics of the environment while simultaneously suppressing the losses and corruption introduced by uncertain environmental, hardware, and network conditions. Consequently, existing data cleaning and filtering paradigms are being challenged. This dissertation develops novel schemes for cleaning data streams received from a wireless sensor network operating under non-linear and dynamically varying conditions. The study establishes a paradigm for validating spatio-temporal associations among data sources to enhance data cleaning. To simplify the complexity of the validation process, the developed solution maps the requirements of the application on a geometrical space and identifies the potential sensor nodes of interest. Additionally, this dissertation models a wireless sensor network data reduction system by ascertaining that segregating data adaptation and prediction processes will augment the data reduction rates. The schemes presented in this study are evaluated using simulation and information theory concepts. The results demonstrate that dynamic conditions of the environment are better managed when validation is used for data cleaning. They also show that when a fast convergent adaptation process is deployed, data reduction rates are significantly improved. Targeted applications of the developed methodology include machine health monitoring, tele-health, environment and habitat monitoring, intermodal transportation and homeland security.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The purpose of this project was to evaluate the use of remote sensing 1) to detect and map Everglades wetland plant communities at different scales; and 2) to compare map products delineated and resampled at various scales with the intent to quantify and describe the quantitative and qualitative differences between such products. We evaluated data provided by Digital Globe’s WorldView 2 (WV2) sensor with a spatial resolution of 2m and data from Landsat’s Thematic and Enhanced Thematic Mapper (TM and ETM+) sensors with a spatial resolution of 30m. We were also interested in the comparability and scalability of products derived from these data sources. The adequacy of each data set to map wetland plant communities was evaluated utilizing two metrics: 1) model-based accuracy estimates of the classification procedures; and 2) design-based post-classification accuracy estimates of derived maps.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With the advent of peer to peer networks, and more importantly sensor networks, the desire to extract useful information from continuous and unbounded streams of data has become more prominent. For example, in tele-health applications, sensor based data streaming systems are used to continuously and accurately monitor Alzheimer's patients and their surrounding environment. Typically, the requirements of such applications necessitate the cleaning and filtering of continuous, corrupted and incomplete data streams gathered wirelessly in dynamically varying conditions. Yet, existing data stream cleaning and filtering schemes are incapable of capturing the dynamics of the environment while simultaneously suppressing the losses and corruption introduced by uncertain environmental, hardware, and network conditions. Consequently, existing data cleaning and filtering paradigms are being challenged. This dissertation develops novel schemes for cleaning data streams received from a wireless sensor network operating under non-linear and dynamically varying conditions. The study establishes a paradigm for validating spatio-temporal associations among data sources to enhance data cleaning. To simplify the complexity of the validation process, the developed solution maps the requirements of the application on a geometrical space and identifies the potential sensor nodes of interest. Additionally, this dissertation models a wireless sensor network data reduction system by ascertaining that segregating data adaptation and prediction processes will augment the data reduction rates. The schemes presented in this study are evaluated using simulation and information theory concepts. The results demonstrate that dynamic conditions of the environment are better managed when validation is used for data cleaning. They also show that when a fast convergent adaptation process is deployed, data reduction rates are significantly improved. Targeted applications of the developed methodology include machine health monitoring, tele-health, environment and habitat monitoring, intermodal transportation and homeland security.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.

Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.