863 resultados para Data sources detection
Resumo:
The last few years have seen the advent of high-throughput technologies to analyze various properties of the transcriptome and proteome of several organisms. The congruency of these different data sources, or lack thereof, can shed light on the mechanisms that govern cellular function. A central challenge for bioinformatics research is to develop a unified framework for combining the multiple sources of functional genomics information and testing associations between them, thus obtaining a robust and integrated view of the underlying biology. We present a graph theoretic approach to test the significance of the association between multiple disparate sources of functional genomics data by proposing two statistical tests, namely edge permutation and node label permutation tests. We demonstrate the use of the proposed tests by finding significant association between a Gene Ontology-derived "predictome" and data obtained from mRNA expression and phenotypic experiments for Saccharomyces cerevisiae. Moreover, we employ the graph theoretic framework to recast a surprising discrepancy presented in Giaever et al. (2002) between gene expression and knockout phenotype, using expression data from a different set of experiments.
Resumo:
The near-real time retrieval of low stratiform cloud (LSC) coverage is of vital interest for such disciplines as meteorology, transport safety, economy and air quality. Within this scope, a novel methodology is proposed which provides the LSC occurrence probability estimates for a satellite scene. The algorithm is suited for the 1 × 1 km Advanced Very High Resolution Radiometer (AVHRR) data and was trained and validated against collocated SYNOP observations. Utilisation of these two combined data sources requires a formulation of constraints in order to discriminate cases where the LSC is overlaid by higher clouds. The LSC classification process is based on six features which are first converted to the integer form by step functions and combined by means of bitwise operations. Consequently, a set of values reflecting a unique combination of those features is derived which is further employed to extract the LSC occurrence probability estimates from the precomputed look-up vectors (LUV). Although the validation analyses confirmed good performance of the algorithm, some inevitable misclassification with other optically thick clouds were reported. Moreover, the comparison against Polar Platform System (PPS) cloud-type product revealed superior classification accuracy. From the temporal perspective, the acquired results reported a presence of diurnal and annual LSC probability cycles over Europe.
Resumo:
Documenting changes in distribution is necessary for understanding species' response to environmental changes, but data on species distributions are heterogeneous in accuracy and resolution. Combining different data sources and methodological approaches can fill gaps in knowledge about the dynamic processes driving changes in species-rich, but data-poor regions. We combined recent bird survey data from the Neotropical Biodiversity Mapping Initiative (NeoMaps) with historical distribution records to estimate potential changes in the distribution of eight species of Amazon parrots in Venezuela. Using environmental covariates and presence-only data from museum collections and the literature, we first used maximum likelihood to fit a species distribution model (SDM) estimating a historical maximum probability of occurrence for each species. We then used recent, NeoMaps survey data to build single-season occupancy models (OM) with the same environmental covariates, as well as with time- and effort-dependent detectability, resulting in estimates of the current probability of occurrence. We finally calculated the disagreement between predictions as a matrix of probability of change in the state of occurrence. Our results suggested negative changes for the only restricted, threatened species, Amazona barbadensis, which has been independently confirmed with field studies. Two of the three remaining widespread species that were detected, Amazona amazonica, Amazona ochrocephala, also had a high probability of negative changes in northern Venezuela, but results were not conclusive for Amazona farinosa. The four remaining species were undetected in recent field surveys; three of these were most probably absent from the survey locations (Amazona autumnalis, Amazona mercenaria and Amazona festiva), while a fourth (Amazona dufresniana) requires more intensive targeted sampling to estimate its current status. Our approach is unique in taking full advantage of available, but limited data, and in detecting a high probability of change even for rare and patchily-distributed species. However, it is presently limited to species meeting the strong assumptions required for maximum-likelihood estimation with presence-only data, including very high detectability and representative sampling of its historical distribution.
Resumo:
Landcover is subject to continuous changes on a wide variety of temporal and spatial scales. Those changes produce significant effects in human and natural activities. Maintaining an updated spatial database with the occurred changes allows a better monitoring of the Earth?s resources and management of the environment. Change detection (CD) techniques using images from different sensors, such as satellite imagery, aerial photographs, etc., have proven to be suitable and secure data sources from which updated information can be extracted efficiently, so that changes can also be inventoried and monitored. In this paper, a multisource CD methodology for multiresolution datasets is applied. First, different change indices are processed, then different thresholding algorithms for change/no_change are applied to these indices in order to better estimate the statistical parameters of these categories, finally the indices are integrated into a change detection multisource fusion process, which allows generating a single CD result from several combination of indices. This methodology has been applied to datasets with different spectral and spatial resolution properties. Then, the obtained results are evaluated by means of a quality control analysis, as well as with complementary graphical representations. The suggested methodology has also been proved efficiently for identifying the change detection index with the higher contribution.
Nesting In The Clouds: Evaluating And Predicting Sea Turtle Nesting Beach Parameters From Lidar Data
Resumo:
Humans' desire for knowledge regarding animal species and their interactions with the natural world have spurred centuries of studies. The relatively new development of remote sensing systems using satellite or aircraft-borne sensors has opened up a wide field of research, which unfortunately largely remains dependent on coarse-scale image spatial resolution, particularly for habitat modeling. For habitat-specialized species, such data may not be sufficient to successfully capture the nuances of their preferred areas. Of particular concern are those species for which topographic feature attributes are a main limiting factor for habitat use. Coarse spatial resolution data can smooth over details that may be essential for habitat characterization. Three studies focusing on sea turtle nesting beaches were completed to serve as an example of how topography can be a main deciding factor for certain species. Light Detection and Ranging (LiDAR) data were used to illustrate that fine spatial scale data can provide information not readily captured by either field work or coarser spatial scale sources. The variables extracted from the LiDAR data could successfully model nesting density for loggerhead (Caretta caretta), green (Chelonia mydas), and leatherback (Dermochelys coriacea) sea turtle species using morphological beach characteristics, highlight beach changes over time and their correlations with nesting success, and provide comparisons for nesting density models across large geographic areas. Comparisons between the LiDAR dataset and other digital elevation models (DEMs) confirmed that fine spatial scale data sources provide more similar habitat information than those with coarser spatial scales. Although these studies focused solely on sea turtles, the underlying principles are applicable for many other wildlife species whose range and behavior may be influenced by topographic features.
Resumo:
An approach and strategy for automatic detection of buildings from aerial images using combined image analysis and interpretation techniques is described in this paper. It is undertaken in several steps. A dense DSM is obtained by stereo image matching and then the results of multi-band classification, the DSM, and Normalized Difference Vegetation Index (NDVI) are used to reveal preliminary building interest areas. From these areas, a shape modeling algorithm has been used to precisely delineate their boundaries. The Dempster-Shafer data fusion technique is then applied to detect buildings from the combination of three data sources by a statistically-based classification. A number of test areas, which include buildings of different sizes, shape, and roof color have been investigated. The tests are encouraging and demonstrate that all processes in this system are important for effective building detection.
Resumo:
The use of digital communication systems is increasing very rapidly. This is due to lower system implementation cost compared to analogue transmission and at the same time, the ease with which several types of data sources (data, digitised speech and video, etc.) can be mixed. The emergence of packet broadcast techniques as an efficient type of multiplexing, especially with the use of contention random multiple access protocols, has led to a wide-spread application of these distributed access protocols in local area networks (LANs) and a further extension of them to radio and mobile radio communication applications. In this research, a proposal for a modified version of the distributed access contention protocol which uses the packet broadcast switching technique has been achieved. The carrier sense multiple access with collision avoidance (CSMA/CA) is found to be the most appropriate protocol which has the ability to satisfy equally the operational requirements for local area networks as well as for radio and mobile radio applications. The suggested version of the protocol is designed in a way in which all desirable features of its precedents is maintained. However, all the shortcomings are eliminated and additional features have been added to strengthen its ability to work with radio and mobile radio channels. Operational performance evaluation of the protocol has been carried out for the two types of non-persistent and slotted non-persistent, through mathematical and simulation modelling of the protocol. The results obtained from the two modelling procedures validate the accuracy of both methods, which compares favourably with its precedent protocol CSMA/CD (with collision detection). A further extension of the protocol operation has been suggested to operate with multichannel systems. Two multichannel systems based on the CSMA/CA protocol for medium access are therefore proposed. These are; the dynamic multichannel system, which is based on two types of channel selection, the random choice (RC) and the idle choice (IC), and the sequential multichannel system. The latter has been proposed in order to supress the effect of the hidden terminal, which always represents a major problem with the usage of the contention random multiple access protocols with radio and mobile radio channels. Verification of their operation performance evaluation has been carried out using mathematical modelling for the dynamic system. However, simulation modelling has been chosen for the sequential system. Both systems are found to improve system operation and fault tolerance when compared to single channel operation.
Resumo:
Overlaying maps using a desktop GIS is often the first step of a multivariate spatial analysis. The potential of this operation has increased considerably as data sources an dWeb services to manipulate them are becoming widely available via the Internet. Standards from the OGC enable such geospatial ‘mashups’ to be seamless and user driven, involving discovery of thematic data. The user is naturally inclined to look for spatial clusters and ‘correlation’ of outcomes. Using classical cluster detection scan methods to identify multivariate associations can be problematic in this context, because of a lack of control on or knowledge about background populations. For public health and epidemiological mapping, this limiting factor can be critical but often the focus is on spatial identification of risk factors associated with health or clinical status. In this article we point out that this association itself can ensure some control on underlying populations, and develop an exploratory scan statistic framework for multivariate associations. Inference using statistical map methodologies can be used to test the clustered associations. The approach is illustrated with a hypothetical data example and an epidemiological study on community MRSA. Scenarios of potential use for online mashups are introduced but full implementation is left for further research.
Resumo:
The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^
Resumo:
The generation of heterogeneous big data sources with ever increasing volumes, velocities and veracities over the he last few years has inspired the data science and research community to address the challenge of extracting knowledge form big data. Such a wealth of generated data across the board can be intelligently exploited to advance our knowledge about our environment, public health, critical infrastructure and security. In recent years we have developed generic approaches to process such big data at multiple levels for advancing decision-support. It specifically concerns data processing with semantic harmonisation, low level fusion, analytics, knowledge modelling with high level fusion and reasoning. Such approaches will be introduced and presented in context of the TRIDEC project results on critical oil and gas industry drilling operations and also the ongoing large eVacuate project on critical crowd behaviour detection in confined spaces.
Resumo:
In this paper, we implement an anomaly detection system using the Dempster-Shafer method. Using two standard benchmark problems we show that by combining multiple signals it is possible to achieve better results than by using a single signal. We further show that by applying this approach to a real-world email dataset the algorithm works for email worm detection. Dempster-Shafer can be a promising method for anomaly detection problems with multiple features (data sources), and two or more classes.
Resumo:
The increasing number of extreme rainfall events, combined with the high population density and the imperviousness of the land surface, makes urban areas particularly vulnerable to pluvial flooding. In order to design and manage cities to be able to deal with this issue, the reconstruction of weather phenomena is essential. Among the most interesting data sources which show great potential are the observational networks of private sensors managed by citizens (crowdsourcing). The number of these personal weather stations is consistently increasing, and the spatial distribution roughly follows population density. Precisely for this reason, they perfectly suit this detailed study on the modelling of pluvial flood in urban environments. The uncertainty associated with these measurements of precipitation is still a matter of research. In order to characterise the accuracy and precision of the crowdsourced data, we carried out exploratory data analyses. A comparison between Netatmo hourly precipitation amounts and observations of the same quantity from weather stations managed by national weather services is presented. The crowdsourced stations have very good skills in rain detection but tend to underestimate the reference value. In detail, the accuracy and precision of crowd- sourced data change as precipitation increases, improving the spread going to the extreme values. Then, the ability of this kind of observation to improve the prediction of pluvial flooding is tested. To this aim, the simplified raster-based inundation model incorporated in the Saferplaces web platform is used for simulating pluvial flooding. Different precipitation fields have been produced and tested as input in the model. Two different case studies are analysed over the most densely populated Norwegian city: Oslo. The crowdsourced weather station observations, bias-corrected (i.e. increased by 25%), showed very good skills in detecting flooded areas.
Resumo:
A sensitive near-resonant four-wave mixing technique based on two-photon parametric four-wave mixing has been developed. Seeded parametric four-wave mixing requires only a single laser as an additional phase matched seeder field is generated via parametric four-wave mixing of the pump beam in a high gain cell. The seeder field travels collinearly with the pump beam providing efficient nondegenerate four-wave mixing in a second medium. This simple arrangement facilitates the detection of complex molecular spectra by simply scanning the pump laser. Seeded parametric four-wave mixing is demonstrated in both a low pressure cell and an air/acetylene flame with detection of the two-photon C (2) Pi(upsilon'=0)<--X (2) Pi(upsilon =0) spectrum of nitric oxide. From the cell data a detection limit of 10(12) molecules/cm(3) is established. A theoretical model of seeded parametric four-wave mixing is developed from existing parametric four-wave mixing theory. The addition of the seeder field significantly modifies the parametric four-wave mixing behaviour such that in the small signal regime, the signal intensity can readily be made to scale as the cube of the laser pump power while the density dependence follows a more familiar square law dependence, In general, we find excellent agreement between theory and experiment. Limitations to the process result from an ac Stark shift of the two-photon resonance in the high pressure seeder cell caused by the generation of a strong seeder field, as well as a reduction in phase matching efficiency due to the presence of certain buffer species. Various optimizations are suggested which should overcome these limitations, providing even greater detection sensitivity. (C) 1998 American Institute of Physics, [S0021-9606(98)01014-9].
Resumo:
Objective: To illustrate methodological issues involved in estimating dietary trends in populations using data obtained from various sources in Australia in the 1980s and 1990s. Methods: Estimates of absolute and relative change in consumption of selected food items were calculated using national data published annually on the national food supply for 1982-83 to 1992-93 and responses to food frequency questions in two population based risk factor surveys in 1983 and 1994 in the Hunter Region of New South Wales, Australia. The validity of estimated food quantities obtained from these inexpensive sources at the beginning of the period was assessed by comparison with data from a national dietary survey conducted in 1983 using 24 h recall. Results: Trend estimates from the food supply data and risk factor survey data were in good agreement for increases in consumption of fresh fruit, vegetables and breakfast food and decreases in butter, margarine, sugar and alcohol. Estimates for trends in milk, eggs and bread consumption, however, were inconsistent. Conclusions: Both data sources can be used for monitoring progress towards national nutrition goals based on selected food items provided that some limitations are recognized. While data collection methods should be consistent over time they also need to allow for changes in the food supply (for example the introduction of new varieties such as low-fat dairy products). From time to time the trends derived from these inexpensive data sources should be compared with data derived from more detailed and quantitative estimates of dietary intake.
Resumo:
There are two main types of data sources of income distributions in China: household survey data and grouped data. Household survey data are typically available for isolated years and individual provinces. In comparison, aggregate or grouped data are typically available more frequently and usually have national coverage. In principle, grouped data allow investigation of the change of inequality over longer, continuous periods of time, and the identification of patterns of inequality across broader regions. Nevertheless, a major limitation of grouped data is that only mean (average) income and income shares of quintile or decile groups of the population are reported. Directly using grouped data reported in this format is equivalent to assuming that all individuals in a quintile or decile group have the same income. This potentially distorts the estimate of inequality within each region. The aim of this paper is to apply an improved econometric method designed to use grouped data to study income inequality in China. A generalized beta distribution is employed to model income inequality in China at various levels and periods of time. The generalized beta distribution is more general and flexible than the lognormal distribution that has been used in past research, and also relaxes the assumption of a uniform distribution of income within quintile and decile groups of populations. The paper studies the nature and extent of inequality in rural and urban China over the period 1978 to 2002. Income inequality in the whole of China is then modeled using a mixture of province-specific distributions. The estimated results are used to study the trends in national inequality, and to discuss the empirical findings in the light of economic reforms, regional policies, and globalization of the Chinese economy.