864 resultados para Multiple data
Resumo:
Forensic analysis requires the acquisition and management of many different types of evidence, including individual disk drives, RAID sets, network packets, memory images, and extracted files. Often the same evidence is reviewed by several different tools or examiners in different locations. We propose a backwards-compatible redesign of the Advanced Forensic Formatdan open, extensible file format for storing and sharing of evidence, arbitrary case related information and analysis results among different tools. The new specification, termed AFF4, is designed to be simple to implement, built upon the well supported ZIP file format specification. Furthermore, the AFF4 implementation has downward comparability with existing AFF files.
Resumo:
Employing multiple base stations is an attractive approach to enhance the lifetime of wireless sensor networks. In this paper, we address the fundamental question concerning the limits on the network lifetime in sensor networks when multiple base stations are deployed as data sinks. Specifically, we derive upper bounds on the network lifetime when multiple base stations are employed, and obtain optimum locations of the base stations (BSs) that maximize these lifetime bounds. For the case of two BSs, we jointly optimize the BS locations by maximizing the lifetime bound using a genetic algorithm based optimization. Joint optimization for more number of BSs is complex. Hence, for the case of three BSs, we optimize the third BS location using the previously obtained optimum locations of the first two BSs. We also provide simulation results that validate the lifetime bounds and the optimum locations of the BSs.
Resumo:
In this paper, we address the fundamental question concerning the limits on the network lifetime in sensor networks when multiple base stations (BSs) are deployed as data sinks. Specifically, we derive upper bounds on the network lifetime when multiple BSs arc employed, and obtain optimum locations of the base stations that maximise these lifetime bounds. For the case of two BSs, we jointly optimise the BS locations by maximising the lifetime bound using genetic algorithm. Joint optimisation for more number of BSs becomes prohibitively complex. Further, we propose a suboptimal approach for higher number of BSs, Individually Optimum method, where we optimise the next BS location using optimum location of previous BSs. Individually Optimum method has advantage of being attractive for solving the problem with more number of BSs at the cost of little compromised accuracy. We show that accuracy degradation is quite small for the case of three BSs.
Resumo:
Variability in the strength of the stratospheric Lagrangian mean meridional or Brewer-Dobson circulation and horizontal mixing into the tropics over the past three decades are examined using observations of stratospheric mean age of air and ozone. We use a simple representation of the stratosphere, the tropical leaky pipe (TLP) model, guided by mean meridional circulation and horizontal mixing changes in several reanalyses data sets and chemistry climate model (CCM) simulations, to help elucidate reasons for the observed changes in stratospheric mean age and ozone. We find that the TLP model is able to accurately simulate multiyear variability in ozone following recent major volcanic eruptions and the early 2000s sea surface temperature changes, as well as the lasting impact on mean age of relatively short-term circulation perturbations. We also find that the best quantitative agreement with the observed mean age and ozone trends over the past three decades is found assuming a small strengthening of the mean circulation in the lower stratosphere, a moderate weakening of the mean circulation in the middle and upper stratosphere, and a moderate increase in the horizontal mixing into the tropics. The mean age trends are strongly sensitive to trends in the horizontal mixing into the tropics, and the uncertainty in the mixing trends causes uncertainty in the mean circulation trends. Comparisons of the mean circulation and mixing changes suggested by the measurements with those from a recent suite of CCM runs reveal significant differences that may have important implications on the accurate simulation of future stratospheric climate.
Resumo:
This paper presents a poverty profile for Brazil, based on three different sources of household data for 1996. We use PPV consumption data to estimate poverty and indigence lines. “Contagem” data is used to allow for an unprecedented refinement of the country’s poverty map. Poverty measures and shares are also presented for a wide range of population subgroups, based on the PNAD 1996, with new adjustments for imputed rents and spatial differences in cost of living. Robustness of the profile is verified with respect to different poverty lines, spatial price deflators, and equivalence scales. Overall poverty incidence ranges from 23% with respect to an indigence line to 45% with respect to a more generous poverty line. More importantly, however, poverty is found to vary significantly across regions and city sizes, with rural areas, small and medium towns and the metropolitan peripheries of the North and Northeast regions being poorest.
Resumo:
The software PanGet is a special tool for the download of multiple data sets from PANGAEA. It uses the PANGAEA data set ID which is unique and part of the DOI. In a first step a list of ID's of those data sets to be downloaded must be created. There are two choices to define this individual collection of sets. Based on the ID list, the tool will download the data sets. Failed downloads are written to the file *_failed.txt. The functionality of PanGet is also part of the program Pan2Applic (choose File > Download PANGAEA datasets...) and PanTool2 (choose Basic tools > Download PANGAEA datasets...).
Resumo:
Acknowledgements. This work was mainly funded by the EU FP7 CARBONES project (contracts FP7-SPACE-2009-1-242316), with also a small contribution from GEOCARBON project (ENV.2011.4.1.1-1-283080). This work used eddy covariance data acquired by the FLUXNET community and in particular by the following networks: AmeriFlux (U.S. Department of Energy, Biological and Environmental Research, Terrestrial Carbon Program; DE-FG02-04ER63917 and DE-FG02-04ER63911), AfriFlux, AsiaFlux, CarboAfrica, CarboEuropeIP, CarboItaly, CarboMont, ChinaFlux, Fluxnet-Canada (supported by CFCAS, NSERC, BIOCAP, Environment Canada, and NRCan), GreenGrass, KoFlux, LBA, NECC, OzFlux, TCOS-Siberia, USCCC. We acknowledge the financial support to the eddy covariance data harmonization provided by CarboEuropeIP, FAO-GTOS-TCO, iLEAPS, Max Planck Institute for Biogeochemistry, National Science Foundation, University of Tuscia, Université Laval and Environment Canada and US Department of Energy and the database development and technical support from Berkeley Water Center, Lawrence Berkeley National Laboratory, Microsoft Research eScience, Oak Ridge National Laboratory, University of California-Berkeley, University of Virginia. Philippe Ciais acknowledges support from the European Research Council through Synergy grant ERC-2013-SyG-610028 “IMBALANCE-P”. The authors wish to thank M. Jung for providing access to the GPP MTE data, which were downloaded from the GEOCARBON data portal (https://www.bgc-jena.mpg.de/geodb/projects/Data.php). The authors are also grateful to computing support and resources provided at LSCE and to the overall ORCHIDEE project that coordinate the development of the code (http://labex.ipsl.fr/orchidee/index.php/about-the-team).
Resumo:
International audience
Resumo:
The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^
Resumo:
This paper demonstrates the affordances of the work diary as a data collection tool for both pilot studies and qualitative research of social interactions. Observation is the cornerstone of many qualitative, ethnographic research projects (Creswell, 2008). However, determining through observation, the activities of busy school teams could be likened to joining dots of a child’s drawing activity to reveal a complex picture of interactions. Teachers, leaders and support personnel are in different locations within a school, performing diverse tasks for a variety of outcomes, which hopefully achieve a common goal. As a researcher, the quest to observe these busy teams and their interactions with each other was daunting and perhaps unrealistic. The decision to use a diary as part of a wider research project was to overcome the physical impossibility of simultaneously observing multiple team members. One reported advantage of the use of the diary in research was its suitability as a substitute for lengthy researcher observation, because multiple data sets could be collected at once (Lewis et al, 2005; Marelli, 2007).
Resumo:
In recent years, rapid advances in information technology have led to various data collection systems which are enriching the sources of empirical data for use in transport systems. Currently, traffic data are collected through various sensors including loop detectors, probe vehicles, cell-phones, Bluetooth, video cameras, remote sensing and public transport smart cards. It has been argued that combining the complementary information from multiple sources will generally result in better accuracy, increased robustness and reduced ambiguity. Despite the fact that there have been substantial advances in data assimilation techniques to reconstruct and predict the traffic state from multiple data sources, such methods are generally data-driven and do not fully utilize the power of traffic models. Furthermore, the existing methods are still limited to freeway networks and are not yet applicable in the urban context due to the enhanced complexity of the flow behavior. The main traffic phenomena on urban links are generally caused by the boundary conditions at intersections, un-signalized or signalized, at which the switching of the traffic lights and the turning maneuvers of the road users lead to shock-wave phenomena that propagate upstream of the intersections. This paper develops a new model-based methodology to build up a real-time traffic prediction model for arterial corridors using data from multiple sources, particularly from loop detectors and partial observations from Bluetooth and GPS devices.
Resumo:
The core aim of machine learning is to make a computer program learn from the experience. Learning from data is usually defined as a task of learning regularities or patterns in data in order to extract useful information, or to learn the underlying concept. An important sub-field of machine learning is called multi-view learning where the task is to learn from multiple data sets or views describing the same underlying concept. A typical example of such scenario would be to study a biological concept using several biological measurements like gene expression, protein expression and metabolic profiles, or to classify web pages based on their content and the contents of their hyperlinks. In this thesis, novel problem formulations and methods for multi-view learning are presented. The contributions include a linear data fusion approach during exploratory data analysis, a new measure to evaluate different kinds of representations for textual data, and an extension of multi-view learning for novel scenarios where the correspondence of samples in the different views or data sets is not known in advance. In order to infer the one-to-one correspondence of samples between two views, a novel concept of multi-view matching is proposed. The matching algorithm is completely data-driven and is demonstrated in several applications such as matching of metabolites between humans and mice, and matching of sentences between documents in two languages.