861 resultados para multiple data sources
Nesting In The Clouds: Evaluating And Predicting Sea Turtle Nesting Beach Parameters From Lidar Data
Resumo:
Humans' desire for knowledge regarding animal species and their interactions with the natural world have spurred centuries of studies. The relatively new development of remote sensing systems using satellite or aircraft-borne sensors has opened up a wide field of research, which unfortunately largely remains dependent on coarse-scale image spatial resolution, particularly for habitat modeling. For habitat-specialized species, such data may not be sufficient to successfully capture the nuances of their preferred areas. Of particular concern are those species for which topographic feature attributes are a main limiting factor for habitat use. Coarse spatial resolution data can smooth over details that may be essential for habitat characterization. Three studies focusing on sea turtle nesting beaches were completed to serve as an example of how topography can be a main deciding factor for certain species. Light Detection and Ranging (LiDAR) data were used to illustrate that fine spatial scale data can provide information not readily captured by either field work or coarser spatial scale sources. The variables extracted from the LiDAR data could successfully model nesting density for loggerhead (Caretta caretta), green (Chelonia mydas), and leatherback (Dermochelys coriacea) sea turtle species using morphological beach characteristics, highlight beach changes over time and their correlations with nesting success, and provide comparisons for nesting density models across large geographic areas. Comparisons between the LiDAR dataset and other digital elevation models (DEMs) confirmed that fine spatial scale data sources provide more similar habitat information than those with coarser spatial scales. Although these studies focused solely on sea turtles, the underlying principles are applicable for many other wildlife species whose range and behavior may be influenced by topographic features.
Resumo:
The EPA promulgated the Exceptional Events Rule codifying guidance regarding exclusion of monitoring data from compliance decisions due to uncontrollable natural or exceptional events. This capstone examines documentation systems utilized by agencies requesting data be excluded from compliance decisions due to exceptional events. A screening tool is developed to determine whether an event would meet exceptional event criteria. New data sources are available to enhance analysis but evaluation shows many are unusable in their current form. The EPA and States must collaborate to develop consistent evaluation methodologies documenting exceptional events to improve the efficiency and effectiveness of the new rule. To utilize newer sophisticated data, consistent, user-friendly translation systems must be developed.
Open business intelligence: on the importance of data quality awareness in user-friendly data mining
Resumo:
Citizens demand more and more data for making decisions in their daily life. Therefore, mechanisms that allow citizens to understand and analyze linked open data (LOD) in a user-friendly manner are highly required. To this aim, the concept of Open Business Intelligence (OpenBI) is introduced in this position paper. OpenBI facilitates non-expert users to (i) analyze and visualize LOD, thus generating actionable information by means of reporting, OLAP analysis, dashboards or data mining; and to (ii) share the new acquired information as LOD to be reused by anyone. One of the most challenging issues of OpenBI is related to data mining, since non-experts (as citizens) need guidance during preprocessing and application of mining algorithms due to the complexity of the mining process and the low quality of the data sources. This is even worst when dealing with LOD, not only because of the different kind of links among data, but also because of its high dimensionality. As a consequence, in this position paper we advocate that data mining for OpenBI requires data quality-aware mechanisms for guiding non-expert users in obtaining and sharing the most reliable knowledge from the available LOD.
Resumo:
Comunicación presentada en las XVI Jornadas de Ingeniería del Software y Bases de Datos, JISBD 2011, A Coruña, 5-7 septiembre 2011.
Resumo:
Among many other problems, the migration, humanitarian and policy crises in the European Union in 2015 and early 2016 have highlighted a pressing need for reliable, timely and comparable statistical data on migration, asylum and arrivals at national borders. In this fast-moving policy field, data production and the timeliness of dissemination have seen some improvements but the sources of data remain largely unchanged at national level. In this paper the author examines the reasons for some of the problems with the data for policy and for public discussion, and makes a set of recommendations that call for a complete and updated inventory of data sources and for an evaluation of the quality of data used for policy-making.
Resumo:
La présente thèse vise à évaluer le degré d’implantation et d’utilisation de systèmes de mesure de la performance (SMP) par les décideurs des organisations de réadaptation et à comprendre les facteurs contextuels ayant influencé leur implantation. Pour ce faire, une étude de cas multiples a été réalisée comprenant deux sources de données: des entrevues individuelles avec des cadres supérieurs des organisations de réadaptation du Québec et des documents organisationnels. Le cadre conceptuel Consolidated Framework for Implementation Research a été utilisé pour guider la collecte et l’analyse des données. Une analyse intra-cas ainsi qu’une analyse inter-cas ont été réalisées. Nos résultats montrent que le niveau de préparation organisationnelle à l’implantation d’un SMP était élevé et que les SMP ont été implantés avec succès et utilisés de plusieurs façons. Les organisations les ont utilisés de façon passive (comme outil d’information), de façon ciblée (pour tenter d’améliorer des domaines sous-performants) et de façon politique (comme outil de négociation auprès des autorités gouvernementales). Cette utilisation diversifiée des SMP est suscitée par l’interaction complexe de facteurs provenant du contexte interne propre à chaque organisation, des caractéristiques du SMP, du processus d’implantation appliqué et du contexte externe dans lequel évoluent ces organisations. Au niveau du contexte interne, l’engagement continu et le leadership de la haute direction ont été décisifs dans l’implantation du SMP de par leur influence sur l’identification du besoin d’un SMP, l’engagement des utilisateurs visés dans le projet, la priorité organisationnelle accordée au SMP ainsi que les ressources octroyées à son implantation, la qualité des communications et le climat d’apprentissage organisationnel. Toutefois, même si certains de ces facteurs, comme les ressources octroyées à l’implantation, la priorité organisationnelle du SMP et le climat d’apprentissage se sont révélés être des barrières à l’implantation, ultimement, ces barrières n’étaient pas suffisamment importantes pour entraver l’utilisation du SMP. Cette étude a également confirmé l’importance des caractéristiques du SMP, particulièrement la perception de qualité et d’utilité de l’information. Cependant, à elles seules, ces caractéristiques sont insuffisantes pour assurer le succès d’implantation. Cette analyse d’implantation a également révélé que, même si le processus d’implantation ne suit pas des étapes formelles, un plan de développement du SMP, la participation et l’engagement des décideurs ainsi que la désignation d’un responsable de projet ont tous facilité son implantation. Cependant, l’absence d’évaluation et de réflexion collective sur le processus d’implantation a limité le potentiel d’apprentissage organisationnel, un prérequis à l’amélioration de la performance. Quant au contexte externe, le soutien d’un organisme externe s’est avéré un facilitateur indispensable pour favoriser l’implantation de SMP par les organisations de réadaptation malgré l’absence de politiques et incitatifs gouvernementaux à cet effet. Cette étude contribue à accroître les connaissances sur les facteurs contextuels ainsi que sur leurs interactions dans l’utilisation d’innovations tels les SMP et confirme l’importance d’aborder l’analyse de l’implantation avec une perspective systémique.
Resumo:
Cybercrime and related malicious activity in our increasingly digital world has become more prevalent and sophisticated, evading traditional security mechanisms. Digital forensics has been proposed to help investigate, understand and eventually mitigate such attacks. The practice of digital forensics, however, is still fraught with various challenges. Some of the most prominent of these challenges include the increasing amounts of data and the diversity of digital evidence sources appearing in digital investigations. Mobile devices and cloud infrastructures are an interesting specimen, as they inherently exhibit these challenging circumstances and are becoming more prevalent in digital investigations today. Additionally they embody further characteristics such as large volumes of data from multiple sources, dynamic sharing of resources, limited individual device capabilities and the presence of sensitive data. These combined set of circumstances make digital investigations in mobile and cloud environments particularly challenging. This is not aided by the fact that digital forensics today still involves manual, time consuming tasks within the processes of identifying evidence, performing evidence acquisition and correlating multiple diverse sources of evidence in the analysis phase. Furthermore, industry standard tools developed are largely evidence-oriented, have limited support for evidence integration and only automate certain precursory tasks, such as indexing and text searching. In this study, efficiency, in the form of reducing the time and human labour effort expended, is sought after in digital investigations in highly networked environments through the automation of certain activities in the digital forensic process. To this end requirements are outlined and an architecture designed for an automated system that performs digital forensics in highly networked mobile and cloud environments. Part of the remote evidence acquisition activity of this architecture is built and tested on several mobile devices in terms of speed and reliability. A method for integrating multiple diverse evidence sources in an automated manner, supporting correlation and automated reasoning is developed and tested. Finally the proposed architecture is reviewed and enhancements proposed in order to further automate the architecture by introducing decentralization particularly within the storage and processing functionality. This decentralization also improves machine to machine communication supporting several digital investigation processes enabled by the architecture through harnessing the properties of various peer-to-peer overlays. Remote evidence acquisition helps to improve the efficiency (time and effort involved) in digital investigations by removing the need for proximity to the evidence. Experiments show that a single TCP connection client-server paradigm does not offer the required scalability and reliability for remote evidence acquisition and that a multi-TCP connection paradigm is required. The automated integration, correlation and reasoning on multiple diverse evidence sources demonstrated in the experiments improves speed and reduces the human effort needed in the analysis phase by removing the need for time-consuming manual correlation. Finally, informed by published scientific literature, the proposed enhancements for further decentralizing the Live Evidence Information Aggregator (LEIA) architecture offer a platform for increased machine-to-machine communication thereby enabling automation and reducing the need for manual human intervention.
Resumo:
In simultaneous analyses of multiple data partitions, the trees relevant when measuring support for a clade are the optimal tree, and the best tree lacking the clade (i.e., the most reasonable alternative). The parsimony-based method of partitioned branch support (PBS) forces each data set to arbitrate between the two relevant trees. This value is the amount each data set contributes to clade support in the combined analysis, and can be very different to support apparent in separate analyses. The approach used in PBS can also be employed in likelihood: a simultaneous analysis of all data retrieves the maximum likelihood tree, and the best tree without the clade of interest is also found. Each data set is fitted to the two trees and the log-likelihood difference calculated, giving partitioned likelihood support (PLS) for each data set. These calculations can be performed regardless of the complexity of the ML model adopted. The significance of PLS can be evaluated using a variety of resampling methods, such as the Kishino-Hasegawa test, the Shimodiara-Hasegawa test, or likelihood weights, although the appropriateness and assumptions of these tests remains debated.
Resumo:
In the wake of findings from the Bundaberg Hospital and Forster inquiries in Queensland, periodic public release of hospital performance reports has been recommended. A process for developing and releasing such reports is being established by Queensland Health, overseen by an independent expert panel. This recommendation presupposes that public reports based on routinely collected administrative data are accurate; that the public can access, correctly interpret and act upon report contents; that reports motivate hospital clinicians and managers to improve quality of care; and that there are no unintended adverse effects of public reporting. Available research suggests that primary data sources are often inaccurate and incomplete, that reports have low predictive value in detecting outlier hospitals, and that users experience difficulty in accessing and interpreting reports and tend to distrust their findings.
Resumo:
In recent years many real time applications need to handle data streams. We consider the distributed environments in which remote data sources keep on collecting data from real world or from other data sources, and continuously push the data to a central stream processor. In these kinds of environments, significant communication is induced by the transmitting of rapid, high-volume and time-varying data streams. At the same time, the computing overhead at the central processor is also incurred. In this paper, we develop a novel filter approach, called DTFilter approach, for evaluating the windowed distinct queries in such a distributed system. DTFilter approach is based on the searching algorithm using a data structure of two height-balanced trees, and it avoids transmitting duplicate items in data streams, thus lots of network resources are saved. In addition, theoretical analysis of the time spent in performing the search, and of the amount of memory needed is provided. Extensive experiments also show that DTFilter approach owns high performance.
Resumo:
Overlaying maps using a desktop GIS is often the first step of a multivariate spatial analysis. The potential of this operation has increased considerably as data sources an dWeb services to manipulate them are becoming widely available via the Internet. Standards from the OGC enable such geospatial ‘mashups’ to be seamless and user driven, involving discovery of thematic data. The user is naturally inclined to look for spatial clusters and ‘correlation’ of outcomes. Using classical cluster detection scan methods to identify multivariate associations can be problematic in this context, because of a lack of control on or knowledge about background populations. For public health and epidemiological mapping, this limiting factor can be critical but often the focus is on spatial identification of risk factors associated with health or clinical status. In this article we point out that this association itself can ensure some control on underlying populations, and develop an exploratory scan statistic framework for multivariate associations. Inference using statistical map methodologies can be used to test the clustered associations. The approach is illustrated with a hypothetical data example and an epidemiological study on community MRSA. Scenarios of potential use for online mashups are introduced but full implementation is left for further research.
Resumo:
To capture the genomic profiles for histone modification, chromatin immunoprecipitation (ChIP) is combined with next generation sequencing, which is called ChIP-seq. However, enriched regions generated from the ChIP-seq data are only evaluated on the limited knowledge acquired from manually examining the relevant biological literature. This paper proposes a novel framework, which integrates multiple knowledge sources such as biological literature, Gene Ontology, and microarray data. In order to precisely analyze ChIP-seq data for histone modification, knowledge integration is based on a unified probabilistic model. The model is employed to re-rank the enriched regions generated from peak finding algorithms. Through filtering the reranked enriched regions using some predefined threshold, more reliable and precise results could be generated. The combination of the multiple knowledge sources with the peaking finding algorithm produces a new paradigm for ChIP-seq data analysis. © (2012) Trans Tech Publications, Switzerland.
Resumo:
questions of forming of learning sets for artificial neural networks in problems of lossless data compression are considered. Methods of construction and use of learning sets are studied. The way of forming of learning set during training an artificial neural network on the data stream is offered.
Resumo:
2010 Mathematics Subject Classification: 94A17.
Resumo:
An implementation of Sem-ODB—a database management system based on the Semantic Binary Model is presented. A metaschema of Sem-ODB database as well as the top-level architecture of the database engine is defined. A new benchmarking technique is proposed which allows databases built on different database models to compete fairly. This technique is applied to show that Sem-ODB has excellent efficiency comparing to a relational database on a certain class of database applications. A new semantic benchmark is designed which allows evaluation of the performance of the features characteristic of semantic database applications. An application used in the benchmark represents a class of problems requiring databases with sparse data, complex inheritances and many-to-many relations. Such databases can be naturally accommodated by semantic model. A fixed predefined implementation is not enforced allowing the database designer to choose the most efficient structures available in the DBMS tested. The results of the benchmark are analyzed. ^ A new high-level querying model for semantic databases is defined. It is proven adequate to serve as an efficient native semantic database interface, and has several advantages over the existing interfaces. It is optimizable and parallelizable, supports the definition of semantic userviews and the interoperability of semantic databases with other data sources such as World Wide Web, relational, and object-oriented databases. The query is structured as a semantic database schema graph with interlinking conditionals. The query result is a mini-database, accessible in the same way as the original database. The paradigm supports and utilizes the rich semantics and inherent ergonomics of semantic databases. ^ The analysis and high-level design of a system that exploits the superiority of the Semantic Database Model to other data models in expressive power and ease of use to allow uniform access to heterogeneous data sources such as semantic databases, relational databases, web sites, ASCII files, and others via a common query interface is presented. The Sem-ODB engine is used to control all the data sources combined under a unified semantic schema. A particular application of the system to provide an ODBC interface to the WWW as a data source is discussed. ^