963 resultados para Count data models


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Past changes in North Pacific sea surface temperatures and sea-ice conditions are proposed to play a crucial role in deglacial climate development and ocean circulation but are less well known than from the North Atlantic. Here, we present new alkenone-based sea surface temperature records from the subarctic northwest Pacific and its marginal seas (Bering Sea and Sea of Okhotsk) for the time interval of the last 15 kyr, indicating millennial-scale sea surface temperature fluctuations similar to short-term deglacial climate oscillations known from Greenland ice-core records. Past changes in sea-ice distribution are derived from relative percentage of specific diatom groups and qualitative assessment of the IP25 biomarker related to sea-ice diatoms. The deglacial variability in sea-ice extent matches the sea surface temperature fluctuations. These fluctuations suggest a linkage to deglacial variations in Atlantic meridional overturning circulation and a close atmospheric coupling between the North Pacific and North Atlantic. During the Holocene the subarctic North Pacific is marked by complex sea surface temperature trends, which do not support the hypothesis of a Holocene seesaw in temperature development between the North Atlantic and the North Pacific.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Detrital modes for 524 deep-marine sand and sandstone samples recovered on circum-Pacific, Caribbean, and Mediterranean legs of the Deep Sea Drilling Project and the Ocean Drilling Program form the basis for an actualistic model for arc-related provenance. This model refines the Dickinson and Suczek (1979) and Dickinson and others (1983) models and can be used to interpret the provenance/tectonic history of ancient arc-related sedimentary sequences. Four provenance groups are defined using QFL, QmKP, LmLvLs, and LvfLvmiLvl ternary plots of site means: (1) intraoceanic arc and remnant arc, (2) continental arc, (3) triple junction, and (4) strike-slip-continental arc. Intraoceanic- and remnant-arc sands are poor in quartz (mean QFL%Q < 5) and rich in lithics (QFL%L > 75); they are predominantly composed of plagioclase feldspar and volcanic lithic fragments. Continental-arc sand can be more quartzofeldspathic than the intraoceanic- and remnant-arc sand (mean QFL%Q values as much as 10, mean QFL%F values as much as 65, and mean QmKP%Qm as much as 20) and has more variable lithic populations, with minor metamorphic and sedimentary components. The triple-junction and strike-slip-continental groups compositionally overlap; both are more quartzofeldspathic than the other groups and show highly variable lithic proportions, but the strike-slip-continental group is more quartzose. Modal compositions of the triple junction group roughly correlate with the QFL transitional-arc field of Dickinson and others (1983), whereas the strike-slip-continental group approximately correlates with their dissected-arc field.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

EURATOM/CIEMAT and Technical University of Madrid (UPM) have been involved in the development of a FPSC [1] (Fast Plant System Control) prototype for ITER, based on PXIe (PCI eXtensions for Instrumentation). One of the main focuses of this project has been data acquisition and all the related issues, including scientific data archiving. Additionally, a new data archiving solution has been developed to demonstrate the obtainable performances and possible bottlenecks of scientific data archiving in Fast Plant System Control. The presented system implements a fault tolerant architecture over a GEthernet network where FPSC data are reliably archived on remote, while remaining accessible to be redistributed, within the duration of a pulse. The storing service is supported by a clustering solution to guaranty scalability, so that FPSC management and configuration may be simplified, and a unique view of all archived data provided. All the involved components have been integrated under EPICS [2] (Experimental Physics and Industrial Control System), implementing in each case the necessary extensions, state machines and configuration process variables. The prototyped solution is based on the NetCDF-4 [3] and [4] (Network Common Data Format) file format in order to incorporate important features, such as scientific data models support, huge size files management, platform independent codification, or single-writer/multiple-readers concurrency. In this contribution, a complete description of the above mentioned solution is presented, together with the most relevant results of the tests performed, while focusing in the benefits and limitations of the applied technologies.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Sensor networks are increasingly becoming one of the main sources of Big Data on the Web. However, the observations that they produce are made available with heterogeneous schemas, vocabularies and data formats, making it difficult to share and reuse these data for other purposes than those for which they were originally set up. In this thesis we address these challenges, considering how we can transform streaming raw data to rich ontology-based information that is accessible through continuous queries for streaming data. Our main contribution is an ontology-based approach for providing data access and query capabilities to streaming data sources, allowing users to express their needs at a conceptual level, independent of implementation and language-specific details. We introduce novel query rewriting and data translation techniques that rely on mapping definitions relating streaming data models to ontological concepts. Specific contributions include: • The syntax and semantics of the SPARQLStream query language for ontologybased data access, and a query rewriting approach for transforming SPARQLStream queries into streaming algebra expressions. • The design of an ontology-based streaming data access engine that can internally reuse an existing data stream engine, complex event processor or sensor middleware, using R2RML mappings for defining relationships between streaming data models and ontology concepts. Concerning the sensor metadata of such streaming data sources, we have investigated how we can use raw measurements to characterize streaming data, producing enriched data descriptions in terms of ontological models. Our specific contributions are: • A representation of sensor data time series that captures gradient information that is useful to characterize types of sensor data. • A method for classifying sensor data time series and determining the type of data, using data mining techniques, and a method for extracting semantic sensor metadata features from the time series.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Due to the advancement of both, information technology in general, and databases in particular; data storage devices are becoming cheaper and data processing speed is increasing. As result of this, organizations tend to store large volumes of data holding great potential information. Decision Support Systems, DSS try to use the stored data to obtain valuable information for organizations. In this paper, we use both data models and use cases to represent the functionality of data processing in DSS following Software Engineering processes. We propose a methodology to develop DSS in the Analysis phase, respective of data processing modeling. We have used, as a starting point, a data model adapted to the semantics involved in multidimensional databases or data warehouses, DW. Also, we have taken an algorithm that provides us with all the possible ways to automatically cross check multidimensional model data. Using the aforementioned, we propose diagrams and descriptions of use cases, which can be considered as patterns representing the DSS functionality, in regard to DW data processing, DW on which DSS are based. We highlight the reusability and automation benefits that this can be achieved, and we think this study can serve as a guide in the development of DSS.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Replication Data Management (RDM) aims at enabling the use of data collections from several iterations of an experiment. However, there are several major challenges to RDM from integrating data models and data from empirical study infrastructures that were not designed to cooperate, e.g., data model variation of local data sources. [Objective] In this paper we analyze RDM needs and evaluate conceptual RDM approaches to support replication researchers. [Method] We adapted the ATAM evaluation process to (a) analyze RDM use cases and needs of empirical replication study research groups and (b) compare three conceptual approaches to address these RDM needs: central data repositories with a fixed data model, heterogeneous local repositories, and an empirical ecosystem. [Results] While the central and local approaches have major issues that are hard to resolve in practice, the empirical ecosystem allows bridging current gaps in RDM from heterogeneous data sources. [Conclusions] The empirical ecosystem approach should be explored in diverse empirical environments.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

La gran cantidad de datos que se registran diariamente en los sistemas de base de datos de las organizaciones ha generado la necesidad de analizarla. Sin embargo, se enfrentan a la complejidad de procesar enormes volúmenes de datos a través de métodos tradicionales de análisis. Además, dentro de un contexto globalizado y competitivo las organizaciones se mantienen en la búsqueda constante de mejorar sus procesos, para lo cual requieren herramientas que les permitan tomar mejores decisiones. Esto implica estar mejor informado y conocer su historia digital para describir sus procesos y poder anticipar (predecir) eventos no previstos. Estos nuevos requerimientos de análisis de datos ha motivado el desarrollo creciente de proyectos de minería de datos. El proceso de minería de datos busca obtener desde un conjunto masivo de datos, modelos que permitan describir los datos o predecir nuevas instancias en el conjunto. Implica etapas de: preparación de los datos, procesamiento parcial o totalmente automatizado para identificar modelos en los datos, para luego obtener como salida patrones, relaciones o reglas. Esta salida debe significar un nuevo conocimiento para la organización, útil y comprensible para los usuarios finales, y que pueda ser integrado a los procesos para apoyar la toma de decisiones. Sin embargo, la mayor dificultad es justamente lograr que el analista de datos, que interviene en todo este proceso, pueda identificar modelos lo cual es una tarea compleja y muchas veces requiere de la experiencia, no sólo del analista de datos, sino que también del experto en el dominio del problema. Una forma de apoyar el análisis de datos, modelos y patrones es a través de su representación visual, utilizando las capacidades de percepción visual del ser humano, la cual puede detectar patrones con mayor facilidad. Bajo este enfoque, la visualización ha sido utilizada en minería datos, mayormente en el análisis descriptivo de los datos (entrada) y en la presentación de los patrones (salida), dejando limitado este paradigma para el análisis de modelos. El presente documento describe el desarrollo de la Tesis Doctoral denominada “Nuevos Esquemas de Visualizaciones para Mejorar la Comprensibilidad de Modelos de Data Mining”. Esta investigación busca aportar con un enfoque de visualización para apoyar la comprensión de modelos minería de datos, para esto propone la metáfora de modelos visualmente aumentados. ABSTRACT The large amount of data to be recorded daily in the systems database of organizations has generated the need to analyze it. However, faced with the complexity of processing huge volumes of data over traditional methods of analysis. Moreover, in a globalized and competitive environment organizations are kept constantly looking to improve their processes, which require tools that allow them to make better decisions. This involves being bettered informed and knows your digital story to describe its processes and to anticipate (predict) unanticipated events. These new requirements of data analysis, has led to the increasing development of data-mining projects. The data-mining process seeks to obtain from a massive data set, models to describe the data or predict new instances in the set. It involves steps of data preparation, partially or fully automated processing to identify patterns in the data, and then get output patterns, relationships or rules. This output must mean new knowledge for the organization, useful and understandable for end users, and can be integrated into the process to support decision-making. However, the biggest challenge is just getting the data analyst involved in this process, which can identify models is complex and often requires experience not only of the data analyst, but also the expert in the problem domain. One way to support the analysis of the data, models and patterns, is through its visual representation, i.e., using the capabilities of human visual perception, which can detect patterns easily in any context. Under this approach, the visualization has been used in data mining, mostly in exploratory data analysis (input) and the presentation of the patterns (output), leaving limited this paradigm for analyzing models. This document describes the development of the doctoral thesis entitled "New Visualizations Schemes to Improve Understandability of Data-Mining Models". This research aims to provide a visualization approach to support understanding of data mining models for this proposed metaphor visually enhanced models.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

O objetivo dessa pesquisa foi avaliar aspectos genéticos que relacionados à produção in vitro de embriões na raça Guzerá. O primeiro estudo focou na estimação de (co) variâncias genéticas e fenotípicas em características relacionadas a produção de embriões e na detecção de possível associação com a idade ao primeiro parto (AFC). Foi detectada baixa e média herdabilidade para características relacionadas à produção de oócitos e embriões. Houve fraca associação genética entre características ligadas a reprodução artificial e a idade ao primeiro parto. O segundo estudo avaliou tendências genéticas e de endogamia em uma população Guzerá no Brasil. Doadoras e embriões produzidos in vitro foram considerados como duas subpopulações de forma a realizar comparações acerca das diferenças de variação anual genética e do coeficiente de endogamia. A tendência anual do coeficiente de endogamia (F) foi superior para a população geral, sendo detectado efeito quadrático. No entanto, a média de F para a sub- população de embriões foi maior do que na população geral e das doadoras. Foi observado ganho genético anual superior para a idade ao primeiro parto e para a produção de leite (305 dias) entre embriões produzidos in vitro do que entre doadoras ou entre a população geral. O terceiro estudo examinou os efeitos do coeficiente de endogamia da doadora, do reprodutor (usado na fertilização in vitro) e dos embriões sobre resultados de produção in vitro de embriões na raça Guzerá. Foi detectado efeito da endogamia da doadora e dos embriões sobre as características estudadas. O quarto (e último) estudo foi elaborado para comparar a adequação de modelos mistos lineares e generalizados sob método de Máxima Verossimilhança Restrita (REML) e sua adequação a variáveis discretas. Quatro modelos hierárquicos assumindo diferentes distribuições para dados de contagem encontrados no banco. Inferência foi realizada com base em diagnósticos de resíduo e comparação de razões entre componentes de variância para os modelos em cada variável. Modelos Poisson superaram tanto o modelo linear (com e sem transformação da variável) quanto binomial negativo à qualidade do ajuste e capacidade preditiva, apesar de claras diferenças observadas na distribuição das variáveis. Entre os modelos testados, a pior qualidade de ajuste foi obtida para o modelo linear mediante transformação logarítmica (Log10 X +1) da variável resposta.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Many studies on birds focus on the collection of data through an experimental design, suitable for investigation in a classical analysis of variance (ANOVA) framework. Although many findings are confirmed by one or more experts, expert information is rarely used in conjunction with the survey data to enhance the explanatory and predictive power of the model. We explore this neglected aspect of ecological modelling through a study on Australian woodland birds, focusing on the potential impact of different intensities of commercial cattle grazing on bird density in woodland habitat. We examine a number of Bayesian hierarchical random effects models, which cater for overdispersion and a high frequency of zeros in the data using WinBUGS and explore the variation between and within different grazing regimes and species. The impact and value of expert information is investigated through the inclusion of priors that reflect the experience of 20 experts in the field of bird responses to disturbance. Results indicate that expert information moderates the survey data, especially in situations where there are little or no data. When experts agreed, credible intervals for predictions were tightened considerably. When experts failed to agree, results were similar to those evaluated in the absence of expert information. Overall, we found that without expert opinion our knowledge was quite weak. The fact that the survey data is quite consistent, in general, with expert opinion shows that we do know something about birds and grazing and we could learn a lot faster if we used this approach more in ecology, where data are scarce. Copyright (c) 2005 John Wiley & Sons, Ltd.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In recent years there has been a great effort to combine the technologies and techniques of GIS and process models. This project examines the issues of linking a standard current generation 2½d GIS with several existing model codes. The focus for the project has been the Shropshire Groundwater Scheme, which is being developed to augment flow in the River Severn during drought periods by pumping water from the Shropshire Aquifer. Previous authors have demonstrated that under certain circumstances pumping could reduce the soil moisture available for crops. This project follows earlier work at Aston in which the effects of drawdown were delineated and quantified through the development of a software package that implemented a technique which brought together the significant spatially varying parameters. This technique is repeated here, but using a standard GIS called GRASS. The GIS proved adequate for the task and the added functionality provided by the general purpose GIS - the data capture, manipulation and visualisation facilities - were of great benefit. The bulk of the project is concerned with examining the issues of the linkage of GIS and environmental process models. To this end a groundwater model (Modflow) and a soil moisture model (SWMS2D) were linked to the GIS and a crop model was implemented within the GIS. A loose-linked approach was adopted and secondary and surrogate data were used wherever possible. The implications of which relate to; justification of a loose-linked versus a closely integrated approach; how, technically, to achieve the linkage; how to reconcile the different data models used by the GIS and the process models; control of the movement of data between models of environmental subsystems, to model the total system; the advantages and disadvantages of using a current generation GIS as a medium for linking environmental process models; generation of input data, including the use of geostatistic, stochastic simulation, remote sensing, regression equations and mapped data; issues of accuracy, uncertainty and simply providing adequate data for the complex models; how such a modelling system fits into an organisational framework.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An implementation of Sem-ODB—a database management system based on the Semantic Binary Model is presented. A metaschema of Sem-ODB database as well as the top-level architecture of the database engine is defined. A new benchmarking technique is proposed which allows databases built on different database models to compete fairly. This technique is applied to show that Sem-ODB has excellent efficiency comparing to a relational database on a certain class of database applications. A new semantic benchmark is designed which allows evaluation of the performance of the features characteristic of semantic database applications. An application used in the benchmark represents a class of problems requiring databases with sparse data, complex inheritances and many-to-many relations. Such databases can be naturally accommodated by semantic model. A fixed predefined implementation is not enforced allowing the database designer to choose the most efficient structures available in the DBMS tested. The results of the benchmark are analyzed. ^ A new high-level querying model for semantic databases is defined. It is proven adequate to serve as an efficient native semantic database interface, and has several advantages over the existing interfaces. It is optimizable and parallelizable, supports the definition of semantic userviews and the interoperability of semantic databases with other data sources such as World Wide Web, relational, and object-oriented databases. The query is structured as a semantic database schema graph with interlinking conditionals. The query result is a mini-database, accessible in the same way as the original database. The paradigm supports and utilizes the rich semantics and inherent ergonomics of semantic databases. ^ The analysis and high-level design of a system that exploits the superiority of the Semantic Database Model to other data models in expressive power and ease of use to allow uniform access to heterogeneous data sources such as semantic databases, relational databases, web sites, ASCII files, and others via a common query interface is presented. The Sem-ODB engine is used to control all the data sources combined under a unified semantic schema. A particular application of the system to provide an ODBC interface to the WWW as a data source is discussed. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Crash reduction factors (CRFs) are used to estimate the potential number of traffic crashes expected to be prevented from investment in safety improvement projects. The method used to develop CRFs in Florida has been based on the commonly used before-and-after approach. This approach suffers from a widely recognized problem known as regression-to-the-mean (RTM). The Empirical Bayes (EB) method has been introduced as a means to addressing the RTM problem. This method requires the information from both the treatment and reference sites in order to predict the expected number of crashes had the safety improvement projects at the treatment sites not been implemented. The information from the reference sites is estimated from a safety performance function (SPF), which is a mathematical relationship that links crashes to traffic exposure. The objective of this dissertation was to develop the SPFs for different functional classes of the Florida State Highway System. Crash data from years 2001 through 2003 along with traffic and geometric data were used in the SPF model development. SPFs for both rural and urban roadway categories were developed. The modeling data used were based on one-mile segments that contain homogeneous traffic and geometric conditions within each segment. Segments involving intersections were excluded. The scatter plots of data show that the relationships between crashes and traffic exposure are nonlinear, that crashes increase with traffic exposure in an increasing rate. Four regression models, namely, Poisson (PRM), Negative Binomial (NBRM), zero-inflated Poisson (ZIP), and zero-inflated Negative Binomial (ZINB), were fitted to the one-mile segment records for individual roadway categories. The best model was selected for each category based on a combination of the Likelihood Ratio test, the Vuong statistical test, and the Akaike's Information Criterion (AIC). The NBRM model was found to be appropriate for only one category and the ZINB model was found to be more appropriate for six other categories. The overall results show that the Negative Binomial distribution model generally provides a better fit for the data than the Poisson distribution model. In addition, the ZINB model was found to give the best fit when the count data exhibit excess zeros and over-dispersion for most of the roadway categories. While model validation shows that most data points fall within the 95% prediction intervals of the models developed, the Pearson goodness-of-fit measure does not show statistical significance. This is expected as traffic volume is only one of the many factors contributing to the overall crash experience, and that the SPFs are to be applied in conjunction with Accident Modification Factors (AMFs) to further account for the safety impacts of major geometric features before arriving at the final crash prediction. However, with improved traffic and crash data quality, the crash prediction power of SPF models may be further improved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Big Data Analytics is an emerging field since massive storage and computing capabilities have been made available by advanced e-infrastructures. Earth and Environmental sciences are likely to benefit from Big Data Analytics techniques supporting the processing of the large number of Earth Observation datasets currently acquired and generated through observations and simulations. However, Earth Science data and applications present specificities in terms of relevance of the geospatial information, wide heterogeneity of data models and formats, and complexity of processing. Therefore, Big Earth Data Analytics requires specifically tailored techniques and tools. The EarthServer Big Earth Data Analytics engine offers a solution for coverage-type datasets, built around a high performance array database technology, and the adoption and enhancement of standards for service interaction (OGC WCS and WCPS). The EarthServer solution, led by the collection of requirements from scientific communities and international initiatives, provides a holistic approach that ranges from query languages and scalability up to mobile access and visualization. The result is demonstrated and validated through the development of lighthouse applications in the Marine, Geology, Atmospheric, Planetary and Cryospheric science domains.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Big Data Analytics is an emerging field since massive storage and computing capabilities have been made available by advanced e-infrastructures. Earth and Environmental sciences are likely to benefit from Big Data Analytics techniques supporting the processing of the large number of Earth Observation datasets currently acquired and generated through observations and simulations. However, Earth Science data and applications present specificities in terms of relevance of the geospatial information, wide heterogeneity of data models and formats, and complexity of processing. Therefore, Big Earth Data Analytics requires specifically tailored techniques and tools. The EarthServer Big Earth Data Analytics engine offers a solution for coverage-type datasets, built around a high performance array database technology, and the adoption and enhancement of standards for service interaction (OGC WCS and WCPS). The EarthServer solution, led by the collection of requirements from scientific communities and international initiatives, provides a holistic approach that ranges from query languages and scalability up to mobile access and visualization. The result is demonstrated and validated through the development of lighthouse applications in the Marine, Geology, Atmospheric, Planetary and Cryospheric science domains.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

No estudo de séries temporais, os processos estocásticos usuais assumem que as distribuições marginais são contínuas e, em geral, não são adequados para modelar séries de contagem, pois as suas características não lineares colocam alguns problemas estatísticos, principalmente na estimação dos parâmetros. Assim, investigou-se metodologias apropriadas de análise e modelação de séries com distribuições marginais discretas. Neste contexto, Al-Osh and Alzaid (1987) e McKenzie (1988) introduziram na literatura a classe dos modelos autorregressivos com valores inteiros não negativos, os processos INAR. Estes modelos têm sido frequentemente tratados em artigos científicos ao longo das últimas décadas, pois a sua importância nas aplicações em diversas áreas do conhecimento tem despertado um grande interesse no seu estudo. Neste trabalho, após uma breve revisão sobre séries temporais e os métodos clássicos para a sua análise, apresentamos os modelos autorregressivos de valores inteiros não negativos de primeira ordem INAR (1) e a sua extensão para uma ordem p, as suas propriedades e alguns métodos de estimação dos parâmetros nomeadamente, o método de Yule-Walker, o método de Mínimos Quadrados Condicionais (MQC), o método de Máxima Verosimilhança Condicional (MVC) e o método de Quase Máxima Verosimilhança (QMV). Apresentamos também um critério automático de seleção de ordem para modelos INAR, baseado no Critério de Informação de Akaike Corrigido, AICC, um dos critérios usados para determinar a ordem em modelos autorregressivos, AR. Finalmente, apresenta-se uma aplicação da metodologia dos modelos INAR em dados reais de contagem relativos aos setores dos transportes marítimos e atividades de seguros de Cabo Verde.