985 resultados para binary data


Relevância:

70.00% 70.00%

Publicador:

Resumo:

This collection contains measurements of abundance and diversity of different groups of aboveground invertebrates sampled on the plots of the different sub-experiments at the field site of a large grassland biodiversity experiment (the Jena Experiment; see further details below). In the main experiment, 82 grassland plots of 20 x 20 m were established from a pool of 60 species belonging to four functional groups (grasses, legumes, tall and small herbs). In May 2002, varying numbers of plant species from this species pool were sown into the plots to create a gradient of plant species richness (1, 2, 4, 8, 16 and 60 species) and functional richness (1, 2, 3, 4 functional groups). Plots were maintained by bi-annual weeding and mowing. The following series of datasets are contained in this collection: 1. Measurements of ant abundance (number of individuals attracted to baits) and ant occurrence (binary data) in the Main Experiment in 2006 and 2013. Ants where sampled using two types of baited traps receiving ~10g of Tuna or ~10g of honey/Sucrose. After 30min the occurrence (presence = 1 / absence = 0) and abundance (number) of ants at the two types of baits was recorded and pooled per plot.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The viscosity of ionic liquids (ILs) has been modeled as a function of temperature and at atmospheric pressure using a new method based on the UNIFAC–VISCO method. This model extends the calculations previously reported by our group (see Zhao et al. J. Chem. Eng. Data 2016, 61, 2160–2169) which used 154 experimental viscosity data points of 25 ionic liquids for regression of a set of binary interaction parameters and ion Vogel–Fulcher–Tammann (VFT) parameters. Discrepancies in the experimental data of the same IL affect the quality of the correlation and thus the development of the predictive method. In this work, mathematical gnostics was used to analyze the experimental data from different sources and recommend one set of reliable data for each IL. These recommended data (totally 819 data points) for 70 ILs were correlated using this model to obtain an extended set of binary interaction parameters and ion VFT parameters, with a regression accuracy of 1.4%. In addition, 966 experimental viscosity data points for 11 binary mixtures of ILs were collected from literature to establish this model. All the binary data consist of 128 training data points used for the optimization of binary interaction parameters and 838 test data points used for the comparison of the pure evaluated values. The relative average absolute deviation (RAAD) for training and test is 2.9% and 3.9%, respectively.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We investigate methods for data-based selection of working covariance models in the analysis of correlated data with generalized estimating equations. We study two selection criteria: Gaussian pseudolikelihood and a geodesic distance based on discrepancy between model-sensitive and model-robust regression parameter covariance estimators. The Gaussian pseudolikelihood is found in simulation to be reasonably sensitive for several response distributions and noncanonical mean-variance relations for longitudinal data. Application is also made to a clinical dataset. Assessment of adequacy of both correlation and variance models for longitudinal data should be routine in applications, and we describe open-source software supporting this practice.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The method of generalized estimating equation-, (GEEs) has been criticized recently for a failure to protect against misspecification of working correlation models, which in some cases leads to loss of efficiency or infeasibility of solutions. However, the feasibility and efficiency of GEE methods can be enhanced considerably by using flexible families of working correlation models. We propose two ways of constructing unbiased estimating equations from general correlation models for irregularly timed repeated measures to supplement and enhance GEE. The supplementary estimating equations are obtained by differentiation of the Cholesky decomposition of the working correlation, or as score equations for decoupled Gaussian pseudolikelihood. The estimating equations are solved with computational effort equivalent to that required for a first-order GEE. Full details and analytic expressions are developed for a generalized Markovian model that was evaluated through simulation. Large-sample ".sandwich" standard errors for working correlation parameter estimates are derived and shown to have good performance. The proposed estimating functions are further illustrated in an analysis of repeated measures of pulmonary function in children.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The method of generalised estimating equations for regression modelling of clustered outcomes allows for specification of a working matrix that is intended to approximate the true correlation matrix of the observations. We investigate the asymptotic relative efficiency of the generalised estimating equation for the mean parameters when the correlation parameters are estimated by various methods. The asymptotic relative efficiency depends on three-features of the analysis, namely (i) the discrepancy between the working correlation structure and the unobservable true correlation structure, (ii) the method by which the correlation parameters are estimated and (iii) the 'design', by which we refer to both the structures of the predictor matrices within clusters and distribution of cluster sizes. Analytical and numerical studies of realistic data-analysis scenarios show that choice of working covariance model has a substantial impact on regression estimator efficiency. Protection against avoidable loss of efficiency associated with covariance misspecification is obtained when a 'Gaussian estimation' pseudolikelihood procedure is used with an AR(1) structure.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The article describes a generalized estimating equations approach that was used to investigate the impact of technology on vessel performance in a trawl fishery during 1988-96, while accounting for spatial and temporal correlations in the catch-effort data. Robust estimation of parameters in the presence of several levels of clustering depended more on the choice of cluster definition than on the choice of correlation structure within the cluster. Models with smaller cluster sizes produced stable results, while models with larger cluster sizes, that may have had complex within-cluster correlation structures and that had within-cluster covariates, produced estimates sensitive to the correlation structure. The preferred model arising from this dataset assumed that catches from a vessel were correlated in the same years and the same areas, but independent in different years and areas. The model that assumed catches from a vessel were correlated in all years and areas, equivalent to a random effects term for vessel, produced spurious results. This was an unexpected finding that highlighted the need to adopt a systematic strategy for modelling. The article proposes a modelling strategy of selecting the best cluster definition first, and the working correlation structure (within clusters) second. The article discusses the selection and interpretation of the model in the light of background knowledge of the data and utility of the model, and the potential for this modelling approach to apply in similar statistical situations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In terabit-density magnetic recording, several bits of data can be replaced by the values of their neighbors in the storage medium. As a result, errors in the medium are dependent on each other and also on the data written. We consider a simple 1-D combinatorial model of this medium. In our model, we assume a setting where binary data is sequentially written on the medium and a bit can erroneously change to the immediately preceding value. We derive several properties of codes that correct this type of errors, focusing on bounds on their cardinality. We also define a probabilistic finite-state channel model of the storage medium, and derive lower and upper estimates of its capacity. A lower bound is derived by evaluating the symmetric capacity of the channel, i.e., the maximum transmission rate under the assumption of the uniform input distribution of the channel. An upper bound is found by showing that the original channel is a stochastic degradation of another, related channel model whose capacity we can compute explicitly.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A zone based systems design framework is described and utilised in the implementation of a message authentication code (MAC) algorithm based on symmetric key block ciphers. The resulting block cipher based MAC algorithm may be used to provide assurance of the authenticity and, hence, the integrity of binary data. Using software simulation to benchmark against the de facto cipher block chaining MAC (CBC-MAC) variant used in the TinySec security protocol for wireless sensor networks and the NIST cipher block chaining MAC standard, CMAC; we show that our zone based systems design framework can lead to block cipher based MAC constructs that point to improvements in message processing efficiency, processing throughput and processing latency.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Application of sensor-based technology within activity monitoring systems is becoming a popular technique within the smart environment paradigm. Nevertheless, the use of such an approach generates complex constructs of data, which subsequently requires the use of intricate activity recognition techniques to automatically infer the underlying activity. This paper explores a cluster-based ensemble method as a new solution for the purposes of activity recognition within smart environments. With this approach activities are modelled as collections of clusters built on different subsets of features. A classification process is performed by assigning a new instance to its closest cluster from each collection. Two different sensor data representations have been investigated, namely numeric and binary. Following the evaluation of the proposed methodology it has been demonstrated that the cluster-based ensemble method can be successfully applied as a viable option for activity recognition. Results following exposure to data collected from a range of activities indicated that the ensemble method had the ability to perform with accuracies of 94.2% and 97.5% for numeric and binary data, respectively. These results outperformed a range of single classifiers considered as benchmarks.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tese de doutoramento, Estatística e Investigação Operacional (Probabilidades e Estatística), Universidade de Lisboa, Faculdade de Ciências, 2014

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nos últimos anos, o processo de ensino e aprendizagem tem sofrido significativas alterações graças ao aparecimento da Internet. Novas ferramentas para apoio ao ensino têm surgido, nas quais se destacam os laboratórios remotos. Atualmente, muitas instituições de ensino disponibilizam laboratórios remotos nos seus cursos, que permitem, a professores e alunos, a realização de experiências reais através da Internet. Estes são implementados por diferentes arquiteturas e infraestruturas, suportados por vários módulos de laboratório acessíveis remotamente (e.g. instrumentos de medição). No entanto, a sua inclusão no ensino é ainda deficitária, devido: i) à falta de meios e competências técnicas das instituições de ensino para os desenvolverem, ii) à dificuldade na partilha dos módulos de laboratório por diferentes infraestruturas e, iii) à reduzida capacidade de os reconfigurar com esses módulos. Para ultrapassar estas limitações, foi idealizado e desenvolvido no âmbito de um trabalho de doutoramento [1] um protótipo, cuja arquitetura é baseada na norma IEEE 1451.0 e na tecnologia de FPGAs. Para além de garantir o desenvolvimento e o acesso de forma normalizada a um laboratório remoto, este protótipo promove ainda a partilha de módulos de laboratório por diferentes infraestruturas. Nesse trabalho explorou-se a capacidade de reconfiguração de FPGAs para embutir na infraestrutura do laboratório vários módulos, todos descritos em ficheiros, utilizando linguagens de descrição de hardware estruturados de acordo com a norma IEEE 1451.0. A definição desses módulos obriga à criação de estruturas de dados binárias (Transducer Electronic Data Sheets, TEDSs), bem como de outros ficheiros que possibilitam a sua interligação com a infraestrutura do laboratório. No entanto, a criação destes ficheiros é bastante complexa, uma vez que exige a realização de vários cálculos e conversões. Tendo em consideração essa mesma complexidade, esta dissertação descreve o desenvolvimento de uma aplicação Web para leitura e escrita dos TEDSs. Para além de um estudo sobre os laboratórios remotos, é efetuada uma descrição da norma IEEE 1451.0, com particular atenção para a sua arquitetura e para a estrutura dos diferentes TEDSs. Com o objetivo de enquadrar a aplicação desenvolvida, efetua-se ainda uma breve apresentação de um protótipo de um laboratório remoto reconfigurável, cuja reconfiguração é apoiada por esta aplicação. Por fim, é descrita a verificação da aplicação Web, de forma a tirar conclusões sobre o seu contributo para a simplificação dessa reconfiguração.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A statistical technique for fault analysis in industrial printing is reported. The method specifically deals with binary data, for which the results of the production process fall into two categories, rejected or accepted. The method is referred to as logistic regression, and is capable of predicting future fault occurrences by the analysis of current measurements from machine parts sensors. Individual analysis of each type of fault can determine which parts of the plant have a significant influence on the occurrence of such faults; it is also possible to infer which measurable process parameters have no significant influence on the generation of these faults. Information derived from the analysis can be helpful in the operator's interpretation of the current state of the plant. Appropriate actions may then be taken to prevent potential faults from occurring. The algorithm is being implemented as part of an applied self-learning expert system.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Despite the growing interest in dietary patterns, there have been few longitudinal investigations. The objective of the present study was to extend an earlier method of dietary pattern assessment to longitudinal binary data and to assess changes in patterns over time and in relation to socio-demographic covariates. A prospective national cohort of 1265 participants completed a 5 d food diary at three time-points during their adult life (at age 36 years in 1982, 43 years in 1989 and 53 years in 1999). Factor analysis identified three dietary patterns for women (fruit, vegetables and dairy; ethnic foods and alcohol; meat, potatoes and sweet foods) and two patterns in men (ethnic foods and alcohol; mixed). Trends in dietary pattern scores were calculated using random effects models. Marked changes were found in scores for all patterns between 1989 and 1999, with only the meat, potatoes and sweet foods pattern in women recording a decline. In a multiple variable model that included the three time-points, socio-demographic variables and BMI time-dependent covariates, both non-manual social class and higher education level were also strongly associated with the consumption of more items from the ethnic foods and alcohol pattern and the mixed pattern for men (P<0[middle dot]0001) and the fruit, vegetables and dairy pattern and the ethnic foods and alcohol pattern for women (P<0[middle dot]01). In conclusion, longitudinal changes in dietary patterns and across socio-economic groups can assist with targeting public health initiatives by identifying stages during adult life when interventions to improve diet would be most beneficial to health.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Regular expressions are used to parse textual data to match patterns and extract variables. They have been implemented in a vast number of programming languages with a significant quantity of research devoted to improving their operational efficiency. However, regular expressions are limited to finding linear matches. Little research has been done in the field of object-oriented results which would allow textual or binary data to be converted to multi-layered objects. This is significantly relevant as many of todaypsilas data formats are object-based. This paper extends our previous work by detailing an algorithmic approach to perform object-oriented parsing, and provides an initial study of benchmarks of the algorithms of our contribution

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work presents a neural network based on the ART architecture ( adaptive resonance theory), named fuzzy ART& ARTMAP neural network, applied to the electric load-forecasting problem. The neural networks based on the ARTarchitecture have two fundamental characteristics that are extremely important for the network performance ( stability and plasticity), which allow the implementation of continuous training. The fuzzy ART& ARTMAP neural network aims to reduce the imprecision of the forecasting results by a mechanism that separate the analog and binary data, processing them separately. Therefore, this represents a reduction on the processing time and improved quality of the results, when compared to the Back-Propagation neural network, and better to the classical forecasting techniques (ARIMA of Box and Jenkins methods). Finished the training, the fuzzy ART& ARTMAP neural network is capable to forecast electrical loads 24 h in advance. To validate the methodology, data from a Brazilian electric company is used. (C) 2004 Elsevier B.V. All rights reserved.