960 resultados para Discrete Data Models


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The schema of an information system can significantly impact the ability of end users to efficiently and effectively retrieve the information they need. Obtaining quickly the appropriate data increases the likelihood that an organization will make good decisions and respond adeptly to challenges. This research presents and validates a methodology for evaluating, ex ante, the relative desirability of alternative instantiations of a model of data. In contrast to prior research, each instantiation is based on a different formal theory. This research theorizes that the instantiation that yields the lowest weighted average query complexity for a representative sample of information requests is the most desirable instantiation for end-user queries. The theory was validated by an experiment that compared end-user performance using an instantiation of a data structure based on the relational model of data with performance using the corresponding instantiation of the data structure based on the object-relational model of data. Complexity was measured using three different Halstead metrics: program length, difficulty, and effort. For a representative sample of queries, the average complexity using each instantiation was calculated. As theorized, end users querying the instantiation with the lower average complexity made fewer semantic errors, i.e., were more effective at composing queries. (c) 2005 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conceptual modeling forms an important part of systems analysis. If this is done incorrectly or incompletely, there can be serious implications for the resultant system, specifically in terms of rework and useability. One approach to improving the conceptual modelling process is to evaluate how well the model represents reality. Emergence of the Bunge-Wand-Weber (BWW) ontological model introduced a platform to classify and compare the grammar of conceptual modelling languages. This work applies the BWW theory to a real world example in the health arena. The general practice computing group data model was developed using the Barker Entity Relationship Modelling technique. We describe an experiment, grounded in ontological theory, which evaluates how well the GPCG data model is understood by domain experts. The results show that with the exception of the use of entities to represent events, the raw model is better understood by domain experts

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Even when data repositories exhibit near perfect data quality, users may formulate queries that do not correspond to the information requested. Users’ poor information retrieval performance may arise from either problems understanding of the data models that represent the real world systems, or their query skills. This research focuses on users’ understanding of the data structures, i.e., their ability to map the information request and the data model. The Bunge-Wand-Weber ontology was used to formulate three sets of hypotheses. Two laboratory experiments (one using a small data model and one using a larger data model) tested the effect of ontological clarity on users’ performance when undertaking component, record, and aggregate level tasks. The results indicate for the hypotheses associated with different representations but equivalent semantics that parsimonious data model participants performed better for component level tasks but that ontologically clearer data model participants performed better for record and aggregate level tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we address the problem of robust information embedding in digital data. Such a process is carried out by introducing modifications to the original data that one would like to keep minimal. It assumes that the data, which includes the embedded information, is corrupted before the extraction is carried out. We propose a principled way to tailor an efficient embedding process for given data and noise statistics. © Springer-Verlag Berlin Heidelberg 2005.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, discrete time one-factor models of the term structure of interest rates and their application to the pricing of interest rate contingent claims are examined theoretically and empirically. The first chapter provides a discussion of the issues involved in the pricing of interest rate contingent claims and a description of the Ho and Lee (1986), Maloney and Byrne (1989), and Black, Derman, and Toy (1990) discrete time models. In the second chapter, a general discrete time model of the term structure from which the Ho and Lee, Maloney and Byrne, and Black, Derman, and Toy models can all be obtained is presented. The general model also provides for the specification of an additional model, the ExtendedMB model. The third chapter illustrates the application of the discrete time models to the pricing of a variety of interest rate contingent claims. In the final chapter, the performance of the Ho and Lee, Black, Derman, and Toy, and ExtendedMB models in the pricing of Eurodollar futures options is investigated empirically. The results indicate that the Black, Derman, and Toy and ExtendedMB models outperform the Ho and Lee model. Little difference in the performance of the Black, Derman, and Toy and ExtendedMB models is detected. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An important aspect of constructing discrete velocity models (DVMs) for the Boltzmann equation is to obtain the right number of collision invariants. It is a well-known fact that DVMs can also have extra collision invariants, so called spurious collision invariants, in plus to the physical ones. A DVM with only physical collision invariants, and so without spurious ones, is called normal. For binary mixtures also the concept of supernormal DVMs was introduced, meaning that in addition to the DVM being normal, the restriction of the DVM to any single species also is normal. Here we introduce generalizations of this concept to DVMs for multicomponent mixtures. We also present some general algorithms for constructing such models and give some concrete examples of such constructions. One of our main results is that for any given number of species, and any given rational mass ratios we can construct a supernormal DVM. The DVMs are constructed in such a way that for half-space problems, as the Milne and Kramers problems, but also nonlinear ones, we obtain similar structures as for the classical discrete Boltzmann equation for one species, and therefore we can apply obtained results for the classical Boltzmann equation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ties among event times are often recorded in survival studies. For example, in a two week laboratory study where event times are measured in days, ties are very likely to occur. The proportional hazards model might be used in this setting using an approximated partial likelihood function. This approximation works well when the number of ties is small. on the other hand, discrete regression models are suggested when the data are heavily tied. However, in many situations it is not clear which approach should be used in practice. In this work, empirical guidelines based on Monte Carlo simulations are provided. These recommendations are based on a measure of the amount of tied data present and the mean square error. An example illustrates the proposed criterion.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background Multiple logistic regression is precluded from many practical applications in ecology that aim to predict the geographic distributions of species because it requires absence data, which are rarely available or are unreliable. In order to use multiple logistic regression, many studies have simulated "pseudo-absences" through a number of strategies, but it is unknown how the choice of strategy influences models and their geographic predictions of species. In this paper we evaluate the effect of several prevailing pseudo-absence strategies on the predictions of the geographic distribution of a virtual species whose "true" distribution and relationship to three environmental predictors was predefined. We evaluated the effect of using a) real absences b) pseudo-absences selected randomly from the background and c) two-step approaches: pseudo-absences selected from low suitability areas predicted by either Ecological Niche Factor Analysis: (ENFA) or BIOCLIM. We compared how the choice of pseudo-absence strategy affected model fit, predictive power, and information-theoretic model selection results. Results Models built with true absences had the best predictive power, best discriminatory power, and the "true" model (the one that contained the correct predictors) was supported by the data according to AIC, as expected. Models based on random pseudo-absences had among the lowest fit, but yielded the second highest AUC value (0.97), and the "true" model was also supported by the data. Models based on two-step approaches had intermediate fit, the lowest predictive power, and the "true" model was not supported by the data. Conclusion If ecologists wish to build parsimonious GLM models that will allow them to make robust predictions, a reasonable approach is to use a large number of randomly selected pseudo-absences, and perform model selection based on an information theoretic approach. However, the resulting models can be expected to have limited fit.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many European states apply score systems to evaluate the disability severity of non-fatal motor victims under the law of third-party liability. The score is a non-negative integer with an upper bound at 100 that increases with severity. It may be automatically converted into financial terms and thus also reflects the compensation cost for disability. In this paper, discrete regression models are applied to analyze the factors that influence the disability severity score of victims. Standard and zero-altered regression models are compared from two perspectives: an interpretation of the data generating process and the level of statistical fit. The results have implications for traffic safety policy decisions aimed at reducing accident severity. An application using data from Spain is provided.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We address the effect of solvation on the lowest electronic excitation energy of camphor. The solvents considered represent a large variation in-solvent polarity. We consider three conceptually different ways of accounting for the solvent using either an implicit, a discrete or an explicit solvation model. The solvatochromic shifts in polar solvents are found to be in good agreement with the experimental data for all three solvent models. However, both the implicit and discrete solvation models are less successful in predicting solvatochromic shifts for solvents of low polarity. The results presented suggest the importance of using explicit solvent molecules in the case of nonpolar solvents. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this article, we introduce a semi-parametric Bayesian approach based on Dirichlet process priors for the discrete calibration problem in binomial regression models. An interesting topic is the dosimetry problem related to the dose-response model. A hierarchical formulation is provided so that a Markov chain Monte Carlo approach is developed. The methodology is applied to simulated and real data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation proposes statistical methods to formulate, estimate and apply complex transportation models. Two main problems are part of the analyses conducted and presented in this dissertation. The first method solves an econometric problem and is concerned with the joint estimation of models that contain both discrete and continuous decision variables. The use of ordered models along with a regression is proposed and their effectiveness is evaluated with respect to unordered models. Procedure to calculate and optimize the log-likelihood functions of both discrete-continuous approaches are derived, and difficulties associated with the estimation of unordered models explained. Numerical approximation methods based on the Genz algortithm are implemented in order to solve the multidimensional integral associated with the unordered modeling structure. The problems deriving from the lack of smoothness of the probit model around the maximum of the log-likelihood function, which makes the optimization and the calculation of standard deviations very difficult, are carefully analyzed. A methodology to perform out-of-sample validation in the context of a joint model is proposed. Comprehensive numerical experiments have been conducted on both simulated and real data. In particular, the discrete-continuous models are estimated and applied to vehicle ownership and use models on data extracted from the 2009 National Household Travel Survey. The second part of this work offers a comprehensive statistical analysis of free-flow speed distribution; the method is applied to data collected on a sample of roads in Italy. A linear mixed model that includes speed quantiles in its predictors is estimated. Results show that there is no road effect in the analysis of free-flow speeds, which is particularly important for model transferability. A very general framework to predict random effects with few observations and incomplete access to model covariates is formulated and applied to predict the distribution of free-flow speed quantiles. The speed distribution of most road sections is successfully predicted; jack-knife estimates are calculated and used to explain why some sections are poorly predicted. Eventually, this work contributes to the literature in transportation modeling by proposing econometric model formulations for discrete-continuous variables, more efficient methods for the calculation of multivariate normal probabilities, and random effects models for free-flow speed estimation that takes into account the survey design. All methods are rigorously validated on both real and simulated data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Geographic Data Warehouses (GDW) are one of the main technologies used in decision-making processes and spatial analysis, and the literature proposes several conceptual and logical data models for GDW. However, little effort has been focused on studying how spatial data redundancy affects SOLAP (Spatial On-Line Analytical Processing) query performance over GDW. In this paper, we investigate this issue. Firstly, we compare redundant and non-redundant GDW schemas and conclude that redundancy is related to high performance losses. We also analyze the issue of indexing, aiming at improving SOLAP query performance on a redundant GDW. Comparisons of the SB-index approach, the star-join aided by R-tree and the star-join aided by GiST indicate that the SB-index significantly improves the elapsed time in query processing from 25% up to 99% with regard to SOLAP queries defined over the spatial predicates of intersection, enclosure and containment and applied to roll-up and drill-down operations. We also investigate the impact of the increase in data volume on the performance. The increase did not impair the performance of the SB-index, which highly improved the elapsed time in query processing. Performance tests also show that the SB-index is far more compact than the star-join, requiring only a small fraction of at most 0.20% of the volume. Moreover, we propose a specific enhancement of the SB-index to deal with spatial data redundancy. This enhancement improved performance from 80 to 91% for redundant GDW schemas.