960 resultados para Discrete Data Models
Resumo:
A comprehensive user model, built by monitoring a user's current use of applications, can be an excellent starting point for building adaptive user-centred applications. The BaranC framework monitors all user interaction with a digital device (e.g. smartphone), and also collects all available context data (such as from sensors in the digital device itself, in a smart watch, or in smart appliances) in order to build a full model of user application behaviour. The model built from the collected data, called the UDI (User Digital Imprint), is further augmented by analysis services, for example, a service to produce activity profiles from smartphone sensor data. The enhanced UDI model can then be the basis for building an appropriate adaptive application that is user-centred as it is based on an individual user model. As BaranC supports continuous user monitoring, an application can be dynamically adaptive in real-time to the current context (e.g. time, location or activity). Furthermore, since BaranC is continuously augmenting the user model with more monitored data, over time the user model changes, and the adaptive application can adapt gradually over time to changing user behaviour patterns. BaranC has been implemented as a service-oriented framework where the collection of data for the UDI and all sharing of the UDI data are kept strictly under the user's control. In addition, being service-oriented allows (with the user's permission) its monitoring and analysis services to be easily used by 3rd parties in order to provide 3rd party adaptive assistant services. An example 3rd party service demonstrator, built on top of BaranC, proactively assists a user by dynamic predication, based on the current context, what apps and contacts the user is likely to need. BaranC introduces an innovative user-controlled unified service model of monitoring and use of personal digital activity data in order to provide adaptive user-centred applications. This aims to improve on the current situation where the diversity of adaptive applications results in a proliferation of applications monitoring and using personal data, resulting in a lack of clarity, a dispersal of data, and a diminution of user control.
Resumo:
Semantic relations are an important element in the construction of ontologies and models of problem domains. Nevertheless, they remain fuzzy or under-specified. This is a pervasive problem in software engineering and artificial intelligence. Thus, we find semantic links that can have multiple interpretations in wide-coverage ontologies, semantic data models with abstractions that are not enough to capture the relation richness of problem domains, and improperly structured taxonomies. However, if relations are provided with precise semantics, some of these problems can be avoided, and meaningful operations can be performed on them. In this paper we present some insightful issues about the modeling, representation and usage of relations including the available taxonomy structuring methodologies as well as the initiatives aiming to provide relations with precise semantics. Moreover, we explain and propose the control of relations as a key issue for the coherent construction of ontologies.
Resumo:
Effective and efficient implementation of intelligent and/or recently emerged networked manufacturing systems require an enterprise level integration. The networked manufacturing offers several advantages in the current competitive atmosphere by way to reduce, by shortening manufacturing cycle time and maintaining the production flexibility thereby achieving several feasible process plans. The first step in this direction is to integrate manufacturing functions such as process planning and scheduling for multi-jobs in a network based manufacturing system. It is difficult to determine a proper plan that meets conflicting objectives simultaneously. This paper describes a mobile-agent based negotiation approach to integrate manufacturing functions in a distributed manner; and its fundamental framework and functions are presented. Moreover, ontology has been constructed by using the Protégé software which possesses the flexibility to convert knowledge into Extensible Markup Language (XML) schema of Web Ontology Language (OWL) documents. The generated XML schemas have been used to transfer information throughout the manufacturing network for the intelligent interoperable integration of product data models and manufacturing resources. To validate the feasibility of the proposed approach, an illustrative example along with varied production environments that includes production demand fluctuations is presented and compared the proposed approach performance and its effectiveness with evolutionary algorithm based Hybrid Dynamic-DNA (HD-DNA) algorithm. The results show that the proposed scheme is very effective and reasonably acceptable for integration of manufacturing functions.
Resumo:
Solar radiation data is crucial for the design of energy systems based on the solar resource. Since diffuse radiation measurements are not always available in the archive data series, either due to the inexistence of measuring equipment, shading device misplacement or missing data, models to generate these data are needed. In this work, one year of hourly and daily horizontal solar global and diffuse irradiation measurements in Évora are used to establish a new relation between the diffuse radiation and the clearness index. The proposed model includes a fitting parameter, which was adjusted through a simple optimization procedure to minimize the Least Square Error as compared to measurements. A comparison against several other fitting models presented in the literature was also carried out using the Root Mean Square Error as statistical indicator, and it was found that the present model is more accurate than the previous fitting models for the diffuse radiation data in Évora.
Resumo:
This thesis entitled Reliability Modelling and Analysis in Discrete time Some Concepts and Models Useful in the Analysis of discrete life time data.The present study consists of five chapters. In Chapter II we take up the derivation of some general results useful in reliability modelling that involves two component mixtures. Expression for the failure rate, mean residual life and second moment of residual life of the mixture distributions in terms of the corresponding quantities in the component distributions are investigated. Some applications of these results are also pointed out. The role of the geometric,Waring and negative hypergeometric distributions as models of life lengths in the discrete time domain has been discussed already. While describing various reliability characteristics, it was found that they can be often considered as a class. The applicability of these models in single populations naturally extends to the case of populations composed of sub-populations making mixtures of these distributions worth investigating. Accordingly the general properties, various reliability characteristics and characterizations of these models are discussed in chapter III. Inference of parameters in mixture distribution is usually a difficult problem because the mass function of the mixture is a linear function of the component masses that makes manipulation of the likelihood equations, leastsquare function etc and the resulting computations.very difficult. We show that one of our characterizations help in inferring the parameters of the geometric mixture without involving computational hazards. As mentioned in the review of results in the previous sections, partial moments were not studied extensively in literature especially in the case of discrete distributions. Chapters IV and V deal with descending and ascending partial factorial moments. Apart from studying their properties, we prove characterizations of distributions by functional forms of partial moments and establish recurrence relations between successive moments for some well known families. It is further demonstrated that partial moments are equally efficient and convenient compared to many of the conventional tools to resolve practical problems in reliability modelling and analysis. The study concludes by indicating some new problems that surfaced during the course of the present investigation which could be the subject for a future work in this area.
Resumo:
In this study, regression models are evaluated for grouped survival data when the effect of censoring time is considered in the model and the regression structure is modeled through four link functions. The methodology for grouped survival data is based on life tables, and the times are grouped in k intervals so that ties are eliminated. Thus, the data modeling is performed by considering the discrete models of lifetime regression. The model parameters are estimated by using the maximum likelihood and jackknife methods. To detect influential observations in the proposed models, diagnostic measures based on case deletion, which are denominated global influence, and influence measures based on small perturbations in the data or in the model, referred to as local influence, are used. In addition to those measures, the local influence and the total influential estimate are also employed. Various simulation studies are performed and compared to the performance of the four link functions of the regression models for grouped survival data for different parameter settings, sample sizes and numbers of intervals. Finally, a data set is analyzed by using the proposed regression models. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The term reliability of an equipment or device is often meant to indicate the probability that it carries out the functions expected of it adequately or without failure and within specified performance limits at a given age for a desired mission time when put to use under the designated application and operating environmental stress. A broad classification of the approaches employed in relation to reliability studies can be made as probabilistic and deterministic, where the main interest in the former is to device tools and methods to identify the random mechanism governing the failure process through a proper statistical frame work, while the latter addresses the question of finding the causes of failure and steps to reduce individual failures thereby enhancing reliability. In the probabilistic attitude to which the present study subscribes to, the concept of life distribution, a mathematical idealisation that describes the failure times, is fundamental and a basic question a reliability analyst has to settle is the form of the life distribution. It is for no other reason that a major share of the literature on the mathematical theory of reliability is focussed on methods of arriving at reasonable models of failure times and in showing the failure patterns that induce such models. The application of the methodology of life time distributions is not confined to the assesment of endurance of equipments and systems only, but ranges over a wide variety of scientific investigations where the word life time may not refer to the length of life in the literal sense, but can be concieved in its most general form as a non-negative random variable. Thus the tools developed in connection with modelling life time data have found applications in other areas of research such as actuarial science, engineering, biomedical sciences, economics, extreme value theory etc.
Resumo:
This article presents important properties of standard discrete distributions and its conjugate densities. The Bernoulli and Poisson processes are described as generators of such discrete models. A characterization of distributions by mixtures is also introduced. This article adopts a novel singular notation and representation. Singular representations are unusual in statistical texts. Nevertheless, the singular notation makes it simpler to extend and generalize theoretical results and greatly facilitates numerical and computational implementation.
Resumo:
In this paper, we derive score test statistics to discriminate between proportional hazards and proportional odds models for grouped survival data. These models are embedded within a power family transformation in order to obtain the score tests. In simple cases, some small-sample results are obtained for the score statistics using Monte Carlo simulations. Score statistics have distributions well approximated by the chi-squared distribution. Real examples illustrate the proposed tests.
Resumo:
This work develops a new methodology in order to discriminate models for interval-censored data based on bootstrap residual simulation by observing the deviance difference from one model in relation to another, according to Hinde (1992). Generally, this sort of data can generate a large number of tied observations and, in this case, survival time can be regarded as discrete. Therefore, the Cox proportional hazards model for grouped data (Prentice & Gloeckler, 1978) and the logistic model (Lawless, 1982) can befitted by means of generalized linear models. Whitehead (1989) considered censoring to be an indicative variable with a binomial distribution and fitted the Cox proportional hazards model using complementary log-log as a link function. In addition, a logistic model can be fitted using logit as a link function. The proposed methodology arises as an alternative to the score tests developed by Colosimo et al. (2000), where such models can be obtained for discrete binary data as particular cases from the Aranda-Ordaz distribution asymmetric family. These tests are thus developed with a basis on link functions to generate such a fit. The example that motivates this study was the dataset from an experiment carried out on a flax cultivar planted on four substrata susceptible to the pathogen Fusarium oxysoprum. The response variable, which is the time until blighting, was observed in intervals during 52 days. The results were compared with the model fit and the AIC values.
Resumo:
The discrete-time Markov chain is commonly used in describing changes of health states for chronic diseases in a longitudinal study. Statistical inferences on comparing treatment effects or on finding determinants of disease progression usually require estimation of transition probabilities. In many situations when the outcome data have some missing observations or the variable of interest (called a latent variable) can not be measured directly, the estimation of transition probabilities becomes more complicated. In the latter case, a surrogate variable that is easier to access and can gauge the characteristics of the latent one is usually used for data analysis. ^ This dissertation research proposes methods to analyze longitudinal data (1) that have categorical outcome with missing observations or (2) that use complete or incomplete surrogate observations to analyze the categorical latent outcome. For (1), different missing mechanisms were considered for empirical studies using methods that include EM algorithm, Monte Carlo EM and a procedure that is not a data augmentation method. For (2), the hidden Markov model with the forward-backward procedure was applied for parameter estimation. This method was also extended to cover the computation of standard errors. The proposed methods were demonstrated by the Schizophrenia example. The relevance of public health, the strength and limitations, and possible future research were also discussed. ^
Resumo:
Mixture modeling is commonly used to model categorical latent variables that represent subpopulations in which population membership is unknown but can be inferred from the data. In relatively recent years, the potential of finite mixture models has been applied in time-to-event data. However, the commonly used survival mixture model assumes that the effects of the covariates involved in failure times differ across latent classes, but the covariate distribution is homogeneous. The aim of this dissertation is to develop a method to examine time-to-event data in the presence of unobserved heterogeneity under a framework of mixture modeling. A joint model is developed to incorporate the latent survival trajectory along with the observed information for the joint analysis of a time-to-event variable, its discrete and continuous covariates, and a latent class variable. It is assumed that the effects of covariates on survival times and the distribution of covariates vary across different latent classes. The unobservable survival trajectories are identified through estimating the probability that a subject belongs to a particular class based on observed information. We applied this method to a Hodgkin lymphoma study with long-term follow-up and observed four distinct latent classes in terms of long-term survival and distributions of prognostic factors. Our results from simulation studies and from the Hodgkin lymphoma study demonstrated the superiority of our joint model compared with the conventional survival model. This flexible inference method provides more accurate estimation and accommodates unobservable heterogeneity among individuals while taking involved interactions between covariates into consideration.^
Resumo:
Recently, we have developed the hierarchical Generative Topographic Mapping (HGTM), an interactive method for visualization of large high-dimensional real-valued data sets. In this paper, we propose a more general visualization system by extending HGTM in three ways, which allows the user to visualize a wider range of data sets and better support the model development process. 1) We integrate HGTM with noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM). This enables us to visualize data of inherently discrete nature, e.g., collections of documents, in a hierarchical manner. 2) We give the user a choice of initializing the child plots of the current plot in either interactive, or automatic mode. In the interactive mode, the user selects "regions of interest," whereas in the automatic mode, an unsupervised minimum message length (MML)-inspired construction of a mixture of LTMs is employed. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. 3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualization plots, since they can highlight the boundaries between data clusters. We illustrate our approach on a toy example and evaluate it on three more complex real data sets. © 2005 IEEE.
Resumo:
In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.
Resumo:
Often in biomedical research, we deal with continuous (clustered) proportion responses ranging between zero and one quantifying the disease status of the cluster units. Interestingly, the study population might also consist of relatively disease-free as well as highly diseased subjects, contributing to proportion values in the interval [0, 1]. Regression on a variety of parametric densities with support lying in (0, 1), such as beta regression, can assess important covariate effects. However, they are deemed inappropriate due to the presence of zeros and/or ones. To evade this, we introduce a class of general proportion density, and further augment the probabilities of zero and one to this general proportion density, controlling for the clustering. Our approach is Bayesian and presents a computationally convenient framework amenable to available freeware. Bayesian case-deletion influence diagnostics based on q-divergence measures are automatic from the Markov chain Monte Carlo output. The methodology is illustrated using both simulation studies and application to a real dataset from a clinical periodontology study.