963 resultados para count data models
Resumo:
This thesis introduces heat demand forecasting models which are generated by using data mining algorithms. The forecast spans one full day and this forecast can be used in regulating heat consumption of buildings. For training the data mining models, two years of heat consumption data from a case building and weather measurement data from Finnish Meteorological Institute are used. The thesis utilizes Microsoft SQL Server Analysis Services data mining tools in generating the data mining models and CRISP-DM process framework to implement the research. Results show that the built models can predict heat demand at best with mean average percentage errors of 3.8% for 24-h profile and 5.9% for full day. A deployment model for integrating the generated data mining models into an existing building energy management system is also discussed.
Resumo:
Multivariate lifetime data arise in various forms including recurrent event data when individuals are followed to observe the sequence of occurrences of a certain type of event; correlated lifetime when an individual is followed for the occurrence of two or more types of events, or when distinct individuals have dependent event times. In most studies there are covariates such as treatments, group indicators, individual characteristics, or environmental conditions, whose relationship to lifetime is of interest. This leads to a consideration of regression models.The well known Cox proportional hazards model and its variations, using the marginal hazard functions employed for the analysis of multivariate survival data in literature are not sufficient to explain the complete dependence structure of pair of lifetimes on the covariate vector. Motivated by this, in Chapter 2, we introduced a bivariate proportional hazards model using vector hazard function of Johnson and Kotz (1975), in which the covariates under study have different effect on two components of the vector hazard function. The proposed model is useful in real life situations to study the dependence structure of pair of lifetimes on the covariate vector . The well known partial likelihood approach is used for the estimation of parameter vectors. We then introduced a bivariate proportional hazards model for gap times of recurrent events in Chapter 3. The model incorporates both marginal and joint dependence of the distribution of gap times on the covariate vector . In many fields of application, mean residual life function is considered superior concept than the hazard function. Motivated by this, in Chapter 4, we considered a new semi-parametric model, bivariate proportional mean residual life time model, to assess the relationship between mean residual life and covariates for gap time of recurrent events. The counting process approach is used for the inference procedures of the gap time of recurrent events. In many survival studies, the distribution of lifetime may depend on the distribution of censoring time. In Chapter 5, we introduced a proportional hazards model for duration times and developed inference procedures under dependent (informative) censoring. In Chapter 6, we introduced a bivariate proportional hazards model for competing risks data under right censoring. The asymptotic properties of the estimators of the parameters of different models developed in previous chapters, were studied. The proposed models were applied to various real life situations.
Resumo:
This thesis entitled Reliability Modelling and Analysis in Discrete time Some Concepts and Models Useful in the Analysis of discrete life time data.The present study consists of five chapters. In Chapter II we take up the derivation of some general results useful in reliability modelling that involves two component mixtures. Expression for the failure rate, mean residual life and second moment of residual life of the mixture distributions in terms of the corresponding quantities in the component distributions are investigated. Some applications of these results are also pointed out. The role of the geometric,Waring and negative hypergeometric distributions as models of life lengths in the discrete time domain has been discussed already. While describing various reliability characteristics, it was found that they can be often considered as a class. The applicability of these models in single populations naturally extends to the case of populations composed of sub-populations making mixtures of these distributions worth investigating. Accordingly the general properties, various reliability characteristics and characterizations of these models are discussed in chapter III. Inference of parameters in mixture distribution is usually a difficult problem because the mass function of the mixture is a linear function of the component masses that makes manipulation of the likelihood equations, leastsquare function etc and the resulting computations.very difficult. We show that one of our characterizations help in inferring the parameters of the geometric mixture without involving computational hazards. As mentioned in the review of results in the previous sections, partial moments were not studied extensively in literature especially in the case of discrete distributions. Chapters IV and V deal with descending and ascending partial factorial moments. Apart from studying their properties, we prove characterizations of distributions by functional forms of partial moments and establish recurrence relations between successive moments for some well known families. It is further demonstrated that partial moments are equally efficient and convenient compared to many of the conventional tools to resolve practical problems in reliability modelling and analysis. The study concludes by indicating some new problems that surfaced during the course of the present investigation which could be the subject for a future work in this area.
Resumo:
Data mining is one of the hottest research areas nowadays as it has got wide variety of applications in common man’s life to make the world a better place to live. It is all about finding interesting hidden patterns in a huge history data base. As an example, from a sales data base, one can find an interesting pattern like “people who buy magazines tend to buy news papers also” using data mining. Now in the sales point of view the advantage is that one can place these things together in the shop to increase sales. In this research work, data mining is effectively applied to a domain called placement chance prediction, since taking wise career decision is so crucial for anybody for sure. In India technical manpower analysis is carried out by an organization named National Technical Manpower Information System (NTMIS), established in 1983-84 by India's Ministry of Education & Culture. The NTMIS comprises of a lead centre in the IAMR, New Delhi, and 21 nodal centres located at different parts of the country. The Kerala State Nodal Centre is located at Cochin University of Science and Technology. In Nodal Centre, they collect placement information by sending postal questionnaire to passed out students on a regular basis. From this raw data available in the nodal centre, a history data base was prepared. Each record in this data base includes entrance rank ranges, reservation, Sector, Sex, and a particular engineering. From each such combination of attributes from the history data base of student records, corresponding placement chances is computed and stored in the history data base. From this data, various popular data mining models are built and tested. These models can be used to predict the most suitable branch for a particular new student with one of the above combination of criteria. Also a detailed performance comparison of the various data mining models is done.This research work proposes to use a combination of data mining models namely a hybrid stacking ensemble for better predictions. A strategy to predict the overall absorption rate for various branches as well as the time it takes for all the students of a particular branch to get placed etc are also proposed. Finally, this research work puts forward a new data mining algorithm namely C 4.5 * stat for numeric data sets which has been proved to have competent accuracy over standard benchmarking data sets called UCI data sets. It also proposes an optimization strategy called parameter tuning to improve the standard C 4.5 algorithm. As a summary this research work passes through all four dimensions for a typical data mining research work, namely application to a domain, development of classifier models, optimization and ensemble methods.
Resumo:
Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.
Resumo:
In the last years, the use of every type of Digital Elevation Models has iimproved. The LiDAR (Light Detection and Ranging) technology, based on the scansion of the territory b airborne laser telemeters, allows the construction of digital Surface Models (DSM), in an easy way by a simple data interpolation
Resumo:
Resumen tomado de la publicaci??n. Resumen tambi??n en ingl??s
Resumo:
A combination of satellite data, reanalysis products and climate models are combined to monitor changes in water vapour, clear-sky radiative cooling of the atmosphere and precipitation over the period 1979-2006. Climate models are able to simulate observed increases in column integrated water vapour (CWV) with surface temperature (Ts) over the ocean. Changes in the observing system lead to spurious variability in water vapour and clear-sky longwave radiation in reanalysis products. Nevertheless all products considered exhibit a robust increase in clear-sky longwave radiative cooling from the atmosphere to the surface; clear-sky longwave radiative cooling of the atmosphere is found to increase with Ts at the rate of ~4 Wm-2 K-1 over tropical ocean regions of mean descending vertical motion. Precipitation (P) is tightly coupled to atmospheric radiative cooling rates and this implies an increase in P with warming at a slower rate than the observed increases in CWV. Since convective precipitation depends on moisture convergence, the above implies enhanced precipitation over convective regions and reduced precipitation over convectively suppressed regimes. To quantify this response, observed and simulated changes in precipitation rate are analysed separately over regions of mean ascending and descending vertical motion over the tropics. The observed response is found to be substantially larger than the model simulations and climate change projections. It is currently not clear whether this is due to deficiencies in model parametrizations or errors in satellite retrievals.