893 resultados para Biogeography, Bioregions, Subregion, Statistical Modelling, GIS, Finite Mixture Models
Resumo:
Survival models are being widely applied to the engineering field to model time-to-event data once censored data is here a common issue. Using parametric models or not, for the case of heterogeneous data, they may not always represent a good fit. The present study relays on critical pumps survival data where traditional parametric regression might be improved in order to obtain better approaches. Considering censored data and using an empiric method to split the data into two subgroups to give the possibility to fit separated models to our censored data, we’ve mixture two distinct distributions according a mixture-models approach. We have concluded that it is a good method to fit data that does not fit to a usual parametric distribution and achieve reliable parameters. A constant cumulative hazard rate policy was used as well to check optimum inspection times using the obtained model from the mixture-model, which could be a plus when comparing with the actual maintenance policies to check whether changes should be introduced or not.
Resumo:
Tese apresentada como requisito parcial para obtenção do grau de Doutor em Estatística e Gestão de Informação pelo Instituto Superior de Estatística e Gestão de Informação da Universidade Nova de Lisboa
Resumo:
Many of developing countries are facing crisis in water management due to increasing of population, water scarcity, water contaminations and effects of world economic crisis. Water distribution systems in developing countries are facing many challenges of efficient repair and rehabilitation since the information of water network is very limited, which makes the rehabilitation assessment plans very difficult. Sufficient information with high technology in developed countries makes the assessment for rehabilitation easy. Developing countries have many difficulties to assess the water network causing system failure, deterioration of mains and bad water quality in the network due to pipe corrosion and deterioration. The limited information brought into focus the urgent need to develop economical assessment for rehabilitation of water distribution systems adapted to water utilities. Gaza Strip is subject to a first case study, suffering from severe shortage in the water supply and environmental problems and contamination of underground water resources. This research focuses on improvement of water supply network to reduce the water losses in water network based on limited database using techniques of ArcGIS and commercial water network software (WaterCAD). A new approach for rehabilitation water pipes has been presented in Gaza city case study. Integrated rehabilitation assessment model has been developed for rehabilitation water pipes including three components; hydraulic assessment model, Physical assessment model and Structural assessment model. WaterCAD model has been developed with integrated in ArcGIS to produce the hydraulic assessment model for water network. The model have been designed based on pipe condition assessment with 100 score points as a maximum points for pipe condition. As results from this model, we can indicate that 40% of water pipeline have score points less than 50 points and about 10% of total pipes length have less than 30 score points. By using this model, the rehabilitation plans for each region in Gaza city can be achieved based on available budget and condition of pipes. The second case study is Kuala Lumpur Case from semi-developed countries, which has been used to develop an approach to improve the water network under crucial conditions using, advanced statistical and GIS techniques. Kuala Lumpur (KL) has water losses about 40% and high failure rate, which make severe problem. This case can represent cases in South Asia countries. Kuala Lumpur faced big challenges to reduce the water losses in water network during last 5 years. One of these challenges is high deterioration of asbestos cement (AC) pipes. They need to replace more than 6500 km of AC pipes, which need a huge budget to be achieved. Asbestos cement is subject to deterioration due to various chemical processes that either leach out the cement material or penetrate the concrete to form products that weaken the cement matrix. This case presents an approach for geo-statistical model for modelling pipe failures in a water distribution network. Database of Syabas Company (Kuala Lumpur water company) has been used in developing the model. The statistical models have been calibrated, verified and used to predict failures for both networks and individual pipes. The mathematical formulation developed for failure frequency in Kuala Lumpur was based on different pipeline characteristics, reflecting several factors such as pipe diameter, length, pressure and failure history. Generalized linear model have been applied to predict pipe failures based on District Meter Zone (DMZ) and individual pipe levels. Based on Kuala Lumpur case study, several outputs and implications have been achieved. Correlations between spatial and temporal intervals of pipe failures also have been done using ArcGIS software. Water Pipe Assessment Model (WPAM) has been developed using the analysis of historical pipe failure in Kuala Lumpur which prioritizing the pipe rehabilitation candidates based on ranking system. Frankfurt Water Network in Germany is the third main case study. This case makes an overview for Survival analysis and neural network methods used in water network. Rehabilitation strategies of water pipes have been developed for Frankfurt water network in cooperation with Mainova (Frankfurt Water Company). This thesis also presents a methodology of technical condition assessment of plastic pipes based on simple analysis. This thesis aims to make contribution to improve the prediction of pipe failures in water networks using Geographic Information System (GIS) and Decision Support System (DSS). The output from the technical condition assessment model can be used to estimate future budget needs for rehabilitation and to define pipes with high priority for replacement based on poor condition. rn
Resumo:
The modelling of inpatient length of stay (LOS) has important implications in health care studies. Finite mixture distributions are usually used to model the heterogeneous LOS distribution, due to a certain proportion of patients sustaining-a longer stay. However, the morbidity data are collected from hospitals, observations clustered within the same hospital are often correlated. The generalized linear mixed model approach is adopted to accommodate the inherent correlation via unobservable random effects. An EM algorithm is developed to obtain residual maximum quasi-likelihood estimation. The proposed hierarchical mixture regression approach enables the identification and assessment of factors influencing the long-stay proportion and the LOS for the long-stay patient subgroup. A neonatal LOS data set is used for illustration, (C) 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Resumo:
This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
In a recent paper Bermúdez [2009] used bivariate Poisson regression models for ratemaking in car insurance, and included zero-inflated models to account for the excess of zeros and the overdispersion in the data set. In the present paper, we revisit this model in order to consider alternatives. We propose a 2-finite mixture of bivariate Poisson regression models to demonstrate that the overdispersion in the data requires more structure if it is to be taken into account, and that a simple zero-inflated bivariate Poisson model does not suffice. At the same time, we show that a finite mixture of bivariate Poisson regression models embraces zero-inflated bivariate Poisson regression models as a special case. Additionally, we describe a model in which the mixing proportions are dependent on covariates when modelling the way in which each individual belongs to a separate cluster. Finally, an EM algorithm is provided in order to ensure the models’ ease-of-fit. These models are applied to the same automobile insurance claims data set as used in Bermúdez [2009] and it is shown that the modelling of the data set can be improved considerably.
Resumo:
Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.
Resumo:
The identification of compositional changes in fumarolic gases of active and quiescent volcanoes is one of the most important targets in monitoring programs. From a general point of view, many systematic (often cyclic) and random processes control the chemistry of gas discharges, making difficult to produce a convincing mathematical-statistical modelling. Changes in the chemical composition of volcanic gases sampled at Vulcano Island (Aeolian Arc, Sicily, Italy) from eight different fumaroles located in the northern sector of the summit crater (La Fossa) have been analysed by considering their dependence from time in the period 2000-2007. Each intermediate chemical composition has been considered as potentially derived from the contribution of the two temporal extremes represented by the 2000 and 2007 samples, respectively, by using inverse modelling methodologies for compositional data. Data pertaining to fumaroles F5 and F27, located on the rim and in the inner part of La Fossa crater, respectively, have been used to achieve the proposed aim. The statistical approach has allowed us to highlight the presence of random and not random fluctuations, features useful to understand how the volcanic system works, opening new perspectives in sampling strategies and in the evaluation of the natural risk related to a quiescent volcano
Resumo:
Background: The purpose of this study is to analyze the tension distribution on bone tissue around implants with different angulations (0 degrees, 17 degrees, and 30 degrees) and connections (external hexagon and tapered) through the use of three-dimensional finite element and statistical analyses.Methods: Twelve different configurations of three-dimensional finite element models, including three inclinations of the implants (0 degrees, 17 degrees, and 30 degrees), two connections (an external hexagon and a tapered), and two load applications (axial and oblique), were simulated. The maximum principal stress values for cortical bone were measured at the mesial, distal, buccal, and lingual regions around the implant for each analyzed situation, totaling 48 groups. Loads of 200 and 100 N were applied at the occlusal surface in the axial and oblique directions, respectively. Maximum principal stress values were measured at the bone crest and statistically analyzed using analysis of variance. Stress patterns in the bone tissue around the implant were analyzed qualitatively.Results: The results demonstrated that under the oblique loading process, the external hexagon connection showed significantly higher stress concentrations in the bone tissue (P < 0.05) compared with the tapered connection. Moreover, the buccal and mesial regions of the cortical bone concentrated significantly higher stress (P < 0.005) to the external hexagon implant type. Under the oblique loading direction, the increased external hexagon implant angulation induced a significantly higher stress concentration (P = 0.045).Conclusions: The study results show that: 1) the oblique load was more damaging to bone tissue, mainly when associated with external hexagon implants; and 2) there was a higher stress concentration on the buccal region in comparison to all other regions under oblique load.
Resumo:
Ecological regions are increasingly used as a spatial unit for planning and environmental management. It is important to define these regions in a scientifically defensible way to justify any decisions made on the basis that they are representative of broad environmental assets. The paper describes a methodology and tool to identify cohesive bioregions. The methodology applies an elicitation process to obtain geographical descriptions for bioregions, each of these is transformed into a Normal density estimate on environmental variables within that region. This prior information is balanced with data classification of environmental datasets using a Bayesian statistical modelling approach to objectively map ecological regions. The method is called model-based clustering as it fits a Normal mixture model to the clusters associated with regions, and it addresses issues of uncertainty in environmental datasets due to overlapping clusters.
Resumo:
Understanding how aquatic species grow is fundamental in fisheries because stock assessment often relies on growth dependent statistical models. Length-frequency-based methods become important when more applicable data for growth model estimation are either not available or very expensive. In this article, we develop a new framework for growth estimation from length-frequency data using a generalized von Bertalanffy growth model (VBGM) framework that allows for time-dependent covariates to be incorporated. A finite mixture of normal distributions is used to model the length-frequency cohorts of each month with the means constrained to follow a VBGM. The variances of the finite mixture components are constrained to be a function of mean length, reducing the number of parameters and allowing for an estimate of the variance at any length. To optimize the likelihood, we use a minorization–maximization (MM) algorithm with a Nelder–Mead sub-step. This work was motivated by the decline in catches of the blue swimmer crab (BSC) (Portunus armatus) off the east coast of Queensland, Australia. We test the method with a simulation study and then apply it to the BSC fishery data.
Resumo:
In a sample of censored survival times, the presence of an immune proportion of individuals who are not subject to death, failure or relapse, may be indicated by a relatively high number of individuals with large censored survival times. In this paper the generalized log-gamma model is modified for the possibility that long-term survivors may be present in the data. The model attempts to separately estimate the effects of covariates on the surviving fraction, that is, the proportion of the population for which the event never occurs. The logistic function is used for the regression model of the surviving fraction. Inference for the model parameters is considered via maximum likelihood. Some influence methods, such as the local influence and total local influence of an individual are derived, analyzed and discussed. Finally, a data set from the medical area is analyzed under the log-gamma generalized mixture model. A residual analysis is performed in order to select an appropriate model.
Resumo:
25th Annual Conference of the European Cetacean Society, Cadiz, Spain 21-23 March 2011.