963 resultados para count data models
Resumo:
The graphical representation of spatial soil properties in a digital environment is complex because it requires a conversion of data collected in a discrete form onto a continuous surface. The objective of this study was to apply three-dimension techniques of interpolation and visualization on soil texture and fertility properties and establish relationships with pedogenetic factors and processes in a slope area. The GRASS Geographic Information System was used to generate three-dimensional models and ParaView software to visualize soil volumes. Samples of the A, AB, BA, and B horizons were collected in a regular 122-point grid in an area of 13 ha, in Pinhais, PR, in southern Brazil. Geoprocessing and graphic computing techniques were effective in identifying and delimiting soil volumes of distinct ranges of fertility properties confined within the soil matrix. Both three-dimensional interpolation and the visualization tool facilitated interpretation in a continuous space (volumes) of the cause-effect relationships between soil texture and fertility properties and pedological factors and processes, such as higher clay contents following the drainage lines of the area. The flattest part with more weathered soils (Oxisols) had the highest pH values and lower Al3+ concentrations. These techniques of data interpolation and visualization have great potential for use in diverse areas of soil science, such as identification of soil volumes occurring side-by-side but that exhibit different physical, chemical, and mineralogical conditions for plant root growth, and monitoring of plumes of organic and inorganic pollutants in soils and sediments, among other applications. The methodological details for interpolation and a three-dimensional view of soil data are presented here.
Resumo:
The present research deals with an application of artificial neural networks for multitask learning from spatial environmental data. The real case study (sediments contamination of Geneva Lake) consists of 8 pollutants. There are different relationships between these variables, from linear correlations to strong nonlinear dependencies. The main idea is to construct a subsets of pollutants which can be efficiently modeled together within the multitask framework. The proposed two-step approach is based on: 1) the criterion of nonlinear predictability of each variable ?k? by analyzing all possible models composed from the rest of the variables by using a General Regression Neural Network (GRNN) as a model; 2) a multitask learning of the best model using multilayer perceptron and spatial predictions. The results of the study are analyzed using both machine learning and geostatistical tools.
Resumo:
Many of the most interesting questions ecologists ask lead to analyses of spatial data. Yet, perhaps confused by the large number of statistical models and fitting methods available, many ecologists seem to believe this is best left to specialists. Here, we describe the issues that need consideration when analysing spatial data and illustrate these using simulation studies. Our comparative analysis involves using methods including generalized least squares, spatial filters, wavelet revised models, conditional autoregressive models and generalized additive mixed models to estimate regression coefficients from synthetic but realistic data sets, including some which violate standard regression assumptions. We assess the performance of each method using two measures and using statistical error rates for model selection. Methods that performed well included generalized least squares family of models and a Bayesian implementation of the conditional auto-regressive model. Ordinary least squares also performed adequately in the absence of model selection, but had poorly controlled Type I error rates and so did not show the improvements in performance under model selection when using the above methods. Removing large-scale spatial trends in the response led to poor performance. These are empirical results; hence extrapolation of these findings to other situations should be performed cautiously. Nevertheless, our simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets, and should be considered a necessary foundation for statements of this type in future.
Resumo:
Abstract Traditionally, the common reserving methods used by the non-life actuaries are based on the assumption that future claims are going to behave in the same way as they did in the past. There are two main sources of variability in the processus of development of the claims: the variability of the speed with which the claims are settled and the variability between the severity of the claims from different accident years. High changes in these processes will generate distortions in the estimation of the claims reserves. The main objective of this thesis is to provide an indicator which firstly identifies and quantifies these two influences and secondly to determine which model is adequate for a specific situation. Two stochastic models were analysed and the predictive distributions of the future claims were obtained. The main advantage of the stochastic models is that they provide measures of variability of the reserves estimates. The first model (PDM) combines one conjugate family Dirichlet - Multinomial with the Poisson distribution. The second model (NBDM) improves the first one by combining two conjugate families Poisson -Gamma (for distribution of the ultimate amounts) and Dirichlet Multinomial (for distribution of the incremental claims payments). It was found that the second model allows to find the speed variability in the reporting process and development of the claims severity as function of two above mentioned distributions' parameters. These are the shape parameter of the Gamma distribution and the Dirichlet parameter. Depending on the relation between them we can decide on the adequacy of the claims reserve estimation method. The parameters have been estimated by the Methods of Moments and Maximum Likelihood. The results were tested using chosen simulation data and then using real data originating from the three lines of business: Property/Casualty, General Liability, and Accident Insurance. These data include different developments and specificities. The outcome of the thesis shows that when the Dirichlet parameter is greater than the shape parameter of the Gamma, resulting in a model with positive correlation between the past and future claims payments, suggests the Chain-Ladder method as appropriate for the claims reserve estimation. In terms of claims reserves, if the cumulated payments are high the positive correlation will imply high expectations for the future payments resulting in high claims reserves estimates. The negative correlation appears when the Dirichlet parameter is lower than the shape parameter of the Gamma, meaning low expected future payments for the same high observed cumulated payments. This corresponds to the situation when claims are reported rapidly and fewer claims remain expected subsequently. The extreme case appears in the situation when all claims are reported at the same time leading to expectations for the future payments of zero or equal to the aggregated amount of the ultimate paid claims. For this latter case, the Chain-Ladder is not recommended.
Resumo:
We study theoretical and empirical aspects of the mean exit time (MET) of financial time series. The theoretical modeling is done within the framework of continuous time random walk. We empirically verify that the mean exit time follows a quadratic scaling law and it has associated a prefactor which is specific to the analyzed stock. We perform a series of statistical tests to determine which kind of correlation are responsible for this specificity. The main contribution is associated with the autocorrelation property of stock returns. We introduce and solve analytically both two-state and three-state Markov chain models. The analytical results obtained with the two-state Markov chain model allows us to obtain a data collapse of the 20 measured MET profiles in a single master curve.
Resumo:
Glioblastoma multiforme (GBM) is the most common and lethal of all gliomas. The current standard of care includes surgery followed by concomitant radiation and chemotherapy with the DNA alkylating agent temozolomide (TMZ). O⁶-methylguanine-DNA methyltransferase (MGMT) repairs the most cytotoxic of lesions generated by TMZ, O⁶-methylguanine. Methylation of the MGMT promoter in GBM correlates with increased therapeutic sensitivity to alkylating agent therapy. However, several aspects of TMZ sensitivity are not explained by MGMT promoter methylation. Here, we investigated our hypothesis that the base excision repair enzyme alkylpurine-DNA-N-glycosylase (APNG), which repairs the cytotoxic lesions N³-methyladenine and N⁷-methylguanine, may contribute to TMZ resistance. Silencing of APNG in established and primary TMZ-resistant GBM cell lines endogenously expressing MGMT and APNG attenuated repair of TMZ-induced DNA damage and enhanced apoptosis. Reintroducing expression of APNG in TMZ-sensitive GBM lines conferred resistance to TMZ in vitro and in orthotopic xenograft mouse models. In addition, resistance was enhanced with coexpression of MGMT. Evaluation of APNG protein levels in several clinical datasets demonstrated that in patients, high nuclear APNG expression correlated with poorer overall survival compared with patients lacking APNG expression. Loss of APNG expression in a subset of patients was also associated with increased APNG promoter methylation. Collectively, our data demonstrate that APNG contributes to TMZ resistance in GBM and may be useful in the diagnosis and treatment of the disease.
Resumo:
A startlingly new development has occurred over the past year: The number of offenders residing in Iowa’s correctional institutions has actually dropped. An ever increasing prison population – in 1990 the prison count stood at 3,842 offenders – reached an all-time high of 8,940 offenders on October 3,2007, an increase of 233% over 17 years. A significant cause for the increase has been longer stays in prison, due in part to the long-term effect of restrictions on parole eligibility. Over the past nine months, however, the prison population has been declining – to 8,573 on July 15, 2008 (not including 129 jail prisoners temporarily housed at ASP and IMCC due to the flooding). This represents a decrease of 367 offenders – or 4.1% - from the October 3, 2007 high.
Resumo:
Traffic safety engineers are among the early adopters of Bayesian statistical tools for analyzing crash data. As in many other areas of application, empirical Bayes methods were their first choice, perhaps because they represent an intuitively appealing, yet relatively easy to implement alternative to purely classical approaches. With the enormous progress in numerical methods made in recent years and with the availability of free, easy to use software that permits implementing a fully Bayesian approach, however, there is now ample justification to progress towards fully Bayesian analyses of crash data. The fully Bayesian approach, in particular as implemented via multi-level hierarchical models, has many advantages over the empirical Bayes approach. In a full Bayesian analysis, prior information and all available data are seamlessly integrated into posterior distributions on which practitioners can base their inferences. All uncertainties are thus accounted for in the analyses and there is no need to pre-process data to obtain Safety Performance Functions and other such prior estimates of the effect of covariates on the outcome of interest. In this slight, fully Bayesian methods may well be less costly to implement and may result in safety estimates with more realistic standard errors. In this manuscript, we present the full Bayesian approach to analyzing traffic safety data and focus on highlighting the differences between the empirical Bayes and the full Bayes approaches. We use an illustrative example to discuss a step-by-step Bayesian analysis of the data and to show some of the types of inferences that are possible within the full Bayesian framework.
Resumo:
Traffic safety engineers are among the early adopters of Bayesian statistical tools for analyzing crash data. As in many other areas of application, empirical Bayes methods were their first choice, perhaps because they represent an intuitively appealing, yet relatively easy to implement alternative to purely classical approaches. With the enormous progress in numerical methods made in recent years and with the availability of free, easy to use software that permits implementing a fully Bayesian approach, however, there is now ample justification to progress towards fully Bayesian analyses of crash data. The fully Bayesian approach, in particular as implemented via multi-level hierarchical models, has many advantages over the empirical Bayes approach. In a full Bayesian analysis, prior information and all available data are seamlessly integrated into posterior distributions on which practitioners can base their inferences. All uncertainties are thus accounted for in the analyses and there is no need to pre-process data to obtain Safety Performance Functions and other such prior estimates of the effect of covariates on the outcome of interest. In this light, fully Bayesian methods may well be less costly to implement and may result in safety estimates with more realistic standard errors. In this manuscript, we present the full Bayesian approach to analyzing traffic safety data and focus on highlighting the differences between the empirical Bayes and the full Bayes approaches. We use an illustrative example to discuss a step-by-step Bayesian analysis of the data and to show some of the types of inferences that are possible within the full Bayesian framework.
Resumo:
Protein S (ProS) is an important negative regulator of blood coagulation. Its physiological importance is evident in purpura fulminans and other life-threatening thrombotic disorders typical of ProS deficient patients. Our previous characterization of ProS deficiency in mouse models has shown similarities with the human phenotypes: heterozygous ProS-deficient mice (Pros+/-) had increased thrombotic risk whereas homozygous deficiency in ProS (Pros-/-) was incompatible with life (Blood 2009; 114:2307-2314). In tissues, ProS exerts cellular functions by binding to and activating tyrosine kinase receptors of the Tyro3 family (TAM) on the cell surface.To extend the analysis of coagulation defects beyond the Pros-/- phenotype and add new insights into the sites of synthesis ProS and its action, we generated mice with inactivated ProS in hepatocytes (Proslox/loxAlbCre+) as well as in endothelial and hematopoietic cells (Proslox/loxTie2Cre+). Both models resulted in significant reduction of circulating ProS levels and in a remarkable increased thrombotic risk in vivo. In a model of tissue factor (TF)-induced venous thromboembolism (VTE), only 17% of Proslox/loxAlbCre+ mice (n=12) and only 13% of Proslox/loxTie2Cre+ mice (n=14) survived, compared with 86% of Proslox/lox mice (n=14; P<0.001).To mimic a severe acquired ProS deficiency, ProS gene was inactivated at the adult stage using the polyI:C-inducible Mx1-Cre system (Proslox/loxMx1Cre+). Ten days after polyI:C treatment, Proslox/loxMx1Cre+ mice developed disseminated intravascular coagulation with extensive lung and liver thrombosis.It is worth noting that no skin lesions compatible with purpura fulminans were observed in any of the above-described models of partial ProS deficiency. In order to shed light on the pathogenesis of purpura fulminans, we exposed the different ProS-deficient mice to warfarin (0.2 mg/day). We observed that Pros+/-, Proslox/loxAlbCre+ and Proslox/loxTie2Cre+ mice developed retiform purpura (characterized by erythematous and necrotic lesions of the genital region and extremities) and died after 3 to 5 days after the first warfarin administration.In human, ProS is also synthesized by megakaryocytes and hence stored at high concentrations in circulating platelets (pProS). The role of pProS has been investigated by generating megakaryocyte ProS-deficient model using the PF4 promoter as Cre driver (Proslox/loxPf4Cre+). In the TF-induced VTE model, Proslox/loxPf4Cre+ (n=15) mice showed a significant increased risk of thrombosis compared to Proslox/lox controls (n=14; survival rate 47% and 86%, respectively; P<0.05). Furthermore, preliminary results suggest survival to be associated with higher circulating ProS levels. In order to evaluate the potential role of pProS in thrombus formation, we investigated the thrombotic response to intravenous injection of collagen-epinephrine in vivo and platelet function in vitro. Both in vivo and in vitro experiments showed similar results between Proslox/loxPf4Cre+ and Proslox/lox, indicating that platelet reactivity was not influenced by the absence of pProS. These data suggest that pProS is delivered at the site of thrombosis to inhibit thrombin generation.We further investigated the ability of ProS to function as a ligand of TAM receptors, by using homozygous and heterozygous deficient mice for both the TAM ligands ProS and Gas6. Gas6-/-Pros-/- mice died in utero and showed comparable dramatic bleeding and thrombotic phenotype as described for Pros-/- embryos.In conclusion, like complete ProS deficiency, double deficiency in ProS and Gas6 was lethal, whereas partial ProS deficiency was not. Mice partially deficient in ProS displayed a prothrombotic phenotype, including those with only deficiency in pProS. Purpura fulminans did not occur spontaneously in mice with partial Pros deficiency but developed upon warfarin administration.Thus, the use of different mice models of ProS deficiency can be instrumental in the study of its highly variable thrombotic phenotype and in the investigation of additional roles of ProS in inflammation and autoimmunity through TAM signaling.
Resumo:
Evaluating other individuals with respect to personality characteristics plays a crucial role in human relations and it is the focus of attention for research in diverse fields such as psychology and interactive computer systems. In psychology, face perception has been recognized as a key component of this evaluation system. Multiple studies suggest that observers use face information to infer personality characteristics. Interactive computer systems are trying to take advantage of these findings and apply them to increase the natural aspect of interaction and to improve the performance of interactive computer systems. Here, we experimentally test whether the automatic prediction of facial trait judgments (e.g. dominance) can be made by using the full appearance information of the face and whether a reduced representation of its structure is sufficient. We evaluate two separate approaches: a holistic representation model using the facial appearance information and a structural model constructed from the relations among facial salient points. State of the art machine learning methods are applied to a) derive a facial trait judgment model from training data and b) predict a facial trait value for any face. Furthermore, we address the issue of whether there are specific structural relations among facial points that predict perception of facial traits. Experimental results over a set of labeled data (9 different trait evaluations) and classification rules (4 rules) suggest that a) prediction of perception of facial traits is learnable by both holistic and structural approaches; b) the most reliable prediction of facial trait judgments is obtained by certain type of holistic descriptions of the face appearance; and c) for some traits such as attractiveness and extroversion, there are relationships between specific structural features and social perceptions.
Resumo:
Abstract : The human body is composed of a huge number of cells acting together in a concerted manner. The current understanding is that proteins perform most of the necessary activities in keeping a cell alive. The DNA, on the other hand, stores the information on how to produce the different proteins in the genome. Regulating gene transcription is the first important step that can thus affect the life of a cell, modify its functions and its responses to the environment. Regulation is a complex operation that involves specialized proteins, the transcription factors. Transcription factors (TFs) can bind to DNA and activate the processes leading to the expression of genes into new proteins. Errors in this process may lead to diseases. In particular, some transcription factors have been associated with a lethal pathological state, commonly known as cancer, associated with uncontrolled cellular proliferation, invasiveness of healthy tissues and abnormal responses to stimuli. Understanding cancer-related regulatory programs is a difficult task, often involving several TFs interacting together and influencing each other's activity. This Thesis presents new computational methodologies to study gene regulation. In addition we present applications of our methods to the understanding of cancer-related regulatory programs. The understanding of transcriptional regulation is a major challenge. We address this difficult question combining computational approaches with large collections of heterogeneous experimental data. In detail, we design signal processing tools to recover transcription factors binding sites on the DNA from genome-wide surveys like chromatin immunoprecipitation assays on tiling arrays (ChIP-chip). We then use the localization about the binding of TFs to explain expression levels of regulated genes. In this way we identify a regulatory synergy between two TFs, the oncogene C-MYC and SP1. C-MYC and SP1 bind preferentially at promoters and when SP1 binds next to C-NIYC on the DNA, the nearby gene is strongly expressed. The association between the two TFs at promoters is reflected by the binding sites conservation across mammals, by the permissive underlying chromatin states 'it represents an important control mechanism involved in cellular proliferation, thereby involved in cancer. Secondly, we identify the characteristics of TF estrogen receptor alpha (hERa) target genes and we study the influence of hERa in regulating transcription. hERa, upon hormone estrogen signaling, binds to DNA to regulate transcription of its targets in concert with its co-factors. To overcome the scarce experimental data about the binding sites of other TFs that may interact with hERa, we conduct in silico analysis of the sequences underlying the ChIP sites using the collection of position weight matrices (PWMs) of hERa partners, TFs FOXA1 and SP1. We combine ChIP-chip and ChIP-paired-end-diTags (ChIP-pet) data about hERa binding on DNA with the sequence information to explain gene expression levels in a large collection of cancer tissue samples and also on studies about the response of cells to estrogen. We confirm that hERa binding sites are distributed anywhere on the genome. However, we distinguish between binding sites near promoters and binding sites along the transcripts. The first group shows weak binding of hERa and high occurrence of SP1 motifs, in particular near estrogen responsive genes. The second group shows strong binding of hERa and significant correlation between the number of binding sites along a gene and the strength of gene induction in presence of estrogen. Some binding sites of the second group also show presence of FOXA1, but the role of this TF still needs to be investigated. Different mechanisms have been proposed to explain hERa-mediated induction of gene expression. Our work supports the model of hERa activating gene expression from distal binding sites by interacting with promoter bound TFs, like SP1. hERa has been associated with survival rates of breast cancer patients, though explanatory models are still incomplete: this result is important to better understand how hERa can control gene expression. Thirdly, we address the difficult question of regulatory network inference. We tackle this problem analyzing time-series of biological measurements such as quantification of mRNA levels or protein concentrations. Our approach uses the well-established penalized linear regression models where we impose sparseness on the connectivity of the regulatory network. We extend this method enforcing the coherence of the regulatory dependencies: a TF must coherently behave as an activator, or a repressor on all its targets. This requirement is implemented as constraints on the signs of the regressed coefficients in the penalized linear regression model. Our approach is better at reconstructing meaningful biological networks than previous methods based on penalized regression. The method is tested on the DREAM2 challenge of reconstructing a five-genes/TFs regulatory network obtaining the best performance in the "undirected signed excitatory" category. Thus, these bioinformatics methods, which are reliable, interpretable and fast enough to cover large biological dataset, have enabled us to better understand gene regulation in humans.
Resumo:
This study aims to improve the accuracy of AASHTO Mechanistic-Empirical Pavement Design Guide (MEPDG) pavement performance predictions for Iowa pavement systems through local calibration of MEPDG prediction models. A total of 130 representative pavement sites across Iowa were selected. The selected pavement sites represent flexible, rigid, and composite pavement systems throughout Iowa. The required MEPDG inputs and the historical performance data for the selected sites were extracted from a variety of sources. The accuracy of the nationally-calibrated MEPDG prediction models for Iowa conditions was evaluated. The local calibration factors of MEPDG performance prediction models were identified to improve the accuracy of model predictions. The identified local calibration coefficients are presented with other significant findings and recommendations for use in MEPDG/DARWin-ME for Iowa pavement systems.
Resumo:
This study aims to improve the accuracy of AASHTO Mechanistic-Empirical Pavement Design Guide (MEPDG) pavement performance predictions for Iowa pavement systems through local calibration of MEPDG prediction models. A total of 130 representative pavement sites across Iowa were selected. The selected pavement sites represent flexible, rigid, and composite pavement systems throughout Iowa. The required MEPDG inputs and the historical performance data for the selected sites were extracted from a variety of sources. The accuracy of the nationally-calibrated MEPDG prediction models for Iowa conditions was evaluated. The local calibration factors of MEPDG performance prediction models were identified to improve the accuracy of model predictions. The identified local calibration coefficients are presented with other significant findings and recommendations for use in MEPDG/DARWin-ME for Iowa pavement systems.
Resumo:
The temporal dynamics of species diversity are shaped by variations in the rates of speciation and extinction, and there is a long history of inferring these rates using first and last appearances of taxa in the fossil record. Understanding diversity dynamics critically depends on unbiased estimates of the unobserved times of speciation and extinction for all lineages, but the inference of these parameters is challenging due to the complex nature of the available data. Here, we present a new probabilistic framework to jointly estimate species-specific times of speciation and extinction and the rates of the underlying birth-death process based on the fossil record. The rates are allowed to vary through time independently of each other, and the probability of preservation and sampling is explicitly incorporated in the model to estimate the true lifespan of each lineage. We implement a Bayesian algorithm to assess the presence of rate shifts by exploring alternative diversification models. Tests on a range of simulated data sets reveal the accuracy and robustness of our approach against violations of the underlying assumptions and various degrees of data incompleteness. Finally, we demonstrate the application of our method with the diversification of the mammal family Rhinocerotidae and reveal a complex history of repeated and independent temporal shifts of both speciation and extinction rates, leading to the expansion and subsequent decline of the group. The estimated parameters of the birth-death process implemented here are directly comparable with those obtained from dated molecular phylogenies. Thus, our model represents a step towards integrating phylogenetic and fossil information to infer macroevolutionary processes.