14 resultados para model order estimation
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
The starting point of this article is the question "How to retrieve fingerprints of rhythm in written texts?" We address this problem in the case of Brazilian and European Portuguese. These two dialects of Modern Portuguese share the same lexicon and most of the sentences they produce are superficially identical. Yet they are conjectured, on linguistic grounds, to implement different rhythms. We show that this linguistic question can be formulated as a problem of model selection in the class of variable length Markov chains. To carry on this approach, we compare texts from European and Brazilian Portuguese. These texts are previously encoded according to some basic rhythmic features of the sentences which can be automatically retrieved. This is an entirely new approach from the linguistic point of view. Our statistical contribution is the introduction of the smallest maximizer criterion which is a constant free procedure for model selection. As a by-product, this provides a solution for the problem of optimal choice of the penalty constant when using the BIC to select a variable length Markov chain. Besides proving the consistency of the smallest maximizer criterion when the sample size diverges, we also make a simulation study comparing our approach with both the standard BIC selection and the Peres-Shields order estimation. Applied to the linguistic sample constituted for our case study, the smallest maximizer criterion assigns different context-tree models to the two dialects of Portuguese. The features of the selected models are compatible with current conjectures discussed in the linguistic literature.
Resumo:
Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is an interest in studying latent variables (or latent traits). Usually such latent traits are assumed to be random variables and a convenient distribution is assigned to them. A very common choice for such a distribution has been the standard normal. Recently, Azevedo et al. [Bayesian inference for a skew-normal IRT model under the centred parameterization, Comput. Stat. Data Anal. 55 (2011), pp. 353-365] proposed a skew-normal distribution under the centred parameterization (SNCP) as had been studied in [R. B. Arellano-Valle and A. Azzalini, The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal. 99(7) (2008), pp. 1362-1382], to model the latent trait distribution. This approach allows one to represent any asymmetric behaviour concerning the latent trait distribution. Also, they developed a Metropolis-Hastings within the Gibbs sampling (MHWGS) algorithm based on the density of the SNCP. They showed that the algorithm recovers all parameters properly. Their results indicated that, in the presence of asymmetry, the proposed model and the estimation algorithm perform better than the usual model and estimation methods. Our main goal in this paper is to propose another type of MHWGS algorithm based on a stochastic representation (hierarchical structure) of the SNCP studied in [N. Henze, A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271-275]. Our algorithm has only one Metropolis-Hastings step, in opposition to the algorithm developed by Azevedo et al., which has two such steps. This not only makes the implementation easier but also reduces the number of proposal densities to be used, which can be a problem in the implementation of MHWGS algorithms, as can be seen in [R.J. Patz and B.W. Junker, A straightforward approach to Markov Chain Monte Carlo methods for item response models, J. Educ. Behav. Stat. 24(2) (1999), pp. 146-178; R. J. Patz and B. W. Junker, The applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses, J. Educ. Behav. Stat. 24(4) (1999), pp. 342-366; A. Gelman, G.O. Roberts, and W.R. Gilks, Efficient Metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599-607]. Moreover, we consider a modified beta prior (which generalizes the one considered in [3]) and a Jeffreys prior for the asymmetry parameter. Furthermore, we study the sensitivity of such priors as well as the use of different kernel densities for this parameter. Finally, we assess the impact of the number of examinees, number of items and the asymmetry level on the parameter recovery. Results of the simulation study indicated that our approach performed equally as well as that in [3], in terms of parameter recovery, mainly using the Jeffreys prior. Also, they indicated that the asymmetry level has the highest impact on parameter recovery, even though it is relatively small. A real data analysis is considered jointly with the development of model fitting assessment tools. The results are compared with the ones obtained by Azevedo et al. The results indicate that using the hierarchical approach allows us to implement MCMC algorithms more easily, it facilitates diagnosis of the convergence and also it can be very useful to fit more complex skew IRT models.
Discriminating Different Classes of Biological Networks by Analyzing the Graphs Spectra Distribution
Resumo:
The brain's structural and functional systems, protein-protein interaction, and gene networks are examples of biological systems that share some features of complex networks, such as highly connected nodes, modularity, and small-world topology. Recent studies indicate that some pathologies present topological network alterations relative to norms seen in the general population. Therefore, methods to discriminate the processes that generate the different classes of networks (e. g., normal and disease) might be crucial for the diagnosis, prognosis, and treatment of the disease. It is known that several topological properties of a network (graph) can be described by the distribution of the spectrum of its adjacency matrix. Moreover, large networks generated by the same random process have the same spectrum distribution, allowing us to use it as a "fingerprint". Based on this relationship, we introduce and propose the entropy of a graph spectrum to measure the "uncertainty" of a random graph and the Kullback-Leibler and Jensen-Shannon divergences between graph spectra to compare networks. We also introduce general methods for model selection and network model parameter estimation, as well as a statistical procedure to test the nullity of divergence between two classes of complex networks. Finally, we demonstrate the usefulness of the proposed methods by applying them to (1) protein-protein interaction networks of different species and (2) on networks derived from children diagnosed with Attention Deficit Hyperactivity Disorder (ADHD) and typically developing children. We conclude that scale-free networks best describe all the protein-protein interactions. Also, we show that our proposed measures succeeded in the identification of topological changes in the network while other commonly used measures (number of edges, clustering coefficient, average path length) failed.
Resumo:
This paper provides additional validation to the problem of estimating wave spectra based on the first-order motions of a moored vessel. Prior investigations conducted by the authors have attested that even a large-volume ship, such as an FPSO unit, could be adopted for on-board estimation of the wave field. The obvious limitation of the methodology concerns filtering of high-frequency wave components, for which the vessel has no significant response. As a result, the estimation range is directly dependent on the characteristics of the vessel response. In order to extend this analysis, further small-scale tests were performed with a model of a pipe-laying crane-barge. When compared to the FPSO case, the results attest that a broader range of typical sea states can be accurately estimated, including crossed-sea states with low peak periods. (C) 2012 Elsevier Ltd. All rights reserved.
Enhancement of Nematic Order and Global Phase Diagram of a Lattice Model for Coupled Nematic Systems
Resumo:
We use an infinite-range Maier-Saupe model, with two sets of local quadrupolar variables and restricted orientations, to investigate the global phase diagram of a coupled system of two nematic subsystems. The free energy and the equations of state are exactly calculated by standard techniques of statistical mechanics. The nematic-isotropic transition temperature of system A increases with both the interaction energy among mesogens of system B, and the two-subsystem coupling J. This enhancement of the nematic phase is manifested in a global phase diagram in terms of the interaction parameters and the temperature T. We make some comments on the connections of these results with experimental findings for a system of diluted ferroelectric nanoparticles embedded in a nematic liquid-crystalline environment.
Resumo:
An extension of some standard likelihood based procedures to heteroscedastic nonlinear regression models under scale mixtures of skew-normal (SMSN) distributions is developed. This novel class of models provides a useful generalization of the heteroscedastic symmetrical nonlinear regression models (Cysneiros et al., 2010), since the random term distributions cover both symmetric as well as asymmetric and heavy-tailed distributions such as skew-t, skew-slash, skew-contaminated normal, among others. A simple EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters is presented and the observed information matrix is derived analytically. In order to examine the performance of the proposed methods, some simulation studies are presented to show the robust aspect of this flexible class against outlying and influential observations and that the maximum likelihood estimates based on the EM-type algorithm do provide good asymptotic properties. Furthermore, local influence measures and the one-step approximations of the estimates in the case-deletion model are obtained. Finally, an illustration of the methodology is given considering a data set previously analyzed under the homoscedastic skew-t nonlinear regression model. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Many recent survival studies propose modeling data with a cure fraction, i.e., data in which part of the population is not susceptible to the event of interest. This event may occur more than once for the same individual (recurrent event). We then have a scenario of recurrent event data in the presence of a cure fraction, which may appear in various areas such as oncology, finance, industries, among others. This paper proposes a multiple time scale survival model to analyze recurrent events using a cure fraction. The objective is analyzing the efficiency of certain interventions so that the studied event will not happen again in terms of covariates and censoring. All estimates were obtained using a sampling-based approach, which allows information to be input beforehand with lower computational effort. Simulations were done based on a clinical scenario in order to observe some frequentist properties of the estimation procedure in the presence of small and moderate sample sizes. An application of a well-known set of real mammary tumor data is provided.
Resumo:
The objective of this study was to investigate, in a population of crossbred cattle, the obtainment of the non-additive genetic effects for the characteristics weight at 205 and 390 days and scrotal circumference, and to evaluate the consideration of these effects in the prediction of breeding values of sires using different estimation methodologies. In method 1, the data were pre-adjusted for the non-additive effects obtained by least squares means method in a model that considered the direct additive, maternal and non-additive fixed genetic effects, the direct and total maternal heterozygosities, and epistasis. In method 2, the non-additive effects were considered covariates in genetic model. Genetic values for adjusted and non-adjusted data were predicted considering additive direct and maternal effects, and for weight at 205 days, also the permanent environmental effect, as random effects in the model. The breeding values of the categories of sires considered for the weight characteristic at 205 days were organized in files, in order to verify alterations in the magnitude of the predictions and ranking of animals in the two methods of correction data for the non-additives effects. The non-additive effects were not similar in magnitude and direction in the two estimation methods used, nor for the characteristics evaluated. Pearson and Spearman correlations between breeding values were higher than 0.94, and the use of different methods does not imply changes in the selection of animals.
Resumo:
This paper introduces a skewed log-Birnbaum-Saunders regression model based on the skewed sinh-normal distribution proposed by Leiva et al. [A skewed sinh-normal distribution and its properties and application to air pollution, Comm. Statist. Theory Methods 39 (2010), pp. 426-443]. Some influence methods, such as the local influence and generalized leverage, are presented. Additionally, we derived the normal curvatures of local influence under some perturbation schemes. An empirical application to a real data set is presented in order to illustrate the usefulness of the proposed model.
Resumo:
Long-term survival models have historically been considered for analyzing time-to-event data with long-term survivors fraction. However, situations in which a fraction (1 - p) of systems is subject to failure from independent competing causes of failure, while the remaining proportion p is cured or has not presented the event of interest during the time period of the study, have not been fully considered in the literature. In order to accommodate such situations, we present in this paper a new long-term survival model. Maximum likelihood estimation procedure is discussed as well as interval estimation and hypothesis tests. A real dataset illustrates the methodology.
Resumo:
In savannah and tropical grasslands, which account for 60% of grasslands worldwide, a large share of ecosystem carbon is located below ground due to high root:shoot ratios. Temporal variations in soil CO2 efflux (R-S) were investigated in a grassland of coastal Congo over two years. The objectives were (1) to identify the main factors controlling seasonal variations in R-S and (2) to develop a semi-empirical model describing R-S and including a heterotrophic component (R-H) and an autotrophic component (R-A). Plant above-ground activity was found to exert strong control over soil respiration since 71% of seasonal R-S variability was explained by the quantity of photosynthetically active radiation absorbed (APAR) by the grass canopy. We tested an additive model including a parameter enabling R-S partitioning into R-A and R-H. Assumptions underlying this model were that R-A mainly depended on the amount of photosynthates allocated below ground and that microbial and root activity was mostly controlled by soil temperature and soil moisture. The model provided a reasonably good prediction of seasonal variations in R-S (R-2 = 0.85) which varied between 5.4 mu mol m(-2) s(-1) in the wet season and 0.9 mu mol m(-2) s(-1) at the end of the dry season. The model was subsequently used to obtain annual estimates of R-S, R-A and R-H. In accordance with results reported for other tropical grasslands, we estimated that R-H accounted for 44% of R-S, which represented a flux similar to the amount of carbon brought annually to the soil from below-ground litter production. Overall, this study opens up prospects for simulating the carbon budget of tropical grasslands on a large scale using remotely sensed data. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
The objective of this paper is to model variations in test-day milk yields of first lactations of Holstein cows by RR using B-spline functions and Bayesian inference in order to fit adequate and parsimonious models for the estimation of genetic parameters. They used 152,145 test day milk yield records from 7317 first lactations of Holstein cows. The model established in this study was additive, permanent environmental and residual random effects. In addition, contemporary group and linear and quadratic effects of the age of cow at calving were included as fixed effects. Authors modeled the average lactation curve of the population with a fourth-order orthogonal Legendre polynomial. They concluded that a cubic B-spline with seven random regression coefficients for both the additive genetic and permanent environment effects was to be the best according to residual mean square and residual variance estimates. Moreover they urged a lower order model (quadratic B-spline with seven random regression coefficients for both random effects) could be adopted because it yielded practically the same genetic parameter estimates with parsimony. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Abstract Background Blood leukocytes constitute two interchangeable sub-populations, the marginated and circulating pools. These two sub-compartments are found in normal conditions and are potentially affected by non-normal situations, either pathological or physiological. The dynamics between the compartments is governed by rate constants of margination (M) and return to circulation (R). Therefore, estimates of M and R may prove of great importance to a deeper understanding of many conditions. However, there has been a lack of formalism in order to approach such estimates. The few attempts to furnish an estimation of M and R neither rely on clearly stated models that precisely say which rate constant is under estimation nor recognize which factors may influence the estimation. Results The returning of the blood pools to a steady-state value after a perturbation (e.g., epinephrine injection) was modeled by a second-order differential equation. This equation has two eigenvalues, related to a fast- and to a slow-component of the dynamics. The model makes it possible to identify that these components are partitioned into three constants: R, M and SB; where SB is a time-invariant exit to tissues rate constant. Three examples of the computations are worked and a tentative estimation of R for mouse monocytes is presented. Conclusions This study establishes a firm theoretical basis for the estimation of the rate constants of the dynamics between the blood sub-compartments of white cells. It shows, for the first time, that the estimation must also take into account the exit to tissues rate constant, SB.
Resumo:
Despite the great importance of soybeans in Brazil, there have been few applications of soybean crop modeling on Brazilian conditions. Thus, the objective of this study was to use modified crop models to estimate the depleted and potential soybean crop yield in Brazil. The climatic variable data used in the modified simulation of the soybean crop models were temperature, insolation and rainfall. The data set was taken from 33 counties (28 Sao Paulo state counties, and 5 counties from other states that neighbor São Paulo). Among the models, modifications in the estimation of the leaf area of the soybean crop, which includes corrections for the temperature, shading, senescence, CO2, and biomass partition were proposed; also, the methods of input for the model's simulation of the climatic variables were reconsidered. The depleted yields were estimated through a water balance, from which the depletion coefficient was estimated. It can be concluded that the adaptation soybean growth crop model might be used to predict the results of the depleted and potential yield of soybeans, and it can also be used to indicate better locations and periods of tillage.