Biblioteca Digital

85 resultados para Mathematics, interdisciplinary applications

Mapping an uncertainty zone between interpolated types of a categorical variable

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Categorical data cannot be interpolated directly because they are outcomes of discrete random variables. Thus, types of categorical variables are transformed into indicator functions that can be handled by interpolation methods. Interpolated indicator values are then backtransformed to the original types of categorical variables. However, aspects such as variability and uncertainty of interpolated values of categorical data have never been considered. In this paper we show that the interpolation variance can be used to map an uncertainty zone around boundaries between types of categorical variables. Moreover, it is shown that the interpolation variance is a component of the total variance of the categorical variables, as measured by the coefficient of unalikeability. (C) 2011 Elsevier Ltd. All rights reserved.

A survey of evolutionary algorithms for decision-tree induction

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a survey of evolutionary algorithms that are designed for decision-tree induction. In this context, most of the paper focuses on approaches that evolve decision trees as an alternate heuristics to the traditional top-down divide-and-conquer approach. Additionally, we present some alternative methods that make use of evolutionary algorithms to improve particular components of decision-tree classifiers. The paper's original contributions are the following. First, it provides an up-to-date overview that is fully focused on evolutionary algorithms and decision trees and does not concentrate on any specific evolutionary approach. Second, it provides a taxonomy, which addresses works that evolve decision trees and works that design decision-tree components by the use of evolutionary algorithms. Finally, a number of references are provided that describe applications of evolutionary algorithms for decision-tree induction in different domains. At the end of this paper, we address some important issues and open questions that can be the subject of future research.

Integrated pulp and paper mill planning and scheduling

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This article describes a real-world production planning and scheduling problem occurring at an integrated pulp and paper mill (P&P) which manufactures paper for cardboard out of produced pulp. During the cooking of wood chips in the digester, two by-products are produced: the pulp itself (virgin fibers) and the waste stream known as black liquor. The former is then mixed with recycled fibers and processed in a paper machine. Here, due to significant sequence-dependent setups in paper type changeovers, sizing and sequencing of lots have to be made simultaneously in order to efficiently use capacity. The latter is converted into electrical energy using a set of evaporators, recovery boilers and counter-pressure turbines. The planning challenge is then to synchronize the material flow as it moves through the pulp and paper mills, and energy plant, maximizing customer demand (as backlogging is allowed), and minimizing operation costs. Due to the intensive capital feature of P&P, the output of the digester must be maximized. As the production bottleneck is not fixed, to tackle this problem we propose a new model that integrates the critical production units associated to the pulp and paper mills, and energy plant for the first time. Simple stochastic mixed integer programming based local search heuristics are developed to obtain good feasible solutions for the problem. The benefits of integrating the three stages are discussed. The proposed approaches are tested on real-world data. Our work may help P&P companies to increase their competitiveness and reactiveness in dealing with demand pattern oscillations. (C) 2012 Elsevier Ltd. All rights reserved.

A knapsack problem as a tool to solve the production planning problem in small foundries

Relevância:

80.00% 80.00%

Publicador:

Resumo:

According to recent research carried out in the foundry sector, one of the most important concerns of the industries is to improve their production planning. A foundry production plan involves two dependent stages: (1) determining the alloys to be merged and (2) determining the lots that will be produced. The purpose of this study is to draw up plans of minimum production cost for the lot-sizing problem for small foundries. As suggested in the literature, the proposed heuristic addresses the problem stages in a hierarchical way. Firstly, the alloys are determined and, subsequently, the items that are produced from them. In this study, a knapsack problem as a tool to determine the items to be produced from furnace loading was proposed. Moreover, we proposed a genetic algorithm to explore some possible sets of alloys and to determine the production planning for a small foundry. Our method attempts to overcome the difficulties in finding good production planning presented by the method proposed in the literature. The computational experiments show that the proposed methods presented better results than the literature. Furthermore, the proposed methods do not need commercial software, which is favorable for small foundries. (C) 2010 Elsevier Ltd. All rights reserved.

Two versions of the threshold contact model in two dimensions

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Two versions of the threshold contact process ordinary and conservative - are studied on a square lattice. In the first, particles are created on active sites, those having at least two nearest neighbor sites occupied, and are annihilated spontaneously. In the conservative version, a particle jumps from its site to an active site. Mean-field analysis suggests the existence of a first-order phase transition, which is confirmed by Monte Carlo simulations. In the thermodynamic limit, the two versions are found to give the same results. (C) 2012 Elsevier B.V. All rights reserved.

Extensive structural change of the envelope protein of dengue virus induced by a tuned ionic strength: conformational and energetic analyses

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Dengue has become a global public health threat, with over 100 million infections annually; to date there is no specific vaccine or any antiviral drug. The structures of the envelope (E) proteins of the four known serotype of the dengue virus (DENV) are already known, but there are insufficient molecular details of their structural behavior in solution in the distinct environmental conditions in which the DENVs are submitted, from the digestive tract of the mosquito up to its replication inside the host cell. Such detailed knowledge becomes important because of the multifunctional character of the E protein: it mediates the early events in cell entry, via receptor endocytosis and, as a class II protein, participates determinately in the process of membrane fusion. The proposed infection mechanism asserts that once in the endosome, at low pH, the E homodimers dissociate and insert into the endosomal lipid membrane, after an extensive conformational change, mainly on the relative arrangement of its three domains. In this work we employ all-atom explicit solvent Molecular Dynamics simulations to specify the thermodynamic conditions in that the E proteins are induced to experience extensive structural changes, such as during the process of reducing pH. We study the structural behavior of the E protein monomer at acid pH solution of distinct ionic strength. Extensive simulations are carried out with all the histidine residues in its full protonated form at four distinct ionic strengths. The results are analyzed in detail from structural and energetic perspectives, and the virtual protein movements are described by means of the principal component analyses. As the main result, we found that at acid pH and physiological ionic strength, the E protein suffers a major structural change; for lower or higher ionic strengths, the crystal structure is essentially maintained along of all extensive simulations. On the other hand, at basic pH, when all histidine residues are in the unprotonated form, the protein structure is very stable for ionic strengths ranging from 0 to 225 mM. Therefore, our findings support the hypothesis that the histidines constitute the hot points that induce configurational changes of E protein in acid pH, and give extra motivation to the development of new ideas for antivirus compound design.

General results for the Kumaraswamy-G distribution

Relevância:

80.00% 80.00%

Publicador:

Resumo:

For any continuous baseline G distribution [G. M. Cordeiro and M. de Castro, A new family of generalized distributions, J. Statist. Comput. Simul. 81 (2011), pp. 883-898], proposed a new generalized distribution (denoted here with the prefix 'Kw-G'(Kumaraswamy-G)) with two extra positive parameters. They studied some of its mathematical properties and presented special sub-models. We derive a simple representation for the Kw-Gdensity function as a linear combination of exponentiated-G distributions. Some new distributions are proposed as sub-models of this family, for example, the Kw-Chen [Z.A. Chen, A new two-parameter lifetime distribution with bathtub shape or increasing failure rate function, Statist. Probab. Lett. 49 (2000), pp. 155-161], Kw-XTG [M. Xie, Y. Tang, and T.N. Goh, A modified Weibull extension with bathtub failure rate function, Reliab. Eng. System Safety 76 (2002), pp. 279-285] and Kw-Flexible Weibull [M. Bebbington, C. D. Lai, and R. Zitikis, A flexible Weibull extension, Reliab. Eng. System Safety 92 (2007), pp. 719-726]. New properties of the Kw-G distribution are derived which include asymptotes, shapes, moments, moment generating function, mean deviations, Bonferroni and Lorenz curves, reliability, Renyi entropy and Shannon entropy. New properties of the order statistics are investigated. We discuss the estimation of the parameters by maximum likelihood. We provide two applications to real data sets and discuss a bivariate extension of the Kw-G distribution.

Generalized beta-generated distributions

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This article introduces generalized beta-generated (GBG) distributions. Sub-models include all classical beta-generated, Kumaraswamy-generated and exponentiated distributions. They are maximum entropy distributions under three intuitive conditions, which show that the classical beta generator skewness parameters only control tail entropy and an additional shape parameter is needed to add entropy to the centre of the parent distribution. This parameter controls skewness without necessarily differentiating tail weights. The GBG class also has tractable properties: we present various expansions for moments, generating function and quantiles. The model parameters are estimated by maximum likelihood and the usefulness of the new class is illustrated by means of some real data sets. (c) 2011 Elsevier B.V. All rights reserved.

Bartlett corrections in Birnbaum-Saunders nonlinear regression models

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Lemonte and Cordeiro [Birnbaum-Saunders nonlinear regression models, Comput. Stat. Data Anal. 53 (2009), pp. 4441-4452] introduced a class of Birnbaum-Saunders (BS) nonlinear regression models potentially useful in lifetime data analysis. We give a general matrix Bartlett correction formula to improve the likelihood ratio (LR) tests in these models. The formula is simple enough to be used analytically to obtain several closed-form expressions in special cases. Our results generalize those in Lemonte et al. [Improved likelihood inference in Birnbaum-Saunders regressions, Comput. Stat. DataAnal. 54 (2010), pp. 1307-1316], which hold only for the BS linear regression models. We consider Monte Carlo simulations to show that the corrected tests work better than the usual LR tests.

Morphological Homogeneity of Neurons: Searching for Outlier Neuronal Cells

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We report a morphology-based approach for the automatic identification of outlier neurons, as well as its application to the NeuroMorpho.org database, with more than 5,000 neurons. Each neuron in a given analysis is represented by a feature vector composed of 20 measurements, which are then projected into a two-dimensional space by applying principal component analysis. Bivariate kernel density estimation is then used to obtain the probability distribution for the group of cells, so that the cells with highest probabilities are understood as archetypes while those with the smallest probabilities are classified as outliers. The potential of the methodology is illustrated in several cases involving uniform cell types as well as cell types for specific animal species. The results provide insights regarding the distribution of cells, yielding single and multi-variate clusters, and they suggest that outlier cells tend to be more planar and tortuous. The proposed methodology can be used in several situations involving one or more categories of cells, as well as for detection of new categories and possible artifacts.

A log-Birnbaum-Saunders regression model with asymmetric errors

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper introduces a skewed log-Birnbaum-Saunders regression model based on the skewed sinh-normal distribution proposed by Leiva et al. [A skewed sinh-normal distribution and its properties and application to air pollution, Comm. Statist. Theory Methods 39 (2010), pp. 426-443]. Some influence methods, such as the local influence and generalized leverage, are presented. Additionally, we derived the normal curvatures of local influence under some perturbation schemes. An empirical application to a real data set is presented in order to illustrate the usefulness of the proposed model.

Identical sequence patterns in the ends of exons and introns of human protein-coding genes

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Intron splicing is one of the most important steps involved in the maturation process of a pre-mRNA. Although the sequence profiles around the splice sites have been studied extensively, the levels of sequence identity between the exonic sequences preceding the donor sites and the intronic sequences preceding the acceptor sites has not been examined as thoroughly. In this study we investigated identity patterns between the last 15 nucleotides of the exonic sequence preceding the 5' splice site and the intronic sequence preceding the 3' splice site in a set of human protein-coding genes that do not exhibit intron retention. We found that almost 60% of consecutive exons and introns in human protein-coding genes share at least two identical nucleotides at their 3' ends and, on average, the sequence identity length is 2.47 nucleotides. Based on our findings we conclude that the 3' ends of exons and introns tend to have longer identical sequences within a gene than when being taken from different genes. Our results hold even if the pairs are non-consecutive in the transcription order. (C) 2012 Elsevier Ltd. All rights reserved.

Parameter recovery for a skew-normal IRT model under a Bayesian approach: hierarchical framework, prior and kernel sensitivity and sample size

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Item response theory (IRT) comprises a set of statistical models which are useful in many fields, especially when there is an interest in studying latent variables (or latent traits). Usually such latent traits are assumed to be random variables and a convenient distribution is assigned to them. A very common choice for such a distribution has been the standard normal. Recently, Azevedo et al. [Bayesian inference for a skew-normal IRT model under the centred parameterization, Comput. Stat. Data Anal. 55 (2011), pp. 353-365] proposed a skew-normal distribution under the centred parameterization (SNCP) as had been studied in [R. B. Arellano-Valle and A. Azzalini, The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal. 99(7) (2008), pp. 1362-1382], to model the latent trait distribution. This approach allows one to represent any asymmetric behaviour concerning the latent trait distribution. Also, they developed a Metropolis-Hastings within the Gibbs sampling (MHWGS) algorithm based on the density of the SNCP. They showed that the algorithm recovers all parameters properly. Their results indicated that, in the presence of asymmetry, the proposed model and the estimation algorithm perform better than the usual model and estimation methods. Our main goal in this paper is to propose another type of MHWGS algorithm based on a stochastic representation (hierarchical structure) of the SNCP studied in [N. Henze, A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271-275]. Our algorithm has only one Metropolis-Hastings step, in opposition to the algorithm developed by Azevedo et al., which has two such steps. This not only makes the implementation easier but also reduces the number of proposal densities to be used, which can be a problem in the implementation of MHWGS algorithms, as can be seen in [R.J. Patz and B.W. Junker, A straightforward approach to Markov Chain Monte Carlo methods for item response models, J. Educ. Behav. Stat. 24(2) (1999), pp. 146-178; R. J. Patz and B. W. Junker, The applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses, J. Educ. Behav. Stat. 24(4) (1999), pp. 342-366; A. Gelman, G.O. Roberts, and W.R. Gilks, Efficient Metropolis jumping rules, Bayesian Stat. 5 (1996), pp. 599-607]. Moreover, we consider a modified beta prior (which generalizes the one considered in [3]) and a Jeffreys prior for the asymmetry parameter. Furthermore, we study the sensitivity of such priors as well as the use of different kernel densities for this parameter. Finally, we assess the impact of the number of examinees, number of items and the asymmetry level on the parameter recovery. Results of the simulation study indicated that our approach performed equally as well as that in [3], in terms of parameter recovery, mainly using the Jeffreys prior. Also, they indicated that the asymmetry level has the highest impact on parameter recovery, even though it is relatively small. A real data analysis is considered jointly with the development of model fitting assessment tools. The results are compared with the ones obtained by Azevedo et al. The results indicate that using the hierarchical approach allows us to implement MCMC algorithms more easily, it facilitates diagnosis of the convergence and also it can be very useful to fit more complex skew IRT models.

The log-exponentiated generalized gamma regression model for censored data

Relevância:

80.00% 80.00%

Publicador:

Resumo:

For the first time, we introduce a generalized form of the exponentiated generalized gamma distribution [Cordeiro et al. The exponentiated generalized gamma distribution with application to lifetime data, J. Statist. Comput. Simul. 81 (2011), pp. 827-842.] that is the baseline for the log-exponentiated generalized gamma regression model. The new distribution can accommodate increasing, decreasing, bathtub- and unimodal-shaped hazard functions. A second advantage is that it includes classical distributions reported in the lifetime literature as special cases. We obtain explicit expressions for the moments of the baseline distribution of the new regression model. The proposed model can be applied to censored data since it includes as sub-models several widely known regression models. It therefore can be used more effectively in the analysis of survival data. We obtain maximum likelihood estimates for the model parameters by considering censored data. We show that our extended regression model is very useful by means of two applications to real data.

Relationship between global structural parameters and Enzyme Commission hierarchy: Implications for function prediction

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In protein databases there is a substantial number of proteins structurally determined but without function annotation. Understanding the relationship between function and structure can be useful to predict function on a large scale. We have analyzed the similarities in global physicochemical parameters for a set of enzymes which were classified according to the four Enzyme Commission (EC) hierarchical levels. Using relevance theory we introduced a distance between proteins in the space of physicochemical characteristics. This was done by minimizing a cost function of the metric tensor built to reflect the EC classification system. Using an unsupervised clustering method on a set of 1025 enzymes, we obtained no relevant clustering formation compatible with EC classification. The distance distributions between enzymes from the same EC group and from different EC groups were compared by histograms. Such analysis was also performed using sequence alignment similarity as a distance. Our results suggest that global structure parameters are not sufficient to segregate enzymes according to EC hierarchy. This indicates that features essential for function are rather local than global. Consequently, methods for predicting function based on global attributes should not obtain high accuracy in main EC classes prediction without relying on similarities between enzymes from training and validation datasets. Furthermore, these results are consistent with a substantial number of studies suggesting that function evolves fundamentally by recruitment, i.e., a same protein motif or fold can be used to perform different enzymatic functions and a few specific amino acids (AAs) are actually responsible for enzyme activity. These essential amino acids should belong to active sites and an effective method for predicting function should be able to recognize them. (C) 2012 Elsevier Ltd. All rights reserved.

«
1
2
3
4
5
6
»