900 resultados para computational statistics
Resumo:
Motivation: A consensus sequence for a family of related sequences is, as the name suggests, a sequence that captures the features common to most members of the family. Consensus sequences are important in various DNA sequencing applications and are a convenient way to characterize a family of molecules. Results: This paper describes a new algorithm for finding a consensus sequence, using the popular optimization method known as simulated annealing. Unlike the conventional approach of finding a consensus sequence by first forming a multiple sequence alignment, this algorithm searches for a sequence that minimises the sum of pairwise distances to each of the input sequences. The resulting consensus sequence can then be used to induce a multiple sequence alignment. The time required by the algorithm scales linearly with the number of input sequences and quadratically with the length of the consensus sequence. We present results demonstrating the high quality of the consensus sequences and alignments produced by the new algorithm. For comparison, we also present similar results obtained using ClustalW. The new algorithm outperforms ClustalW in many cases.
Resumo:
We present a novel maximum-likelihood-based algorithm for estimating the distribution of alignment scores from the scores of unrelated sequences in a database search. Using a new method for measuring the accuracy of p-values, we show that our maximum-likelihood-based algorithm is more accurate than existing regression-based and lookup table methods. We explore a more sophisticated way of modeling and estimating the score distributions (using a two-component mixture model and expectation maximization), but conclude that this does not improve significantly over simply ignoring scores with small E-values during estimation. Finally, we measure the classification accuracy of p-values estimated in different ways and observe that inaccurate p-values can, somewhat paradoxically, lead to higher classification accuracy. We explain this paradox and argue that statistical accuracy, not classification accuracy, should be the primary criterion in comparisons of similarity search methods that return p-values that adjust for target sequence length.
Resumo:
In computer simulations of smooth dynamical systems, the original phase space is replaced by machine arithmetic, which is a finite set. The resulting spatially discretized dynamical systems do not inherit all functional properties of the original systems, such as surjectivity and existence of absolutely continuous invariant measures. This can lead to computational collapse to fixed points or short cycles. The paper studies loss of such properties in spatial discretizations of dynamical systems induced by unimodal mappings of the unit interval. The problem reduces to studying set-valued negative semitrajectories of the discretized system. As the grid is refined, the asymptotic behavior of the cardinality structure of the semitrajectories follows probabilistic laws corresponding to a branching process. The transition probabilities of this process are explicitly calculated. These results are illustrated by the example of the discretized logistic mapping.
Resumo:
Motivation: A major issue in cell biology today is how distinct intracellular regions of the cell, like the Golgi Apparatus, maintain their unique composition of proteins and lipids. The cell differentially separates Golgi resident proteins from proteins that move through the organelle to other subcellular destinations. We set out to determine if we could distinguish these two types of transmembrane proteins using computational approaches. Results: A new method has been developed to predict Golgi membrane proteins based on their transmembrane domains. To establish the prediction procedure, we took the hydrophobicity values and frequencies of different residues within the transmembrane domains into consideration. A simple linear discriminant function was developed with a small number of parameters derived from a dataset of Type II transmembrane proteins of known localization. This can discriminate between proteins destined for Golgi apparatus or other locations (post-Golgi) with a success rate of 89.3% or 85.2%, respectively on our redundancy-reduced data sets.
Resumo:
We consider a mixture model approach to the regression analysis of competing-risks data. Attention is focused on inference concerning the effects of factors on both the probability of occurrence and the hazard rate conditional on each of the failure types. These two quantities are specified in the mixture model using the logistic model and the proportional hazards model, respectively. We propose a semi-parametric mixture method to estimate the logistic and regression coefficients jointly, whereby the component-baseline hazard functions are completely unspecified. Estimation is based on maximum likelihood on the basis of the full likelihood, implemented via an expectation-conditional maximization (ECM) algorithm. Simulation studies are performed to compare the performance of the proposed semi-parametric method with a fully parametric mixture approach. The results show that when the component-baseline hazard is monotonic increasing, the semi-parametric and fully parametric mixture approaches are comparable for mildly and moderately censored samples. When the component-baseline hazard is not monotonic increasing, the semi-parametric method consistently provides less biased estimates than a fully parametric approach and is comparable in efficiency in the estimation of the parameters for all levels of censoring. The methods are illustrated using a real data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Copyright (C) 2003 John Wiley Sons, Ltd.
Resumo:
Signal peptides and transmembrane helices both contain a stretch of hydrophobic amino acids. This common feature makes it difficult for signal peptide and transmembrane helix predictors to correctly assign identity to stretches of hydrophobic residues near the N-terminal methionine of a protein sequence. The inability to reliably distinguish between N-terminal transmembrane helix and signal peptide is an error with serious consequences for the prediction of protein secretory status or transmembrane topology. In this study, we report a new method for differentiating protein N-terminal signal peptides and transmembrane helices. Based on the sequence features extracted from hydrophobic regions (amino acid frequency, hydrophobicity, and the start position), we set up discriminant functions and examined them on non-redundant datasets with jackknife tests. This method can incorporate other signal peptide prediction methods and achieve higher prediction accuracy. For Gram-negative bacterial proteins, 95.7% of N-terminal signal peptides and transmembrane helices can be correctly predicted (coefficient 0.90). Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 99% (coefficient 0.92). For eukaryotic proteins, 94.2% of N-terminal signal peptides and transmembrane helices can be correctly predicted with coefficient 0.83. Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 87% (coefficient 0.85). The method can be used to complement current transmembrane protein prediction and signal peptide prediction methods to improve their prediction accuracies. (C) 2003 Elsevier Inc. All rights reserved.
Resumo:
O cultivo do café é uma das atividades do agronegócio de maior importância socioeconômica dentre as diferentes atividades ligadas ao comércio agrícola mundial. Uma das maiores contribuições da genética quantitativa para o melhoramento genético é a possibilidade de prever ganhos genéticos. Quando diferentes critérios de seleção são considerados, a predição de ganhos referentes a cada critério tem grande importância, pois indica os melhoristas sobre como utilizar o material genético disponível, visando obter o máximo de ganhos possível para as características de interesse. O presente trabalho foi instalado em julho de 2004, na Fazenda Experimental de Bananal do Norte, conduzida pelo Incaper, no distrito de Pacotuba, município de Cachoeiro de Itapemirim, região Sul do Estado, com o objetivo de selecionar as melhores plantas entre e dentro de progênies de meios- irmãos de Coffea canephora, por meio de diferentes critérios de seleção. Foram realizadas análises de variância individuais e conjuntas para 26 progênies de meios- irmãos Coffea canephora. O delineamento experimental utilizado foi em blocos ao acaso com quatro testemunhas adicionais com quatro repetições e parcela composta por cinco plantas, com o espaçamento de 3,0 m x 1,2 m. Neste trabalho, considerou-se os dados das últimas cinco colheitas. As características mensuradas foram: florescimento, maturação, tamanho do grão, peso, porte, vigor, ferrugem, mancha cercóspora, seca de ponteiros, escala geral, porcentagem de frutos boia e bicho mineiro. Todas as análises estatísticas foram realizadas com o aplicativo computacional em genética e estatística (GENES). Foram estimados os ganhos de seleção em função da porcentagem de seleção de 20% entre e dentro, sendo as mesmas mantidas para todas as características. Todas as características foram submetidas a seleção no sentido positivo, exceto para florescimento, porte, ferrugem, mancha cercóspora, seca de ponteiros, porcentagem de frutos boia e bicho mineiro, para obter decréscimo em suas médias originais. Os critérios de seleção estudados foram: seleção convencional entre e dentro das famílias, índice de seleção combinada, seleção massal e seleção massal estratificada. Esta dissertação é composta por dois capítulos, em que foram realizadas análises biométricas, como a obtenção de estimativas de parâmetros genéticos. Na maioria das características estudadas, verificaram-se diferenças significativas (P<0,05) para genótipos que, associados aos coeficientes de variação genotípicos e também ao coeficiente de determinação genotípico e à relação CVg/CVe, indicam a existência de variabilidade genética nos materiais genéticos para a maioria das características e condições favoráveis para obtenção de ganhos genéticos pela seleção. Essas características também foram correlacionadas. Os dados foram submetidos às análises de variância e multivariada, aplicando-se a técnica de agrupamento e UPGMA, teste de médias e estudo de correlações. Na técnica de agrupamento, foi utilizada a distância generalizada de Mahalanobis como medida de dissimilaridade, e na delimitação dos grupos, o método de Tocher. Foi encontrada diversidade genética para as características associadas à qualidade fisiológica, mobilização de reserva das sementes, dimensões e biomassa das plântulas. Quatro grupos de genótipos puderam ser formados. Peso de massa seca de sementes, redução de reserva de sementes e peso de massa seca de plântulas estão positivamente correlacionados entre si, enquanto a redução de reserva das sementes e a eficiência na conversão dessas reservas em plântulas estão negativamente correlacionadas. De acordo com os resultados obtidos, verificou-se que todas as características apresentaram níveis diferenciados de variabilidade genética e os critérios de seleção utilizados mostraram-se eficientes para o melhoramento, no qual o índice de seleção combinada é o critério de seleção que apresentou os melhores resultados em termos de ganhos, sendo indicado como critério mais apropriado para o melhoramento genético da população estudada. Nos estudos de correlações, em 70% dos casos, a correlação fenotípica foi superior à genotípica, mostrando maior influência dos fatores ambientais em relação aos genotípicos e condições propícias ao melhoramento dos diferentes caracteres. No estudo de divergência genética, observou-se que pelo agrupamento de genótipos, pela técnica de Tocher, indicou que os genótipos foram distribuídos em três grupos.
Computational evaluation of hydraulic system behaviour with entrapped air under rapid pressurization
Resumo:
The pressurization of hydraulic systems containing entrapped air is considered a critical condition for the infrastructure's security due to transient pressure variations often occurred. The objective of the present study is the computational evaluation of trends observed in variation of maximum surge pressure resulting from rapid pressurizations. The comparison of the results with those obtained in previous studies is also undertaken. A brief state of art in this domain is presented. This research work is applied to an experimental system having entrapped air in the top of a vertical pipe section. The evaluation is developed through the elastic model based on the method of characteristics, considering a moving liquid boundary, with the results being compared with those achieved with the rigid liquid column model.
Computational evaluation of hydraulic system behaviour with entrapped air under rapid pressurization
Resumo:
The pressurization of hydraulic systems containing entrapped air is considered a critical condition for the infrastructure's security due to transient pressure variations often occurred. The objective of the present study is the computational evaluation of trends observed in variation of maximum surge pressure resulting from rapid pressurizations. The comparison of the results with those obtained in previous studies is also undertaken. A brief state of art in this domain is presented. This research work is applied to an experimental system having entrapped air in the top of a vertical pipe section. The evaluation is developed through the elastic model based on the method of characteristics, considering a moving liquid boundary, with the results being compared with those achieved with the rigid liquid column model.
Resumo:
Power system planning, control and operation require an adequate use of existing resources as to increase system efficiency. The use of optimal solutions in power systems allows huge savings stressing the need of adequate optimization and control methods. These must be able to solve the envisaged optimization problems in time scales compatible with operational requirements. Power systems are complex, uncertain and changing environments that make the use of traditional optimization methodologies impracticable in most real situations. Computational intelligence methods present good characteristics to address this kind of problems and have already proved to be efficient for very diverse power system optimization problems. Evolutionary computation, fuzzy systems, swarm intelligence, artificial immune systems, neural networks, and hybrid approaches are presently seen as the most adequate methodologies to address several planning, control and operation problems in power systems. Future power systems, with intensive use of distributed generation and electricity market liberalization increase power systems complexity and bring huge challenges to the forefront of the power industry. Decentralized intelligence and decision making requires more effective optimization and control techniques techniques so that the involved players can make the most adequate use of existing resources in the new context. The application of computational intelligence methods to deal with several problems of future power systems is presented in this chapter. Four different applications are presented to illustrate the promises of computational intelligence, and illustrate their potentials.
Resumo:
A quinoxalina e seus derivativos são uma importante classe de compostos heterocíclicos, onde os elementos N, S e O substituem átomos de carbono no anel. A fórmula molecular da quinoxalina é C8H6N2, formada por dois anéis aromáticos, benzeno e pirazina. É rara em estado natural, mas a sua síntese é de fácil execução. Modificações na estrutura da quinoxalina proporcionam uma grande variedade de compostos e actividades, tais como actividades antimicrobiana, antiparasitária, antidiabética, antiproliferativa, anti-inflamatória, anticancerígena, antiglaucoma, antidepressiva apresentando antagonismo do receptor AMPA. Estes compostos também são importantes no campo industrial devido, por exemplo, ao seu poder na inibição da corrosão do metal. A química computacional, ramo natural da química teórica é um método bem desenvolvido, utilizado para representar estruturas moleculares, simulando o seu comportamento com as equações da física quântica e clássica. Existe no mercado uma grande variedade de ferramentas informaticas utilizadas na química computacional, que permitem o cálculo de energias, geometrias, frequências vibracionais, estados de transição, vias de reação, estados excitados e uma variedade de propriedades baseadas em várias funções de onda não correlacionadas e correlacionadas. Nesta medida, a sua aplicação ao estudo das quinoxalinas é importante para a determinação das suas características químicas, permitindo uma análise mais completa, em menos tempo, e com menos custos.
Resumo:
The present study, covering students from public schools and a private school on the island of São Miguel (Azores, Portugal), aims to meet the difficulties of the students of the 3rd and 4th years of the primary education in solving tasks involving construction, reading and interpreting tables and statistical graphs, in the context of Organization and Data Handling (ODH). We present the main results obtained from statistical methods, among which we highlight some non-parametric hypothesis tests and the Categorical Principal Component Analysis (CatPCA), given the nature of the variables included in the questionnaire (mostly nominal and ordinal variables).
Resumo:
Microarray allow to monitoring simultaneously thousands of genes, where the abundance of the transcripts under a same experimental condition at the same time can be quantified. Among various available array technologies, double channel cDNA microarray experiments have arisen in numerous technical protocols associated to genomic studies, which is the focus of this work. Microarray experiments involve many steps and each one can affect the quality of raw data. Background correction and normalization are preprocessing techniques to clean and correct the raw data when undesirable fluctuations arise from technical factors. Several recent studies showed that there is no preprocessing strategy that outperforms others in all circumstances and thus it seems difficult to provide general recommendations. In this work, it is proposed to use exploratory techniques to visualize the effects of preprocessing methods on statistical analysis of cancer two-channel microarray data sets, where the cancer types (classes) are known. For selecting differential expressed genes the arrow plot was used and the graph of profiles resultant from the correspondence analysis for visualizing the results. It was used 6 background methods and 6 normalization methods, performing 36 pre-processing methods and it was analyzed in a published cDNA microarray database (Liver) available at http://genome-www5.stanford.edu/ which microarrays were already classified by cancer type. All statistical analyses were performed using the R statistical software.
Resumo:
Computational Intelligence (CI) includes four main areas: Evolutionary Computation (genetic algorithms and genetic programming), Swarm Intelligence, Fuzzy Systems and Neural Networks. This article shows how CI techniques overpass the strict limits of Artificial Intelligence field and can help solving real problems from distinct engineering areas: Mechanical, Computer Science and Electrical Engineering.
Resumo:
This paper presents a computational tool (PHEx) developed in Excel VBA for solving sizing and rating design problems involving Chevron type plate heat exchangers (PHE) with 1-pass-1-pass configuration. The rating methodology procedure used in the program is outlined, and a case study is presented with the purpose to show how the program can be used to develop sensitivity analysis to several dimensional parameters of PHE and to observe their effect on transferred heat and pressure drop.