476 resultados para Estimateur de Bayes
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The portfolio theory is a field of study devoted to investigate the decision-making by investors of resources. The purpose of this process is to reduce risk through diversification and thus guarantee a return. Nevertheless, the classical Mean-Variance has been criticized regarding its parameters and it is observed that the use of variance and covariance has sensitivity to the market and parameter estimation. In order to reduce the estimation errors, the Bayesian models have more flexibility in modeling, capable of insert quantitative and qualitative parameters about the behavior of the market as a way of reducing errors. Observing this, the present study aimed to formulate a new matrix model using Bayesian inference as a way to replace the covariance in the MV model, called MCB - Covariance Bayesian model. To evaluate the model, some hypotheses were analyzed using the method ex post facto and sensitivity analysis. The benchmarks used as reference were: (1) the classical Mean Variance, (2) the Bovespa index's market, and (3) in addition 94 investment funds. The returns earned during the period May 2002 to December 2009 demonstrated the superiority of MCB in relation to the classical model MV and the Bovespa Index, but taking a little more diversifiable risk that the MV. The robust analysis of the model, considering the time horizon, found returns near the Bovespa index, taking less risk than the market. Finally, in relation to the index of Mao, the model showed satisfactory, return and risk, especially in longer maturities. Some considerations were made, as well as suggestions for further work
Resumo:
Equipment maintenance is the major cost factor in industrial plants, it is very important the development of fault predict techniques. Three-phase induction motors are key electrical equipments used in industrial applications mainly because presents low cost and large robustness, however, it isn t protected from other fault types such as shorted winding and broken bars. Several acquisition ways, processing and signal analysis are applied to improve its diagnosis. More efficient techniques use current sensors and its signature analysis. In this dissertation, starting of these sensors, it is to make signal analysis through Park s vector that provides a good visualization capability. Faults data acquisition is an arduous task; in this way, it is developed a methodology for data base construction. Park s transformer is applied into stationary reference for machine modeling of the machine s differential equations solution. Faults detection needs a detailed analysis of variables and its influences that becomes the diagnosis more complex. The tasks of pattern recognition allow that systems are automatically generated, based in patterns and data concepts, in the majority cases undetectable for specialists, helping decision tasks. Classifiers algorithms with diverse learning paradigms: k-Neighborhood, Neural Networks, Decision Trees and Naïves Bayes are used to patterns recognition of machines faults. Multi-classifier systems are used to improve classification errors. It inspected the algorithms homogeneous: Bagging and Boosting and heterogeneous: Vote, Stacking and Stacking C. Results present the effectiveness of constructed model to faults modeling, such as the possibility of using multi-classifiers algorithm on faults classification
Resumo:
One of the most important goals of bioinformatics is the ability to identify genes in uncharacterized DNA sequences on world wide database. Gene expression on prokaryotes initiates when the RNA-polymerase enzyme interacts with DNA regions called promoters. In these regions are located the main regulatory elements of the transcription process. Despite the improvement of in vitro techniques for molecular biology analysis, characterizing and identifying a great number of promoters on a genome is a complex task. Nevertheless, the main drawback is the absence of a large set of promoters to identify conserved patterns among the species. Hence, a in silico method to predict them on any species is a challenge. Improved promoter prediction methods can be one step towards developing more reliable ab initio gene prediction methods. In this work, we present an empirical comparison of Machine Learning (ML) techniques such as Na¨ýve Bayes, Decision Trees, Support Vector Machines and Neural Networks, Voted Perceptron, PART, k-NN and and ensemble approaches (Bagging and Boosting) to the task of predicting Bacillus subtilis. In order to do so, we first built two data set of promoter and nonpromoter sequences for B. subtilis and a hybrid one. In order to evaluate of ML methods a cross-validation procedure is applied. Good results were obtained with methods of ML like SVM and Naïve Bayes using B. subtilis. However, we have not reached good results on hybrid database
Resumo:
Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The phylogeny is one of the main activities of the modern taxonomists and a way to reconstruct the history of the life through comparative analysis of these sequences stored in their genomes aimed find any justification for the origin or evolution of them. Among the sequences with a high level of conservation are the genes of repair because it is important for the conservation and maintenance of genetic stability. Hence, variations in repair genes, as the genes of the nucleotide excision repair (NER), may indicate a possible gene transfer between species. This study aimed to examine the evolutionary history of the components of the NER. For this, sequences of UVRA, UVRB, UVRC and XPB were obtained from GenBank by Blast-p, considering 10-15 as cutoff to create a database. Phylogenetic studies were done using algorithms in PAUP programs, BAYES and PHYLIP package. Phylogenetic trees were build with protein sequences and with sequences of 16S ribosomal RNA for comparative analysis by the methods of parsimony, likelihood and Bayesian. The XPB tree shows that archaeal´s XPB helicases are similar to eukaryotic helicases. According to this data, we infer that the eukaryote nucleotide excision repair system had appeared in Archaea. At UVRA, UVRB and UVRC trees was found a monophyletic group formed by three species of epsilonproteobacterias class, three species of mollicutes class and archaeabacterias of Methanobacteria and Methanococci classes. This information is supported by a tree obtained with the proteins, UVRA, UVRB and UVRC concatenated. Thus, although there are arguments in the literature defending the horizontal transfer of the system uvrABC of bacteria to archaeabacterias, the analysis made in this study suggests that occurred a vertical transfer, from archaeabacteria, of both the NER genes: uvrABC and XPs. According the parsimony, this is the best way because of the occurrence of monophyletic groups, the time of divergence of classes and number of archaeabacterias species with uvrABC system
Resumo:
The objective of the researches in artificial intelligence is to qualify the computer to execute functions that are performed by humans using knowledge and reasoning. This work was developed in the area of machine learning, that it s the study branch of artificial intelligence, being related to the project and development of algorithms and techniques capable to allow the computational learning. The objective of this work is analyzing a feature selection method for ensemble systems. The proposed method is inserted into the filter approach of feature selection method, it s using the variance and Spearman correlation to rank the feature and using the reward and punishment strategies to measure the feature importance for the identification of the classes. For each ensemble, several different configuration were used, which varied from hybrid (homogeneous) to non-hybrid (heterogeneous) structures of ensemble. They were submitted to five combining methods (voting, sum, sum weight, multiLayer Perceptron and naïve Bayes) which were applied in six distinct database (real and artificial). The classifiers applied during the experiments were k- nearest neighbor, multiLayer Perceptron, naïve Bayes and decision tree. Finally, the performance of ensemble was analyzed comparatively, using none feature selection method, using a filter approach (original) feature selection method and the proposed method. To do this comparison, a statistical test was applied, which demonstrate that there was a significant improvement in the precision of the ensembles
Resumo:
We considered prediction techniques based on models of accelerated failure time with random e ects for correlated survival data. Besides the bayesian approach through empirical Bayes estimator, we also discussed about the use of a classical predictor, the Empirical Best Linear Unbiased Predictor (EBLUP). In order to illustrate the use of these predictors, we considered applications on a real data set coming from the oil industry. More speci - cally, the data set involves the mean time between failure of petroleum-well equipments of the Bacia Potiguar. The goal of this study is to predict the risk/probability of failure in order to help a preventive maintenance program. The results show that both methods are suitable to predict future failures, providing good decisions in relation to employment and economy of resources for preventive maintenance.
Resumo:
Um modelo bayesiano de regressão binária é desenvolvido para predizer óbito hospitalar em pacientes acometidos por infarto agudo do miocárdio. Métodos de Monte Carlo via Cadeias de Markov (MCMC) são usados para fazer inferência e validação. Uma estratégia para construção de modelos, baseada no uso do fator de Bayes, é proposta e aspectos de validação são extensivamente discutidos neste artigo, incluindo a distribuição a posteriori para o índice de concordância e análise de resíduos. A determinação de fatores de risco, baseados em variáveis disponíveis na chegada do paciente ao hospital, é muito importante para a tomada de decisão sobre o curso do tratamento. O modelo identificado se revela fortemente confiável e acurado, com uma taxa de classificação correta de 88% e um índice de concordância de 83%.
Resumo:
A body of research has developed within the context of nonlinear signal and image processing that deals with the automatic, statistical design of digital window-based filters. Based on pairs of ideal and observed signals, a filter is designed in an effort to minimize the error between the ideal and filtered signals. The goodness of an optimal filter depends on the relation between the ideal and observed signals, but the goodness of a designed filter also depends on the amount of sample data from which it is designed. In order to lessen the design cost, a filter is often chosen from a given class of filters, thereby constraining the optimization and increasing the error of the optimal filter. To a great extent, the problem of filter design concerns striking the correct balance between the degree of constraint and the design cost. From a different perspective and in a different context, the problem of constraint versus sample size has been a major focus of study within the theory of pattern recognition. This paper discusses the design problem for nonlinear signal processing, shows how the issue naturally transitions into pattern recognition, and then provides a review of salient related pattern-recognition theory. In particular, it discusses classification rules, constrained classification, the Vapnik-Chervonenkis theory, and implications of that theory for morphological classifiers and neural networks. The paper closes by discussing some design approaches developed for nonlinear signal processing, and how the nature of these naturally lead to a decomposition of the error of a designed filter into a sum of the following components: the Bayes error of the unconstrained optimal filter, the cost of constraint, the cost of reducing complexity by compressing the original signal distribution, the design cost, and the contribution of prior knowledge to a decrease in the error. The main purpose of the paper is to present fundamental principles of pattern recognition theory within the framework of active research in nonlinear signal processing.
Resumo:
Several statistical models can be used for assessing genotype X environment interaction (GEI) and studying genotypic stability. The objectives of this research were to show how (i) to use Bayesian methodology for computing Shukla's phenotypic stability variance and (ii) to incorporate prior information on the parameters for better estimation. Potato [Solanum tuberosum subsp. andigenum (Juz. & Bukasov) Hawkes], wheat (Triticum aestivum L.), and maize (Zea mays L.) multi environment trials (MET) were used for illustrating the application of the Bayes paradigm. The potato trial included 15 genotypes, but prior information for just three genotypes was used. The wheat trial used prior information on all 10 genotypes included in the trial, whereas for the maize trial, noninformative priors for the nine genotypes was used. Concerning the posterior distribution of the genotypic means, the maize MET with 20 sites gave less disperse posterior distributions of the genotypic means than did the posterior distribution of the genotypic means of the other METs, which included fewer environments. The Bayesian approach allows use of other statistical strategies such as the normal truncated distribution (used in this study). When analyzing grain yield, a lower bound of zero and an upper bound set by the researcher's experience can be used. The Bayesian paradigm offers plant breeders the possibility of computing the probability of a genotype being the best performer. The results of this study show that although some genotypes may have a very low probability of being the best in all sites, they have a relatively good chance of being among the five highest yielding genotypes.
Resumo:
Making diagnoses in oral pathology are often difficult and confusing in dental practice, especially for the lessexperienced dental student. One of the most promising areas in bioinformatics is computer-aided diagnosis, where a computer system is capable of imitating human reasoning ability and provides diagnoses with an accuracy approaching that of expert professionals. This type of system could be an alternative tool for assisting dental students to overcome the difficulties of the oral pathology learning process. This could allow students to define variables and information, important to improving the decision-making performance. However, no current open data management system has been integrated with an artificial intelligence system in a user-friendly environment. Such a system could also be used as an education tool to help students perform diagnoses. The aim of the present study was to develop and test an open case-based decisionsupport system.Methods: An open decision-support system based on Bayes' theorem connected to a relational database was developed using the C++ programming language. The software was tested in the computerisation of a surgical pathology service and in simulating the diagnosis of 43 known cases of oral bone disease. The simulation was performed after the system was initially filled with data from 401 cases of oral bone disease.Results: the system allowed the authors to construct and to manage a pathology database, and to simulate diagnoses using the variables from the database.Conclusion: Combining a relational database and an open decision-support system in the same user-friendly environment proved effective in simulating diagnoses based on information from an updated database.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
The advent of molecular markers has created opportunities for a better understanding of quantitative inheritance and for developing novel strategies for genetic improvement of agricultural species, using information on quantitative trait loci (QTL). A QTL analysis relies on accurate genetic marker maps. At present, most statistical methods used for map construction ignore the fact that molecular data may be read with error. Often, however, there is ambiguity about some marker genotypes. A Bayesian MCMC approach for inferences about a genetic marker map when random miscoding of genotypes occurs is presented, and simulated and real data sets are analyzed. The results suggest that unless there is strong reason to believe that genotypes are ascertained without error, the proposed approach provides more reliable inference on the genetic map.