939 resultados para tree-augmented-Naive Bayes structure


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The accompanying collective research report is the result of the research project in 1986­90 between The Finnish Academy and the former Soviet Academy of Sciences. The project was organized around common field work in Finland and in the former Soviet Union and theoretical analyses of tree growth determining processes. Based on theoretical analyses, dynamic stand growth models were made and their parameters were determined utilizing the field results. Annual cycle affects the tree growth. Our theoretical approach was based on adaptation to local climate conditions from Lapland to South Russia. The initiation of growth was described as a simple low and high temperature accumulation driven model. Linking the theoretical model with long term temperature data allowed us to analyze what type of temperature response produced favorable outcome in different climates. Initiation of growth consumes the carbohydrate reserves in plants. We measured the dynamics of insoluble and soluble sugars in the very northern and Karelian conditions. Clear cyclical pattern was observed but the differences between locations were surprisingly small. Analysis of field measurements of CO2 exchange showed that irradiance is the dominating factor causing variation in photosynthetic rate in natural conditions during summer. The effect of other factors is so small that they can be omitted without any considerable loss of accuracy. A special experiment carried out in Hyytiälä showed that the needle living space, defined as the ratio between the shoot cylindric volume and needle surface area, correlates with the shoot photosynthesis. The penetration of irradiance into Scots pine canopy is a complicated phenomenon because of the movement of the sun on the sky and the complicated structure of branches and needles. A moderately simple but balanced forest radiation regime submodel was constructed. It consists of the tree crown and forest structure, the gap probability calculation and the consideration of spatial and temporal variation of radiation inside the forest. The common field excursions in different geographical regions resulted in a lot of experimental data of regularities of woody structures. The water transport seems to be a good common factor to analyse these properties of tree structure. There are evident regressions between cross-sectional areas measured at different locations along the water pathway from fine roots to needles. The observed regressions have clear geographical trends. For example, the same cross-sectional area can support three times higher needle mass in South Russia than in Lapland. Geographical trends can also be seen in shoot and needle structure. Analysis of data published by several Russian authors show, that one ton of needles transpire 42 ton of water a year. This annual amount of transpiration seems to be independent of geographical location, year and site conditions. The produced theoretical and experimental material is utilised in the development of stand growth model that describes the growth and development of Scots pine stands in Finland and the former Soviet Union. The core of the model is carbon and nutrient balances. This means that carbon obtained in photosynthesis is consumed for growth and maintenance and nutrients are taken according to the metabolic needs. The annual photosynthetic production by trees in the stand is determined as a function of irradiance and shading during the active period. The utilisation of the annual photosynthetic production to the growth of different components of trees is based on structural regularities. Since the fundamental metabolic processes are the same in all locations the same growth model structure can be applied in the large range of Scots pine. The annual photosynthetic production and structural regularities determining the allocation of resources have geographical features. The common field measurements enable the application of the model to the analysis of growth and development of stands growing on the five locations of experiments. The model enables the analysis of geographical differences in the growth of Scots pine. For example, the annual photosynthetic production of a 100-year-old stand at Voronez is 3.5 times higher than in Lapland. The share consumed to needle growth (30 %) and to growth of branches (5 %) seems to be the same in all locations. In contrast, the share of fine roots is decreasing when moving from north to south. It is 20 % in Lapland, 15 % in Hyytiälä Central Finland and Kentjärvi Karelia and 15 % in Voronez South Russia. The stem masses (115­113 ton/ha) are rather similar in Hyytiälä, Kentjärvi and Voronez, but rather low (50 ton/ha) in Lapland. In Voronez the height of the trees reach 29 m being in Hyytiälä and Kentjärvi 22 m and in Lapland only 14 m. The present approach enables utilization of structural and functional knowledge, gained in places of intensive research, in the analysis of growth and development of any stand. This opens new possibilities for growth research and also for applications in forestry practice.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Maximum entropy approach to classification is very well studied in applied statistics and machine learning and almost all the methods that exists in literature are discriminative in nature. In this paper, we introduce a maximum entropy classification method with feature selection for large dimensional data such as text datasets that is generative in nature. To tackle the curse of dimensionality of large data sets, we employ conditional independence assumption (Naive Bayes) and we perform feature selection simultaneously, by enforcing a `maximum discrimination' between estimated class conditional densities. For two class problems, in the proposed method, we use Jeffreys (J) divergence to discriminate the class conditional densities. To extend our method to the multi-class case, we propose a completely new approach by considering a multi-distribution divergence: we replace Jeffreys divergence by Jensen-Shannon (JS) divergence to discriminate conditional densities of multiple classes. In order to reduce computational complexity, we employ a modified Jensen-Shannon divergence (JS(GM)), based on AM-GM inequality. We show that the resulting divergence is a natural generalization of Jeffreys divergence to a multiple distributions case. As far as the theoretical justifications are concerned we show that when one intends to select the best features in a generative maximum entropy approach, maximum discrimination using J-divergence emerges naturally in binary classification. Performance and comparative study of the proposed algorithms have been demonstrated on large dimensional text and gene expression datasets that show our methods scale up very well with large dimensional datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The 2011 outburst of the black hole candidate IGR J17091-3624 followed the canonical track of state transitions along with the evolution of quasi-periodic oscillation (QPO) frequencies before it began exhibiting various variability classes similar to GRS 1915+105. We use this canonical evolution of spectral and temporal properties to determine the mass of IGR J17091-3624, using three different methods: photon index (Gamma)-QPO frequency (nu) correlation, QPO frequency (nu)-time (day) evolution, and broadband spectral modeling based on two-component advective flow (TCAF). We provide a combined mass estimate for the source using a naive Bayes based joint likelihood approach. This gives a probable mass range of 11.8 M-circle dot-13.7 M-circle dot. Considering each individual estimate and taking the lowermost and uppermost bounds among all three methods, we get a mass range of 8.7 M-circle dot-15.6 M-circle dot with 90% confidence. We discuss the possible implications of our findings in the context of two-component accretion flow.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

An increasing number of parameter estimation tasks involve the use of at least two information sources, one complete but limited, the other abundant but incomplete. Standard algorithms such as EM (or em) used in this context are unfortunately not stable in the sense that they can lead to a dramatic loss of accuracy with the inclusion of incomplete observations. We provide a more controlled solution to this problem through differential equations that govern the evolution of locally optimal solutions (fixed points) as a function of the source weighting. This approach permits us to explicitly identify any critical (bifurcation) points leading to choices unsupported by the available complete data. The approach readily applies to any graphical model in O(n^3) time where n is the number of parameters. We use the naive Bayes model to illustrate these ideas and demonstrate the effectiveness of our approach in the context of text classification problems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

C. Shang and Q. Shen. Aiding classification of gene expression data with feature selection: a comparative study. Computational Intelligence Research, 1(1):68-76.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The effect of different factors (spawning biomass, environmental conditions) on recruitment is a subject of great importance in the management of fisheries, recovery plans and scenario exploration. In this study, recently proposed supervised classification techniques, tested by the machine-learning community, are applied to forecast the recruitment of seven fish species of North East Atlantic (anchovy, sardine, mackerel, horse mackerel, hake, blue whiting and albacore), using spawning, environmental and climatic data. In addition, the use of the probabilistic flexible naive Bayes classifier (FNBC) is proposed as modelling approach in order to reduce uncertainty for fisheries management purposes. Those improvements aim is to improve probability estimations of each possible outcome (low, medium and high recruitment) based in kernel density estimation, which is crucial for informed management decision making with high uncertainty. Finally, a comparison between goodness-of-fit and generalization power is provided, in order to assess the reliability of the final forecasting models. It is found that in most cases the proposed methodology provides useful information for management whereas the case of horse mackerel is an example of the limitations of the approach. The proposed improvements allow for a better probabilistic estimation of the different scenarios, i.e. to reduce the uncertainty in the provided forecasts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Classification methods with embedded feature selection capability are very appealing for the analysis of complex processes since they allow the analysis of root causes even when the number of input variables is high. In this work, we investigate the performance of three techniques for classification within a Monte Carlo strategy with the aim of root cause analysis. We consider the naive bayes classifier and the logistic regression model with two different implementations for controlling model complexity, namely, a LASSO-like implementation with a L1 norm regularization and a fully Bayesian implementation of the logistic model, the so called relevance vector machine. Several challenges can arise when estimating such models mainly linked to the characteristics of the data: a large number of input variables, high correlation among subsets of variables, the situation where the number of variables is higher than the number of available data points and the case of unbalanced datasets. Using an ecological and a semiconductor manufacturing dataset, we show advantages and drawbacks of each method, highlighting the superior performance in term of classification accuracy for the relevance vector machine with respect to the other classifiers. Moreover, we show how the combination of the proposed techniques and the Monte Carlo approach can be used to get more robust insights into the problem under analysis when faced with challenging modelling conditions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Credal networks are graph-based statistical models whose parameters take values in a set, instead of being sharply specified as in traditional statistical models (e.g., Bayesian networks). The computational complexity of inferences on such models depends on the irrelevance/independence concept adopted. In this paper, we study inferential complexity under the concepts of epistemic irrelevance and strong independence. We show that inferences under strong independence are NP-hard even in trees with binary variables except for a single ternary one. We prove that under epistemic irrelevance the polynomial-time complexity of inferences in credal trees is not likely to extend to more general models (e.g., singly connected topologies). These results clearly distinguish networks that admit efficient inferences and those where inferences are most likely hard, and settle several open questions regarding their computational complexity. We show that these results remain valid even if we disallow the use of zero probabilities. We also show that the computation of bounds on the probability of the future state in a hidden Markov model is the same whether we assume epistemic irrelevance or strong independence, and we prove an analogous result for inference in Naive Bayes structures. These inferential equivalences are important for practitioners, as hidden Markov models and Naive Bayes networks are used in real applications of imprecise probability.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação para obtenção do grau de Mestre em Engenharia Informática

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multi- resolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We compare Naive Bayes and Support Vector Machines on the task of multiclass text classification. Using a variety of approaches to combine the underlying binary classifiers, we find that SVMs substantially outperform Naive Bayes. We present full multiclass results on two well-known text data sets, including the lowest error to date on both data sets. We develop a new indicator of binary performance to show that the SVM's lower multiclass error is a result of its improved binary performance. Furthermore, we demonstrate and explore the surprising result that one-vs-all classification performs favorably compared to other approaches even though it has no error-correcting properties.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Accurate single trial P300 classification lends itself to fast and accurate control of Brain Computer Interfaces (BCIs). Highly accurate classification of single trial P300 ERPs is achieved by characterizing the EEG via corresponding stationary and time-varying Wackermann parameters. Subsets of maximally discriminating parameters are then selected using the Network Clustering feature selection algorithm and classified with Naive-Bayes and Linear Discriminant Analysis classifiers. Hence the method is assessed on two different data-sets from BCI competitions and is shown to produce accuracies of between approximately 70% and 85%. This is promising for the use of Wackermann parameters as features in the classification of single-trial ERP responses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Distributed and collaborative data stream mining in a mobile computing environment is referred to as Pocket Data Mining PDM. Large amounts of available data streams to which smart phones can subscribe to or sense, coupled with the increasing computational power of handheld devices motivates the development of PDM as a decision making system. This emerging area of study has shown to be feasible in an earlier study using technological enablers of mobile software agents and stream mining techniques [1]. A typical PDM process would start by having mobile agents roam the network to discover relevant data streams and resources. Then other (mobile) agents encapsulating stream mining techniques visit the relevant nodes in the network in order to build evolving data mining models. Finally, a third type of mobile agents roam the network consulting the mining agents for a final collaborative decision, when required by one or more users. In this paper, we propose the use of distributed Hoeffding trees and Naive Bayes classifers in the PDM framework over vertically partitioned data streams. Mobile policing, health monitoring and stock market analysis are among the possible applications of PDM. An extensive experimental study is reported showing the effectiveness of the collaborative data mining with the two classifers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper a custom classification algorithm based on linear discriminant analysis and probability-based weights is implemented and applied to the hippocampus measurements of structural magnetic resonance images from healthy subjects and Alzheimer’s Disease sufferers; and then attempts to diagnose them as accurately as possible. The classifier works by classifying each measurement of a hippocampal volume as healthy controlsized or Alzheimer’s Disease-sized, these new features are then weighted and used to classify the subject as a healthy control or suffering from Alzheimer’s Disease. The preliminary results obtained reach an accuracy of 85.8% and this is a similar accuracy to state-of-the-art methods such as a Naive Bayes classifier and a Support Vector Machine. An advantage of the method proposed in this paper over the aforementioned state of the art classifiers is the descriptive ability of the classifications it produces. The descriptive model can be of great help to aid a doctor in the diagnosis of Alzheimer’s Disease, or even further the understand of how Alzheimer’s Disease affects the hippocampus.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este trabalho descreve a especificação e implementação do protótipo Assistente de Feedback que ajuda os usuários a ajustarem os parâmetros do serviço de filtragem de mensagens vindas do correio eletrônico de sistemas como o Direto. O Assistente de Feedback é instalado no computador do usuário do Direto para monitorar suas preferências representadas pelas ações aplicadas nas mensagens do correio eletrônico. O trabalho apresenta, ainda, uma revisão bibliográfica sobre os conceitos gerais de probabilidades, redes Bayesianas e classificadores. Procura-se descrever as características gerais dos classificadores, em especial o Naive Bayes, sua lógica e seu desempenho comparado a outros classificadores. São abordados, também, conceitos relacionados ao modelo de perfil de usuário e o ambiente Direto. O Naive Bayes torna-se atraente para ser utilizado no Assistente de Feedback por apresentar bom desempenho sobre os demais classificadores e por ser eficiente na predição, quando os atributos são independentes entre si. O Assistente de Feedback utiliza um classificador Naive Bayes para predizer as preferências por intermédio das ações do usuário. Utiliza, também, pesos que representarão a satisfação do usuário para os termos extraídos do corpo da mensagem. Esses pesos são associados às ações do usuário para estimar os termos mais interessantes e menos interessantes, pelo valor de suas médias finais. Quando o usuário desejar alterar os filtros de mensagens do Direto, ele solicita ao Assistente de Feedback sugestões para possíveis exclusões dos termos menos interessantes e as possíveis inclusões dos termos mais interessantes. O protótipo é testado utilizando dois métodos de avaliação para medir o grau de precisão e o desempenho do Assistente de Feedback. Os resultados obtidos na avaliação de precisão apresentam valores satisfatórios, considerando o uso de cinco classes pelo classificador do Assistente de Feedback. Os resultados dos testes de desempenho permitem observar que, se forem utilizadas máquinas com configurações mais atualizadas, os usuários conseguirão receber sugestões com tempo de respostas mais toleráveis.