999 resultados para Bayesian nonparametric


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering of multivariate data is a commonly used technique in ecology, and many approaches to clustering are available. The results from a clustering algorithm are uncertain, but few clustering approaches explicitly acknowledge this uncertainty. One exception is Bayesian mixture modelling, which treats all results probabilistically, and allows comparison of multiple plausible classifications of the same data set. We used this method, implemented in the AutoClass program, to classify catchments (watersheds) in the Murray Darling Basin (MDB), Australia, based on their physiographic characteristics (e.g. slope, rainfall, lithology). The most likely classification found nine classes of catchments. Members of each class were aggregated geographically within the MDB. Rainfall and slope were the two most important variables that defined classes. The second-most likely classification was very similar to the first, but had one fewer class. Increasing the nominal uncertainty of continuous data resulted in a most likely classification with five classes, which were again aggregated geographically. Membership probabilities suggested that a small number of cases could be members of either of two classes. Such cases were located on the edges of groups of catchments that belonged to one class, with a group belonging to the second-most likely class adjacent. A comparison of the Bayesian approach to a distance-based deterministic method showed that the Bayesian mixture model produced solutions that were more spatially cohesive and intuitively appealing. The probabilistic presentation of results from the Bayesian classification allows richer interpretation, including decisions on how to treat cases that are intermediate between two or more classes, and whether to consider more than one classification. The explicit consideration and presentation of uncertainty makes this approach useful for ecological investigations, where both data and expectations are often highly uncertain.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Email overload is a recent problem that there is increasingly difficulty people have faced to process the large number of emails received daily. Currently this problem becomes more and more serious and it has already affected the normal usage of email as a knowledge management tool. It has been recognized that categorizing emails into meaningful groups can greatly save cognitive load to process emails and thus this is an effective way to manage email overload problem. However, most current approaches still require significant human input when categorizing emails. In this paper we develop an automatic email clustering system, underpinned by a new nonparametric text clustering algorithm. This system does not require any predefined input parameters and can automatically generate meaningful email clusters. Experiments show our new algorithm outperforms existing text clustering algorithms with higher efficiency in terms of computational time and clustering quality measured by different gauges.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A major challenge facing freshwater ecologists and managers is the development of models that link stream ecological condition to catchment scale effects, such as land use. Previous attempts to make such models have followed two general approaches. The bottom-up approach employs mechanistic models, which can quickly become too complex to be useful. The top-down approach employs empirical models derived from large data sets, and has often suffered from large amounts of unexplained variation in stream condition.

We believe that the lack of success of both modelling approaches may be at least partly explained by scientists considering too wide a breadth of catchment type. Thus, we believe that by stratifying large sets of catchments into groups of similar types prior to modelling, both types of models may be improved. This paper describes preliminary work using a Bayesian classification software package, ‘Autoclass’ (Cheeseman and Stutz 1996) to create classes of catchments within the Murray Darling Basin based on physiographic data.

Autoclass uses a model-based classification method that employs finite mixture modelling and trades off model fit versus complexity, leading to a parsimonious solution. The software provides information on the posterior probability that the classification is ‘correct’ and also probabilities for alternative classifications. The importance of each attribute in defining the individual classes is calculated and presented, assisting description of the classes. Each case is ‘assigned’ to a class based on membership probability, but the probability of membership of other classes is also provided. This feature deals very well with cases that do not fit neatly into a larger class. Lastly, Autoclass requires the user to specify the measurement error of continuous variables.

Catchments were derived from the Australian digital elevation model. Physiographic data werederived from national spatial data sets. There was very little information on measurement errors for the spatial data, and so a conservative error of 5% of data range was adopted for all continuous attributes. The incorporation of uncertainty into spatial data sets remains a research challenge.

The results of the classification were very encouraging. The software found nine classes of catchments in the Murray Darling Basin. The classes grouped together geographically, and followed altitude and latitude gradients, despite the fact that these variables were not included in the classification. Descriptions of the classes reveal very different physiographic environments, ranging from dry and flat catchments (i.e. lowlands), through to wet and hilly catchments (i.e. mountainous areas). Rainfall and slope were two important discriminators between classes. These two attributes, in particular, will affect the ways in which the stream interacts with the catchment, and can thus be expected to modify the effects of land use change on ecological condition. Thus, realistic models of the effects of land use change on streams would differ between the different types of catchments, and sound management practices will differ.

A small number of catchments were assigned to their primary class with relatively low probability. These catchments lie on the boundaries of groups of catchments, with the second most likely class being an adjacent group. The locations of these ‘uncertain’ catchments show that the Bayesian classification dealt well with cases that do not fit neatly into larger classes.

Although the results are intuitive, we cannot yet assess whether the classifications described in this paper would assist the modelling of catchment scale effects on stream ecological condition. It is most likely that catchment classification and modelling will be an iterative process, where the needs of the model are used to guide classification, and the results of classifications used to suggest further refinements to models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The thesis examined the inter-rater reliability and procedural validity of four computerised Bayesian belief networks (BBNs) which were developed to assist with the diagnosis of psychotic disorders. The results of this research indicated that BBNs can significantly improve diagnostic reliability and may represent an important advance over current diagnostic methods. The professional portfolio investigated, through the presentation of case studies and review of literature relevant to each case study, how comorbidity and context of depression may impact on cognitive behavioural therapy treatment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In a nonparametric setting, the functional form of the relationship between the response variable and the associated predictor variables is assumed to be unknown when data is fitted to the model. Non-parametric regression models can be used for the same types of applications such as estimation, prediction, calibration, and optimization that traditional regression models are used for. The main aim of nonparametric regression is to highlight an important structure in the data without any assumptions about the shape of an underlying regression function. Hence the nonparametric approach allows the data to speak for itself. Applications of sequential procedures to a nonparametric regression model at a given point are considered.

The primary goal of sequential analysis is to achieve a given accuracy by using the smallest possible sample sizes. These sequential procedures allow an experimenter to make decisions based on the smallest number of observations without compromising accuracy. In the nonparametric regression model with a random design based on independent and identically distributed pairs of observations (X ,Y ), where the regression function m(x) is given bym(x) = E(Y X = x), estimation of the Nadaraya-Watson kernel estimator (m (x)) NW and local linear kernel estimator (m (x)) LL for the curve m(x) is considered. In order to obtain asymptotic confidence intervals form(x), two stage sequential procedure is used under which some asymptotic properties of Nadaraya-Watson and local linear estimators have been obtained.

The proposed methodology is first tested with the help of simulated data from linear and nonlinear functions. Encouraged by the preliminary findings from simulation results, the proposed method is applied to estimate the nonparametric regression curve of CAPM.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In nonparametric statistics the functional form of the relationship between the response variable and its associated predictor variables is unspecified but it is assumed to be a smooth function. We develop a procedure for constructing a fixed width confidence interval for the predicted value at a specified point of the independent variable. The optimal sample size for constructing this interval is obtained using a two stage sequential procedure which relies on some asymptotic properties of the Nadaraya--Watson and local linear estimators. Finally, a large scale simulation study demonstrates the applicability of the developed procedure for small and moderate sample sizes. The procedure developed here should find wide applicability since many practical problems which arise in industry involve estimating an unknown function.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In multi-agent systems, most of the time, an agent does not have complete information about the preferences and decision making processes of other agents. This prevents even the cooperative agents from making coordinated choices, purely due to their ignorance of what others want. To overcome this problem, traditional coordination methods rely heavily on inter-agent communication, and thus become very inefficient when communication is costly or simply not desirable (e.g. to preserve privacy). In this paper, we propose the use of learning to complement communication in acquiring knowledge about other agents. We augment the communication-intensive negotiating agent architecture with a learning module, implemented as a Bayesian classifier. This allows our agents to incrementally update models of other agents' preferences from past negotiations with them. Based on these models, the agents can make sound predictions about others' preferences, thus reducing the need for communication in their future interactions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study compares the effectiveness of Bayesian networks versus Decision Trees in modeling the Integral Theory of Female Urinary Incontinence diagnostic algorithm. Bayesian networks and Decision Trees were developed and trained using data from 58 adult women presenting with urinary incontinence symptoms. A Bayesian Network was developed in collaboration with an expert specialist who regularly utilizes a non-automated diagnostic algorithm in clinical practice. The original Bayesian network was later refined using a more connected approach. Diagnoses determined from all automated approaches were compared with the diagnoses of a single human expert. In most cases, Bayesian networks were found to be at least as accurate as the Decision Tree approach. The refined Connected Bayesian Network was found to be more accurate than the Original Bayesian Network accurately discriminated between diagnoses despite the small sample size. In contrast, the Connected and Decision Tree approaches were less able to discriminate between diagnoses. The Original Bayesian Network was found to provide an excellent basis for graphically communicating the correlation between symptoms and laxity defects in a given anatomical zone. Performance measures in both networks indicate that Bayesian networks could provide a potentially useful tool in the management of female pelvic floor dysfunction. Before the technique can be utilized in practice, well-established learning algorithms should be applied to improve network structure. A larger training data set should also improve network accuracy, sensitivity, and specificity.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a novel Bayesian formulation to exploit shared structures across multiple data sources, constructing foundations for effective mining and retrieval across disparate domains. We jointly analyze diverse data sources using a unifying piece of metadata (textual tags). We propose a method based on Bayesian Probabilistic Matrix Factorization (BPMF) which is able to explicitly model the partial knowledge common to the datasets using shared subspaces and the knowledge specific to each dataset using individual subspaces. For the proposed model, we derive an efficient algorithm for learning the joint factorization based on Gibbs sampling. The effectiveness of the model is demonstrated by social media retrieval tasks across single and multiple media. The proposed solution is applicable to a wider context, providing a formal framework suitable for exploiting individual as well as mutual knowledge present across heterogeneous data sources of many kinds.