980 resultados para Approximate Bayesian Computation


Relevância:

20.00% 20.00%

Publicador:

Resumo:

For most of the work done in developing association rule mining, the primary focus has been on the efficiency of the approach and to a lesser extent the quality of the derived rules has been emphasized. Often for a dataset, a huge number of rules can be derived, but many of them can be redundant to other rules and thus are useless in practice. The extremely large number of rules makes it difficult for the end users to comprehend and therefore effectively use the discovered rules and thus significantly reduces the effectiveness of rule mining algorithms. If the extracted knowledge can’t be effectively used in solving real world problems, the effort of extracting the knowledge is worth little. This is a serious problem but not yet solved satisfactorily. In this paper, we propose a concise representation called Reliable Approximate basis for representing non-redundant approximate association rules. We prove that the redundancy elimination based on the proposed basis does not reduce the belief to the extracted rules. We also prove that all approximate association rules can be deduced from the Reliable Approximate basis. Therefore the basis is a lossless representation of approximate association rules.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective We aimed to predict sub-national spatial variation in numbers of people infected with Schistosoma haematobium, and associated uncertainties, in Burkina Faso, Mali and Niger, prior to implementation of national control programmes. Methods We used national field survey datasets covering a contiguous area 2,750 × 850 km, from 26,790 school-aged children (5–14 years) in 418 schools. Bayesian geostatistical models were used to predict prevalence of high and low intensity infections and associated 95% credible intervals (CrI). Numbers infected were determined by multiplying predicted prevalence by numbers of school-aged children in 1 km2 pixels covering the study area. Findings Numbers of school-aged children with low-intensity infections were: 433,268 in Burkina Faso, 872,328 in Mali and 580,286 in Niger. Numbers with high-intensity infections were: 416,009 in Burkina Faso, 511,845 in Mali and 254,150 in Niger. 95% CrIs (indicative of uncertainty) were wide; e.g. the mean number of boys aged 10–14 years infected in Mali was 140,200 (95% CrI 6200, 512,100). Conclusion National aggregate estimates for numbers infected mask important local variation, e.g. most S. haematobium infections in Niger occur in the Niger River valley. Prevalence of high-intensity infections was strongly clustered in foci in western and central Mali, north-eastern and northwestern Burkina Faso and the Niger River valley in Niger. Populations in these foci are likely to carry the bulk of the urinary schistosomiasis burden and should receive priority for schistosomiasis control. Uncertainties in predicted prevalence and numbers infected should be acknowledged and taken into consideration by control programme planners.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Definition of disease phenotype is a necessary preliminary to research into genetic causes of a complex disease. Clinical diagnosis of migraine is currently based on diagnostic criteria developed by the International Headache Society. Previously, we examined the natural clustering of these diagnostic symptoms using latent class analysis (LCA) and found that a four-class model was preferred. However, the classes can be ordered such that all symptoms progressively intensify, suggesting that a single continuous variable representing disease severity may provide a better model. Here, we compare two models: item response theory and LCA, each constructed within a Bayesian context. A deviance information criterion is used to assess model fit. We phenotyped our population sample using these models, estimated heritability and conducted genome-wide linkage analysis using Merlin-qtl. LCA with four classes was again preferred. After transformation, phenotypic trait values derived from both models are highly correlated (correlation = 0.99) and consequently results from subsequent genetic analyses were similar. Heritability was estimated at 0.37, while multipoint linkage analysis produced genome-wide significant linkage to chromosome 7q31-q33 and suggestive linkage to chromosomes 1 and 2. We argue that such continuous measures are a powerful tool for identifying genes contributing to migraine susceptibility.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Perez-Losada et al. [1] analyzed 72 complete genomes corresponding to nine mammalian (67 strains) and 2 avian (5 strains) polyomavirus species using maximum likelihood and Bayesian methods of phylogenetic inference. Because some data of 2 genomes in their work are now not available in GenBank, in this work, we analyze the phylogenetic relationship of the remaining 70 complete genomes corresponding to nine mammalian (65 strains) and two avian (5 strains) polyomavirus species using a dynamical language model approach developed by our group (Yu et al., [26]). This distance method does not require sequence alignment for deriving species phylogeny based on overall similarities of the complete genomes. Our best tree separates the bird polyomaviruses (avian polyomaviruses and goose hemorrhagic polymaviruses) from the mammalian polyomaviruses, which supports the idea of splitting the genus into two subgenera. Such a split is consistent with the different viral life strategies of each group. In the mammalian polyomavirus subgenera, mouse polyomaviruses (MPV), simian viruses 40 (SV40), BK viruses (BKV) and JC viruses (JCV) are grouped as different branches as expected. The topology of our best tree is quite similar to that of the tree constructed by Perez-Losada et al.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Industrial applications of the simulated-moving-bed (SMB) chromatographic technology have brought an emergent demand to improve the SMB process operation for higher efficiency and better robustness. Improved process modelling and more-efficient model computation will pave a path to meet this demand. However, the SMB unit operation exhibits complex dynamics, leading to challenges in SMB process modelling and model computation. One of the significant problems is how to quickly obtain the steady state of an SMB process model, as process metrics at the steady state are critical for process design and real-time control. The conventional computation method, which solves the process model cycle by cycle and takes the solution only when a cyclic steady state is reached after a certain number of switching, is computationally expensive. Adopting the concept of quasi-envelope (QE), this work treats the SMB operation as a pseudo-oscillatory process because of its large number of continuous switching. Then, an innovative QE computation scheme is developed to quickly obtain the steady state solution of an SMB model for any arbitrary initial condition. The QE computation scheme allows larger steps to be taken for predicting the slow change of the starting state within each switching. Incorporating with the wavelet-based technique, this scheme is demonstrated to be effective and efficient for an SMB sugar separation process. Moreover, investigations are also carried out on when the computation scheme should be activated and how the convergence of the scheme is affected by a variable stepsize.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Object tracking systems require accurate segmentation of the objects from the background for effective tracking. Motion segmentation or optical flow can be used to segment incoming images. Whilst optical flow allows multiple moving targets to be separated based on their individual velocities, optical flow techniques are prone to errors caused by changing lighting and occlusions, both common in a surveillance environment. Motion segmentation techniques are more robust to fluctuating lighting and occlusions, but don't provide information on the direction of the motion. In this paper we propose a combined motion segmentation/optical flow algorithm for use in object tracking. The proposed algorithm uses the motion segmentation results to inform the optical flow calculations and ensure that optical flow is only calculated in regions of motion, and improve the performance of the optical flow around the edge of moving objects. Optical flow is calculated at pixel resolution and tracking of flow vectors is employed to improve performance and detect discontinuities, which can indicate the location of overlaps between objects. The algorithm is evaluated by attempting to extract a moving target within the flow images, given expected horizontal and vertical movement (i.e. the algorithms intended use for object tracking). Results show that the proposed algorithm outperforms other widely used optical flow techniques for this surveillance application.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The main objective of this PhD was to further develop Bayesian spatio-temporal models (specifically the Conditional Autoregressive (CAR) class of models), for the analysis of sparse disease outcomes such as birth defects. The motivation for the thesis arose from problems encountered when analyzing a large birth defect registry in New South Wales. The specific components and related research objectives of the thesis were developed from gaps in the literature on current formulations of the CAR model, and health service planning requirements. Data from a large probabilistically-linked database from 1990 to 2004, consisting of fields from two separate registries: the Birth Defect Registry (BDR) and Midwives Data Collection (MDC) were used in the analyses in this thesis. The main objective was split into smaller goals. The first goal was to determine how the specification of the neighbourhood weight matrix will affect the smoothing properties of the CAR model, and this is the focus of chapter 6. Secondly, I hoped to evaluate the usefulness of incorporating a zero-inflated Poisson (ZIP) component as well as a shared-component model in terms of modeling a sparse outcome, and this is carried out in chapter 7. The third goal was to identify optimal sampling and sample size schemes designed to select individual level data for a hybrid ecological spatial model, and this is done in chapter 8. Finally, I wanted to put together the earlier improvements to the CAR model, and along with demographic projections, provide forecasts for birth defects at the SLA level. Chapter 9 describes how this is done. For the first objective, I examined a series of neighbourhood weight matrices, and showed how smoothing the relative risk estimates according to similarity by an important covariate (i.e. maternal age) helped improve the model’s ability to recover the underlying risk, as compared to the traditional adjacency (specifically the Queen) method of applying weights. Next, to address the sparseness and excess zeros commonly encountered in the analysis of rare outcomes such as birth defects, I compared a few models, including an extension of the usual Poisson model to encompass excess zeros in the data. This was achieved via a mixture model, which also encompassed the shared component model to improve on the estimation of sparse counts through borrowing strength across a shared component (e.g. latent risk factor/s) with the referent outcome (caesarean section was used in this example). Using the Deviance Information Criteria (DIC), I showed how the proposed model performed better than the usual models, but only when both outcomes shared a strong spatial correlation. The next objective involved identifying the optimal sampling and sample size strategy for incorporating individual-level data with areal covariates in a hybrid study design. I performed extensive simulation studies, evaluating thirteen different sampling schemes along with variations in sample size. This was done in the context of an ecological regression model that incorporated spatial correlation in the outcomes, as well as accommodating both individual and areal measures of covariates. Using the Average Mean Squared Error (AMSE), I showed how a simple random sample of 20% of the SLAs, followed by selecting all cases in the SLAs chosen, along with an equal number of controls, provided the lowest AMSE. The final objective involved combining the improved spatio-temporal CAR model with population (i.e. women) forecasts, to provide 30-year annual estimates of birth defects at the Statistical Local Area (SLA) level in New South Wales, Australia. The projections were illustrated using sixteen different SLAs, representing the various areal measures of socio-economic status and remoteness. A sensitivity analysis of the assumptions used in the projection was also undertaken. By the end of the thesis, I will show how challenges in the spatial analysis of rare diseases such as birth defects can be addressed, by specifically formulating the neighbourhood weight matrix to smooth according to a key covariate (i.e. maternal age), incorporating a ZIP component to model excess zeros in outcomes and borrowing strength from a referent outcome (i.e. caesarean counts). An efficient strategy to sample individual-level data and sample size considerations for rare disease will also be presented. Finally, projections in birth defect categories at the SLA level will be made.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ecological problems are typically multi faceted and need to be addressed from a scientific and a management perspective. There is a wealth of modelling and simulation software available, each designed to address a particular aspect of the issue of concern. Choosing the appropriate tool, making sense of the disparate outputs, and taking decisions when little or no empirical data is available, are everyday challenges facing the ecologist and environmental manager. Bayesian Networks provide a statistical modelling framework that enables analysis and integration of information in its own right as well as integration of a variety of models addressing different aspects of a common overall problem. There has been increased interest in the use of BNs to model environmental systems and issues of concern. However, the development of more sophisticated BNs, utilising dynamic and object oriented (OO) features, is still at the frontier of ecological research. Such features are particularly appealing in an ecological context, since the underlying facts are often spatial and temporal in nature. This thesis focuses on an integrated BN approach which facilitates OO modelling. Our research devises a new heuristic method, the Iterative Bayesian Network Development Cycle (IBNDC), for the development of BN models within a multi-field and multi-expert context. Expert elicitation is a popular method used to quantify BNs when data is sparse, but expert knowledge is abundant. The resulting BNs need to be substantiated and validated taking this uncertainty into account. Our research demonstrates the application of the IBNDC approach to support these aspects of BN modelling. The complex nature of environmental issues makes them ideal case studies for the proposed integrated approach to modelling. Moreover, they lend themselves to a series of integrated sub-networks describing different scientific components, combining scientific and management perspectives, or pooling similar contributions developed in different locations by different research groups. In southern Africa the two largest free-ranging cheetah (Acinonyx jubatus) populations are in Namibia and Botswana, where the majority of cheetahs are located outside protected areas. Consequently, cheetah conservation in these two countries is focussed primarily on the free-ranging populations as well as the mitigation of conflict between humans and cheetahs. In contrast, in neighbouring South Africa, the majority of cheetahs are found in fenced reserves. Nonetheless, conflict between humans and cheetahs remains an issue here. Conservation effort in South Africa is also focussed on managing the geographically isolated cheetah populations as one large meta-population. Relocation is one option among a suite of tools used to resolve human-cheetah conflict in southern Africa. Successfully relocating captured problem cheetahs, and maintaining a viable free-ranging cheetah population, are two environmental issues in cheetah conservation forming the first case study in this thesis. The second case study involves the initiation of blooms of Lyngbya majuscula, a blue-green algae, in Deception Bay, Australia. L. majuscula is a toxic algal bloom which has severe health, ecological and economic impacts on the community located in the vicinity of this algal bloom. Deception Bay is an important tourist destination with its proximity to Brisbane, Australia’s third largest city. Lyngbya is one of several algae considered to be a Harmful Algal Bloom (HAB). This group of algae includes other widespread blooms such as red tides. The occurrence of Lyngbya blooms is not a local phenomenon, but blooms of this toxic weed occur in coastal waters worldwide. With the increase in frequency and extent of these HAB blooms, it is important to gain a better understanding of the underlying factors contributing to the initiation and sustenance of these blooms. This knowledge will contribute to better management practices and the identification of those management actions which could prevent or diminish the severity of these blooms.