802 resultados para Data-driven Methods
Resumo:
Background: IgE is the pivotal-specific effector molecule of allergic reactions yet it remains unclear whether the elevated production of IgE in atopic individuals is due to superantigen activation of B cell populations, increased antibody class switching to IgE or oligoclonal allergen-driven IgE responses. Objectives: To increase our understanding of the mechanisms driving IgE responses in allergic disease we examined immunoglobulin variable regions of IgE heavy chain transcripts from three patients with seasonal rhinitis due to grass pollen allergy. Methods: Variable domain of heavy chain-epsilon constant domain 1 cDNAs were amplified from peripheral blood using a two-step semi-nested PCR, cloned and sequenced. Results: The VH gene family usage in subject A was broadly based, but there were two clusters of sequences using genes VH 3-9 and 3-11 with unusually low levels of somatic mutations, 0-3%. Subject B repeatedly used VH 1-69 and subject C repeatedly used VH 1-02, 1-46 and 5a genes. Most clones were highly mutated being only 86-95% homologous to their germline VH gene counterparts and somatic mutations were more abundant at the complementarity determining rather than framework regions. Multiple sequence alignment revealed both repeated use of particular VH genes as well as clonal relatedness among clusters of IgE transcripts. Conclusion: In contrast to previous studies we observed no preferred VH gene common to IgE transcripts of the three subjects allergic to grass pollen. Moreover, most of the VH gene characteristics of the IgE transcripts were consistent with oligoclonal antigen-driven IgE responses.
Resumo:
Background Spatial analysis is increasingly important for identifying modifiable geographic risk factors for disease. However, spatial health data from surveys are often incomplete, ranging from missing data for only a few variables, to missing data for many variables. For spatial analyses of health outcomes, selection of an appropriate imputation method is critical in order to produce the most accurate inferences. Methods We present a cross-validation approach to select between three imputation methods for health survey data with correlated lifestyle covariates, using as a case study, type II diabetes mellitus (DM II) risk across 71 Queensland Local Government Areas (LGAs). We compare the accuracy of mean imputation to imputation using multivariate normal and conditional autoregressive prior distributions. Results Choice of imputation method depends upon the application and is not necessarily the most complex method. Mean imputation was selected as the most accurate method in this application. Conclusions Selecting an appropriate imputation method for health survey data, after accounting for spatial correlation and correlation between covariates, allows more complete analysis of geographic risk factors for disease with more confidence in the results to inform public policy decision-making.
Resumo:
Cancer is the leading contributor to the disease burden in Australia. This thesis develops and applies Bayesian hierarchical models to facilitate an investigation of the spatial and temporal associations for cancer diagnosis and survival among Queenslanders. The key objectives are to document and quantify the importance of spatial inequalities, explore factors influencing these inequalities, and investigate how spatial inequalities change over time. Existing Bayesian hierarchical models are refined, new models and methods developed, and tangible benefits obtained for cancer patients in Queensland. The versatility of using Bayesian models in cancer control are clearly demonstrated through these detailed and comprehensive analyses.
Resumo:
In analysis of longitudinal data, the variance matrix of the parameter estimates is usually estimated by the 'sandwich' method, in which the variance for each subject is estimated by its residual products. We propose smooth bootstrap methods by perturbing the estimating functions to obtain 'bootstrapped' realizations of the parameter estimates for statistical inference. Our extensive simulation studies indicate that the variance estimators by our proposed methods can not only correct the bias of the sandwich estimator but also improve the confidence interval coverage. We applied the proposed method to a data set from a clinical trial of antibiotics for leprosy.
Resumo:
One difficulty in summarising biological survivorship data is that the hazard rates are often neither constant nor increasing with time or decreasing with time in the entire life span. The promising Weibull model does not work here. The paper demonstrates how bath tub shaped quadratic models may be used in such a case. Further, sometimes due to a paucity of data actual lifetimes are not as certainable. It is shown how a concept from queuing theory namely first in first out (FIFO) can be profitably used here. Another nonstandard situation considered is one in which lifespan of the individual entity is too long compared to duration of the experiment. This situation is dealt with, by using ancilliary information. In each case the methodology is illustrated with numerical examples.
Resumo:
The development of innovative methods of stock assessment is a priority for State and Commonwealth fisheries agencies. It is driven by the need to facilitate sustainable exploitation of naturally occurring fisheries resources for the current and future economic, social and environmental well being of Australia. This project was initiated in this context and took advantage of considerable recent achievements in genomics that are shaping our comprehension of the DNA of humans and animals. The basic idea behind this project was that genetic estimates of effective population size, which can be made from empirical measurements of genetic drift, were equivalent to estimates of the successful number of spawners that is an important parameter in process of fisheries stock assessment. The broad objectives of this study were to 1. Critically evaluate a variety of mathematical methods of calculating effective spawner numbers (Ne) by a. conducting comprehensive computer simulations, and by b. analysis of empirical data collected from the Moreton Bay population of tiger prawns (P. esculentus). 2. Lay the groundwork for the application of the technology in the northern prawn fishery (NPF). 3. Produce software for the calculation of Ne, and to make it widely available. The project pulled together a range of mathematical models for estimating current effective population size from diverse sources. Some of them had been recently implemented with the latest statistical methods (eg. Bayesian framework Berthier, Beaumont et al. 2002), while others had lower profiles (eg. Pudovkin, Zaykin et al. 1996; Rousset and Raymond 1995). Computer code and later software with a user-friendly interface (NeEstimator) was produced to implement the methods. This was used as a basis for simulation experiments to evaluate the performance of the methods with an individual-based model of a prawn population. Following the guidelines suggested by computer simulations, the tiger prawn population in Moreton Bay (south-east Queensland) was sampled for genetic analysis with eight microsatellite loci in three successive spring spawning seasons in 2001, 2002 and 2003. As predicted by the simulations, the estimates had non-infinite upper confidence limits, which is a major achievement for the application of the method to a naturally-occurring, short generation, highly fecund invertebrate species. The genetic estimate of the number of successful spawners was around 1000 individuals in two consecutive years. This contrasts with about 500,000 prawns participating in spawning. It is not possible to distinguish successful from non-successful spawners so we suggest a high level of protection for the entire spawning population. We interpret the difference in numbers between successful and non-successful spawners as a large variation in the number of offspring per family that survive – a large number of families have no surviving offspring, while a few have a large number. We explored various ways in which Ne can be useful in fisheries management. It can be a surrogate for spawning population size, assuming the ratio between Ne and spawning population size has been previously calculated for that species. Alternatively, it can be a surrogate for recruitment, again assuming that the ratio between Ne and recruitment has been previously determined. The number of species that can be analysed in this way, however, is likely to be small because of species-specific life history requirements that need to be satisfied for accuracy. The most universal approach would be to integrate Ne with spawning stock-recruitment models, so that these models are more accurate when applied to fisheries populations. A pathway to achieve this was established in this project, which we predict will significantly improve fisheries sustainability in the future. Regardless of the success of integrating Ne into spawning stock-recruitment models, Ne could be used as a fisheries monitoring tool. Declines in spawning stock size or increases in natural or harvest mortality would be reflected by a decline in Ne. This would be good for data-poor fisheries and provides fishery independent information, however, we suggest a species-by-species approach. Some species may be too numerous or experiencing too much migration for the method to work. During the project two important theoretical studies of the simultaneous estimation of effective population size and migration were published (Vitalis and Couvet 2001b; Wang and Whitlock 2003). These methods, combined with collection of preliminary genetic data from the tiger prawn population in southern Gulf of Carpentaria population and a computer simulation study that evaluated the effect of differing reproductive strategies on genetic estimates, suggest that this technology could make an important contribution to the stock assessment process in the northern prawn fishery (NPF). Advances in the genomics world are rapid and already a cheaper, more reliable substitute for microsatellite loci in this technology is available. Digital data from single nucleotide polymorphisms (SNPs) are likely to super cede ‘analogue’ microsatellite data, making it cheaper and easier to apply the method to species with large population sizes.
Resumo:
Background: Standard methods for quantifying IncuCyte ZOOM™ assays involve measurements that quantify how rapidly the initially-vacant area becomes re-colonised with cells as a function of time. Unfortunately, these measurements give no insight into the details of the cellular-level mechanisms acting to close the initially-vacant area. We provide an alternative method enabling us to quantify the role of cell motility and cell proliferation separately. To achieve this we calibrate standard data available from IncuCyte ZOOM™ images to the solution of the Fisher-Kolmogorov model. Results: The Fisher-Kolmogorov model is a reaction-diffusion equation that has been used to describe collective cell spreading driven by cell migration, characterised by a cell diffusivity, D, and carrying capacity limited proliferation with proliferation rate, λ, and carrying capacity density, K. By analysing temporal changes in cell density in several subregions located well-behind the initial position of the leading edge we estimate λ and K. Given these estimates, we then apply automatic leading edge detection algorithms to the images produced by the IncuCyte ZOOM™ assay and match this data with a numerical solution of the Fisher-Kolmogorov equation to provide an estimate of D. We demonstrate this method by applying it to interpret a suite of IncuCyte ZOOM™ assays using PC-3 prostate cancer cells and obtain estimates of D, λ and K. Comparing estimates of D, λ and K for a control assay with estimates of D, λ and K for assays where epidermal growth factor (EGF) is applied in varying concentrations confirms that EGF enhances the rate of scratch closure and that this stimulation is driven by an increase in D and λ, whereas K is relatively unaffected by EGF. Conclusions: Our approach for estimating D, λ and K from an IncuCyte ZOOM™ assay provides more detail about cellular-level behaviour than standard methods for analysing these assays. In particular, our approach can be used to quantify the balance of cell migration and cell proliferation and, as we demonstrate, allow us to quantify how the addition of growth factors affects these processes individually.
Resumo:
Socio-economic and demographic changes among family forest owners and demands for versatile forestry decision aid motivated this study, which sought grounds for owner-driven forest planning. Finnish family forest owners’ forest-related decision making was analyzed in two interview-based qualitative studies, the main findings of which were surveyed quantitatively. Thereafter, a scheme for adaptively mixing methods in individually tailored decision support processes was constructed. The first study assessed owners’ decision-making strategies by examining varying levels of the sharing of decision-making power and the desire to learn. Five decision-making modes – trusting, learning, managing, pondering, and decisive – were discerned and discussed against conformable decision-aid approaches. The second study conceptualized smooth communication and assessed emotional, practical, and institutional boosters of and barriers to such smoothness in communicative decision support. The results emphasize the roles of trust, comprehension, and contextual services in owners’ communicative decision making. In the third study, a questionnaire tool to measure owners’ attitudes towards communicative planning was constructed by using trusting, learning, and decisive dimensions. Through a multivariate analysis of survey data, three owner groups were identified as fusions of the original decision-making modes: trusting learners (53%), decisive learners (27%), and decisive managers (20%). Differently weighted communicative services are recommended for these compound wishes. The findings of the studies above were synthesized in a form of adaptive decision analysis (ADA), which allows and encourages the decision-maker (owner) to make deliberate choices concerning the phases of a decision aid (planning) process. The ADA model relies on adaptability and feedback management, which foster smooth communication with the owner and (inter-)organizational learning of the planning institution(s). The summarized results indicate that recognizing the communication-related amenity values of family forest owners may be crucial in developing planning and extension services. It is therefore recommended that owners, root-level planners, consultation professionals, and pragmatic researchers collaboratively continue to seek stable change.
Resumo:
Large-scale chromosome rearrangements such as copy number variants (CNVs) and inversions encompass a considerable proportion of the genetic variation between human individuals. In a number of cases, they have been closely linked with various inheritable diseases. Single-nucleotide polymorphisms (SNPs) are another large part of the genetic variance between individuals. They are also typically abundant and their measuring is straightforward and cheap. This thesis presents computational means of using SNPs to detect the presence of inversions and deletions, a particular variety of CNVs. Technically, the inversion-detection algorithm detects the suppressed recombination rate between inverted and non-inverted haplotype populations whereas the deletion-detection algorithm uses the EM-algorithm to estimate the haplotype frequencies of a window with and without a deletion haplotype. As a contribution to population biology, a coalescent simulator for simulating inversion polymorphisms has been developed. Coalescent simulation is a backward-in-time method of modelling population ancestry. Technically, the simulator also models multiple crossovers by using the Counting model as the chiasma interference model. Finally, this thesis includes an experimental section. The aforementioned methods were tested on synthetic data to evaluate their power and specificity. They were also applied to the HapMap Phase II and Phase III data sets, yielding a number of candidates for previously unknown inversions, deletions and also correctly detecting known such rearrangements.
Resumo:
In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.
Resumo:
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.
Resumo:
Matrix decompositions, where a given matrix is represented as a product of two other matrices, are regularly used in data mining. Most matrix decompositions have their roots in linear algebra, but the needs of data mining are not always those of linear algebra. In data mining one needs to have results that are interpretable -- and what is considered interpretable in data mining can be very different to what is considered interpretable in linear algebra. --- The purpose of this thesis is to study matrix decompositions that directly address the issue of interpretability. An example is a decomposition of binary matrices where the factor matrices are assumed to be binary and the matrix multiplication is Boolean. The restriction to binary factor matrices increases interpretability -- factor matrices are of the same type as the original matrix -- and allows the use of Boolean matrix multiplication, which is often more intuitive than normal matrix multiplication with binary matrices. Also several other decomposition methods are described, and the computational complexity of computing them is studied together with the hardness of approximating the related optimization problems. Based on these studies, algorithms for constructing the decompositions are proposed. Constructing the decompositions turns out to be computationally hard, and the proposed algorithms are mostly based on various heuristics. Nevertheless, the algorithms are shown to be capable of finding good results in empirical experiments conducted with both synthetic and real-world data.
Resumo:
Background Ankylosing spondylitis (AS) is an immune-mediated arthritis particularly targeting the spine and pelvis and is characterised by inflammation, osteoproliferation and frequently ankylosis. Current treatments that predominately target inflammatory pathways have disappointing efficacy in slowing disease progression. Thus, a better understanding of the causal association and pathological progression from inflammation to bone formation, particularly whether inflammation directly initiates osteoproliferation, is required. Methods The proteoglycan-induced spondylitis (PGISp) mouse model of AS was used to histopathologically map the progressive axial disease events, assess molecular changes during disease progression and define disease progression using unbiased clustering of semi-quantitative histology. PGISp mice were followed over a 24-week time course. Spinal disease was assessed using a novel semi-quantitative histological scoring system that independently evaluated the breadth of pathological features associated with PGISp axial disease, including inflammation, joint destruction and excessive tissue formation (osteoproliferation). Matrix components were identified using immunohistochemistry. Results Disease initiated with inflammation at the periphery of the intervertebral disc (IVD) adjacent to the longitudinal ligament, reminiscent of enthesitis, and was associated with upregulated tumor necrosis factor and metalloproteinases. After a lag phase, established inflammation was temporospatially associated with destruction of IVDs, cartilage and bone. At later time points, advanced disease was characterised by substantially reduced inflammation, excessive tissue formation and ectopic chondrocyte expansion. These distinct features differentiated affected mice into early, intermediate and advanced disease stages. Excessive tissue formation was observed in vertebral joints only if the IVD was destroyed as a consequence of the early inflammation. Ectopic excessive tissue was predominantly chondroidal with chondrocyte-like cells embedded within collagen type II- and X-rich matrix. This corresponded with upregulation of mRNA for cartilage markers Col2a1, sox9 and Comp. Osteophytes, though infrequent, were more prevalent in later disease. Conclusions The inflammation-driven IVD destruction was shown to be a prerequisite for axial disease progression to osteoproliferation in the PGISp mouse. Osteoproliferation led to vertebral body deformity and fusion but was never seen concurrent with persistent inflammation, suggesting a sequential process. The findings support that early intervention with anti-inflammatory therapies will be needed to limit destructive processes and consequently prevent progression of AS.