868 resultados para data-types
Resumo:
This dissertation is primarily an applied statistical modelling investigation, motivated by a case study comprising real data and real questions. Theoretical questions on modelling and computation of normalization constants arose from pursuit of these data analytic questions. The essence of the thesis can be described as follows. Consider binary data observed on a two-dimensional lattice. A common problem with such data is the ambiguity of zeroes recorded. These may represent zero response given some threshold (presence) or that the threshold has not been triggered (absence). Suppose that the researcher wishes to estimate the effects of covariates on the binary responses, whilst taking into account underlying spatial variation, which is itself of some interest. This situation arises in many contexts and the dingo, cypress and toad case studies described in the motivation chapter are examples of this. Two main approaches to modelling and inference are investigated in this thesis. The first is frequentist and based on generalized linear models, with spatial variation modelled by using a block structure or by smoothing the residuals spatially. The EM algorithm can be used to obtain point estimates, coupled with bootstrapping or asymptotic MLE estimates for standard errors. The second approach is Bayesian and based on a three- or four-tier hierarchical model, comprising a logistic regression with covariates for the data layer, a binary Markov Random field (MRF) for the underlying spatial process, and suitable priors for parameters in these main models. The three-parameter autologistic model is a particular MRF of interest. Markov chain Monte Carlo (MCMC) methods comprising hybrid Metropolis/Gibbs samplers is suitable for computation in this situation. Model performance can be gauged by MCMC diagnostics. Model choice can be assessed by incorporating another tier in the modelling hierarchy. This requires evaluation of a normalization constant, a notoriously difficult problem. Difficulty with estimating the normalization constant for the MRF can be overcome by using a path integral approach, although this is a highly computationally intensive method. Different methods of estimating ratios of normalization constants (N Cs) are investigated, including importance sampling Monte Carlo (ISMC), dependent Monte Carlo based on MCMC simulations (MCMC), and reverse logistic regression (RLR). I develop an idea present though not fully developed in the literature, and propose the Integrated mean canonical statistic (IMCS) method for estimating log NC ratios for binary MRFs. The IMCS method falls within the framework of the newly identified path sampling methods of Gelman & Meng (1998) and outperforms ISMC, MCMC and RLR. It also does not rely on simplifying assumptions, such as ignoring spatio-temporal dependence in the process. A thorough investigation is made of the application of IMCS to the three-parameter Autologistic model. This work introduces background computations required for the full implementation of the four-tier model in Chapter 7. Two different extensions of the three-tier model to a four-tier version are investigated. The first extension incorporates temporal dependence in the underlying spatio-temporal process. The second extensions allows the successes and failures in the data layer to depend on time. The MCMC computational method is extended to incorporate the extra layer. A major contribution of the thesis is the development of a fully Bayesian approach to inference for these hierarchical models for the first time. Note: The author of this thesis has agreed to make it open access but invites people downloading the thesis to send her an email via the 'Contact Author' function.
Resumo:
Many large coal mining operations in Australia rely heavily on the rail network to transport coal from mines to coal terminals at ports for shipment. Over the last few years, due to the fast growing demand, the coal rail network is becoming one of the worst industrial bottlenecks in Australia. As a result, this provides great incentives for pursuing better optimisation and control strategies for the operation of the whole rail transportation system under network and terminal capacity constraints. This PhD research aims to achieve a significant efficiency improvement in a coal rail network on the basis of the development of standard modelling approaches and generic solution techniques. Generally, the train scheduling problem can be modelled as a Blocking Parallel- Machine Job-Shop Scheduling (BPMJSS) problem. In a BPMJSS model for train scheduling, trains and sections respectively are synonymous with jobs and machines and an operation is regarded as the movement/traversal of a train across a section. To begin, an improved shifting bottleneck procedure algorithm combined with metaheuristics has been developed to efficiently solve the Parallel-Machine Job- Shop Scheduling (PMJSS) problems without the blocking conditions. Due to the lack of buffer space, the real-life train scheduling should consider blocking or hold-while-wait constraints, which means that a track section cannot release and must hold a train until the next section on the routing becomes available. As a consequence, the problem has been considered as BPMJSS with the blocking conditions. To develop efficient solution techniques for BPMJSS, extensive studies on the nonclassical scheduling problems regarding the various buffer conditions (i.e. blocking, no-wait, limited-buffer, unlimited-buffer and combined-buffer) have been done. In this procedure, an alternative graph as an extension of the classical disjunctive graph is developed and specially designed for the non-classical scheduling problems such as the blocking flow-shop scheduling (BFSS), no-wait flow-shop scheduling (NWFSS), and blocking job-shop scheduling (BJSS) problems. By exploring the blocking characteristics based on the alternative graph, a new algorithm called the topological-sequence algorithm is developed for solving the non-classical scheduling problems. To indicate the preeminence of the proposed algorithm, we compare it with two known algorithms (i.e. Recursive Procedure and Directed Graph) in the literature. Moreover, we define a new type of non-classical scheduling problem, called combined-buffer flow-shop scheduling (CBFSS), which covers four extreme cases: the classical FSS (FSS) with infinite buffer, the blocking FSS (BFSS) with no buffer, the no-wait FSS (NWFSS) and the limited-buffer FSS (LBFSS). After exploring the structural properties of CBFSS, we propose an innovative constructive algorithm named the LK algorithm to construct the feasible CBFSS schedule. Detailed numerical illustrations for the various cases are presented and analysed. By adjusting only the attributes in the data input, the proposed LK algorithm is generic and enables the construction of the feasible schedules for many types of non-classical scheduling problems with different buffer constraints. Inspired by the shifting bottleneck procedure algorithm for PMJSS and characteristic analysis based on the alternative graph for non-classical scheduling problems, a new constructive algorithm called the Feasibility Satisfaction Procedure (FSP) is proposed to obtain the feasible BPMJSS solution. A real-world train scheduling case is used for illustrating and comparing the PMJSS and BPMJSS models. Some real-life applications including considering the train length, upgrading the track sections, accelerating a tardy train and changing the bottleneck sections are discussed. Furthermore, the BPMJSS model is generalised to be a No-Wait Blocking Parallel- Machine Job-Shop Scheduling (NWBPMJSS) problem for scheduling the trains with priorities, in which prioritised trains such as express passenger trains are considered simultaneously with non-prioritised trains such as freight trains. In this case, no-wait conditions, which are more restrictive constraints than blocking constraints, arise when considering the prioritised trains that should traverse continuously without any interruption or any unplanned pauses because of the high cost of waiting during travel. In comparison, non-prioritised trains are allowed to enter the next section immediately if possible or to remain in a section until the next section on the routing becomes available. Based on the FSP algorithm, a more generic algorithm called the SE algorithm is developed to solve a class of train scheduling problems in terms of different conditions in train scheduling environments. To construct the feasible train schedule, the proposed SE algorithm consists of many individual modules including the feasibility-satisfaction procedure, time-determination procedure, tune-up procedure and conflict-resolve procedure algorithms. To find a good train schedule, a two-stage hybrid heuristic algorithm called the SE-BIH algorithm is developed by combining the constructive heuristic (i.e. the SE algorithm) and the local-search heuristic (i.e. the Best-Insertion- Heuristic algorithm). To optimise the train schedule, a three-stage algorithm called the SE-BIH-TS algorithm is developed by combining the tabu search (TS) metaheuristic with the SE-BIH algorithm. Finally, a case study is performed for a complex real-world coal rail network under network and terminal capacity constraints. The computational results validate that the proposed methodology would be very promising because it can be applied as a fundamental tool for modelling and solving many real-world scheduling problems.
Resumo:
Prostate cancer is an important male health issue. The strategies used to diagnose and treat prostate cancer underscore the cell and molecular interactions that promote disease progression. Prostate cancer is histologically defined by increasingly undifferentiated tumour cells and therapeutically targeted by androgen ablation. Even as the normal glandular architecture of the adult prostate is lost, prostate cancer cells remain dependent on the androgen receptor (AR) for growth and survival. This project focused on androgen-regulated gene expression, altered cellular differentiation, and the nexus between these two concepts. The AR controls prostate development, homeostasis and cancer progression by regulating the expression of downstream genes. Kallikrein-related serine peptidases are prominent transcriptional targets of AR in the adult prostate. Kallikrein 3 (KLK3), which is commonly referred to as prostate-specific antigen, is the current serum biomarker for prostate cancer. Other kallikreins are potential adjunct biomarkers. As secreted proteases, kallikreins act through enzyme cascades that may modulate the prostate cancer microenvironment. Both as a panel of biomarkers and cascade of proteases, the roles of kallikreins are interconnected. Yet the expression and regulation of different kallikreins in prostate cancer has not been compared. In this study, a spectrum of prostate cell lines was used to evaluate the expression profile of all 15 members of the kallikrein family. A cluster of genes was co-ordinately expressed in androgenresponsive cell lines. This group of kallikreins included KLK2, 3, 4 and 15, which are located adjacent to one another at the centromeric end of the kallikrein locus. KLK14 was also of interest, because it was ubiquitously expressed among the prostate cell lines. Immunohistochemistry showed that these 5 kallikreins are co-expressed in benign and malignant prostate tissue. The androgen-regulated expression of KLK2 and KLK3 is well-characterised, but has not been compared with other kallikreins. Therefore, KLK2, 3, 4, 14 and 15 expression were all measured in time course and dose response experiments with androgens, AR-antagonist treatments, hormone deprivation experiments and cells transfected with AR siRNA. Collectively, these experiments demonstrated that prostatic kallikreins are specifically and directly regulated by the AR. The data also revealed that kallikrein genes are differentially regulated by androgens; KLK2 and KLK3 were strongly up-regulated, KLK4 and KLK15 were modestly up-regulated, and KLK14 was repressed. Notably, KLK14 is located at the telomeric end of the kallikrein locus, far away from the centromeric cluster of kallikreins that are stimulated by androgens. These results show that the expression of KLK2, 3, 4, 14 and 15 is maintained in prostate cancer, but that these genes exhibit different responses to androgens. This makes the kallikrein locus an ideal model to investigate AR signalling. The increasingly dedifferentiated phenotype of aggressive prostate cancer cells is accompanied by the re-expression of signalling molecules that are usually expressed during embryogenesis and foetal tissue development. The Wnt pathway is one developmental cascade that is reactivated in prostate cancer. The canonical Wnt cascade regulates the intracellular levels of β-catenin, a potent transcriptional co-activator of T-cell factor (TCF) transcription factors. Notably, β-catenin can also bind to the AR and synergistically stimulate androgen-mediated gene expression. This is at the expense of typical Wnt/TCF target genes, because the AR:β-catenin and TCF:β-catenin interactions are mutually exclusive. The effect of β-catenin on kallikrein expression was examined to further investigate the role of β-catenin in prostate cancer. Stable knockdown of β-catenin in LNCaP prostate cancer cells attenuated the androgen-regulated expression of KLK2, 3, 4 and 15, but not KLK14. To test whether KLK14 is instead a TCF:β-catenin target gene, the endogenous levels of β-catenin were increased by inhibiting its degradation. Although KLK14 expression was up-regulated by these treatments, siRNA knockdown of β-catenin demonstrated that this effect was independent of β-catenin. These results show that β-catenin is required for maximal expression of KLK2, 3, 4 and 15, but not KLK14. Developmental cells and tumour cells express a similar repertoire of signalling molecules, which means that these different cell types are responsive to one another. Previous reports have shown that stem cells and foetal tissues can reprogram aggressive cancer cells to less aggressive phenotypes by restoring the balance to developmental signalling pathways that are highly dysregulated in cancer. To investigate this phenomenon in prostate cancer, DU145 and PC-3 prostate cancer cells were cultured on matrices pre-conditioned with human embryonic stem cells (hESCs). Soft agar assays showed that prostate cancer cells exposed to hESC conditioned matrices had reduced clonogenicity compared with cells harvested from control matrices. A recent study demonstrated that this effect was partially due to hESC-derived Lefty, an antagonist of Nodal. A member of the transforming growth factor β (TGFβ) superfamily, Nodal regulates embryogenesis and is re-expressed in cancer. The role of Nodal in prostate cancer has not previously been reported. Therefore, the expression and function of the Nodal signalling pathway in prostate cancer was investigated. Western blots confirmed that Nodal is expressed in DU145 and PC-3 cells. Immunohistochemistry revealed greater expression of Nodal in malignant versus benign glands. Notably, the Nodal inhibitor, Lefty, was not expressed at the mRNA level in any prostate cell lines tested. The Nodal signalling pathway is functionally active in prostate cancer cells. Recombinant Nodal treatments triggered downstream phosphorylation of Smad2 in DU145 and LNCaP cells, and stably-transfected Nodal increased the clonogencity of LNCaP cells. Nodal was also found to modulate AR signalling. Nodal reduced the activity of an androgen-regulated KLK3 promoter construct in luciferase assays and attenuated the endogenous expression of AR target genes including prostatic kallikreins. These results demonstrate that Nodal is a novel example of a developmental signalling molecule that is reexpressed in prostate cancer and may have a functional role in prostate cancer progression. In summary, this project clarifies the role of androgens and changing cellular differentiation in prostate cancer by characterising the expression and function of the downstream genes encoding kallikrein-related serine proteases and Nodal. Furthermore, this study emphasises the similarities between prostate cancer and early development, and the crosstalk between developmental signalling pathways and the AR axis. The outcomes of this project also affirm the utility of the kallikrein locus as a model system to monitor tumour progression and the phenotype of prostate cancer cells.
Resumo:
While it is commonly accepted that computability on a Turing machine in polynomial time represents a correct formalization of the notion of a feasibly computable function, there is no similar agreement on how to extend this notion on functionals, that is, what functionals should be considered feasible. One possible paradigm was introduced by Mehlhorn, who extended Cobham's definition of feasible functions to type 2 functionals. Subsequently, this class of functionals (with inessential changes of the definition) was studied by Townsend who calls this class POLY, and by Kapron and Cook who call the same class basic feasible functionals. Kapron and Cook gave an oracle Turing machine model characterisation of this class. In this article, we demonstrate that the class of basic feasible functionals has recursion theoretic properties which naturally generalise the corresponding properties of the class of feasible functions, thus giving further evidence that the notion of feasibility of functionals mentioned above is correctly chosen. We also improve the Kapron and Cook result on machine representation.Our proofs are based on essential applications of logic. We introduce a weak fragment of second order arithmetic with second order variables ranging over functions from NN which suitably characterises basic feasible functionals, and show that it is a useful tool for investigating the properties of basic feasible functionals. In particular, we provide an example how one can extract feasible programs from mathematical proofs that use nonfeasible functions.
Resumo:
In the context of learning paradigms of identification in the limit, we address the question: why is uncertainty sometimes desirable? We use mind change bounds on the output hypotheses as a measure of uncertainty, and interpret ‘desirable’ as reduction in data memorization, also defined in terms of mind change bounds. The resulting model is closely related to iterative learning with bounded mind change complexity, but the dual use of mind change bounds — for hypotheses and for data — is a key distinctive feature of our approach. We show that situations exists where the more mind changes the learner is willing to accept, the lesser the amount of data it needs to remember in order to converge to the correct hypothesis. We also investigate relationships between our model and learning from good examples, set-driven, monotonic and strong-monotonic learners, as well as class-comprising versus class-preserving learnability.
Resumo:
Keyword Spotting is the task of detecting keywords of interest within continu- ous speech. The applications of this technology range from call centre dialogue systems to covert speech surveillance devices. Keyword spotting is particularly well suited to data mining tasks such as real-time keyword monitoring and unre- stricted vocabulary audio document indexing. However, to date, many keyword spotting approaches have su®ered from poor detection rates, high false alarm rates, or slow execution times, thus reducing their commercial viability. This work investigates the application of keyword spotting to data mining tasks. The thesis makes a number of major contributions to the ¯eld of keyword spotting. The ¯rst major contribution is the development of a novel keyword veri¯cation method named Cohort Word Veri¯cation. This method combines high level lin- guistic information with cohort-based veri¯cation techniques to obtain dramatic improvements in veri¯cation performance, in particular for the problematic short duration target word class. The second major contribution is the development of a novel audio document indexing technique named Dynamic Match Lattice Spotting. This technique aug- ments lattice-based audio indexing principles with dynamic sequence matching techniques to provide robustness to erroneous lattice realisations. The resulting algorithm obtains signi¯cant improvement in detection rate over lattice-based audio document indexing while still maintaining extremely fast search speeds. The third major contribution is the study of multiple veri¯er fusion for the task of keyword veri¯cation. The reported experiments demonstrate that substantial improvements in veri¯cation performance can be obtained through the fusion of multiple keyword veri¯ers. The research focuses on combinations of speech background model based veri¯ers and cohort word veri¯ers. The ¯nal major contribution is a comprehensive study of the e®ects of limited training data for keyword spotting. This study is performed with consideration as to how these e®ects impact the immediate development and deployment of speech technologies for non-English languages.
Resumo:
We propose a model-based approach to unify clustering and network modeling using time-course gene expression data. Specifically, our approach uses a mixture model to cluster genes. Genes within the same cluster share a similar expression profile. The network is built over cluster-specific expression profiles using state-space models. We discuss the application of our model to simulated data as well as to time-course gene expression data arising from animal models on prostate cancer progression. The latter application shows that with a combined statistical/bioinformatics analyses, we are able to extract gene-to-gene relationships supported by the literature as well as new plausible relationships.