562 resultados para 01 Mathematical Sciences
Resumo:
With the increasing number of XML documents in varied domains, it has become essential to identify ways of finding interesting information from these documents. Data mining techniques were used to derive this interesting information. Mining on XML documents is impacted by its model due to the semi-structured nature of these documents. Hence, in this chapter we present an overview of the various models of XML documents, how these models were used for mining and some of the issues and challenges in these models. In addition, this chapter also provides some insights into the future models of XML documents for effectively capturing the two important features namely structure and content of XML documents for mining.
Resumo:
The problem of steady subcritical free surface flow past a submerged inclined step is considered. The asymptotic limit of small Froude number is treated, with particular emphasis on the effect that changing the angle of the step face has on the surface waves. As demonstrated by Chapman & Vanden-Broeck (2006), the divergence of a power series expansion in powers of the square of the Froude number is caused by singularities in the analytic continuation of the free surface; for an inclined step, these singularities may correspond to either the corners or stagnation points of the step, or both, depending on the angle of incline. Stokes lines emanate from these singularities, and exponentially small waves are switched on at the point the Stokes lines intersect with the free surface. Our results suggest that for a certain range of step angles, two wavetrains are switched on, but the exponentially subdominant one is switched on first, leading to an intermediate wavetrain not previously noted. We extend these ideas to the problem of flow over a submerged bump or trench, again with inclined sides. This time there may be two, three or four active Stokes lines, depending on the inclination angles. We demonstrate how to construct a base topography such that wave contributions from separate Stokes lines are of equal magnitude but opposite phase, thus cancelling out. Our asymptotic results are complemented by numerical solutions to the fully nonlinear equations.
Resumo:
In this paper we present a sequential Monte Carlo algorithm for Bayesian sequential experimental design applied to generalised non-linear models for discrete data. The approach is computationally convenient in that the information of newly observed data can be incorporated through a simple re-weighting step. We also consider a flexible parametric model for the stimulus-response relationship together with a newly developed hybrid design utility that can produce more robust estimates of the target stimulus in the presence of substantial model and parameter uncertainty. The algorithm is applied to hypothetical clinical trial or bioassay scenarios. In the discussion, potential generalisations of the algorithm are suggested to possibly extend its applicability to a wide variety of scenarios
Resumo:
The main aim of this thesis is to analyse and optimise a public hospital Emergency Department. The Emergency Department (ED) is a complex system with limited resources and a high demand for these resources. Adding to the complexity is the stochastic nature of almost every element and characteristic in the ED. The interaction with other functional areas also complicates the system as these areas have a huge impact on the ED and the ED is powerless to change them. Therefore it is imperative that OR be applied to the ED to improve the performance within the constraints of the system. The main characteristics of the system to optimise included tardiness, adherence to waiting time targets, access block and length of stay. A validated and verified simulation model was built to model the real life system. This enabled detailed analysis of resources and flow without disruption to the actual ED. A wide range of different policies for the ED and a variety of resources were able to be investigated. Of particular interest was the number and type of beds in the ED and also the shift times of physicians. One point worth noting was that neither of these resources work in isolation and for optimisation of the system both resources need to be investigated in tandem. The ED was likened to a flow shop scheduling problem with the patients and beds being synonymous with the jobs and machines typically found in manufacturing problems. This enabled an analytic scheduling approach. Constructive heuristics were developed to reactively schedule the system in real time and these were able to improve the performance of the system. Metaheuristics that optimised the system were also developed and analysed. An innovative hybrid Simulated Annealing and Tabu Search algorithm was developed that out-performed both simulated annealing and tabu search algorithms by combining some of their features. The new algorithm achieves a more optimal solution and does so in a shorter time.
Resumo:
Bioinformatics involves analyses of biological data such as DNA sequences, microarrays and protein-protein interaction (PPI) networks. Its two main objectives are the identification of genes or proteins and the prediction of their functions. Biological data often contain uncertain and imprecise information. Fuzzy theory provides useful tools to deal with this type of information, hence has played an important role in analyses of biological data. In this thesis, we aim to develop some new fuzzy techniques and apply them on DNA microarrays and PPI networks. We will focus on three problems: (1) clustering of microarrays; (2) identification of disease-associated genes in microarrays; and (3) identification of protein complexes in PPI networks. The first part of the thesis aims to detect, by the fuzzy C-means (FCM) method, clustering structures in DNA microarrays corrupted by noise. Because of the presence of noise, some clustering structures found in random data may not have any biological significance. In this part, we propose to combine the FCM with the empirical mode decomposition (EMD) for clustering microarray data. The purpose of EMD is to reduce, preferably to remove, the effect of noise, resulting in what is known as denoised data. We call this method the fuzzy C-means method with empirical mode decomposition (FCM-EMD). We applied this method on yeast and serum microarrays, and the silhouette values are used for assessment of the quality of clustering. The results indicate that the clustering structures of denoised data are more reasonable, implying that genes have tighter association with their clusters. Furthermore we found that the estimation of the fuzzy parameter m, which is a difficult step, can be avoided to some extent by analysing denoised microarray data. The second part aims to identify disease-associated genes from DNA microarray data which are generated under different conditions, e.g., patients and normal people. We developed a type-2 fuzzy membership (FM) function for identification of diseaseassociated genes. This approach is applied to diabetes and lung cancer data, and a comparison with the original FM test was carried out. Among the ten best-ranked genes of diabetes identified by the type-2 FM test, seven genes have been confirmed as diabetes-associated genes according to gene description information in Gene Bank and the published literature. An additional gene is further identified. Among the ten best-ranked genes identified in lung cancer data, seven are confirmed that they are associated with lung cancer or its treatment. The type-2 FM-d values are significantly different, which makes the identifications more convincing than the original FM test. The third part of the thesis aims to identify protein complexes in large interaction networks. Identification of protein complexes is crucial to understand the principles of cellular organisation and to predict protein functions. In this part, we proposed a novel method which combines the fuzzy clustering method and interaction probability to identify the overlapping and non-overlapping community structures in PPI networks, then to detect protein complexes in these sub-networks. Our method is based on both the fuzzy relation model and the graph model. We applied the method on several PPI networks and compared with a popular protein complex identification method, the clique percolation method. For the same data, we detected more protein complexes. We also applied our method on two social networks. The results showed our method works well for detecting sub-networks and give a reasonable understanding of these communities.
Resumo:
Complex networks have been studied extensively due to their relevance to many real-world systems such as the world-wide web, the internet, biological and social systems. During the past two decades, studies of such networks in different fields have produced many significant results concerning their structures, topological properties, and dynamics. Three well-known properties of complex networks are scale-free degree distribution, small-world effect and self-similarity. The search for additional meaningful properties and the relationships among these properties is an active area of current research. This thesis investigates a newer aspect of complex networks, namely their multifractality, which is an extension of the concept of selfsimilarity. The first part of the thesis aims to confirm that the study of properties of complex networks can be expanded to a wider field including more complex weighted networks. Those real networks that have been shown to possess the self-similarity property in the existing literature are all unweighted networks. We use the proteinprotein interaction (PPI) networks as a key example to show that their weighted networks inherit the self-similarity from the original unweighted networks. Firstly, we confirm that the random sequential box-covering algorithm is an effective tool to compute the fractal dimension of complex networks. This is demonstrated on the Homo sapiens and E. coli PPI networks as well as their skeletons. Our results verify that the fractal dimension of the skeleton is smaller than that of the original network due to the shortest distance between nodes is larger in the skeleton, hence for a fixed box-size more boxes will be needed to cover the skeleton. Then we adopt the iterative scoring method to generate weighted PPI networks of five species, namely Homo sapiens, E. coli, yeast, C. elegans and Arabidopsis Thaliana. By using the random sequential box-covering algorithm, we calculate the fractal dimensions for both the original unweighted PPI networks and the generated weighted networks. The results show that self-similarity is still present in generated weighted PPI networks. This implication will be useful for our treatment of the networks in the third part of the thesis. The second part of the thesis aims to explore the multifractal behavior of different complex networks. Fractals such as the Cantor set, the Koch curve and the Sierspinski gasket are homogeneous since these fractals consist of a geometrical figure which repeats on an ever-reduced scale. Fractal analysis is a useful method for their study. However, real-world fractals are not homogeneous; there is rarely an identical motif repeated on all scales. Their singularity may vary on different subsets; implying that these objects are multifractal. Multifractal analysis is a useful way to systematically characterize the spatial heterogeneity of both theoretical and experimental fractal patterns. However, the tools for multifractal analysis of objects in Euclidean space are not suitable for complex networks. In this thesis, we propose a new box covering algorithm for multifractal analysis of complex networks. This algorithm is demonstrated in the computation of the generalized fractal dimensions of some theoretical networks, namely scale-free networks, small-world networks, random networks, and a kind of real networks, namely PPI networks of different species. Our main finding is the existence of multifractality in scale-free networks and PPI networks, while the multifractal behaviour is not confirmed for small-world networks and random networks. As another application, we generate gene interactions networks for patients and healthy people using the correlation coefficients between microarrays of different genes. Our results confirm the existence of multifractality in gene interactions networks. This multifractal analysis then provides a potentially useful tool for gene clustering and identification. The third part of the thesis aims to investigate the topological properties of networks constructed from time series. Characterizing complicated dynamics from time series is a fundamental problem of continuing interest in a wide variety of fields. Recent works indicate that complex network theory can be a powerful tool to analyse time series. Many existing methods for transforming time series into complex networks share a common feature: they define the connectivity of a complex network by the mutual proximity of different parts (e.g., individual states, state vectors, or cycles) of a single trajectory. In this thesis, we propose a new method to construct networks of time series: we define nodes by vectors of a certain length in the time series, and weight of edges between any two nodes by the Euclidean distance between the corresponding two vectors. We apply this method to build networks for fractional Brownian motions, whose long-range dependence is characterised by their Hurst exponent. We verify the validity of this method by showing that time series with stronger correlation, hence larger Hurst exponent, tend to have smaller fractal dimension, hence smoother sample paths. We then construct networks via the technique of horizontal visibility graph (HVG), which has been widely used recently. We confirm a known linear relationship between the Hurst exponent of fractional Brownian motion and the fractal dimension of the corresponding HVG network. In the first application, we apply our newly developed box-covering algorithm to calculate the generalized fractal dimensions of the HVG networks of fractional Brownian motions as well as those for binomial cascades and five bacterial genomes. The results confirm the monoscaling of fractional Brownian motion and the multifractality of the rest. As an additional application, we discuss the resilience of networks constructed from time series via two different approaches: visibility graph and horizontal visibility graph. Our finding is that the degree distribution of VG networks of fractional Brownian motions is scale-free (i.e., having a power law) meaning that one needs to destroy a large percentage of nodes before the network collapses into isolated parts; while for HVG networks of fractional Brownian motions, the degree distribution has exponential tails, implying that HVG networks would not survive the same kind of attack.
Resumo:
Mixture models are a flexible tool for unsupervised clustering that have found popularity in a vast array of research areas. In studies of medicine, the use of mixtures holds the potential to greatly enhance our understanding of patient responses through the identification of clinically meaningful clusters that, given the complexity of many data sources, may otherwise by intangible. Furthermore, when developed in the Bayesian framework, mixture models provide a natural means for capturing and propagating uncertainty in different aspects of a clustering solution, arguably resulting in richer analyses of the population under study. This thesis aims to investigate the use of Bayesian mixture models in analysing varied and detailed sources of patient information collected in the study of complex disease. The first aim of this thesis is to showcase the flexibility of mixture models in modelling markedly different types of data. In particular, we examine three common variants on the mixture model, namely, finite mixtures, Dirichlet Process mixtures and hidden Markov models. Beyond the development and application of these models to different sources of data, this thesis also focuses on modelling different aspects relating to uncertainty in clustering. Examples of clustering uncertainty considered are uncertainty in a patient’s true cluster membership and accounting for uncertainty in the true number of clusters present. Finally, this thesis aims to address and propose solutions to the task of comparing clustering solutions, whether this be comparing patients or observations assigned to different subgroups or comparing clustering solutions over multiple datasets. To address these aims, we consider a case study in Parkinson’s disease (PD), a complex and commonly diagnosed neurodegenerative disorder. In particular, two commonly collected sources of patient information are considered. The first source of data are on symptoms associated with PD, recorded using the Unified Parkinson’s Disease Rating Scale (UPDRS) and constitutes the first half of this thesis. The second half of this thesis is dedicated to the analysis of microelectrode recordings collected during Deep Brain Stimulation (DBS), a popular palliative treatment for advanced PD. Analysis of this second source of data centers on the problems of unsupervised detection and sorting of action potentials or "spikes" in recordings of multiple cell activity, providing valuable information on real time neural activity in the brain.
Resumo:
Key establishment is a crucial cryptographic primitive for building secure communication channels between two parties in a network. It has been studied extensively in theory and widely deployed in practice. In the research literature a typical protocol in the public-key setting aims for key secrecy and mutual authentication. However, there are many important practical scenarios where mutual authentication is undesirable, such as in anonymity networks like Tor, or is difficult to achieve due to insufficient public-key infrastructure at the user level, as is the case on the Internet today. In this work we are concerned with the scenario where two parties establish a private shared session key, but only one party authenticates to the other; in fact, the unauthenticated party may wish to have strong anonymity guarantees. We present a desirable set of security, authentication, and anonymity goals for this setting and develop a model which captures these properties. Our approach allows for clients to choose among different levels of authentication. We also describe an attack on a previous protocol of Øverlier and Syverson, and present a new, efficient key exchange protocol that provides one-way authentication and anonymity.
Resumo:
Open pit mine operations are complex businesses that demand a constant assessment of risk. This is because the value of a mine project is typically influenced by many underlying economic and physical uncertainties, such as metal prices, metal grades, costs, schedules, quantities, and environmental issues, among others, which are not known with much certainty at the beginning of the project. Hence, mining projects present a considerable challenge to those involved in associated investment decisions, such as the owners of the mine and other stakeholders. In general terms, when an option exists to acquire a new or operating mining project, , the owners and stock holders of the mine project need to know the value of the mining project, which is the fundamental criterion for making final decisions about going ahead with the venture capital. However, obtaining the mine project’s value is not an easy task. The reason for this is that sophisticated valuation and mine optimisation techniques, which combine advanced theories in geostatistics, statistics, engineering, economics and finance, among others, need to be used by the mine analyst or mine planner in order to assess and quantify the existing uncertainty and, consequently, the risk involved in the project investment. Furthermore, current valuation and mine optimisation techniques do not complement each other. That is valuation techniques based on real options (RO) analysis assume an expected (constant) metal grade and ore tonnage during a specified period, while mine optimisation (MO) techniques assume expected (constant) metal prices and mining costs. These assumptions are not totally correct since both sources of uncertainty—that of the orebody (metal grade and reserves of mineral), and that about the future behaviour of metal prices and mining costs—are the ones that have great impact on the value of any mining project. Consequently, the key objective of this thesis is twofold. The first objective consists of analysing and understanding the main sources of uncertainty in an open pit mining project, such as the orebody (in situ metal grade), mining costs and metal price uncertainties, and their effect on the final project value. The second objective consists of breaking down the wall of isolation between economic valuation and mine optimisation techniques in order to generate a novel open pit mine evaluation framework called the ―Integrated Valuation / Optimisation Framework (IVOF)‖. One important characteristic of this new framework is that it incorporates the RO and MO valuation techniques into a single integrated process that quantifies and describes uncertainty and risk in a mine project evaluation process, giving a more realistic estimate of the project’s value. To achieve this, novel and advanced engineering and econometric methods are used to integrate financial and geological uncertainty into dynamic risk forecasting measures. The proposed mine valuation/optimisation technique is then applied to a real gold disseminated open pit mine deposit to estimate its value in the face of orebody, mining costs and metal price uncertainties.
Resumo:
A novel m-ary tree based approach is presented to solve asset management decisions which are combinatorial in nature. The approach introduces a new dynamic constraint based control mechanism which is capable of excluding infeasible solutions from the solution space. The approach also provides a solution to the challenges with ordering of assets decisions.
Resumo:
In this paper, we describe an analysis for data collected on a three-dimensional spatial lattice with treatments applied at the horizontal lattice points. Spatial correlation is accounted for using a conditional autoregressive model. Observations are defined as neighbours only if they are at the same depth. This allows the corresponding variance components to vary by depth. We use the Markov chain Monte Carlo method with block updating, together with Krylov subspace methods, for efficient estimation of the model. The method is applicable to both regular and irregular horizontal lattices and hence to data collected at any set of horizontal sites for a set of depths or heights, for example, water column or soil profile data. The model for the three-dimensional data is applied to agricultural trial data for five separate days taken roughly six months apart in order to determine possible relationships over time. The purpose of the trial is to determine a form of cropping that leads to less moist soils in the root zone and beyond.We estimate moisture for each date, depth and treatment accounting for spatial correlation and determine relationships of these and other parameters over time.
Resumo:
Modern technology now has the ability to generate large datasets over space and time. Such data typically exhibit high autocorrelations over all dimensions. The field trial data motivating the methods of this paper were collected to examine the behaviour of traditional cropping and to determine a cropping system which could maximise water use for grain production while minimising leakage below the crop root zone. They consist of moisture measurements made at 15 depths across 3 rows and 18 columns, in the lattice framework of an agricultural field. Bayesian conditional autoregressive (CAR) models are used to account for local site correlations. Conditional autoregressive models have not been widely used in analyses of agricultural data. This paper serves to illustrate the usefulness of these models in this field, along with the ease of implementation in WinBUGS, a freely available software package. The innovation is the fitting of separate conditional autoregressive models for each depth layer, the ‘layered CAR model’, while simultaneously estimating depth profile functions for each site treatment. Modelling interest also lay in how best to model the treatment effect depth profiles, and in the choice of neighbourhood structure for the spatial autocorrelation model. The favoured model fitted the treatment effects as splines over depth, and treated depth, the basis for the regression model, as measured with error, while fitting CAR neighbourhood models by depth layer. It is hierarchical, with separate onditional autoregressive spatial variance components at each depth, and the fixed terms which involve an errors-in-measurement model treat depth errors as interval-censored measurement error. The Bayesian framework permits transparent specification and easy comparison of the various complex models compared.
Resumo:
With rapid and continuing growth of learning support initiatives in mathematics and statistics found in many parts of the world, and with the likelihood that this trend will continue, there is a need to ensure that robust and coherent measures are in place to evaluate the effectiveness of these initiatives. The nature of learning support brings challenges for measurement and analysis of its effects. After briefly reviewing the purpose, rationale for, and extent of current provision, this article provides a framework for those working in learning support to think about how their efforts can be evaluated. It provides references and specific examples of how workers in this field are collecting, analysing and reporting their findings. The framework is used to structure evaluation in terms of usage of facilities, resources and services provided, and also in terms of improvements in performance of the students and staff who engage with them. Very recent developments have started to address the effects of learning support on the development of deeper approaches to learning, the affective domain and the development of communities of practice of both learners and teachers. This article intends to be a stimulus to those who work in mathematics and statistics support to gather even richer, more valuable, forms of data. It provides a 'toolkit' for those interested in evaluation of learning support and closes by referring to an on-line resource being developed to archive the growing body of evidence. © 2011 Taylor & Francis.
Resumo:
Sfinks is a shift register based stream cipher designed for hardware implementation. The initialisation state update function is different from the state update function used for keystream generation. We demonstrate state convergence during the initialisation process, even though the individual components used in the initialisation are one-to-one. However, the combination of these components is not one-to-one.
Resumo:
The paper presents the results of a study conducted to investigate indoor air quality within residential dwellings in Lao PDR. Results from PM 10, CO, and NO2 measurements inside 167 dwellings in Lao PDR over a five month period (December 2005-April 2006) are discussed as a function of household characteristics and occupant activities. Extremely high PM10 and NO2 concentrations (12 h mean PM10 concentrations 1275 ± 98 μg m-3 and 1183 ± 99 μg m-3 in Vientiane and Bolikhamxay provinces, respectively; 12 h mean NO2 concentrations 1210 ± 94 μg m-3 and 561 ± 45 μg m-3 in Vientiane and Bolikhamxay, respectively) were measured within the dwellings. Correlations, ANOVA analysis (univariate and multivariate), and linear regression results suggest a substantial contribution from cookingandsmoking.The PM10 concentrations were significantly higher in houses without a chimney compared to houses in which cooking occurred on a stove with a chimney. However, no significant differences in pollutantconcentrations were observed as a function of cooking location. Furthermore, PM10 and NO2 concentrations were higher in houses in which smoking occurred, suggestive of a relationship between increased indoor concentrations and smoking (0.05 < p < 0.10). Resuspension of dust from soil floors was another significant source of PM10 inside the house (634 μg m-3, p < 0.05).