241 resultados para graph entropy
Resumo:
Bioinformatics involves analyses of biological data such as DNA sequences, microarrays and protein-protein interaction (PPI) networks. Its two main objectives are the identification of genes or proteins and the prediction of their functions. Biological data often contain uncertain and imprecise information. Fuzzy theory provides useful tools to deal with this type of information, hence has played an important role in analyses of biological data. In this thesis, we aim to develop some new fuzzy techniques and apply them on DNA microarrays and PPI networks. We will focus on three problems: (1) clustering of microarrays; (2) identification of disease-associated genes in microarrays; and (3) identification of protein complexes in PPI networks. The first part of the thesis aims to detect, by the fuzzy C-means (FCM) method, clustering structures in DNA microarrays corrupted by noise. Because of the presence of noise, some clustering structures found in random data may not have any biological significance. In this part, we propose to combine the FCM with the empirical mode decomposition (EMD) for clustering microarray data. The purpose of EMD is to reduce, preferably to remove, the effect of noise, resulting in what is known as denoised data. We call this method the fuzzy C-means method with empirical mode decomposition (FCM-EMD). We applied this method on yeast and serum microarrays, and the silhouette values are used for assessment of the quality of clustering. The results indicate that the clustering structures of denoised data are more reasonable, implying that genes have tighter association with their clusters. Furthermore we found that the estimation of the fuzzy parameter m, which is a difficult step, can be avoided to some extent by analysing denoised microarray data. The second part aims to identify disease-associated genes from DNA microarray data which are generated under different conditions, e.g., patients and normal people. We developed a type-2 fuzzy membership (FM) function for identification of diseaseassociated genes. This approach is applied to diabetes and lung cancer data, and a comparison with the original FM test was carried out. Among the ten best-ranked genes of diabetes identified by the type-2 FM test, seven genes have been confirmed as diabetes-associated genes according to gene description information in Gene Bank and the published literature. An additional gene is further identified. Among the ten best-ranked genes identified in lung cancer data, seven are confirmed that they are associated with lung cancer or its treatment. The type-2 FM-d values are significantly different, which makes the identifications more convincing than the original FM test. The third part of the thesis aims to identify protein complexes in large interaction networks. Identification of protein complexes is crucial to understand the principles of cellular organisation and to predict protein functions. In this part, we proposed a novel method which combines the fuzzy clustering method and interaction probability to identify the overlapping and non-overlapping community structures in PPI networks, then to detect protein complexes in these sub-networks. Our method is based on both the fuzzy relation model and the graph model. We applied the method on several PPI networks and compared with a popular protein complex identification method, the clique percolation method. For the same data, we detected more protein complexes. We also applied our method on two social networks. The results showed our method works well for detecting sub-networks and give a reasonable understanding of these communities.
Resumo:
Complex networks have been studied extensively due to their relevance to many real-world systems such as the world-wide web, the internet, biological and social systems. During the past two decades, studies of such networks in different fields have produced many significant results concerning their structures, topological properties, and dynamics. Three well-known properties of complex networks are scale-free degree distribution, small-world effect and self-similarity. The search for additional meaningful properties and the relationships among these properties is an active area of current research. This thesis investigates a newer aspect of complex networks, namely their multifractality, which is an extension of the concept of selfsimilarity. The first part of the thesis aims to confirm that the study of properties of complex networks can be expanded to a wider field including more complex weighted networks. Those real networks that have been shown to possess the self-similarity property in the existing literature are all unweighted networks. We use the proteinprotein interaction (PPI) networks as a key example to show that their weighted networks inherit the self-similarity from the original unweighted networks. Firstly, we confirm that the random sequential box-covering algorithm is an effective tool to compute the fractal dimension of complex networks. This is demonstrated on the Homo sapiens and E. coli PPI networks as well as their skeletons. Our results verify that the fractal dimension of the skeleton is smaller than that of the original network due to the shortest distance between nodes is larger in the skeleton, hence for a fixed box-size more boxes will be needed to cover the skeleton. Then we adopt the iterative scoring method to generate weighted PPI networks of five species, namely Homo sapiens, E. coli, yeast, C. elegans and Arabidopsis Thaliana. By using the random sequential box-covering algorithm, we calculate the fractal dimensions for both the original unweighted PPI networks and the generated weighted networks. The results show that self-similarity is still present in generated weighted PPI networks. This implication will be useful for our treatment of the networks in the third part of the thesis. The second part of the thesis aims to explore the multifractal behavior of different complex networks. Fractals such as the Cantor set, the Koch curve and the Sierspinski gasket are homogeneous since these fractals consist of a geometrical figure which repeats on an ever-reduced scale. Fractal analysis is a useful method for their study. However, real-world fractals are not homogeneous; there is rarely an identical motif repeated on all scales. Their singularity may vary on different subsets; implying that these objects are multifractal. Multifractal analysis is a useful way to systematically characterize the spatial heterogeneity of both theoretical and experimental fractal patterns. However, the tools for multifractal analysis of objects in Euclidean space are not suitable for complex networks. In this thesis, we propose a new box covering algorithm for multifractal analysis of complex networks. This algorithm is demonstrated in the computation of the generalized fractal dimensions of some theoretical networks, namely scale-free networks, small-world networks, random networks, and a kind of real networks, namely PPI networks of different species. Our main finding is the existence of multifractality in scale-free networks and PPI networks, while the multifractal behaviour is not confirmed for small-world networks and random networks. As another application, we generate gene interactions networks for patients and healthy people using the correlation coefficients between microarrays of different genes. Our results confirm the existence of multifractality in gene interactions networks. This multifractal analysis then provides a potentially useful tool for gene clustering and identification. The third part of the thesis aims to investigate the topological properties of networks constructed from time series. Characterizing complicated dynamics from time series is a fundamental problem of continuing interest in a wide variety of fields. Recent works indicate that complex network theory can be a powerful tool to analyse time series. Many existing methods for transforming time series into complex networks share a common feature: they define the connectivity of a complex network by the mutual proximity of different parts (e.g., individual states, state vectors, or cycles) of a single trajectory. In this thesis, we propose a new method to construct networks of time series: we define nodes by vectors of a certain length in the time series, and weight of edges between any two nodes by the Euclidean distance between the corresponding two vectors. We apply this method to build networks for fractional Brownian motions, whose long-range dependence is characterised by their Hurst exponent. We verify the validity of this method by showing that time series with stronger correlation, hence larger Hurst exponent, tend to have smaller fractal dimension, hence smoother sample paths. We then construct networks via the technique of horizontal visibility graph (HVG), which has been widely used recently. We confirm a known linear relationship between the Hurst exponent of fractional Brownian motion and the fractal dimension of the corresponding HVG network. In the first application, we apply our newly developed box-covering algorithm to calculate the generalized fractal dimensions of the HVG networks of fractional Brownian motions as well as those for binomial cascades and five bacterial genomes. The results confirm the monoscaling of fractional Brownian motion and the multifractality of the rest. As an additional application, we discuss the resilience of networks constructed from time series via two different approaches: visibility graph and horizontal visibility graph. Our finding is that the degree distribution of VG networks of fractional Brownian motions is scale-free (i.e., having a power law) meaning that one needs to destroy a large percentage of nodes before the network collapses into isolated parts; while for HVG networks of fractional Brownian motions, the degree distribution has exponential tails, implying that HVG networks would not survive the same kind of attack.
Resumo:
While the phrase “six degrees of separation” is widely used to characterize a variety of humanderived networks, in this study we show that in patent citation network, related patents are connected with an average distance of 6, whereas an average distance for a random pair of nodes in the graph is approximately 15. We use this information to improve the recall level in prior-art retrieval in the setting of blind relevance feedback without any textual knowledge.
Resumo:
This paper presents a general, global approach to the problem of robot exploration, utilizing a topological data structure to guide an underlying Simultaneous Localization and Mapping (SLAM) process. A Gap Navigation Tree (GNT) is used to motivate global target selection and occluded regions of the environment (called “gaps”) are tracked probabilistically. The process of map construction and the motion of the vehicle alters both the shape and location of these regions. The use of online mapping is shown to reduce the difficulties in implementing the GNT.
Resumo:
In practice, parallel-machine job-shop scheduling (PMJSS) is very useful in the development of standard modelling approaches and generic solution techniques for many real-world scheduling problems. In this paper, based on the analysis of structural properties in an extended disjunctive graph model, a hybrid shifting bottleneck procedure (HSBP) algorithm combined with Tabu Search metaheuristic algorithm is developed to deal with the PMJSS problem. The original-version SBP algorithm for the job-shop scheduling (JSS) has been significantly improved to solve the PMJSS problem with four novelties: i) a topological-sequence algorithm is proposed to decompose the PMJSS problem into a set of single-machine scheduling (SMS) and/or parallel-machine scheduling (PMS) subproblems; ii) a modified Carlier algorithm based on the proposed lemmas and the proofs is developed to solve the SMS subproblem; iii) the Jackson rule is extended to solve the PMS subproblem; iv) a Tabu Search metaheuristic algorithm is embedded under the framework of SBP to optimise the JSS and PMJSS cases. The computational experiments show that the proposed HSBP is very efficient in solving the JSS and PMJSS problems.
Resumo:
In the structure of the title compound C16H26N+ Cl-, the salt of a precursor in the synthesis of an isoindolin-2-yloxyl free-radical trapping agent, the cations and anions form discrete centrosymetric cyclic dimers through N---H...Cl hydrogen-bonding associations [graph set R2/4(8)].
Resumo:
In the title salt, racemic C6H12N2O+ C8H11O4- from the reaction of cis-cyclohexane-1,2-dicarboxylic anhydride with isonipecotamide, the cations are linked into duplex chain substructures through both centrosymmetric cyclic head-to-head 'amide motif' hydrogen-bonding associations [graph set R2/2(8)] and 'side-by-side' R2/2(14) associations. The anions are incorporated into the chains through cyclic R3/4(10) interactions involving amide and piperidinium N-H...O(carboxyl) hydrogen bonds which, together with inter-anion carboxylic acid O-H...O(carboxyl) hydrogen bonds, give a two-dimensional layered structure extending along (011).
Resumo:
Recommender systems are one of the recent inventions to deal with ever growing information overload in relation to the selection of goods and services in a global economy. Collaborative Filtering (CF) is one of the most popular techniques in recommender systems. The CF recommends items to a target user based on the preferences of a set of similar users known as the neighbours, generated from a database made up of the preferences of past users. With sufficient background information of item ratings, its performance is promising enough but research shows that it performs very poorly in a cold start situation where there is not enough previous rating data. As an alternative to ratings, trust between the users could be used to choose the neighbour for recommendation making. Better recommendations can be achieved using an inferred trust network which mimics the real world "friend of a friend" recommendations. To extend the boundaries of the neighbour, an effective trust inference technique is required. This thesis proposes a trust interference technique called Directed Series Parallel Graph (DSPG) which performs better than other popular trust inference algorithms such as TidalTrust and MoleTrust. Another problem is that reliable explicit trust data is not always available. In real life, people trust "word of mouth" recommendations made by people with similar interests. This is often assumed in the recommender system. By conducting a survey, we can confirm that interest similarity has a positive relationship with trust and this can be used to generate a trust network for recommendation. In this research, we also propose a new method called SimTrust for developing trust networks based on user's interest similarity in the absence of explicit trust data. To identify the interest similarity, we use user's personalised tagging information. However, we are interested in what resources the user chooses to tag, rather than the text of the tag applied. The commonalities of the resources being tagged by the users can be used to form the neighbours used in the automated recommender system. Our experimental results show that our proposed tag-similarity based method outperforms the traditional collaborative filtering approach which usually uses rating data.
Resumo:
As organizations reach higher levels of business process management maturity, they often find themselves maintaining very large process model repositories, representing valuable knowledge about their operations. A common practice within these repositories is to create new process models, or extend existing ones, by copying and merging fragments from other models. We contend that if these duplicate fragments, a.k.a. ex- act clones, can be identified and factored out as shared subprocesses, the repository’s maintainability can be greatly improved. With this purpose in mind, we propose an indexing structure to support fast detection of clones in process model repositories. Moreover, we show how this index can be used to efficiently query a process model repository for fragments. This index, called RPSDAG, is based on a novel combination of a method for process model decomposition (namely the Refined Process Structure Tree), with established graph canonization and string matching techniques. We evaluated the RPSDAG with large process model repositories from industrial practice. The experiments show that a significant number of non-trivial clones can be efficiently found in such repositories, and that fragment queries can be handled efficiently.
Resumo:
Here we present a sequential Monte Carlo (SMC) algorithm that can be used for any one-at-a-time Bayesian sequential design problem in the presence of model uncertainty where discrete data are encountered. Our focus is on adaptive design for model discrimination but the methodology is applicable if one has a different design objective such as parameter estimation or prediction. An SMC algorithm is run in parallel for each model and the algorithm relies on a convenient estimator of the evidence of each model which is essentially a function of importance sampling weights. Other methods for this task such as quadrature, often used in design, suffer from the curse of dimensionality. Approximating posterior model probabilities in this way allows us to use model discrimination utility functions derived from information theory that were previously difficult to compute except for conjugate models. A major benefit of the algorithm is that it requires very little problem specific tuning. We demonstrate the methodology on three applications, including discriminating between models for decline in motor neuron numbers in patients suffering from neurological diseases such as Motor Neuron disease.
Resumo:
In the asymmetric unit of the title co-crystal, C12H14N4O2S . C7H5NO4 there are two independent but conformationally similar heterodimers, which are formed through intermolecular N-H...O(carboxy) and carboxyl O-H...N hydrogen-bond pairs, giving a cyclic motif [graph set R2/2(8)]. The dihedral angles between the rings in the sulfonamide molecules are 78.77(8) and 82.33(9)deg. while the dihedral angles between the ring and the CO2H group in the acids are 2.19(9) and 7.02(10)deg. A two-dimensional structure parallel to the ab plane is generated from the heterodimer units through hydrogen-bonding associations between NH2 and sulfone groups. Between neighbouring two-dimensional arrays there are two types of aromatic pi-pi stacking interactions involving either one of the pyrimidine rings and a 4-nitrobenzoic acid molecule [minimum ring centroid separation = 3.5886(9)A] or two acid molecules [minimum ring centroid separation = 3.7236(10)A].
Resumo:
While researchers strive to improve automatic face recognition performance, the relationship between image resolution and face recognition performance has not received much attention. This relationship is examined systematically and a framework is developed such that results from super-resolution techniques can be compared. Three super-resolution techniques are compared with the Eigenface and Elastic Bunch Graph Matching face recognition engines. Parameter ranges over which these techniques provide better recognition performance than interpolated images is determined.
Resumo:
This work presents two UAS See and Avoid approaches using Fuzzy Control. We compare the performance of each controller when a Cross-Entropy method is applied to optimase the parameters for one of the controllers. Each controller receive information from an image processing front-end that detect and track targets in the environment. Visual information is then used under a visual servoing approach to perform autonomous avoidance. Experimental flight trials using a small quadrotor were performed to validate and compare the behaviour of both controllers
Resumo:
The time consuming and labour intensive task of identifying individuals in surveillance video is often challenged by poor resolution and the sheer volume of stored video. Faces or identifying marks such as tattoos are often too coarse for direct matching by machine or human vision. Object tracking and super-resolution can then be combined to facilitate the automated detection and enhancement of areas of interest. The object tracking process enables the automatic detection of people of interest, greatly reducing the amount of data for super-resolution. Smaller regions such as faces can also be tracked. A number of instances of such regions can then be utilized to obtain a super-resolved version for matching. Performance improvement from super-resolution is demonstrated using a face verification task. It is shown that there is a consistent improvement of approximately 7% in verification accuracy, using both Eigenface and Elastic Bunch Graph Matching approaches for automatic face verification, starting from faces with an eye to eye distance of 14 pixels. Visual improvement in image fidelity from super-resolved images over low-resolution and interpolated images is demonstrated on a small database. Current research and future directions in this area are also summarized.
Resumo:
A Cooperative Collision Warning System (CCWS) is an active safety techno- logy for road vehicles that can potentially reduce traffic accidents. It provides a driver with situational awareness and early warnings of any possible colli- sions through an on-board unit. CCWS is still under active research, and one of the important technical problems is safety message dissemination. Safety messages are disseminated in a high-speed mobile environment using wireless communication technology such as Dedicated Short Range Communication (DSRC). The wireless communication in CCWS has a limited bandwidth and can become unreliable when used inefficiently, particularly given the dynamic nature of road traffic conditions. Unreliable communication may significantly reduce the performance of CCWS in preventing collisions. There are two types of safety messages: Routine Safety Messages (RSMs) and Event Safety Messages (ESMs). An RSM contains the up-to-date state of a vehicle, and it must be disseminated repeatedly to its neighbouring vehicles. An ESM is a warning message that must be sent to all the endangered vehi- cles. Existing RSM and ESM dissemination schemes are inefficient, unscalable, and unable to give priority to vehicles in the most danger. Thus, this study investigates more efficient and scalable RSM and ESM dissemination schemes that can make use of the context information generated from a particular traffic scenario. Therefore, this study tackles three technical research prob- lems, vehicular traffic scenario modelling and context information generation, context-aware RSM dissemination, and context-aware ESM dissemination. The most relevant context information in CCWS is the information about possible collisions among vehicles given a current vehicular traffic situation. To generate the context information, this study investigates techniques to model interactions among multiple vehicles based on their up-to-date motion state obtained via RSM. To date, there is no existing model that can represent interactions among multiple vehicles in a speciffic region and at a particular time. The major outcome from the first problem is a new interaction graph model that can be used to easily identify the endangered vehicles and their danger severity. By identifying the endangered vehicles, RSM and ESM dis- semination can be optimised while improving safety at the same time. The new model enables the development of context-aware RSM and ESM dissemination schemes. To disseminate RSM efficiently, this study investigates a context-aware dis- semination scheme that can optimise the RSM dissemination rate to improve safety in various vehicle densities. The major outcome from the second problem is a context-aware RSM dissemination protocol. The context-aware protocol can adaptively adjust the dissemination rate based on an estimated channel load and danger severity of vehicle interactions given by the interaction graph model. Unlike existing RSM dissemination schemes, the proposed adaptive scheme can reduce channel congestion and improve safety by prioritising ve- hicles that are most likely to crash with other vehicles. The proposed RSM protocol has been implemented and evaluated by simulation. The simulation results have shown that the proposed RSM protocol outperforms existing pro- tocols in terms of efficiency, scalability and safety. To disseminate ESM efficiently, this study investigates a context-aware ESM dissemination scheme that can reduce unnecessary transmissions and deliver ESMs to endangered vehicles as fast as possible. The major outcome from the third problem is a context-aware ESM dissemination protocol that uses a multicast routing strategy. Existing ESM protocols use broadcast rout- ing, which is not efficient because ESMs may be sent to a large number of ve- hicles in the area. Using multicast routing improves efficiency because ESMs are sent only to the endangered vehicles. The endangered vehicles can be identified using the interaction graph model. The proposed ESM protocol has been implemented and evaluated by simulation. The simulation results have shown that the proposed ESM protocol can prevent potential accidents from occurring better than existing ESM protocols. The context model and the RSM and ESM dissemination protocols can be implemented in any CCWS development to improve the communication and safety performance of CCWS. In effect, the outcomes contribute to the realisation of CCWS that will ultimately improve road safety and save lives.