166 resultados para graph


Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present new expected risk bounds for binary and multiclass prediction, and resolve several recent conjectures on sample compressibility due to Kuzmin and Warmuth. By exploiting the combinatorial structure of concept class F, Haussler et al. achieved a VC(F)/n bound for the natural one-inclusion prediction strategy. The key step in their proof is a d = VC(F) bound on the graph density of a subgraph of the hypercube—oneinclusion graph. The first main result of this paper is a density bound of n [n−1 <=d-1]/[n <=d] < d, which positively resolves a conjecture of Kuzmin and Warmuth relating to their unlabeled Peeling compression scheme and also leads to an improved one-inclusion mistake bound. The proof uses a new form of VC-invariant shifting and a group-theoretic symmetrization. Our second main result is an algebraic topological property of maximum classes of VC-dimension d as being d contractible simplicial complexes, extending the well-known characterization that d = 1 maximum classes are trees. We negatively resolve a minimum degree conjecture of Kuzmin and Warmuth—the second part to a conjectured proof of correctness for Peeling—that every class has one-inclusion minimum degree at most its VCdimension. Our final main result is a k-class analogue of the d/n mistake bound, replacing the VC-dimension by the Pollard pseudo-dimension and the one-inclusion strategy by its natural hypergraph generalization. This result improves on known PAC-based expected risk bounds by a factor of O(logn) and is shown to be optimal up to an O(logk) factor. The combinatorial technique of shifting takes a central role in understanding the one-inclusion (hyper)graph and is a running theme throughout.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In semisupervised learning (SSL), a predictive model is learn from a collection of labeled data and a typically much larger collection of unlabeled data. These paper presented a framework called multi-view point cloud regularization (MVPCR), which unifies and generalizes several semisupervised kernel methods that are based on data-dependent regularization in reproducing kernel Hilbert spaces (RKHSs). Special cases of MVPCR include coregularized least squares (CoRLS), manifold regularization (MR), and graph-based SSL. An accompanying theorem shows how to reduce any MVPCR problem to standard supervised learning with a new multi-view kernel.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present new expected risk bounds for binary and multiclass prediction, and resolve several recent conjectures on sample compressibility due to Kuzmin and Warmuth. By exploiting the combinatorial structure of concept class F, Haussler et al. achieved a VC(F)/n bound for the natural one-inclusion prediction strategy. The key step in their proof is a d=VC(F) bound on the graph density of a subgraph of the hypercube—one-inclusion graph. The first main result of this report is a density bound of n∙choose(n-1,≤d-1)/choose(n,≤d) < d, which positively resolves a conjecture of Kuzmin and Warmuth relating to their unlabeled Peeling compression scheme and also leads to an improved one-inclusion mistake bound. The proof uses a new form of VC-invariant shifting and a group-theoretic symmetrization. Our second main result is an algebraic topological property of maximum classes of VC-dimension d as being d-contractible simplicial complexes, extending the well-known characterization that d=1 maximum classes are trees. We negatively resolve a minimum degree conjecture of Kuzmin and Warmuth—the second part to a conjectured proof of correctness for Peeling—that every class has one-inclusion minimum degree at most its VC-dimension. Our final main result is a k-class analogue of the d/n mistake bound, replacing the VC-dimension by the Pollard pseudo-dimension and the one-inclusion strategy by its natural hypergraph generalization. This result improves on known PAC-based expected risk bounds by a factor of O(log n) and is shown to be optimal up to a O(log k) factor. The combinatorial technique of shifting takes a central role in understanding the one-inclusion (hyper)graph and is a running theme throughout

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the structure of the title compound, C5H7N2+ C8H11O4-, the cis-monoanions associate through short carboxylic acid-carboxyl O-H...O hydrogen bonds [graph set C(7)], forming zigzag chains which extend along c and are inter-linked through pyridinium and amine N-H...O(carboxyl) hydrogen bonds giving a three-dimensional network structure.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Web service technology is increasingly being used to build various e-Applications, in domains such as e-Business and e-Science. Characteristic benefits of web service technology are its inter-operability, decoupling and just-in-time integration. Using web service technology, an e-Application can be implemented by web service composition — by composing existing individual web services in accordance with the business process of the application. This means the application is provided to customers in the form of a value-added composite web service. An important and challenging issue of web service composition, is how to meet Quality-of-Service (QoS) requirements. This includes customer focused elements such as response time, price, throughput and reliability as well as how to best provide QoS results for the composites. This in turn best fulfils customers’ expectations and achieves their satisfaction. Fulfilling these QoS requirements or addressing the QoS-aware web service composition problem is the focus of this project. From a computational point of view, QoS-aware web service composition can be transformed into diverse optimisation problems. These problems are characterised as complex, large-scale, highly constrained and multi-objective problems. We therefore use genetic algorithms (GAs) to address QoS-based service composition problems. More precisely, this study addresses three important subproblems of QoS-aware web service composition; QoS-based web service selection for a composite web service accommodating constraints on inter-service dependence and conflict, QoS-based resource allocation and scheduling for multiple composite services on hybrid clouds, and performance-driven composite service partitioning for decentralised execution. Based on operations research theory, we model the three problems as a constrained optimisation problem, a resource allocation and scheduling problem, and a graph partitioning problem, respectively. Then, we present novel GAs to address these problems. We also conduct experiments to evaluate the performance of the new GAs. Finally, verification experiments are performed to show the correctness of the GAs. The major outcomes from the first problem are three novel GAs: a penaltybased GA, a min-conflict hill-climbing repairing GA, and a hybrid GA. These GAs adopt different constraint handling strategies to handle constraints on interservice dependence and conflict. This is an important factor that has been largely ignored by existing algorithms that might lead to the generation of infeasible composite services. Experimental results demonstrate the effectiveness of our GAs for handling the QoS-based web service selection problem with constraints on inter-service dependence and conflict, as well as their better scalability than the existing integer programming-based method for large scale web service selection problems. The major outcomes from the second problem has resulted in two GAs; a random-key GA and a cooperative coevolutionary GA (CCGA). Experiments demonstrate the good scalability of the two algorithms. In particular, the CCGA scales well as the number of composite services involved in a problem increases, while no other algorithms demonstrate this ability. The findings from the third problem result in a novel GA for composite service partitioning for decentralised execution. Compared with existing heuristic algorithms, the new GA is more suitable for a large-scale composite web service program partitioning problems. In addition, the GA outperforms existing heuristic algorithms, generating a better deployment topology for a composite web service for decentralised execution. These effective and scalable GAs can be integrated into QoS-based management tools to facilitate the delivery of feasible, reliable and high quality composite web services.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The structures of two hydrated proton-transfer compounds of 4-piperidinecarboxamide (isonipecotamide) with the isomeric heteroaromatic carboxylic acids indole-2-carboxylic acid and indole-3-carboxylic acid, namely 4-carbamoylpiperidinium indole-2-carboxylate dihydrate (1) and 4-carbamoylpiperidinium indole-3-carboxylate hemihydrate (2) have been determined at 200 K. Crystals of both 1 and 2 are monoclinic, space groups P21/c and P2/c respectively with Z = 4 in cells having dimensions a = 10.6811(4), b = 12.2017(4), c = 12.5456(5) Å, β = 96.000(4)o (1) and a = 15.5140(4), b = 10.2908(3), c = 9.7047(3) Å, β = 97.060(3)o (2). Hydrogen-bonding in 1 involves a primary cyclic interaction involving complementary cation amide N-H…O(carboxyl) anion and anion hetero N-H…O(amide) cation hydrogen bonds [graph set R22(9)]. Secondary associations involving also the water molecules of solvation give a two-dimensional network structure which includes weak water O-H…π interactions. In the three-dimensional hydrogen-bonded structure of 2, there are classic centrosymmetric cyclic head-to-head hydrogen-bonded amide-amide interactions [graph set R22(8)] as well as lateral cyclic amide-O linked amide-amide extensions [graph set R24(8)]. The anions and the water molecule, which lies on a twofold rotation axis, are involved in secondary extensions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Bioinformatics involves analyses of biological data such as DNA sequences, microarrays and protein-protein interaction (PPI) networks. Its two main objectives are the identification of genes or proteins and the prediction of their functions. Biological data often contain uncertain and imprecise information. Fuzzy theory provides useful tools to deal with this type of information, hence has played an important role in analyses of biological data. In this thesis, we aim to develop some new fuzzy techniques and apply them on DNA microarrays and PPI networks. We will focus on three problems: (1) clustering of microarrays; (2) identification of disease-associated genes in microarrays; and (3) identification of protein complexes in PPI networks. The first part of the thesis aims to detect, by the fuzzy C-means (FCM) method, clustering structures in DNA microarrays corrupted by noise. Because of the presence of noise, some clustering structures found in random data may not have any biological significance. In this part, we propose to combine the FCM with the empirical mode decomposition (EMD) for clustering microarray data. The purpose of EMD is to reduce, preferably to remove, the effect of noise, resulting in what is known as denoised data. We call this method the fuzzy C-means method with empirical mode decomposition (FCM-EMD). We applied this method on yeast and serum microarrays, and the silhouette values are used for assessment of the quality of clustering. The results indicate that the clustering structures of denoised data are more reasonable, implying that genes have tighter association with their clusters. Furthermore we found that the estimation of the fuzzy parameter m, which is a difficult step, can be avoided to some extent by analysing denoised microarray data. The second part aims to identify disease-associated genes from DNA microarray data which are generated under different conditions, e.g., patients and normal people. We developed a type-2 fuzzy membership (FM) function for identification of diseaseassociated genes. This approach is applied to diabetes and lung cancer data, and a comparison with the original FM test was carried out. Among the ten best-ranked genes of diabetes identified by the type-2 FM test, seven genes have been confirmed as diabetes-associated genes according to gene description information in Gene Bank and the published literature. An additional gene is further identified. Among the ten best-ranked genes identified in lung cancer data, seven are confirmed that they are associated with lung cancer or its treatment. The type-2 FM-d values are significantly different, which makes the identifications more convincing than the original FM test. The third part of the thesis aims to identify protein complexes in large interaction networks. Identification of protein complexes is crucial to understand the principles of cellular organisation and to predict protein functions. In this part, we proposed a novel method which combines the fuzzy clustering method and interaction probability to identify the overlapping and non-overlapping community structures in PPI networks, then to detect protein complexes in these sub-networks. Our method is based on both the fuzzy relation model and the graph model. We applied the method on several PPI networks and compared with a popular protein complex identification method, the clique percolation method. For the same data, we detected more protein complexes. We also applied our method on two social networks. The results showed our method works well for detecting sub-networks and give a reasonable understanding of these communities.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Complex networks have been studied extensively due to their relevance to many real-world systems such as the world-wide web, the internet, biological and social systems. During the past two decades, studies of such networks in different fields have produced many significant results concerning their structures, topological properties, and dynamics. Three well-known properties of complex networks are scale-free degree distribution, small-world effect and self-similarity. The search for additional meaningful properties and the relationships among these properties is an active area of current research. This thesis investigates a newer aspect of complex networks, namely their multifractality, which is an extension of the concept of selfsimilarity. The first part of the thesis aims to confirm that the study of properties of complex networks can be expanded to a wider field including more complex weighted networks. Those real networks that have been shown to possess the self-similarity property in the existing literature are all unweighted networks. We use the proteinprotein interaction (PPI) networks as a key example to show that their weighted networks inherit the self-similarity from the original unweighted networks. Firstly, we confirm that the random sequential box-covering algorithm is an effective tool to compute the fractal dimension of complex networks. This is demonstrated on the Homo sapiens and E. coli PPI networks as well as their skeletons. Our results verify that the fractal dimension of the skeleton is smaller than that of the original network due to the shortest distance between nodes is larger in the skeleton, hence for a fixed box-size more boxes will be needed to cover the skeleton. Then we adopt the iterative scoring method to generate weighted PPI networks of five species, namely Homo sapiens, E. coli, yeast, C. elegans and Arabidopsis Thaliana. By using the random sequential box-covering algorithm, we calculate the fractal dimensions for both the original unweighted PPI networks and the generated weighted networks. The results show that self-similarity is still present in generated weighted PPI networks. This implication will be useful for our treatment of the networks in the third part of the thesis. The second part of the thesis aims to explore the multifractal behavior of different complex networks. Fractals such as the Cantor set, the Koch curve and the Sierspinski gasket are homogeneous since these fractals consist of a geometrical figure which repeats on an ever-reduced scale. Fractal analysis is a useful method for their study. However, real-world fractals are not homogeneous; there is rarely an identical motif repeated on all scales. Their singularity may vary on different subsets; implying that these objects are multifractal. Multifractal analysis is a useful way to systematically characterize the spatial heterogeneity of both theoretical and experimental fractal patterns. However, the tools for multifractal analysis of objects in Euclidean space are not suitable for complex networks. In this thesis, we propose a new box covering algorithm for multifractal analysis of complex networks. This algorithm is demonstrated in the computation of the generalized fractal dimensions of some theoretical networks, namely scale-free networks, small-world networks, random networks, and a kind of real networks, namely PPI networks of different species. Our main finding is the existence of multifractality in scale-free networks and PPI networks, while the multifractal behaviour is not confirmed for small-world networks and random networks. As another application, we generate gene interactions networks for patients and healthy people using the correlation coefficients between microarrays of different genes. Our results confirm the existence of multifractality in gene interactions networks. This multifractal analysis then provides a potentially useful tool for gene clustering and identification. The third part of the thesis aims to investigate the topological properties of networks constructed from time series. Characterizing complicated dynamics from time series is a fundamental problem of continuing interest in a wide variety of fields. Recent works indicate that complex network theory can be a powerful tool to analyse time series. Many existing methods for transforming time series into complex networks share a common feature: they define the connectivity of a complex network by the mutual proximity of different parts (e.g., individual states, state vectors, or cycles) of a single trajectory. In this thesis, we propose a new method to construct networks of time series: we define nodes by vectors of a certain length in the time series, and weight of edges between any two nodes by the Euclidean distance between the corresponding two vectors. We apply this method to build networks for fractional Brownian motions, whose long-range dependence is characterised by their Hurst exponent. We verify the validity of this method by showing that time series with stronger correlation, hence larger Hurst exponent, tend to have smaller fractal dimension, hence smoother sample paths. We then construct networks via the technique of horizontal visibility graph (HVG), which has been widely used recently. We confirm a known linear relationship between the Hurst exponent of fractional Brownian motion and the fractal dimension of the corresponding HVG network. In the first application, we apply our newly developed box-covering algorithm to calculate the generalized fractal dimensions of the HVG networks of fractional Brownian motions as well as those for binomial cascades and five bacterial genomes. The results confirm the monoscaling of fractional Brownian motion and the multifractality of the rest. As an additional application, we discuss the resilience of networks constructed from time series via two different approaches: visibility graph and horizontal visibility graph. Our finding is that the degree distribution of VG networks of fractional Brownian motions is scale-free (i.e., having a power law) meaning that one needs to destroy a large percentage of nodes before the network collapses into isolated parts; while for HVG networks of fractional Brownian motions, the degree distribution has exponential tails, implying that HVG networks would not survive the same kind of attack.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

While the phrase “six degrees of separation” is widely used to characterize a variety of humanderived networks, in this study we show that in patent citation network, related patents are connected with an average distance of 6, whereas an average distance for a random pair of nodes in the graph is approximately 15. We use this information to improve the recall level in prior-art retrieval in the setting of blind relevance feedback without any textual knowledge.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In practice, parallel-machine job-shop scheduling (PMJSS) is very useful in the development of standard modelling approaches and generic solution techniques for many real-world scheduling problems. In this paper, based on the analysis of structural properties in an extended disjunctive graph model, a hybrid shifting bottleneck procedure (HSBP) algorithm combined with Tabu Search metaheuristic algorithm is developed to deal with the PMJSS problem. The original-version SBP algorithm for the job-shop scheduling (JSS) has been significantly improved to solve the PMJSS problem with four novelties: i) a topological-sequence algorithm is proposed to decompose the PMJSS problem into a set of single-machine scheduling (SMS) and/or parallel-machine scheduling (PMS) subproblems; ii) a modified Carlier algorithm based on the proposed lemmas and the proofs is developed to solve the SMS subproblem; iii) the Jackson rule is extended to solve the PMS subproblem; iv) a Tabu Search metaheuristic algorithm is embedded under the framework of SBP to optimise the JSS and PMJSS cases. The computational experiments show that the proposed HSBP is very efficient in solving the JSS and PMJSS problems.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the structure of the title compound C16H26N+ Cl-, the salt of a precursor in the synthesis of an isoindolin-2-yloxyl free-radical trapping agent, the cations and anions form discrete centrosymetric cyclic dimers through N---H...Cl hydrogen-bonding associations [graph set R2/4(8)].

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the title salt, racemic C6H12N2O+ C8H11O4- from the reaction of cis-cyclohexane-1,2-dicarboxylic anhydride with isonipecotamide, the cations are linked into duplex chain substructures through both centrosymmetric cyclic head-to-head 'amide motif' hydrogen-bonding associations [graph set R2/2(8)] and 'side-by-side' R2/2(14) associations. The anions are incorporated into the chains through cyclic R3/4(10) interactions involving amide and piperidinium N-H...O(carboxyl) hydrogen bonds which, together with inter-anion carboxylic acid O-H...O(carboxyl) hydrogen bonds, give a two-dimensional layered structure extending along (011).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recommender systems are one of the recent inventions to deal with ever growing information overload in relation to the selection of goods and services in a global economy. Collaborative Filtering (CF) is one of the most popular techniques in recommender systems. The CF recommends items to a target user based on the preferences of a set of similar users known as the neighbours, generated from a database made up of the preferences of past users. With sufficient background information of item ratings, its performance is promising enough but research shows that it performs very poorly in a cold start situation where there is not enough previous rating data. As an alternative to ratings, trust between the users could be used to choose the neighbour for recommendation making. Better recommendations can be achieved using an inferred trust network which mimics the real world "friend of a friend" recommendations. To extend the boundaries of the neighbour, an effective trust inference technique is required. This thesis proposes a trust interference technique called Directed Series Parallel Graph (DSPG) which performs better than other popular trust inference algorithms such as TidalTrust and MoleTrust. Another problem is that reliable explicit trust data is not always available. In real life, people trust "word of mouth" recommendations made by people with similar interests. This is often assumed in the recommender system. By conducting a survey, we can confirm that interest similarity has a positive relationship with trust and this can be used to generate a trust network for recommendation. In this research, we also propose a new method called SimTrust for developing trust networks based on user's interest similarity in the absence of explicit trust data. To identify the interest similarity, we use user's personalised tagging information. However, we are interested in what resources the user chooses to tag, rather than the text of the tag applied. The commonalities of the resources being tagged by the users can be used to form the neighbours used in the automated recommender system. Our experimental results show that our proposed tag-similarity based method outperforms the traditional collaborative filtering approach which usually uses rating data.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As organizations reach higher levels of business process management maturity, they often find themselves maintaining very large process model repositories, representing valuable knowledge about their operations. A common practice within these repositories is to create new process models, or extend existing ones, by copying and merging fragments from other models. We contend that if these duplicate fragments, a.k.a. ex- act clones, can be identified and factored out as shared subprocesses, the repository’s maintainability can be greatly improved. With this purpose in mind, we propose an indexing structure to support fast detection of clones in process model repositories. Moreover, we show how this index can be used to efficiently query a process model repository for fragments. This index, called RPSDAG, is based on a novel combination of a method for process model decomposition (namely the Refined Process Structure Tree), with established graph canonization and string matching techniques. We evaluated the RPSDAG with large process model repositories from industrial practice. The experiments show that a significant number of non-trivial clones can be efficiently found in such repositories, and that fragment queries can be handled efficiently.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the asymmetric unit of the title co-crystal, C12H14N4O2S . C7H5NO4 there are two independent but conformationally similar heterodimers, which are formed through intermolecular N-H...O(carboxy) and carboxyl O-H...N hydrogen-bond pairs, giving a cyclic motif [graph set R2/2(8)]. The dihedral angles between the rings in the sulfonamide molecules are 78.77(8) and 82.33(9)deg. while the dihedral angles between the ring and the CO2H group in the acids are 2.19(9) and 7.02(10)deg. A two-dimensional structure parallel to the ab plane is generated from the heterodimer units through hydrogen-bonding associations between NH2 and sulfone groups. Between neighbouring two-dimensional arrays there are two types of aromatic pi-pi stacking interactions involving either one of the pyrimidine rings and a 4-nitrobenzoic acid molecule [minimum ring centroid separation = 3.5886(9)A] or two acid molecules [minimum ring centroid separation = 3.7236(10)A].