917 resultados para Labelled rooted unordered trees


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Extracting frequent subtrees from the tree structured data has important applications in Web mining. In this paper, we introduce a novel canonical form for rooted labelled unordered trees called the balanced-optimal-search canonical form (BOCF) that can handle the isomorphism problem efficiently. Using BOCF, we define a tree structure guided scheme based enumeration approach that systematically enumerates only the valid subtrees. Finally, we present the balanced optimal search tree miner (BOSTER) algorithm based on BOCF and the proposed enumeration approach, for finding frequent induced subtrees from a database of labelled rooted unordered trees. Experiments on the real datasets compare the efficiency of BOSTER over the two state-of-the-art algorithms for mining induced unordered subtrees, HybridTreeMiner and UNI3. The results are encouraging.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an algorithm for mining unordered embedded subtrees using the balanced-optimal-search canonical form (BOCF). A tree structure guided scheme based enumeration approach is defined using BOCF for systematically enumerating the valid subtrees only. Based on this canonical form and enumeration technique, the balanced optimal search embedded subtree mining algorithm (BEST) is introduced for mining embedded subtrees from a database of labelled rooted unordered trees. The extensive experiments on both synthetic and real datasets demonstrate the efficiency of BEST over the two state-of-the-art algorithms for mining embedded unordered subtrees, SLEUTH and U3.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Trees are capable of portraying the semi-structured data which is common in web domain. Finding similarities between trees is mandatory for several applications that deal with semi-structured data. Existing similarity methods examine a pair of trees by comparing through nodes and paths of two trees, and find the similarity between them. However, these methods provide unfavorable results for unordered tree data and result in yielding NP-hard or MAX-SNP hard complexity. In this paper, we present a novel method that encodes a tree with an optimal traversing approach first, and then, utilizes it to model the tree with its equivalent matrix representation for finding similarity between unordered trees efficiently. Empirical analysis shows that the proposed method is able to achieve high accuracy even on the large data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Universal trees based on sequences of single gene homologs cannot be rooted. Iwabe et al. [Iwabe, N., Kuma, K.-I., Hasegawa, M., Osawa, S. & Miyata, T. (1989) Proc. Natl. Acad. Sci. USA 86, 9355-9359] circumvented this problem by using ancient gene duplications that predated the last common ancestor of all living things. Their separate, reciprocally rooted gene trees for elongation factors and ATPase subunits showed Bacteria (eubacteria) as branching first from the universal tree with Archaea (archaebacteria) and Eucarya (eukaryotes) as sister groups. Given its topical importance to evolutionary biology and concerns about the appropriateness of the ATPase data set, an evaluation of the universal tree root using other ancient gene duplications is essential. In this study, we derive a rooting for the universal tree using aminoacyl-tRNA synthetase genes, an extensive multigene family whose divergence likely preceded that of prokaryotes and eukaryotes. An approximately 1600-bp conserved region was sequenced from the isoleucyl-tRNA synthetases of several species representing deep evolutionary branches of eukaryotes (Nosema locustae), Bacteria (Aquifex pyrophilus and Thermotoga maritima) and Archaea (Pyrococcus furiosus and Sulfolobus acidocaldarius). In addition, a new valyl-tRNA synthetase was characterized from the protist Trichomonas vaginalis. Different phylogenetic methods were used to generate trees of isoleucyl-tRNA synthetases rooted by valyl- and leucyl-tRNA synthetases. All isoleucyl-tRNA synthetase trees showed Archaea and Eucarya as sister groups, providing strong confirmation for the universal tree rooting reported by Iwabe et al. As well, there was strong support for the monophyly (sensu Hennig) of Archaea. The valyl-tRNA synthetase gene from Tr. vaginalis clustered with other eukaryotic ValRS genes, which may have been transferred from the mitochondrial genome to the nuclear genome, suggesting that this amitochondrial trichomonad once harbored an endosymbiotic bacterium.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The maximum M of a critical Bienaymé-Galton-Watson process conditioned on the total progeny N is studied. Imbedding of the process in a random walk is used. A limit theorem for the distribution of M as N → ∞ is proved. The result is trasferred to the non-critical processes. A corollary for the maximal strata of a random rooted labeled tree is obtained.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Let G = (V, E) be a finite, simple and undirected graph. For S subset of V, let delta(S, G) = {(u, v) is an element of E : u is an element of S and v is an element of V - S} be the edge boundary of S. Given an integer i, 1 <= i <= vertical bar V vertical bar, let the edge isoperimetric value of G at i be defined as b(e)(i, G) = min(S subset of V:vertical bar S vertical bar=i)vertical bar delta(S, G)vertical bar. The edge isoperimetric peak of G is defined as b(e)(G) = max(1 <= j <=vertical bar V vertical bar)b(e)(j, G). Let b(v)(G) denote the vertex isoperimetric peak defined in a corresponding way. The problem of determining a lower bound for the vertex isoperimetric peak in complete t-ary trees was recently considered in [Y. Otachi, K. Yamazaki, A lower bound for the vertex boundary-width of complete k-ary trees, Discrete Mathematics, in press (doi: 10.1016/j.disc.2007.05.014)]. In this paper we provide bounds which improve those in the above cited paper. Our results can be generalized to arbitrary (rooted) trees. The depth d of a tree is the number of nodes on the longest path starting from the root and ending at a leaf. In this paper we show that for a complete binary tree of depth d (denoted as T-d(2)), c(1)d <= b(e) (T-d(2)) <= d and c(2)d <= b(v)(T-d(2)) <= d where c(1), c(2) are constants. For a complete t-ary tree of depth d (denoted as T-d(t)) and d >= c log t where c is a constant, we show that c(1)root td <= b(e)(T-d(t)) <= td and c(2)d/root t <= b(v) (T-d(t)) <= d where c(1), c(2) are constants. At the heart of our proof we have the following theorem which works for an arbitrary rooted tree and not just for a complete t-ary tree. Let T = (V, E, r) be a finite, connected and rooted tree - the root being the vertex r. Define a weight function w : V -> N where the weight w(u) of a vertex u is the number of its successors (including itself) and let the weight index eta(T) be defined as the number of distinct weights in the tree, i.e eta(T) vertical bar{w(u) : u is an element of V}vertical bar. For a positive integer k, let l(k) = vertical bar{i is an element of N : 1 <= i <= vertical bar V vertical bar, b(e)(i, G) <= k}vertical bar. We show that l(k) <= 2(2 eta+k k)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We re-evaluated the larval support for families within majoids using the Wilcoxon signed-rank test with emphasis on Inachoididae. To accomplish our objectives, we added 10 new taxa, two of which are traditionally assigned to the family of special interest, to a previous larval database for majoids, and re-appraised the larval characters used in earlier studies. Phylogenetic analysis was performed with PAUP* using the heuristic search with 50 replicates or the branch-and-bound algorithm when possible. Multi-state transformation series were considered unordered; initially characters were equally weighted followed by successive weighting, and trees were rooted at the Oregoniidae node. Ten different topological constraints were enforced for families to evaluate tree length under the assumption of monophyly for each taxonomic entity. Our results showed that the tree length of most constrained topologies was not considerably greater than that of unconstrained analysis in which most families nested as paraphyletic taxa. This may indicate that the present larval database does not provide strong support for paraphyly of the taxa in question. For Inachoididae, although the Wilcoxon signed-rank test rejected a significant difference between unconstrained and constrained cladograms, we were unable to provide a single synapomorphy for this clade. Except for the conflicting position of Leurocyclus and Stenorhynchus, the two clades correspond to the traditional taxonomic arrangement. Among inachoidids, the clade (Anasimus (Paradasygyius (Collodes + Pyromaia))) is supported, whereas for inachids, the clade (Inachus (Macropodia + Achaeus)) is one of the most supported clades within majids. As often stated, only additional characters will provide a better test for the monophyly of Inachoididae and other families within Majoidea.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The water relations of two tree species in the Euphorbiaceae were compared to test in part a hypothesis that the forest understorey plays an integral role in drought response. At Danum, Sabah, the relatively common species Dimorphocalyx muricatus is associated with ridges whilst another species, Mallotus wrayi, occurs widely both on ridges and lower slopes. Sets of subplots within two 4 -ha permanent plots in this lowland dipterocarp rain forest, were positioned on ridges and lower slopes. Soil water potentials were recorded in 1995-1997, and leaf water potentials were measured on six occasions. Soil water potentials on the ridges (-0.047 MPa) were significantly lower than on the lower slopes (-0.012 MPa), but during the driest period in May 1997 they fell to similarly low levels on both sites (-0.53 MPa). A weighted 40-day accumulated rainfall index was developed to model the soil water potentials. At dry times, D. muricatus (ridge) had significantly higher pre-dawn (-0.21 v. -0.57 MPa) and mid-day (-0.59 v. -1.77 MPa) leaf water potentials than M. wrayi (mean of ridge and lower slope). Leaf osmotic potentials of M. wrayi on the ridges were lower (-1.63 MPa) than on lower slopes (-1.09 MPa), with those for D. muricatus being intermediate (-1.29 MPa): both species adjusted osmotically between wet and dry times. D. muricatus trees were more deeply rooted than M. wrayi trees (97 v. 70 cm). M. wrayi trees had greater lateral root cross-sectional areas than D. muricatus trees although a greater proportion of this sectional area for D. muricatus was further down the soil profile. D. muricatus appeared to maintain relatively high water potentials during dry periods because of its access to deeper water supplies and thus it largely avoided drought effects, but M. wrayi seemed to be more affected yet tolerant of drought and was more plastic in its response. The interaction between water availability and topography determines these species' distributions and provides insights into how rain forests can withstand occasional strong droughts.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a new method of using foreground silhouette images for human pose estimation. Labels are introduced to the silhouette images, providing an extra layer of information that can be used in the model fitting process. The pixels in the silhouettes are labelled according to the corresponding body part in the model of the current fit, with the labels propagated into the silhouette of the next frame to be used in the fitting for the next frame. Both single and multi-view implementations are detailed, with results showing performance improvements over only using standard unlabelled silhouettes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The work was both conceived and constructed in-situ within Gnombup Swamp a seasonal water body at Bremer Bay, Western Australia. The work interacts with site-specific conditions including wind patterns and a datum of seasonal water levels marks. The work is the result of collaboration between soil scientist Paula Deegan and Ian Weir. The installation was documented with a series of 30 still digital photographs, later animated in Microsoft Powerpoint.