53 resultados para duplication tree
Resumo:
The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.
Resumo:
Monte-Carlo Tree Search (MCTS) is a heuristic to search in large trees. We apply it to argumentative puzzles where MCTS pursues the best argumentation with respect to a set of arguments to be argued. To make our ideas as widely applicable as possible, we integrate MCTS to an abstract setting for argumentation where the content of arguments is left unspecified. Experimental results show the pertinence of this integration for learning argumentations by comparing it with a basic reinforcement learning.
Resumo:
Although species of Syzygium are abundant components of the rainforests in Queensland and New South Wales, little is known about the anatomy of the Australian taxa. Here we describe the foliar anatomy and micromorphology of Syzygium floribundum (syn: Waterhousea floribunda) using standard protocols for scanning electron microscopy (SEM) and light microscopy. Syzygium floribundum possesses dorsiventral leaves with cyclo-staurocytic stomata, single epidermis, internal phloem, rhombus-shaped calcium oxalate crystals and complex-open midrib. In general, leaf anatomical and micromorphological characters are common with some species of the tribe Syzygieae. However, this particular combination of leaf characters has not been reported in a species of the genus. The anatomy of the species is typical of mesophytic taxa.
Resumo:
Being able to accurately predict the risk of falling is crucial in patients with Parkinson’s dis- ease (PD). This is due to the unfavorable effect of falls, which can lower the quality of life as well as directly impact on survival. Three methods considered for predicting falls are decision trees (DT), Bayesian networks (BN), and support vector machines (SVM). Data on a 1-year prospective study conducted at IHBI, Australia, for 51 people with PD are used. Data processing are conducted using rpart and e1071 packages in R for DT and SVM, con- secutively; and Bayes Server 5.5 for the BN. The results show that BN and SVM produce consistently higher accuracy over the 12 months evaluation time points (average sensitivity and specificity > 92%) than DT (average sensitivity 88%, average specificity 72%). DT is prone to imbalanced data so needs to adjust for the misclassification cost. However, DT provides a straightforward, interpretable result and thus is appealing for helping to identify important items related to falls and to generate fallers’ profiles.
Resumo:
This research is a step forward in discovering knowledge from databases of complex structure like tree or graph. Several data mining algorithms are developed based on a novel representation called Balanced Optimal Search for extracting implicit, unknown and potentially useful information like patterns, similarities and various relationships from tree data, which are also proved to be advantageous in analysing big data. This thesis focuses on analysing unordered tree data, which is robust to data inconsistency, irregularity and swift information changes, hence, in the era of big data it becomes a popular and widely used data model.
Resumo:
This paper presents an effective classification method based on Support Vector Machines (SVM) in the context of activity recognition. Local features that capture both spatial and temporal information in activity videos have made significant progress recently. Efficient and effective features, feature representation and classification plays a crucial role in activity recognition. For classification, SVMs are popularly used because of their simplicity and efficiency; however the common multi-class SVM approaches applied suffer from limitations including having easily confused classes and been computationally inefficient. We propose using a binary tree SVM to address the shortcomings of multi-class SVMs in activity recognition. We proposed constructing a binary tree using Gaussian Mixture Models (GMM), where activities are repeatedly allocated to subnodes until every new created node contains only one activity. Then, for each internal node a separate SVM is learned to classify activities, which significantly reduces the training time and increases the speed of testing compared to popular the `one-against-the-rest' multi-class SVM classifier. Experiments carried out on the challenging and complex Hollywood dataset demonstrates comparable performance over the baseline bag-of-features method.
Resumo:
Jarvis et al. (Research Articles, 12 December 2014, p. 1320) presented molecular clock analyses that suggested that most modern bird orders diverged just after the mass extinction event at the Cretaceous-Paleogene boundary (about 66 million years ago). We demonstrate that this conclusion results from the use of a single inappropriate maximum bound, which effectively precludes the Cretaceous diversification overwhelmingly supported by previous molecular studies.