274 resultados para Select top-k patterns

em Queensland University of Technology - ePrints Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the overwhelming increase in the amount of data on the web and data bases, many text mining techniques have been proposed for mining useful patterns in text documents. Extracting closed sequential patterns using the Pattern Taxonomy Model (PTM) is one of the pruning methods to remove noisy, inconsistent, and redundant patterns. However, PTM model treats each extracted pattern as whole without considering included terms, which could affect the quality of extracted patterns. This paper propose an innovative and effective method that extends the random set to accurately weigh patterns based on their distribution in the documents and their terms distribution in patterns. Then, the proposed approach will find the specific closed sequential patterns (SCSP) based on the new calculated weight. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms other state-of-the-art methods in different popular measures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a new method of indexing and searching large binary signature collections to efficiently find similar signatures, addressing the scalability problem in signature search. Signatures offer efficient computation with acceptable measure of similarity in numerous applications. However, performing a complete search with a given search argument (a signature) requires a Hamming distance calculation against every signature in the collection. This quickly becomes excessive when dealing with large collections, presenting issues of scalability that limit their applicability. Our method efficiently finds similar signatures in very large collections, trading memory use and precision for greatly improved search speed. Experimental results demonstrate that our approach is capable of finding a set of nearest signatures to a given search argument with a high degree of speed and fidelity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The top-k retrieval problem aims to find the optimal set of k documents from a number of relevant documents given the user’s query. The key issue is to balance the relevance and diversity of the top-k search results. In this paper, we address this problem using Facility Location Analysis taken from Operations Research, where the locations of facilities are optimally chosen according to some criteria. We show how this analysis technique is a generalization of state-of-the-art retrieval models for diversification (such as the Modern Portfolio Theory for Information Retrieval), which treat the top-k search results like “obnoxious facilities” that should be dispersed as far as possible from each other. However, Facility Location Analysis suggests that the top-k search results could be treated like “desirable facilities” to be placed as close as possible to their customers. This leads to a new top-k retrieval model where the best representatives of the relevant documents are selected. In a series of experiments conducted on two TREC diversity collections, we show that significant improvements can be made over the current state-of-the-art through this alternative treatment of the top-k retrieval problem.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In a pilot application based on web search engine calledWeb-based Relation Completion (WebRC), we propose to join two columns of entities linked by a predefined relation by mining knowledge from the web through a web search engine. To achieve this, a novel retrieval task Relation Query Expansion (RelQE) is modelled: given an entity (query), the task is to retrieve documents containing entities in predefined relation to the given one. Solving this problem entails expanding the query before submitting it to a web search engine to ensure that mostly documents containing the linked entity are returned in the top K search results. In this paper, we propose a novel Learning-based Relevance Feedback (LRF) approach to solve this retrieval task. Expansion terms are learned from training pairs of entities linked by the predefined relation and applied to new entity-queries to find entities linked by the same relation. After describing the approach, we present experimental results on real-world web data collections, which show that the LRF approach always improves the precision of top-ranked search results to up to 8.6 times the baseline. Using LRF, WebRC also shows performances way above the baseline.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Spoken word production is assumed to involve stages of processing in which activation spreads through layers of units comprising lexical-conceptual knowledge and their corresponding phonological word forms. Using high-field (4T) functional magnetic resonance imagine (fMRI), we assessed whether the relationship between these stages is strictly serial or involves cascaded-interactive processing, and whether central (decision/control) processing mechanisms are involved in lexical selection. Participants performed the competitor priming paradigm in which distractor words, named from a definition and semantically related to a subsequently presented target picture, slow picture-naming latency compared to that with unrelated words. The paradigm intersperses two trials between the definition and the picture to be named, temporally separating activation in the word perception and production networks. Priming semantic competitors of target picture names significantly increased activation in the left posterior temporal cortex, and to a lesser extent the left middle temporal cortex, consistent with the predictions of cascaded-interactive models of lexical access. In addition, extensive activation was detected in the anterior cingulate and pars orbitalis of the inferior frontal gyrus. The findings indicate that lexical selection during competitor priming is biased by top-down mechanisms to reverse associations between primed distractor words and target pictures to select words that meet the current goal of speech.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Birds represent the most diverse extant tetrapod clade, with ca. 10,000 extant species, and the timing of the crown avian radiation remains hotly debated. The fossil record supports a primarily Cenozoic radiation of crown birds, whereas molecular divergence dating analyses generally imply that this radiation was well underway during the Cretaceous. Furthermore, substantial differences have been noted between published divergence estimates. These have been variously attributed to clock model, calibration regime, and gene type. One underappreciated phenomenon is that disparity between fossil ages and molecular dates tends to be proportionally greater for shallower nodes in the avian Tree of Life. Here, we explore potential drivers of disparity in avian divergence dates through a set of analyses applying various calibration strategies and coding methods to a mitochondrial genome dataset and an 18-gene nuclear dataset, both sampled across 72 taxa. Our analyses support the occurrence of two deep divergences (i.e., the Palaeognathae/Neognathae split and the Galloanserae/Neoaves split) well within the Cretaceous, followed by a rapid radiation of Neoaves near the K-Pg boundary. However, 95% highest posterior density intervals for most basal divergences in Neoaves cross the boundary, and we emphasize that, barring unreasonably strict prior distributions, distinguishing between a rapid Early Paleocene radiation and a Late Cretaceous radiation may be beyond the resolving power of currently favored divergence dating methods. In contrast to recent observations for placental mammals, constraining all divergences within Neoaves to occur in the Cenozoic does not result in unreasonably high inferred substitution rates. Comparisons of nuclear DNA (nDNA) versus mitochondrial DNA (mtDNA) datasets and NT- versus RY-coded mitochondrial data reveal patterns of disparity that are consistent with substitution model misspecifications that result in tree compression/tree extension artifacts, which may explain some discordance between previous divergence estimates based on different sequence types. Comparisons of fully calibrated and nominally calibrated trees support a correlation between body mass and apparent dating error. Overall, our results are consistent with (but do not require) a Paleogene radiation for most major clades of crown birds.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

INTRODUCTION In their target article, Yuri Hanin and Muza Hanina outlined a novel multidisciplinary approach to performance optimisation for sport psychologists called the Identification-Control-Correction (ICC) programme. According to the authors, this empirically-verified, psycho-pedagogical strategy is designed to improve the quality of coaching and consistency of performance in highly skilled athletes and involves a number of steps including: (i) identifying and increasing self-awareness of ‘optimal’ and ‘non-optimal’ movement patterns for individual athletes; (ii) learning to deliberately control the process of task execution; and iii), correcting habitual and random errors and managing radical changes of movement patterns. Although no specific examples were provided, the ICC programme has apparently been successful in enhancing the performance of Olympic-level athletes. In this commentary, we address what we consider to be some important issues arising from the target article. We specifically focus attention on the contentious topic of optimization in neurobiological movement systems, the role of constraints in shaping emergent movement patterns and the functional role of movement variability in producing stable performance outcomes. In our view, the target article and, indeed, the proposed ICC programme, would benefit from a dynamical systems theoretical backdrop rather than the cognitive scientific approach that appears to be advocated. Although Hanin and Hanina made reference to, and attempted to integrate, constructs typically associated with dynamical systems theoretical accounts of motor control and learning (e.g., Bernstein’s problem, movement variability, etc.), these ideas required more detailed elaboration, which we provide in this commentary.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a new approach to improving the effectiveness of autonomous systems that deal with dynamic environments. The basis of the approach is to find repeating patterns of behavior in the dynamic elements of the system, and then to use predictions of the repeating elements to better plan goal directed behavior. It is a layered approach involving classifying, modeling, predicting and exploiting. Classifying involves using observations to place the moving elements into previously defined classes. Modeling involves recording features of the behavior on a coarse grained grid. Exploitation is achieved by integrating predictions from the model into the behavior selection module to improve the utility of the robot's actions. This is in contrast to typical approaches that use the model to select between different strategies or plays. Three methods of adaptation to the dynamic features of the environment are explored. The effectiveness of each method is determined using statistical tests over a number of repeated experiments. The work is presented in the context of predicting opponent behavior in the highly dynamic and multi-agent robot soccer domain (RoboCup).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Landscape scale environmental gradients present variable spatial patterns and ecological processes caused by climate, topography and soil characteristics and, as such, offer candidate sites to study environmental change. Data are presented on the spatial pattern of dominant species, biomass, and carbon pools and the temporal pattern of fluxes across a transitional zone shifting from Great Basin Desert scrub, up through pinyon-juniper woodlands and into ponderosa pine forest and the ecotones between each vegetation type. The mean annual temperature (MAT) difference across the gradient is approximately 3 degrees C from bottom to top (MAT 8.5-5.5) and annual precipitation averages from 320 to 530 mm/yr, respectively. The stems of the dominant woody vegetation approach a random spatial pattern across the entire gradient, while the canopy cover shows a clustered pattern. The size of the clusters increases with elevation according to available soil moisture which in turn affects available nutrient resources. The total density of woody species declines with increasing soil moisture along the gl-adient, but total biomass increases. Belowground carbon and nutrient pools change from a heterogenous to a homogenous distribution on either side of the woodlands. Although temperature controls the: seasonal patterns of carbon efflux from the soils, soil moisture appears to be the primary driving variable, but response differs underneath the different dominant species, Similarly, decomposition of dominant litter occurs faster-at the cooler and more moist sites, but differs within sites due to litter quality of the different species. The spatial pattern of these communities provides information on the direction of future changes, The ecological processes that we documented are not statistically different in the ecotones as compared to the: adjoining communities, but are different at sites above the woodland than those below the woodland. We speculate that an increase in MAT will have a major impact on C pools and C sequestering and release processes in these semiarid landscapes. However, the impact will be primarily related to moisture availability rather than direct effects of an increase in temperature. (C) 1998 Elsevier Science B.V.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Experimental observations of cell migration often describe the presence of mesoscale patterns within motile cell populations. These patterns can take the form of cells moving as aggregates or in chain-like formation. Here we present a discrete model capable of producing mesoscale patterns. These patterns are formed by biasing movements to favor a particular configuration of agent–agent attachments using a binding function f(K), where K is the scaled local coordination number. This discrete model is related to a nonlinear diffusion equation, where we relate the nonlinear diffusivity D(C) to the binding function f. The nonlinear diffusion equation supports a range of solutions which can be either smooth or discontinuous. Aggregation patterns can be produced with the discrete model, and we show that there is a transition between the presence and absence of aggregation depending on the sign of D(C). A combination of simulation and analysis shows that both the existence of mesoscale patterns and the validity of the continuum model depend on the form of f. Our results suggest that there may be no formal continuum description of a motile system with strong mesoscale patterns.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Snakehead fishes in the family Channidae are obligate freshwater fishes represented by two extant genera, the African Parachannna and the Asian Channa. These species prefer still or slow flowing water bodies, where they are top predators that exercise high levels of parental care, have the ability to breathe air, can tolerate poor water quality, and interestingly, can aestivate or traverse terrestrial habitat in response to seasonal changes in freshwater habitat availability. These attributes suggest that snakehead fishes may possess high dispersal potential, irrespective of the terrestrial barriers that would otherwise constrain the distribution of most freshwater fishes. A number of biogeographical hypotheses have been developed to account for the modern distributions of snakehead fishes across two continents, including ancient vicariance during Gondwanan break-up, or recent colonisation tracking the formation of suitable climatic conditions. Taxonomic uncertainty also surrounds some members of the Channa genus, as geographical distributions for some taxa across southern and Southeast (SE) Asia are very large, and in one case is highly disjunct. The current study adopted a molecular genetics approach to gain an understanding of the evolution of this group of fishes, and in particular how the phylogeography of two Asian species may have been influenced by contemporary versus historical levels of dispersal and vicariance. First, a molecular phylogeny was constructed based on multiple DNA loci and calibrated with fossil evidence to provide a dated chronology of divergence events among extant species, and also within species with widespread geographical distributions. The data provide strong evidence that trans-continental distribution of the Channidae arose as a result of dispersal out of Asia and into Africa in the mid–Eocene. Among Asian Channa, deep divergence among lineages indicates that the Oligocene-Miocene boundary was a time of significant species radiation, potentially associated with historical changes in climate and drainage geomorphology. Mid-Miocene divergence among lineages suggests that a taxonomic revision is warranted for two taxa. Deep intra-specific divergence (~8Mya) was also detected between C. striata lineages that occur sympatrically in the Mekong River Basin. The study then examined the phylogeography and population structure of two major taxa, Channa striata (the chevron snakehead) and the C. micropeltes (the giant snakehead), across SE Asia. Species specific microsatellite loci were developed and used in addition to a mitochondrial DNA marker (Cyt b) to screen neutral genetic variation within and among wild populations. C. striata individuals were sampled across SE Asia (n=988), with the major focus being the Mekong Basin, which is the largest drainage basin in the region. The distributions of two divergent lineages were identified and admixture analysis showed that where they co-occur they are interbreeding, indicating that after long periods of evolution in isolation, divergence has not resulted in reproductive isolation. One lineage is predominantly confined to upland areas of northern Lao PDR to the north of the Khorat Plateau, while the other, which is more closely related to individuals from southern India, has a widespread distribution across mainland SE Asian and Sumatra. The phylogeographical pattern recovered is associated with past river networks, and high diversity and divergence among all populations sampled reveal that contemporary dispersal is very low for this taxon, even where populations occur in contiguous freshwater habitats. C. micropeltes (n=280) were also sampled from across the Mekong River Basin, focusing on the lower basin where it constitutes an important wild fishery resource. In comparison with C. striata, allelic diversity and genetic divergence among populations were extremely low, suggesting very recent colonisation of the greater Mekong region. Populations were significantly structured into at least three discrete populations in the lower Mekong. Results of this study have implications for establishing effective conservation plans for managing both species, that represent economically important wild fishery resources for the region. For C. micropeltes, it is likely that a single fisheries stock in the Tonle Sap Great Lake is being exploited by multiple fisheries operations, and future management initiatives for this species in this region will need to account for this. For C. striata, conservation of natural levels of genetic variation will require management initiatives designed to promote population persistence at very localised spatial scales, as the high level of population structuring uncovered for this species indicates that significant unique diversity is present at this fine spatial scale.