906 resultados para Text mining, Classificazione, Stemming, Text categorization
Resumo:
Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations.
Resumo:
Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset.
Resumo:
An article proposing that the recently discovered Peri Alupias of Galen contains important evidence about the adoption and use of the codex book form.
Resumo:
Real-world text classification tasks often suffer from poor class structure with many overlapping classes and blurred boundaries. Training data pooled from multiple sources tend to be inconsistent and contain erroneous labelling, leading to poor performance of standard text classifiers. The classification of health service products to specialized procurement classes is used to examine and quantify the extent of these problems. A novel method is presented to analyze the labelled data by selectively merging classes where there is not enough information for the classifier to distinguish them. Initial results show the method can identify the most problematic classes, which can be used either as a focus to improve the training data or to merge classes to increase confidence in the predicted results of the classifier.
Resumo:
This paper provides an extended analysis of livelihood diversification in rural Tanzania, with special emphasis on artisanal and small-scale mining (ASM). Over the past decade, this sector of industry, which is labour-intensive and comprises an array of rudimentary and semi-mechanized operations, has become an indispensable economic activity throughout Sub-Saharan Africa, providing employment to a host of redundant public sector workers, retrenched large-scale mine labourers and poor farmers. In many of the region’s rural areas, it is overtaking subsistence agriculture as the primary industry. Such a pattern appears to be unfolding within the Morogoro and Mbeya regions of southern Tanzania, where findings from recent research suggest that a growing number of smallholder farmers are turning to ASM for employment and financial support. It is imperative that national rural development programmes take this trend into account and provide support to these people.
Resumo:
This is a report on the data-mining of two chess databases, the objective being to compare their sub-7-man content with perfect play as documented in Nalimov endgame tables. Van der Heijden’s ENDGAME STUDY DATABASE IV is a definitive collection of 76,132 studies in which White should have an essentially unique route to the stipulated goal. Chessbase’s BIG DATABASE 2010 holds some 4.5 million games. Insight gained into both database content and data-mining has led to some delightful surprises and created a further agenda.
Resumo:
Howard Barker is a writer who has made several notable excursions into what he calls ‘the charnel house…of European drama.’ David Ian Rabey has observed that a compelling property of these classical works lies in what he calls ‘the incompleteness of [their] prescriptions’, and Barker’s Women Beware Women (1986), Seven Lears (1990) and Gertrude: The Cry (2002), are in turn based around the gaps and interstices found in Thomas Middleton’s Women Beware Women (c1627), Shakespeare’s King Lear (c1604) and Hamlet (c1601) respectively. This extends from representing the missing queen from King Lear, who Barker observes, ‘is barely quoted even in the depths of rage or pity’, to his new ending for Middleton’s Jacobean tragedy and the erotic revivification of Hamlet’s mother. This paper will argue that each modern reappropriation accentuates a hidden but powerful feature in these Elizabethan and Jacobean plays – namely their clash between obsessive desire, sexual transgression and death against the imposed restitution of a prescribed morality. This contradiction acts as the basis for Barker’s own explorations of eroticism, death and tragedy. The paper will also discuss Barker’s project for these ‘antique texts’, one that goes beyond what he derisively calls ‘relevance’, but attempts instead to recover ‘smothered genius’, whereby the transgressive is ‘concealed within structures that lend an artificial elegance.’ Together with Barker’s own rediscovery of tragedy, the paper will assert that these rewritings of Elizabethan and Jacobean drama expose their hidden, yet unsettling and provocative ideologies concerning the relationship between political corruption / justice through the power of sexuality (notably through the allure and danger of the mature woman), and an erotics of death that produces tragedy for the contemporary age.