137 resultados para document clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Finding and labelling semantic features patterns of documents in a large, spatial corpus is a challenging problem. Text documents have characteristics that make semantic labelling difficult; the rapidly increasing volume of online documents makes a bottleneck in finding meaningful textual patterns. Aiming to deal with these issues, we propose an unsupervised documnent labelling approach based on semantic content and feature patterns. A world ontology with extensive topic coverage is exploited to supply controlled, structured subjects for labelling. An algorithm is also introduced to reduce dimensionality based on the study of ontological structure. The proposed approach was promisingly evaluated by compared with typical machine learning methods including SVMs, Rocchio, and kNN.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Standard differential equation–based models of collective cell behaviour, such as the logistic growth model, invoke a mean–field assumption which is equivalent to assuming that individuals within the population interact with each other in proportion to the average population density. Implementing such assumptions implies that the dynamics of the system are unaffected by spatial structure, such as the formation of patches or clusters within the population. Recent theoretical developments have introduced a class of models, known as moment dynamics models, which aim to account for the dynamics of individuals, pairs of individuals, triplets of individuals and so on. Such models enable us to describe the dynamics of populations with clustering, however, little progress has been made with regard to applying moment dynamics models to experimental data. Here, we report new experimental results describing the formation of a monolayer of cells using two different cell types: 3T3 fibroblast cells and MDA MB 231 breast cancer cells. Our analysis indicates that the 3T3 fibroblast cells are relatively motile and we observe that the 3T3 fibroblast monolayer forms without clustering. Alternatively, the MDA MB 231 cells are less motile and we observe that the MDA MB 231 monolayer formation is associated with significant clustering. We calibrate a moment dynamics model and a standard mean–field model to both data sets. Our results indicate that the mean–field and moment dynamics models provide similar descriptions of the 3T3 fibroblast monolayer formation whereas these two models give very different predictions for the MDA MD 231 monolayer formation. These outcomes indicate that standard mean–field models of collective cell behaviour are not always appropriate and that care ought to be exercised when implementing such a model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Enterprise Systems (ES) can be understood as the de facto standard for holistic operational and managerial support within an organization. Most commonly ES are offered as commercial off-the-shelf packages, requiring customization in the user organization. This process is a complex and resource-intensive task, which often prevents small and midsize enterprises (SME) from undertaking configuration projects. Especially in the SME market independent software vendors provide pre-configured ES for a small customer base. The problem of ES configuration is shifted from the customer to the vendor, but remains critical. We argue that the yet unexplored link between process configuration and business document configuration must be closer examined as both types of configuration are closely tied to one another.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The feral pig, Sus scrofa, is a widespread and abundant invasive species in Australia. Feral pigs pose a significant threat to the environment, agricultural industry, and human health, and in far north Queensland they endanger World Heritage values of the Wet Tropics. Historical records document the first introduction of domestic pigs into Australia via European settlers in 1788 and subsequent introductions from Asia from 1827 onwards. Since this time, domestic pigs have been accidentally and deliberately released into the wild and significant feral pig populations have become established, resulting in the declaration of this species as a class 2 pest in Queensland. The overall objective of this study was to assess the population genetic structure of feral pigs in far north Queensland, in particular to enable delineation of demographically independent management units. The identification of ecologically meaningful management units using molecular techniques can assist in targeting feral pig control to bring about effective long-term management. Molecular genetic analysis was undertaken on 434 feral pigs from 35 localities between Tully and Innisfail. Seven polymorphic and unlinked microsatellite loci were screened and fixation indices (FST and analogues) and Bayesian clustering methods were used to identify population structure and management units in the study area. Sequencing of the hyper-variable mitochondrial control region (D-loop) of 35 feral pigs was also examined to identify pig ancestry. Three management units were identified in the study at a scale of 25 to 35 km. Even with the strong pattern of genetic structure identified in the study area, some evidence of long distance dispersal and/or translocation was found as a small number of individuals exhibited ancestry from a management unit outside of which they were sampled. Overall, gene flow in the study area was found to be influenced by environmental features such as topography and land use, but no distinct or obvious natural or anthropogenic geographic barriers were identified. Furthermore, strong evidence was found for non-random mating between pigs of European and Asian breeds indicating that feral pig ancestry influences their population genetic structure. Phylogenetic analysis revealed two distinct mitochondrial DNA clades, representing Asian domestic pig breeds and European breeds. A significant finding was that pigs of Asian origin living in Innisfail and south Tully were not mating randomly with European breed pigs populating the nearby Mission Beach area. Feral pig control should be implemented in each of the management units identified in this study. The control should be coordinated across properties within each management unit to prevent re-colonisation from adjacent localities. The adjacent rainforest and National Park Estates, as well as the rainforest-crop boundary should be included in a simultaneous control operation for greater success.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Mathematical descriptions of birth–death–movement processes are often calibrated to measurements from cell biology experiments to quantify tissue growth rates. Here we describe and analyze a discrete model of a birth–death-movement process applied to a typical two–dimensional cell biology experiment. We present three different descriptions of the system: (i) a standard mean–field description which neglects correlation effects and clustering; (ii) a moment dynamics description which approximately incorporates correlation and clustering effects, and; (iii) averaged data from repeated discrete simulations which directly incorporates correlation and clustering effects. Comparing these three descriptions indicates that the mean–field and moment dynamics approaches are valid only for certain parameter regimes, and that both these descriptions fail to make accurate predictions of the system for sufficiently fast birth and death rates where the effects of spatial correlations and clustering are sufficiently strong. Without any method to distinguish between the parameter regimes where these three descriptions are valid, it is possible that either the mean–field or moment dynamics model could be calibrated to experimental data under inappropriate conditions, leading to errors in parameter estimation. In this work we demonstrate that a simple measurement of agent clustering and correlation, based on coordination number data, provides an indirect measure of agent correlation and clustering effects, and can therefore be used to make a distinction between the validity of the different descriptions of the birth–death–movement process.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An Application Specific Instruction-set Processor (ASIP) is a specialized processor tailored to run a particular application/s efficiently. However, when there are multiple candidate applications in the application’s domain it is difficult and time consuming to find optimum set of applications to be implemented. Existing ASIP design approaches perform this selection manually based on a designer’s knowledge. We help in cutting down the number of candidate applications by devising a classification method to cluster similar applications based on the special-purpose operations they share. This provides a significant reduction in the comparison overhead while resulting in customized ASIP instruction sets which can benefit a whole family of related applications. Our method gives users the ability to quantify the degree of similarity between the sets of shared operations to control the size of clusters. A case study involving twelve algorithms confirms that our approach can successfully cluster similar algorithms together based on the similarity of their component operations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering identities in a broadcast video is a useful task to aid in video annotation and retrieval. Quality based frame selection is a crucial task in video face clustering, to both improve the clustering performance and reduce the computational cost. We present a frame work that selects the highest quality frames available in a video to cluster the face. This frame selection technique is based on low level and high level features (face symmetry, sharpness, contrast and brightness) to select the highest quality facial images available in a face sequence for clustering. We also consider the temporal distribution of the faces to ensure that selected faces are taken at times distributed throughout the sequence. Normalized feature scores are fused and frames with high quality scores are used in a Local Gabor Binary Pattern Histogram Sequence based face clustering system. We present a news video database to evaluate the clustering system performance. Experiments on the newly created news database show that the proposed method selects the best quality face images in the video sequence, resulting in improved clustering performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cognitive impairment and physical disability are common in Parkinson’s disease (PD). As a result diet can be difficult to measure. This study aimed to evaluate the use of a photographic dietary record (PhDR) in people with PD. During a 12-week nutrition intervention study, 19 individuals with PD kept 3-day PhDRs on three occasions using point-and-shoot digital cameras. Details on food items present in the PhDRs and those not photographed were collected retrospectively during an interview. Following the first use of the PhDR method, the photographer completed a questionnaire (n=18). In addition, the quality of the PhDRs was evaluated at each time point. The person with PD was the sole photographer in 56% of the cases, with the remainder by the carer or combination of person with PD and the carer. The camera was rated as easy to use by 89%, keeping a PhDR was considered acceptable by 94% and none would rather use a “pen and paper” method. Eighty-three percent felt confident to use the camera again to record intake. Of the photos captured (n=730), 89% were of adequate quality (items visible, in-focus), while only 21% could be used alone (without interview information) to assess intake. Over the study, 22% of eating/drinking occasions were not photographed. PhDRs were considered an easy and acceptable method to measure intake among individuals with PD and their carers. The majority of PhDRs were of adequate quality, however in order to quantify intake the interview was necessary to obtain sufficient detail and capture missing items.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Topic modelling has been widely used in the fields of information retrieval, text mining, machine learning, etc. In this paper, we propose a novel model, Pattern Enhanced Topic Model (PETM), which makes improvements to topic modelling by semantically representing topics with discriminative patterns, and also makes innovative contributions to information filtering by utilising the proposed PETM to determine document relevance based on topics distribution and maximum matched patterns proposed in this paper. Extensive experiments are conducted to evaluate the effectiveness of PETM by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model significantly outperforms both state-of-the-art term-based models and pattern-based models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many mature term-based or pattern-based approaches have been used in the field of information filtering to generate users’ information needs from a collection of documents. A fundamental assumption for these approaches is that the documents in the collection are all about one topic. However, in reality users’ interests can be diverse and the documents in the collection often involve multiple topics. Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, and this has been widely utilized in the fields of machine learning and information retrieval, etc. But its effectiveness in information filtering has not been so well explored. Patterns are always thought to be more discriminative than single terms for describing documents. However, the enormous amount of discovered patterns hinder them from being effectively and efficiently used in real applications, therefore, selection of the most discriminative and representative patterns from the huge amount of discovered patterns becomes crucial. To deal with the above mentioned limitations and problems, in this paper, a novel information filtering model, Maximum matched Pattern-based Topic Model (MPBTM), is proposed. The main distinctive features of the proposed model include: (1) user information needs are generated in terms of multiple topics; (2) each topic is represented by patterns; (3) patterns are generated from topic models and are organized in terms of their statistical and taxonomic features, and; (4) the most discriminative and representative patterns, called Maximum Matched Patterns, are proposed to estimate the document relevance to the user’s information needs in order to filter out irrelevant documents. Extensive experiments are conducted to evaluate the effectiveness of the proposed model by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model significantly outperforms both state-of-the-art term-based models and pattern-based models

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The K-means algorithm is one of the most popular techniques in clustering. Nevertheless, the performance of the K-means algorithm depends highly on initial cluster centers and converges to local minima. This paper proposes a hybrid evolutionary programming based clustering algorithm, called PSO-SA, by combining particle swarm optimization (PSO) and simulated annealing (SA). The basic idea is to search around the global solution by SA and to increase the information exchange among particles using a mutation operator to escape local optima. Three datasets, Iris, Wisconsin Breast Cancer, and Ripley’s Glass, have been considered to show the effectiveness of the proposed clustering algorithm in providing optimal clusters. The simulation results show that the PSO-SA clustering algorithm not only has a better response but also converges more quickly than the K-means, PSO, and SA algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The assumptions underlying the Probability Ranking Principle (PRP) have led to a number of alternative approaches that cater or compensate for the PRP's limitations. In this poster we focus on the Interactive PRP (iPRP), which rejects the assumption of independence between documents made by the PRP. Although the theoretical framework of the iPRP is appealing, no instantiation has been proposed and investigated. In this poster, we propose a possible instantiation of the principle, performing the first empirical comparison of the iPRP against the PRP. For document diversification, our results show that the iPRP is significantly better than the PRP, and comparable to or better than other methods such as Modern Portfolio Theory.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ranking documents according to the Probability Ranking Principle has been theoretically shown to guarantee optimal retrieval effectiveness in tasks such as ad hoc document retrieval. This ranking strategy assumes independence among document relevance assessments. This assumption, however, often does not hold, for example in the scenarios where redundancy in retrieved documents is of major concern, as it is the case in the sub–topic retrieval task. In this chapter, we propose a new ranking strategy for sub–topic retrieval that builds upon the interdependent document relevance and topic–oriented models. With respect to the topic– oriented model, we investigate both static and dynamic clustering techniques, aiming to group topically similar documents. Evidence from clusters is then combined with information about document dependencies to form a new document ranking. We compare and contrast the proposed method against state–of–the–art approaches, such as Maximal Marginal Relevance, Portfolio Theory for Information Retrieval, and standard cluster–based diversification strategies. The empirical investigation is performed on the ImageCLEF 2009 Photo Retrieval collection, where images are assessed with respect to sub–topics of a more general query topic. The experimental results show that our approaches outperform the state–of–the–art strategies with respect to a number of diversity measures.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the last years several works have investigated a formal model for Information Retrieval (IR) based on the mathematical formalism underlying quantum theory. These works have mainly exploited geometric and logical–algebraic features of the quantum formalism, for example entanglement, superposition of states, collapse into basis states, lattice relationships. In this poster I present an analogy between a typical IR scenario and the double slit experiment. This experiment exhibits the presence of interference phenomena between events in a quantum system, causing the Kolmogorovian law of total probability to fail. The analogy allows to put forward the routes for the application of quantum probability theory in IR. However, several questions need still to be addressed; they will be the subject of my PhD research

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this paper is to investigate the role of emotion features in diversifying document rankings to improve the effectiveness of Information Retrieval (IR) systems. For this purpose, two approaches are proposed to consider emotion features for diversification, and they are empirically tested on the TREC 678 Interactive Track collection. The results show that emotion features are capable of enhancing retrieval effectiveness.