905 resultados para Web Mining, Data Mining, User Topic Model, Web User Profiles


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The best accepted method for design of autogenous and semi-autogenous (AG/SAG) mills is to carry out pilot scale test work using a 1.8 m diameter by 0.6 m long pilot scale test mill. The load in such a mill typically contains 250,000-450,000 particles larger than 6 mm, allowing correct representation of more than 90% of the charge in Discrete Element Method (DEM) simulations. Most AG/SAG mills use discharge grate slots which are 15 mm or more in width. The mass in each size fraction usually decreases rapidly below grate size. This scale of DEM model is now within the possible range of standard workstations running an efficient DEM code. This paper describes various ways of extracting collision data front the DEM model and translating it into breakage estimates. Account is taken of the different breakage mechanisms (impact and abrasion) and of the specific impact histories of the particles in order to assess the breakage rates for various size fractions in the mills. At some future time, the integration of smoothed particle hydrodynamics with DEM will allow for the inclusion of slurry within the pilot mill simulation. (C) 2004 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article presents two novel approaches for incorporating sentiment prior knowledge into the topic model for weakly supervised sentiment analysis where sentiment labels are considered as topics. One is by modifying the Dirichlet prior for topic-word distribution (LDA-DP), the other is by augmenting the model objective function through adding terms that express preferences on expectations of sentiment labels of the lexicon words using generalized expectation criteria (LDA-GE). We conducted extensive experiments on English movie review data and multi-domain sentiment dataset as well as Chinese product reviews about mobile phones, digital cameras, MP3 players, and monitors. The results show that while both LDA-DP and LDAGE perform comparably to existing weakly supervised sentiment classification algorithms, they are much simpler and computationally efficient, rendering themmore suitable for online and real-time sentiment classification on the Web. We observed that LDA-GE is more effective than LDA-DP, suggesting that it should be preferred when considering employing the topic model for sentiment analysis. Moreover, both models are able to extract highly domain-salient polarity words from text.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A large number of studies have been devoted to modeling the contents and interactions between users on Twitter. In this paper, we propose a method inspired from Social Role Theory (SRT), which assumes that a user behaves differently in different roles in the generation process of Twitter content. We consider the two most distinctive social roles on Twitter: originator and propagator, who respectively posts original messages and retweets or forwards the messages from others. In addition, we also consider role-specific social interactions, especially implicit interactions between users who share some common interests. All the above elements are integrated into a novel regularized topic model. We evaluate the proposed method on real Twitter data. The results show that our method is more effective than the existing ones which do not distinguish social roles. Copyright 2013 ACM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we explore the idea of social role theory (SRT) and propose a novel regularized topic model which incorporates SRT into the generative process of social media content. We assume that a user can play multiple social roles, and each social role serves to fulfil different duties and is associated with a role-driven distribution over latent topics. In particular, we focus on social roles corresponding to the most common social activities on social networks. Our model is instantiated on microblogs, i.e., Twitter and community question-answering (cQA), i.e., Yahoo! Answers, where social roles on Twitter include "originators" and "propagators", and roles on cQA are "askers" and "answerers". Both explicit and implicit interactions between users are taken into account and modeled as regularization factors. To evaluate the performance of our proposed method, we have conducted extensive experiments on two Twitter datasets and two cQA datasets. Furthermore, we also consider multi-role modeling for scientific papers where an author's research expertise area is considered as a social role. A novel application of detecting users' research interests through topical keyword labeling based on the results of our multi-role model has been presented. The evaluation results have shown the feasibility and effectiveness of our model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The fruit is one of the most complex and important structures produced by flowering plants, and understanding the development and maturation process of fruits in different angiosperm species with diverse fruit structures is of immense interest. In the work presented here, molecular genetics and genomic analysis are used to explore the processes that form the fruit in two species: The model organism Arabidopsis and the diploid strawberry Fragaria vesca. One important basic question concerns the molecular genetic basis of fruit patterning. A long-standing model of Arabidopsis fruit (the gynoecium) patterning holds that auxin produced at the apex diffuses downward, forming a gradient that provides apical-basal positional information to specify different tissue types along the gynoecium’s length. The proposed gradient, however, has never been observed and the model appears inconsistent with a number of observations. I present a new, alternative model, wherein auxin acts to establish the adaxial-abaxial domains of the carpel primordia, which then ensures proper development of the final gynoecium. A second project utilizes genomics to identify genes that regulate fruit color by analyzing the genome sequences of Fragaria vesca, a species of wild strawberry. Shared and distinct SNPs among three F. vesca accessions were identified, providing a foundation for locating candidate mutations underlying phenotypic variations among different F. vesca accessions. Through systematic analysis of relevant SNP variants, a candidate SNP in FveMYB10 was identified that may underlie the fruit color in the yellow-fruited accessions, which was subsequently confirmed by functional assays. Our lab has previously generated extensive RNA-sequencing data that depict genome-scale gene expression profiles in F. vesca fruit and flower tissues at different developmental stages. To enhance the accessibility of this dataset, the web-based eFP software was adapted for this dataset, allowing visualization of gene expression in any tissues by user-initiated queries. Together, this thesis work proposes a well-supported new model of fruit patterning in Arabidopsis and provides further resources for F. vesca, including genome-wide variant lists and the ability to visualize gene expression. This work will facilitate future work linking traits of economic importance to specific genes and gaining novel insights into fruit patterning and development.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: High-density tiling arrays and new sequencing technologies are generating rapidly increasing volumes of transcriptome and protein-DNA interaction data. Visualization and exploration of this data is critical to understanding the regulatory logic encoded in the genome by which the cell dynamically affects its physiology and interacts with its environment. Results: The Gaggle Genome Browser is a cross-platform desktop program for interactively visualizing high-throughput data in the context of the genome. Important features include dynamic panning and zooming, keyword search and open interoperability through the Gaggle framework. Users may bookmark locations on the genome with descriptive annotations and share these bookmarks with other users. The program handles large sets of user-generated data using an in-process database and leverages the facilities of SQL and the R environment for importing and manipulating data. A key aspect of the Gaggle Genome Browser is interoperability. By connecting to the Gaggle framework, the genome browser joins a suite of interconnected bioinformatics tools for analysis and visualization with connectivity to major public repositories of sequences, interactions and pathways. To this flexible environment for exploring and combining data, the Gaggle Genome Browser adds the ability to visualize diverse types of data in relation to its coordinates on the genome. Conclusions: Genomic coordinates function as a common key by which disparate biological data types can be related to one another. In the Gaggle Genome Browser, heterogeneous data are joined by their location on the genome to create information-rich visualizations yielding insight into genome organization, transcription and its regulation and, ultimately, a better understanding of the mechanisms that enable the cell to dynamically respond to its environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Para obtenção do grau de Doutor pela Universidade de Vigo com menção internacional Departamento de Informática

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To better understand the dynamic behavior of metabolic networks in a wide variety of conditions, the field of Systems Biology has increased its interest in the use of kinetic models. The different databases, available these days, do not contain enough data regarding this topic. Given that a significant part of the relevant information for the development of such models is still wide spread in the literature, it becomes essential to develop specific and powerful text mining tools to collect these data. In this context, this work has as main objective the development of a text mining tool to extract, from scientific literature, kinetic parameters, their respective values and their relations with enzymes and metabolites. The approach proposed integrates the development of a novel plug-in over the text mining framework @Note2. In the end, the pipeline developed was validated with a case study on Kluyveromyces lactis, spanning the analysis and results of 20 full text documents.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gairebé un 50% dels metges de la sanitat catalana realitzen actualment les seves tasques assistencials en entorns SAP. Donada la naturalesa de la seva feina, és fàcil afirmar que les aplicacions que utilitzen habitualment haurien d'haver estat desenvolupades centrant l'atenció en l'usuari i el context en què aquest realitza la seva feina. Però degut a diversos motius organitzatius, això no ha estat així. Aquest projecte pretén demostrar el valor afegit que pot aportar el Disseny Centrat en l'Usuari en el desenvolupament d'aplicacions assistencials en l'àmbit SAP Sanitat. S'ha realitzat una investigació prèvia de les necessitats reals dels usuaris, un anàlisi del context d'ús i dels diferents perfils de usuari, s'ha desenvolupat un prototip amb la nova tecnologia que aporta SAP per al desenvolupament de interfícies tipus web integrades en el sistema (SAP Webdynpro for ABAP), i finalment s'ha realitzat la corresponent avaluació heurística i el test de usuaris. S'ha arribat principalment a tres conclusions: en primer lloc, ha sorprès la bona disposició dels usuaris metges per a participar en aquesta mena de projectes; en segon lloc, ha quedat demostrada la importància de l'anàlisi del context, així com la rellevància que el dissenyador estigui quan més a prop millor de l'usuari final; i finalment, cal destacar que la tecnologia emprada ha limitat qualitativament diverses opcions de disseny.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this technical report, we approach one of the practical aspects when it comes to represent users' interests from their tagging activity, namely the categorization of tags into high-level categories of interest. The reason is that the representation of user profiles on the basis of the myriad of tags available on the Web is certainly unfeasible from various practical perspectives; mainly concerningthe unavailability of data to reliably, accurately measure interests across such fine-grained categorization, and, should the data be available, its overwhelming computational intractability. Motivated by this, our study presents the results of a categorization process whereby a collection of tags posted at BibSonomy #http://www.bibsonomy.org# are classified into 5 categories of interest. The methodology used to conduct such categorization is in line with other works in the field.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the scenario of social bookmarking, a user browsing the Web bookmarks web pages and assigns free-text labels (i.e., tags) to them according to their personal preferences. In this technical report, we approach one of the practical aspects when it comes to represent users' interests from their tagging activity, namely the categorization of tags into high-level categories of interest. The reason is that the representation of user profiles on the basis of the myriad of tags available on the Web is certainly unfeasible from various practical perspectives; mainly concerning the unavailability of data to reliably, accurately measure interests across such fine-grained categorisation, and, should the data be available, its overwhelming computational intractability. Motivated by this, our study presents the results of a categorization process whereby a collection of tags posted at Delicious #http://delicious.com# are classified into 200 subcategories of interest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Intensification of agricultural production without a sound management and regulations can lead to severe environmental problems, as in Western Santa Catarina State, Brazil, where intensive swine production has caused large accumulations of manure and consequently water pollution. Natural resource scientists are asked by decision-makers for advice on management and regulatory decisions. Distributed environmental models are useful tools, since they can be used to explore consequences of various management practices. However, in many areas of the world, quantitative data for model calibration and validation are lacking. The data-intensive distributed environmental model AgNPS was applied in a data-poor environment, the upper catchment (2,520 ha) of the Ariranhazinho River, near the city of Seara, in Santa Catarina State. Steps included data preparation, cell size selection, sensitivity analysis, model calibration and application to different management scenarios. The model was calibrated based on a best guess for model parameters and on a pragmatic sensitivity analysis. The parameters were adjusted to match model outputs (runoff volume, peak runoff rate and sediment concentration) closely with the sparse observed data. A modelling grid cell resolution of 150 m adduced appropriate and computer-fit results. The rainfall runoff response of the AgNPS model was calibrated using three separate rainfall ranges (< 25, 25-60, > 60 mm). Predicted sediment concentrations were consistently six to ten times higher than observed, probably due to sediment trapping along vegetated channel banks. Predicted N and P concentrations in stream water ranged from just below to well above regulatory norms. Expert knowledge of the area, in addition to experience reported in the literature, was able to compensate in part for limited calibration data. Several scenarios (actual, recommended and excessive manure applications, and point source pollution from swine operations) could be compared by the model, using a relative ranking rather than quantitative predictions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although research on influenza lasted for more than 100 years, it is still one of the most prominent diseases causing half a million human deaths every year. With the recent observation of new highly pathogenic H5N1 and H7N7 strains, and the appearance of the influenza pandemic caused by the H1N1 swine-like lineage, a collaborative effort to share observations on the evolution of this virus in both animals and humans has been established. The OpenFlu database (OpenFluDB) is a part of this collaborative effort. It contains genomic and protein sequences, as well as epidemiological data from more than 27,000 isolates. The isolate annotations include virus type, host, geographical location and experimentally tested antiviral resistance. Putative enhanced pathogenicity as well as human adaptation propensity are computed from protein sequences. Each virus isolate can be associated with the laboratories that collected, sequenced and submitted it. Several analysis tools including multiple sequence alignment, phylogenetic analysis and sequence similarity maps enable rapid and efficient mining. The contents of OpenFluDB are supplied by direct user submission, as well as by a daily automatic procedure importing data from public repositories. Additionally, a simple mechanism facilitates the export of OpenFluDB records to GenBank. This resource has been successfully used to rapidly and widely distribute the sequences collected during the recent human swine flu outbreak and also as an exchange platform during the vaccine selection procedure. Database URL: http://openflu.vital-it.ch.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Probabilistic inversion methods based on Markov chain Monte Carlo (MCMC) simulation are well suited to quantify parameter and model uncertainty of nonlinear inverse problems. Yet, application of such methods to CPU-intensive forward models can be a daunting task, particularly if the parameter space is high dimensional. Here, we present a 2-D pixel-based MCMC inversion of plane-wave electromagnetic (EM) data. Using synthetic data, we investigate how model parameter uncertainty depends on model structure constraints using different norms of the likelihood function and the model constraints, and study the added benefits of joint inversion of EM and electrical resistivity tomography (ERT) data. Our results demonstrate that model structure constraints are necessary to stabilize the MCMC inversion results of a highly discretized model. These constraints decrease model parameter uncertainty and facilitate model interpretation. A drawback is that these constraints may lead to posterior distributions that do not fully include the true underlying model, because some of its features exhibit a low sensitivity to the EM data, and hence are difficult to resolve. This problem can be partly mitigated if the plane-wave EM data is augmented with ERT observations. The hierarchical Bayesian inverse formulation introduced and used herein is able to successfully recover the probabilistic properties of the measurement data errors and a model regularization weight. Application of the proposed inversion methodology to field data from an aquifer demonstrates that the posterior mean model realization is very similar to that derived from a deterministic inversion with similar model constraints.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Maximum entropy modeling (Maxent) is a widely used algorithm for predicting species distributions across space and time. Properly assessing the uncertainty in such predictions is non-trivial and requires validation with independent datasets. Notably, model complexity (number of model parameters) remains a major concern in relation to overfitting and, hence, transferability of Maxent models. An emerging approach is to validate the cross-temporal transferability of model predictions using paleoecological data. In this study, we assess the effect of model complexity on the performance of Maxent projections across time using two European plant species (Alnus giutinosa (L.) Gaertn. and Corylus avellana L) with an extensive late Quaternary fossil record in Spain as a study case. We fit 110 models with different levels of complexity under present time and tested model performance using AUC (area under the receiver operating characteristic curve) and AlCc (corrected Akaike Information Criterion) through the standard procedure of randomly partitioning current occurrence data. We then compared these results to an independent validation by projecting the models to mid-Holocene (6000 years before present) climatic conditions in Spain to assess their ability to predict fossil pollen presence-absence and abundance. We find that calibrating Maxent models with default settings result in the generation of overly complex models. While model performance increased with model complexity when predicting current distributions, it was higher with intermediate complexity when predicting mid-Holocene distributions. Hence, models of intermediate complexity resulted in the best trade-off to predict species distributions across time. Reliable temporal model transferability is especially relevant for forecasting species distributions under future climate change. Consequently, species-specific model tuning should be used to find the best modeling settings to control for complexity, notably with paleoecological data to independently validate model projections. For cross-temporal projections of species distributions for which paleoecological data is not available, models of intermediate complexity should be selected.