977 resultados para Web mining


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Social bookmark tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. These systems provide currently relatively few structure. We discuss in this paper, how association rule mining can be adopted to analyze and structure folksonomies, and how the results can be used for ontology learning and supporting emergent semantics. We demonstrate our approach on a large scale dataset stemming from an online system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Co-training is a semi-supervised learning method that is designed to take advantage of the redundancy that is present when the object to be identified has multiple descriptions. Co-training is known to work well when the multiple descriptions are conditional independent given the class of the object. The presence of multiple descriptions of objects in the form of text, images, audio and video in multimedia applications appears to provide redundancy in the form that may be suitable for co-training. In this paper, we investigate the suitability of utilizing text and image data from the Web for co-training. We perform measurements to find indications of conditional independence in the texts and images obtained from the Web. Our measurements suggest that conditional independence is likely to be present in the data. Our experiments, within a relevance feedback framework to test whether a method that exploits the conditional independence outperforms methods that do not, also indicate that better performance can indeed be obtained by designing algorithms that exploit this form of the redundancy when it is present.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Search engines - such as Google - have been characterized as "Databases of intentions". This class will focus on different aspects of intentionality on the web, including goal mining, goal modeling and goal-oriented search. Readings: M. Strohmaier, M. Lux, M. Granitzer, P. Scheir, S. Liaskos, E. Yu, How Do Users Express Goals on the Web? - An Exploration of Intentional Structures in Web Search, We Know'07 International Workshop on Collaborative Knowledge Management for Web Information Systems in conjunction with WISE'07, Nancy, France, 2007. [Web link] Readings: Automatic identification of user goals in web search, U. Lee and Z. Liu and J. Cho WWW '05: Proceedings of the 14th International World Wide Web Conference 391--400 (2005) [Web link]

Relevância:

30.00% 30.00%

Publicador:

Resumo:

peaker(s): Jon Hare Organiser: Time: 25/06/2014 11:00-11:50 Location: B32/3077 Abstract The aggregation of items from social media streams, such as Flickr photos and Twitter tweets, into meaningful groups can help users contextualise and effectively consume the torrents of information on the social web. This task is challenging due to the scale of the streams and the inherently multimodal nature of the information being contextualised. In this talk I'll describe some of our recent work on trend and event detection in multimedia data streams. We focus on scalable streaming algorithms that can be applied to multimedia data streams from the web and the social web. The talk will cover two particular aspects of our work: mining Twitter for trending images by detecting near duplicates; and detecting social events in multimedia data with streaming clustering algorithms. I'll will describe in detail our techniques, and explore open questions and areas of potential future work, in both these tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract A frequent assumption in Social Media is that its open nature leads to a representative view of the world. In this talk we want to consider bias occurring in the Social Web. We will consider a case study of liquid feedback, a direct democracy platform of the German pirate party as well as models of (non-)discriminating systems. As a conclusion of this talk we stipulate the need of Social Media systems to bias their working according to social norms and to publish the bias they introduce. Speaker Biography: Prof Steffen Staab Steffen studied in Erlangen (Germany), Philadelphia (USA) and Freiburg (Germany) computer science and computational linguistics. Afterwards he worked as researcher at Uni. Stuttgart/Fraunhofer and Univ. Karlsruhe, before he became professor in Koblenz (Germany). Since March 2015 he also holds a chair for Web and Computer Science at Univ. of Southampton sharing his time between here and Koblenz. In his research career he has managed to avoid almost all good advice that he now gives to his team members. Such advise includes focusing on research (vs. company) or concentrating on only one or two research areas (vs. considering ontologies, semantic web, social web, data engineering, text mining, peer-to-peer, multimedia, HCI, services, software modelling and programming and some more). Though, actually, improving how we understand and use text and data is a good common denominator for a lot of Steffen's professional activities.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

OBJECTIVES: The prediction of protein structure and the precise understanding of protein folding and unfolding processes remains one of the greatest challenges in structural biology and bioinformatics. Computer simulations based on molecular dynamics (MD) are at the forefront of the effort to gain a deeper understanding of these complex processes. Currently, these MD simulations are usually on the order of tens of nanoseconds, generate a large amount of conformational data and are computationally expensive. More and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. METHODS: To adequately organize, manage, and analyze the data generated by unfolding simulation studies, we designed a data warehouse system that is embedded in a grid environment to facilitate the seamless sharing of available computer resources and thus enable many groups to share complex molecular dynamics simulations on a more regular basis. RESULTS: To gain insight into the conformational fluctuations and stability of the monomeric forms of the amyloidogenic protein transthyretin (TTR), molecular dynamics unfolding simulations of the monomer of human TTR have been conducted. Trajectory data and meta-data of the wild-type (WT) protein and the highly amyloidogenic variant L55P-TTR represent the test case for the data warehouse. CONCLUSIONS: Web and grid services, especially pre-defined data mining services that can run on or 'near' the data repository of the data warehouse, are likely to play a pivotal role in the analysis of molecular dynamics unfolding data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Il problema relativo alla predizione, la ricerca di pattern predittivi all‘interno dei dati, è stato studiato ampiamente. Molte metodologie robuste ed efficienti sono state sviluppate, procedimenti che si basano sull‘analisi di informazioni numeriche strutturate. Quella testuale, d‘altro canto, è una tipologia di informazione fortemente destrutturata. Quindi, una immediata conclusione, porterebbe a pensare che per l‘analisi predittiva su dati testuali sia necessario sviluppare metodi completamente diversi da quelli ben noti dalle tecniche di data mining. Un problema di predizione può essere risolto utilizzando invece gli stessi metodi : dati testuali e documenti possono essere trasformati in valori numerici, considerando per esempio l‘assenza o la presenza di termini, rendendo di fatto possibile una utilizzazione efficiente delle tecniche già sviluppate. Il text mining abilita la congiunzione di concetti da campi di applicazione estremamente eterogenei. Con l‘immensa quantità di dati testuali presenti, basti pensare, sul World Wide Web, ed in continua crescita a causa dell‘utilizzo pervasivo di smartphones e computers, i campi di applicazione delle analisi di tipo testuale divengono innumerevoli. L‘avvento e la diffusione dei social networks e della pratica di micro blogging abilita le persone alla condivisione di opinioni e stati d‘animo, creando un corpus testuale di dimensioni incalcolabili aggiornato giornalmente. Le nuove tecniche di Sentiment Analysis, o Opinion Mining, si occupano di analizzare lo stato emotivo o la tipologia di opinione espressa all‘interno di un documento testuale. Esse sono discipline attraverso le quali, per esempio, estrarre indicatori dello stato d‘animo di un individuo, oppure di un insieme di individui, creando una rappresentazione dello stato emotivo sociale. L‘andamento dello stato emotivo sociale può condizionare macroscopicamente l‘evolvere di eventi globali? Studi in campo di Economia e Finanza Comportamentale assicurano un legame fra stato emotivo, capacità nel prendere decisioni ed indicatori economici. Grazie alle tecniche disponibili ed alla mole di dati testuali continuamente aggiornati riguardanti lo stato d‘animo di milioni di individui diviene possibile analizzare tali correlazioni. In questo studio viene costruito un sistema per la previsione delle variazioni di indici di borsa, basandosi su dati testuali estratti dalla piattaforma di microblogging Twitter, sotto forma di tweets pubblici; tale sistema include tecniche di miglioramento della previsione basate sullo studio di similarità dei testi, categorizzandone il contributo effettivo alla previsione.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ogni giorno enormi quantità di dati sono prodotti come record dettagliati del comportamento di utilizzo del Web, ma l'obiettivo di trarne conoscenza rimane ancora una sfida. In questa trattazione viene descritto EOP(Eye-On-Portal), un framework di monitoring che si propone come strumento per riuscire a catturare informazioni dettagliate sulle componenti della pagina visitata dall'utente e sulle interazioni di quest'ultimo con il portale: i dati raccolti potrebbero avere utilità nell'ottimizzazione del layout e nell'usabilità del portale.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Negli ultimi anni i documenti web hanno attratto molta attenzione, poiché vengono visti come un nuovo mezzo che porta quello che sono le esperienze ed opinioni di un individuo da una parte all'altra del mondo, raggiungendo quindi persone che mai si incontreranno. Ed è proprio con la proliferazione del Web 2.0 che l’attenzione è stata incentrata sul contenuto generato dagli utenti della rete, i quali hanno a disposizione diverse piattaforme sulle quali condividere i loro pensieri, opinioni o andare a cercarne di altrui, magari per valutare l’acquisto di uno smartphone piuttosto che un altro o se valutare l’opzione di cambiare operatore telefonico, ponderando quali potrebbero essere gli svantaggi o i vantaggi che otterrebbe modificando la sia situazione attuale. Questa grande disponibilità di informazioni è molto preziosa per i singoli individui e le organizzazioni, che devono però scontrarsi con la grande difficoltà di trovare le fonti di tali opinioni, estrapolarle ed esprimerle in un formato standard. Queste operazioni risulterebbero quasi impossibili da eseguire a mano, per questo è nato il bisogno di automatizzare tali procedimenti, e la Sentiment Analysis è la risposta a questi bisogni. Sentiment analysis (o Opinion Mining, come è chiamata a volte) è uno dei tanti campi di studio computazionali che affronta il tema dell’elaborazione del linguaggio naturale orientato all'estrapolazione delle opinioni. Negli ultimi anni si è rilevato essere uno dei nuovi campi di tendenza nel settore dei social media, con una serie di applicazioni nel campo economico, politico e sociale. Questa tesi ha come obiettivo quello di fornire uno sguardo su quello che è lo stato di questo campo di studio, con presentazione di metodi e tecniche e di applicazioni di esse in alcuni studi eseguiti in questi anni.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the paper we report on the results of our experiments on the construction of the opinion ontology. Our aim is to show the benefits of publishing in the open, on the Web, the results of the opinion mining process in a structured form. On the road to achieving this, we attempt to answer the research question to what extent opinion information can be formalized in a unified way. Furthermore, as part of the evaluation, we experiment with the usage of Semantic Web technologies and show particular use cases that support our claims.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Twelve years ago a group of teachers began to work in educational innovation. In 2002 we received an award for educational innovation, undergoing several stages. Recently, we have decided to focus on being teachers of educational innovation. We create a web scheduled in Joomla offering various services, among which we emphasize teaching courses of educational innovation. The “Instituto de Ciencias de la Educacion” in “Universidad Politécnica de Madrid” has recently incorporated two of these courses, which has been highly praised. These courses will be reissued in new calls, and we are going to offer them to more Universities. We are in contact with several institutions, radio programs, the UNESCO Chair of Mining and Industrial Heritage, and we are working with them in the creation of heritage courses using methods that we have developed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this position paper, we claim that the need for time consuming data preparation and result interpretation tasks in knowledge discovery, as well as for costly expert consultation and consensus building activities required for ontology building can be reduced through exploiting the interplay of data mining and ontology engineering. The aim is to obtain in a semi-automatic way new knowledge from distributed data sources that can be used for inference and reasoning, as well as to guide the extraction of further knowledge from these data sources. The proposed approach is based on the creation of a novel knowledge discovery method relying on the combination, through an iterative ?feedbackloop?, of (a) data mining techniques to make emerge implicit models from data and (b) pattern-based ontology engineering to capture these models in reusable, conceptual and inferable artefacts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La nanotecnología es un área de investigación de reciente creación que trata con la manipulación y el control de la materia con dimensiones comprendidas entre 1 y 100 nanómetros. A escala nanométrica, los materiales exhiben fenómenos físicos, químicos y biológicos singulares, muy distintos a los que manifiestan a escala convencional. En medicina, los compuestos miniaturizados a nanoescala y los materiales nanoestructurados ofrecen una mayor eficacia con respecto a las formulaciones químicas tradicionales, así como una mejora en la focalización del medicamento hacia la diana terapéutica, revelando así nuevas propiedades diagnósticas y terapéuticas. A su vez, la complejidad de la información a nivel nano es mucho mayor que en los niveles biológicos convencionales (desde el nivel de población hasta el nivel de célula) y, por tanto, cualquier flujo de trabajo en nanomedicina requiere, de forma inherente, estrategias de gestión de información avanzadas. Desafortunadamente, la informática biomédica todavía no ha proporcionado el marco de trabajo que permita lidiar con estos retos de la información a nivel nano, ni ha adaptado sus métodos y herramientas a este nuevo campo de investigación. En este contexto, la nueva área de la nanoinformática pretende detectar y establecer los vínculos existentes entre la medicina, la nanotecnología y la informática, fomentando así la aplicación de métodos computacionales para resolver las cuestiones y problemas que surgen con la información en la amplia intersección entre la biomedicina y la nanotecnología. Las observaciones expuestas previamente determinan el contexto de esta tesis doctoral, la cual se centra en analizar el dominio de la nanomedicina en profundidad, así como en el desarrollo de estrategias y herramientas para establecer correspondencias entre las distintas disciplinas, fuentes de datos, recursos computacionales y técnicas orientadas a la extracción de información y la minería de textos, con el objetivo final de hacer uso de los datos nanomédicos disponibles. El autor analiza, a través de casos reales, alguna de las tareas de investigación en nanomedicina que requieren o que pueden beneficiarse del uso de métodos y herramientas nanoinformáticas, ilustrando de esta forma los inconvenientes y limitaciones actuales de los enfoques de informática biomédica a la hora de tratar con datos pertenecientes al dominio nanomédico. Se discuten tres escenarios diferentes como ejemplos de actividades que los investigadores realizan mientras llevan a cabo su investigación, comparando los contextos biomédico y nanomédico: i) búsqueda en la Web de fuentes de datos y recursos computacionales que den soporte a su investigación; ii) búsqueda en la literatura científica de resultados experimentales y publicaciones relacionadas con su investigación; iii) búsqueda en registros de ensayos clínicos de resultados clínicos relacionados con su investigación. El desarrollo de estas actividades requiere el uso de herramientas y servicios informáticos, como exploradores Web, bases de datos de referencias bibliográficas indexando la literatura biomédica y registros online de ensayos clínicos, respectivamente. Para cada escenario, este documento proporciona un análisis detallado de los posibles obstáculos que pueden dificultar el desarrollo y el resultado de las diferentes tareas de investigación en cada uno de los dos campos citados (biomedicina y nanomedicina), poniendo especial énfasis en los retos existentes en la investigación nanomédica, campo en el que se han detectado las mayores dificultades. El autor ilustra cómo la aplicación de metodologías provenientes de la informática biomédica a estos escenarios resulta efectiva en el dominio biomédico, mientras que dichas metodologías presentan serias limitaciones cuando son aplicadas al contexto nanomédico. Para abordar dichas limitaciones, el autor propone un enfoque nanoinformático, original, diseñado específicamente para tratar con las características especiales que la información presenta a nivel nano. El enfoque consiste en un análisis en profundidad de la literatura científica y de los registros de ensayos clínicos disponibles para extraer información relevante sobre experimentos y resultados en nanomedicina —patrones textuales, vocabulario en común, descriptores de experimentos, parámetros de caracterización, etc.—, seguido del desarrollo de mecanismos para estructurar y analizar dicha información automáticamente. Este análisis concluye con la generación de un modelo de datos de referencia (gold standard) —un conjunto de datos de entrenamiento y de test anotados manualmente—, el cual ha sido aplicado a la clasificación de registros de ensayos clínicos, permitiendo distinguir automáticamente los estudios centrados en nanodrogas y nanodispositivos de aquellos enfocados a testear productos farmacéuticos tradicionales. El presente trabajo pretende proporcionar los métodos necesarios para organizar, depurar, filtrar y validar parte de los datos nanomédicos existentes en la actualidad a una escala adecuada para la toma de decisiones. Análisis similares para otras tareas de investigación en nanomedicina ayudarían a detectar qué recursos nanoinformáticos se requieren para cumplir los objetivos actuales en el área, así como a generar conjunto de datos de referencia, estructurados y densos en información, a partir de literatura y otros fuentes no estructuradas para poder aplicar nuevos algoritmos e inferir nueva información de valor para la investigación en nanomedicina. ABSTRACT Nanotechnology is a research area of recent development that deals with the manipulation and control of matter with dimensions ranging from 1 to 100 nanometers. At the nanoscale, materials exhibit singular physical, chemical and biological phenomena, very different from those manifested at the conventional scale. In medicine, nanosized compounds and nanostructured materials offer improved drug targeting and efficacy with respect to traditional formulations, and reveal novel diagnostic and therapeutic properties. Nevertheless, the complexity of information at the nano level is much higher than the complexity at the conventional biological levels (from populations to the cell). Thus, any nanomedical research workflow inherently demands advanced information management. Unfortunately, Biomedical Informatics (BMI) has not yet provided the necessary framework to deal with such information challenges, nor adapted its methods and tools to the new research field. In this context, the novel area of nanoinformatics aims to build new bridges between medicine, nanotechnology and informatics, allowing the application of computational methods to solve informational issues at the wide intersection between biomedicine and nanotechnology. The above observations determine the context of this doctoral dissertation, which is focused on analyzing the nanomedical domain in-depth, and developing nanoinformatics strategies and tools to map across disciplines, data sources, computational resources, and information extraction and text mining techniques, for leveraging available nanomedical data. The author analyzes, through real-life case studies, some research tasks in nanomedicine that would require or could benefit from the use of nanoinformatics methods and tools, illustrating present drawbacks and limitations of BMI approaches to deal with data belonging to the nanomedical domain. Three different scenarios, comparing both the biomedical and nanomedical contexts, are discussed as examples of activities that researchers would perform while conducting their research: i) searching over the Web for data sources and computational resources supporting their research; ii) searching the literature for experimental results and publications related to their research, and iii) searching clinical trial registries for clinical results related to their research. The development of these activities will depend on the use of informatics tools and services, such as web browsers, databases of citations and abstracts indexing the biomedical literature, and web-based clinical trial registries, respectively. For each scenario, this document provides a detailed analysis of the potential information barriers that could hamper the successful development of the different research tasks in both fields (biomedicine and nanomedicine), emphasizing the existing challenges for nanomedical research —where the major barriers have been found. The author illustrates how the application of BMI methodologies to these scenarios can be proven successful in the biomedical domain, whilst these methodologies present severe limitations when applied to the nanomedical context. To address such limitations, the author proposes an original nanoinformatics approach specifically designed to deal with the special characteristics of information at the nano level. This approach consists of an in-depth analysis of the scientific literature and available clinical trial registries to extract relevant information about experiments and results in nanomedicine —textual patterns, common vocabulary, experiment descriptors, characterization parameters, etc.—, followed by the development of mechanisms to automatically structure and analyze this information. This analysis resulted in the generation of a gold standard —a manually annotated training or reference set—, which was applied to the automatic classification of clinical trial summaries, distinguishing studies focused on nanodrugs and nanodevices from those aimed at testing traditional pharmaceuticals. The present work aims to provide the necessary methods for organizing, curating and validating existing nanomedical data on a scale suitable for decision-making. Similar analysis for different nanomedical research tasks would help to detect which nanoinformatics resources are required to meet current goals in the field, as well as to generate densely populated and machine-interpretable reference datasets from the literature and other unstructured sources for further testing novel algorithms and inferring new valuable information for nanomedicine.