977 resultados para Web Mining
Resumo:
The mobile apps market is a tremendous success, with millions of apps downloaded and used every day by users spread all around the world. For apps’ developers, having their apps published on one of the major app stores (e.g. Google Play market) is just the beginning of the apps lifecycle. Indeed, in order to successfully compete with the other apps in the market, an app has to be updated frequently by adding new attractive features and by fixing existing bugs. Clearly, any developer interested in increasing the success of her app should try to implement features desired by the app’s users and to fix bugs affecting the user experience of many of them. A precious source of information to decide how to collect users’ opinions and wishes is represented by the reviews left by users on the store from which they downloaded the app. However, to exploit such information the app’s developer should manually read each user review and verify if it contains useful information (e.g. suggestions for new features). This is something not doable if the app receives hundreds of reviews per day, as happens for the very popular apps on the market. In this work, our aim is to provide support to mobile apps developers by proposing a novel approach exploiting data mining, natural language processing, machine learning, and clustering techniques in order to classify the user reviews on the basis of the information they contain (e.g. useless, suggestion for new features, bugs reporting). Such an approach has been empirically evaluated and made available in a web-‐based tool publicly available to all apps’ developers. The achieved results showed that the developed tool: (i) is able to correctly categorise user reviews on the basis of their content (e.g. isolating those reporting bugs) with 78% of accuracy, (ii) produces clusters of reviews (e.g. groups together reviews indicating exactly the same bug to be fixed) that are meaningful from a developer’s point-‐of-‐view, and (iii) is considered useful by a software company working in the mobile apps’ development market.
Resumo:
This thesis is the result of a project whose objective has been to develop and deploy a dashboard for sentiment analysis of football in Twitter based on web components and D3.js. To do so, a visualisation server has been developed in order to present the data obtained from Twitter and analysed with Senpy. This visualisation server has been developed with Polymer web components and D3.js. Data mining has been done with a pipeline between Twitter, Senpy and ElasticSearch. Luigi have been used in this process because helps building complex pipelines of batch jobs, so it has analysed all tweets and stored them in ElasticSearch. To continue, D3.js has been used to create interactive widgets that make data easily accessible, this widgets will allow the user to interact with them and �filter the most interesting data for him. Polymer web components have been used to make this dashboard according to Google's material design and be able to show dynamic data in widgets. As a result, this project will allow an extensive analysis of the social network, pointing out the influence of players and teams and the emotions and sentiments that emerge in a lapse of time.
Resumo:
En esta memoria se presenta el diseño y desarrollo de una aplicación en la nube destinada a la compartición de objetos y servicios. El desarrollo de esta aplicación surge dentro del proyecto de I+D+i, SITAC: Social Internet of Things – Apps by and for the Crowd ITEA 2 11020, que trata de crear una arquitectura integradora y un “ecosistema” que incluya plataformas, herramientas y metodologías para facilitar la conexión y cooperación de entidades de distinto tipo conectadas a la red bien sean sistemas, máquinas, dispositivos o personas con dispositivos móviles personales como tabletas o teléfonos móviles. El proyecto innovará mediante la utilización de un modelo inspirado en las redes sociales para facilitar y unificar las interacciones tanto entre personas como entre personas y dispositivos. En este contexto surge la necesidad de desarrollar una aplicación destinada a la compartición de recursos en la nube que pueden ser tanto lógicos como físicos, y que esté orientada al big data. Ésta será la aplicación presentada en este trabajo, el “Resource Sharing Center”, que ofrece un servicio web para el intercambio y compartición de contenido, y un motor de recomendaciones basado en las preferencias de los usuarios. Con este objetivo, se han usado tecnologías de despliegue en la nube, como Elastic Beanstalk (el PaaS de Amazon Web Services), S3 (el sistema de almacenamiento de Amazon Web Services), SimpleDB (base de datos NoSQL) y HTML5 con JavaScript y Twitter Bootstrap para el desarrollo del front-end, siendo Python y Node.js las tecnologías usadas en el back end, y habiendo contribuido a la mejora de herramientas de clustering sobre big data. Por último, y de cara a realizar el estudio sobre las pruebas de carga de la aplicación se ha usado la herramienta ApacheJMeter.
Resumo:
This dissertation introduces an approach to generate tests to test fail-safe behavior for web applications. We apply the approach to a commercial web application. We build models for both behavioral and mitigation requirements. We create mitigation tests from an existing functional black box test suite by determining failure type and points of failure in the test suite and weaving required mitigation based on weaving rules to generate a test suite that tests proper mitigation of failures. A genetic algorithm (GA) is used to determine points of failure and type of failure that needs to be tested. Mitigation test paths are woven into the behavioral test at the point of failure based on failure specific weaving rules. A simulator was developed to evaluate choice of parameters for the genetic algorithm. We showed how to tune the fitness function and performed tuning experiments for GA to determine what values to use for exploration weight and prospecting weight. We found that higher defect densities make prospecting and mining more successful, while lower mitigation defect densities need more exploration. We compare efficiency and effectiveness of the approach. First, the GA approach is compared to random selection. The results show that the GA performance was better than random selection and that the approach was robust when the search space increased. Second, we compare the GA against four coverage criteria. The results of comparison show that test requirements generated by a genetic algorithm (GA) are more efficient than three of the four coverage criteria for large search spaces. They are equally effective. For small search spaces, the genetic algorithm is less effective than three of the four coverage criteria. The fourth coverage criteria is too weak and unable to find all defects in almost all cases. We also present a large case study of a mortgage system at one of our industrial partners and show how we formalize the approach. We evaluate the use of a GA to create test requirements. The evaluation includes choice of initial population, multiplicity of runs and a discussion of the cost of evaluating fitness. Finally, we build a selective regression testing approach based on types of changes (add, delete, or modify) that could occur in the behavioral model, the fault model, the mitigation models, the weaving rules, and the state-event matrix. We provide a systematic method by showing the formalization steps for each type of change to the various models.
Resumo:
Tesis doctoral con mención europea en procesamiento del lenguaje natural realizada en la Universidad de Alicante por Ester Boldrini bajo la dirección del Dr. Patricio Martínez-Barco. El acto de defensa de la tesis tuvo lugar en la Universidad de Alicante el 23 de enero de 2012 ante el tribunal formado por los doctores Manuel Palomar (Universidad de Alicante), Dr. Paloma Moreda (UA), Dr. Mariona Taulé (Universidad de Barcelona), Dr. Horacio Saggion (Universitat Pompeu Fabra) y Dr. Mike Thelwall (University of Wolverhampton). Calificación: Sobresaliente Cum Laude por unanimidad.
Resumo:
The exponential increase of subjective, user-generated content since the birth of the Social Web, has led to the necessity of developing automatic text processing systems able to extract, process and present relevant knowledge. In this paper, we tackle the Opinion Retrieval, Mining and Summarization task, by proposing a unified framework, composed of three crucial components (information retrieval, opinion mining and text summarization) that allow the retrieval, classification and summarization of subjective information. An extensive analysis is conducted, where different configurations of the framework are suggested and analyzed, in order to determine which is the best one, and under which conditions. The evaluation carried out and the results obtained show the appropriateness of the individual components, as well as the framework as a whole. By achieving an improvement over 10% compared to the state-of-the-art approaches in the context of blogs, we can conclude that subjective text can be efficiently dealt with by means of our proposed framework.
Open business intelligence: on the importance of data quality awareness in user-friendly data mining
Resumo:
Citizens demand more and more data for making decisions in their daily life. Therefore, mechanisms that allow citizens to understand and analyze linked open data (LOD) in a user-friendly manner are highly required. To this aim, the concept of Open Business Intelligence (OpenBI) is introduced in this position paper. OpenBI facilitates non-expert users to (i) analyze and visualize LOD, thus generating actionable information by means of reporting, OLAP analysis, dashboards or data mining; and to (ii) share the new acquired information as LOD to be reused by anyone. One of the most challenging issues of OpenBI is related to data mining, since non-experts (as citizens) need guidance during preprocessing and application of mining algorithms due to the complexity of the mining process and the low quality of the data sources. This is even worst when dealing with LOD, not only because of the different kind of links among data, but also because of its high dimensionality. As a consequence, in this position paper we advocate that data mining for OpenBI requires data quality-aware mechanisms for guiding non-expert users in obtaining and sharing the most reliable knowledge from the available LOD.
Resumo:
Comunicación presentada en las XVI Jornadas de Ingeniería del Software y Bases de Datos, JISBD 2011, A Coruña, 5-7 septiembre 2011.
Resumo:
Web transaction data between Web visitors and Web functionalities usually convey user task-oriented behavior pattern. Mining such type of click-stream data will lead to capture usage pattern information. Nowadays Web usage mining technique has become one of most widely used methods for Web recommendation, which customizes Web content to user-preferred style. Traditional techniques of Web usage mining, such as Web user session or Web page clustering, association rule and frequent navigational path mining can only discover usage pattern explicitly. They, however, cannot reveal the underlying navigational activities and identify the latent relationships that are associated with the patterns among Web users as well as Web pages. In this work, we propose a Web recommendation framework incorporating Web usage mining technique based on Probabilistic Latent Semantic Analysis (PLSA) model. The main advantages of this method are, not only to discover usage-based access pattern, but also to reveal the underlying latent factor as well. With the discovered user access pattern, we then present user more interested content via collaborative recommendation. To validate the effectiveness of proposed approach, we conduct experiments on real world datasets and make comparisons with some existing traditional techniques. The preliminary experimental results demonstrate the usability of the proposed approach.
Resumo:
Categorising visitors based on their interaction with a website is a key problem in Web content usage. The clickstreams generated by various users often follow distinct patterns, the knowledge of which may help in providing customised content. This paper proposes an approach to clustering weblog data, based on ART2 neural networks. Due to the characteristics of the ART2 neural network model, the proposed approach can be used for unsupervised and self-learning data mining, which makes it adaptable to dynamically changing websites.
Resumo:
With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.
Resumo:
The software product line engineering brings advantages when compared with the traditional software development regarding the mass customization of the system components. However, there are scenarios that to maintain separated clones of a software system seems to be an easier and more flexible approach to manage their variabilities of a software product line. This dissertation evaluates qualitatively an approach that aims to support the reconciliation of functionalities between cloned systems. The analyzed approach is based on mining data about the issues and source code of evolved cloned web systems. The next step is to process the merge conflicts collected by the approach and not indicated by traditional control version systems to identify potential integration problems from the cloned software systems. The results of the study show the feasibility of the approach to perform a systematic characterization and analysis of merge conflicts for large-scale web-based systems.
Resumo:
In this talk, I will describe various computational modelling and data mining solutions that form the basis of how the office of Deputy Head of Department (Resources) works to serve you. These include lessons I learn about, and from, optimisation issues in resource allocation, uncertainty analysis on league tables, modelling the process of winning external grants, and lessons we learn from student satisfaction surveys, some of which I have attempted to inject into our planning processes.
Resumo:
With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of topic patterns over different contexts. For instance, analysis of search logs in the context of the user can reveal how we can improve the quality of a search engine by optimizing the search results according to particular users; analysis of customer reviews in the context of positive and negative sentiments can help the user summarize public opinions about a product; analysis of blogs or scientific publications in the context of a social network can facilitate discovery of more meaningful topical communities. Since context information significantly affects the choices of topics and language made by authors, in general, it is very important to incorporate it into analyzing and mining text data. In general, modeling the context in text, discovering contextual patterns of language units and topics from text, a general task which we refer to as Contextual Text Mining, has widespread applications in text mining. In this thesis, we provide a novel and systematic study of contextual text mining, which is a new paradigm of text mining treating context information as the ``first-class citizen.'' We formally define the problem of contextual text mining and its basic tasks, and propose a general framework for contextual text mining based on generative modeling of text. This conceptual framework provides general guidance on text mining problems with context information and can be instantiated into many real tasks, including the general problem of contextual topic analysis. We formally present a functional framework for contextual topic analysis, with a general contextual topic model and its various versions, which can effectively solve the text mining problems in a lot of real world applications. We further introduce general components of contextual topic analysis, by adding priors to contextual topic models to incorporate prior knowledge, regularizing contextual topic models with dependency structure of context, and postprocessing contextual patterns to extract refined patterns. The refinements on the general contextual topic model naturally lead to a variety of probabilistic models which incorporate different types of context and various assumptions and constraints. These special versions of the contextual topic model are proved effective in a variety of real applications involving topics and explicit contexts, implicit contexts, and complex contexts. We then introduce a postprocessing procedure for contextual patterns, by generating meaningful labels for multinomial context models. This method provides a general way to interpret text mining results for real users. By applying contextual text mining in the ``context'' of other text information management tasks, including ad hoc text retrieval and web search, we further prove the effectiveness of contextual text mining techniques in a quantitative way with large scale datasets. The framework of contextual text mining not only unifies many explorations of text analysis with context information, but also opens up many new possibilities for future research directions in text mining.