905 resultados para Web Mining, Data Mining, User Topic Model, Web User Profiles
Resumo:
Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Electrotécnica e de Computadores
Resumo:
This paper discusses the results of applied research on the eco-driving domain based on a huge data set produced from a fleet of Lisbon's public transportation buses for a three-year period. This data set is based on events automatically extracted from the control area network bus and enriched with GPS coordinates, weather conditions, and road information. We apply online analytical processing (OLAP) and knowledge discovery (KD) techniques to deal with the high volume of this data set and to determine the major factors that influence the average fuel consumption, and then classify the drivers involved according to their driving efficiency. Consequently, we identify the most appropriate driving practices and styles. Our findings show that introducing simple practices, such as optimal clutch, engine rotation, and engine running in idle, can reduce fuel consumption on average from 3 to 5l/100 km, meaning a saving of 30 l per bus on one day. These findings have been strongly considered in the drivers' training sessions.
Resumo:
More than ever, there is an increase of the number of decision support methods and computer aided diagnostic systems applied to various areas of medicine. In breast cancer research, many works have been done in order to reduce false-positives when used as a double reading method. In this study, we aimed to present a set of data mining techniques that were applied to approach a decision support system in the area of breast cancer diagnosis. This method is geared to assist clinical practice in identifying mammographic findings such as microcalcifications, masses and even normal tissues, in order to avoid misdiagnosis. In this work a reliable database was used, with 410 images from about 115 patients, containing previous reviews performed by radiologists as microcalcifications, masses and also normal tissue findings. Throughout this work, two feature extraction techniques were used: the gray level co-occurrence matrix and the gray level run length matrix. For classification purposes, we considered various scenarios according to different distinct patterns of injuries and several classifiers in order to distinguish the best performance in each case described. The many classifiers used were Naïve Bayes, Support Vector Machines, k-nearest Neighbors and Decision Trees (J48 and Random Forests). The results in distinguishing mammographic findings revealed great percentages of PPV and very good accuracy values. Furthermore, it also presented other related results of classification of breast density and BI-RADS® scale. The best predictive method found for all tested groups was the Random Forest classifier, and the best performance has been achieved through the distinction of microcalcifications. The conclusions based on the several tested scenarios represent a new perspective in breast cancer diagnosis using data mining techniques.
Resumo:
Worldwide electricity markets have been evolving into regional and even continental scales. The aim at an efficient use of renewable based generation in places where it exceeds the local needs is one of the main reasons. A reference case of this evolution is the European Electricity Market, where countries are connected, and several regional markets were created, each one grouping several countries, and supporting transactions of huge amounts of electrical energy. The continuous transformations electricity markets have been experiencing over the years create the need to use simulation platforms to support operators, regulators, and involved players for understanding and dealing with this complex environment. This paper focuses on demonstrating the advantage that real electricity markets data has for the creation of realistic simulation scenarios, which allow the study of the impacts and implications that electricity markets transformations will bring to the participant countries. A case study using MASCEM (Multi-Agent System for Competitive Electricity Markets) is presented, with a scenario based on real data, simulating the European Electricity Market environment, and comparing its performance when using several different market mechanisms.
Resumo:
This paper presents the Realistic Scenarios Generator (RealScen), a tool that processes data from real electricity markets to generate realistic scenarios that enable the modeling of electricity market players’ characteristics and strategic behavior. The proposed tool provides significant advantages to the decision making process in an electricity market environment, especially when coupled with a multi-agent electricity markets simulator. The generation of realistic scenarios is performed using mechanisms for intelligent data analysis, which are based on artificial intelligence and data mining algorithms. These techniques allow the study of realistic scenarios, adapted to the existing markets, and improve the representation of market entities as software agents, enabling a detailed modeling of their profiles and strategies. This work contributes significantly to the understanding of the interactions between the entities acting in electricity markets by increasing the capability and realism of market simulations.
Resumo:
Dissertation presented to obtain the Ph.D degree in Bioinformatics
Resumo:
Apresentam-se os resultados parcelares de um estudo destinado a promover um melhor conhecimento das estratégias que os jovens em idade escolar (12-18 anos) consideram relevantes para avaliar as fontes de informação disponíveis na Internet. Para o efeito, foi aplicado um inquérito distribuído a uma amostra de 195 alunos de uma escola do 3o ciclo e outra do ensino secundário de um concelho do distrito do Porto. São apresentados e discutidos os resultados acerca da perceção destes alunos quanto aos critérios a aplicar na avaliação das fontes de informação disponíveis na Internet, na vertente da credibilidade. Serão apresenta- das as práticas que os jovens declaram ter relativamente ao uso de critérios de autoria, originalidade, estrutura, atualidade e de comparação para avaliar a credibilidade das fontes de informação. Em complemento, estes resultados serão comparados e discutidos com as perceções que os mesmos inquiridos demonstram possuir relativamente aos elementos que compõem cada um destes critérios. A análise dos dados obtidos é enquadrada e sustentada numa revisão da literatura acerca do conceito de credibilidade, aplicado às fontes de informação disponíveis na Internet. São ainda abordados alguns tópicos relaciona- dos com a inclusão de estratégias de avaliação da credibilidade da informação digital no modelo Big6, um dos modelos de desenvolvimento de competências de literacia da informação mais conhecidos e utilizados nas bibliotecas escolares portuguesas.
Resumo:
Thesis submitted to Faculdade de Ciências e Tecnologia of the Universidade Nova de Lisboa, in partial fulfilment of the requirements for the degree of Master in Computer Science
Resumo:
Harnessing idle PCs CPU cycles, storage space and other resources of networked computers to collaborative are mainly fixated on for all major grid computing research projects. Most of the university computers labs are occupied with the high puissant desktop PC nowadays. It is plausible to notice that most of the time machines are lying idle or wasting their computing power without utilizing in felicitous ways. However, for intricate quandaries and for analyzing astronomically immense amounts of data, sizably voluminous computational resources are required. For such quandaries, one may run the analysis algorithms in very puissant and expensive computers, which reduces the number of users that can afford such data analysis tasks. Instead of utilizing single expensive machines, distributed computing systems, offers the possibility of utilizing a set of much less expensive machines to do the same task. BOINC and Condor projects have been prosperously utilized for solving authentic scientific research works around the world at a low cost. In this work the main goal is to explore both distributed computing to implement, Condor and BOINC, and utilize their potency to harness the ideal PCs resources for the academic researchers to utilize in their research work. In this thesis, Data mining tasks have been performed in implementation of several machine learning algorithms on the distributed computing environment.
Resumo:
Data Mining (DM) methods are being increasingly used in prediction with time series data, in addition to traditional statistical approaches. This paper presents a literature review of the use of DM with time series data, focusing on short- time stocks prediction. This is an area that has been attracting a great deal of attention from researchers in the field. The main contribution of this paper is to provide an outline of the use of DM with time series data, using mainly examples related with short-term stocks prediction. This is important to a better understanding of the field. Some of the main trends and open issues will also be introduced.
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
Muito se tem falado sobre revolução tecnológica e do aparecimento constante de novas aplicações Web, com novas funcionalidades que visam facilitar o trabalho dos utilizadores. Mas será que estas aplicações garantem que os dados transmitidos são tratados e enviados por canais seguros (protocolos)? Que garantias é que o utilizador tem que mesmo que a aplicação utilize um canal, que prevê a privacidade e integridade de dados, esta não apresente alguma vulnerabilidade pondo em causa a informação sensível do utilizador? Software que não foi devidamente testado, aliado à falta de sensibilização por parte dos responsáveis pelo desenvolvimento de software para questões de segurança, levam ao aumento de vulnerabilidades e assim exponenciam o número de potenciais vítimas. Isto aliado ao efeito de desinibição que o sentimento de invisibilidade pode provocar, conduz ao facilitismo e consequentemente ao aumento do número de vítimas alvos de ataques informáticos. O utilizador, por vezes, não sabe muito bem do que se deve proteger, pois a confiança que depõem no software não pressupõem que os seus dados estejam em risco. Neste contexto foram recolhidos dados históricos relativos a vulnerabilidades nos protocolos SSL/TLS, para perceber o impacto que as mesmas apresentam e avaliar o grau de risco. Para além disso, foram avaliados um número significativo de domínios portugueses para perceber se os mesmos têm uma vulnerabilidade específica do protocolo SSL/TLS.
Resumo:
Trabalho de Projeto apresentado como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de Informação