859 resultados para Data mining, Business intelligence, Previsioni di mercato
Resumo:
Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be effective, it is important to include the visualization techniques in the mining process and to generate the discovered patterns for a more comprehensive visual view. In this dissertation, four related problems: dimensionality reduction for visualizing high dimensional datasets, visualization-based clustering evaluation, interactive document mining, and multiple clusterings exploration are studied to explore the integration of data mining and data visualization. In particular, we 1) propose an efficient feature selection method (reliefF + mRMR) for preprocessing high dimensional datasets; 2) present DClusterE to integrate cluster validation with user interaction and provide rich visualization tools for users to examine document clustering results from multiple perspectives; 3) design two interactive document summarization systems to involve users efforts and generate customized summaries from 2D sentence layouts; and 4) propose a new framework which organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions.
Resumo:
Electronic database handling of buisness information has gradually gained its popularity in the hospitality industry. This article provides an overview on the fundamental concepts of a hotel database and investigates the feasibility of incorporating computer-assisted data mining techniques into hospitality database applications. The author also exposes some potential myths associated with data mining in hospitaltiy database applications.
Resumo:
Peer reviewed
Resumo:
Peer reviewed
Resumo:
Data mining can be defined as the extraction of implicit, previously un-known, and potentially useful information from data. Numerous re-searchers have been developing security technology and exploring new methods to detect cyber-attacks with the DARPA 1998 dataset for Intrusion Detection and the modified versions of this dataset KDDCup99 and NSL-KDD, but until now no one have examined the performance of the Top 10 data mining algorithms selected by experts in data mining. The compared classification learning algorithms in this thesis are: C4.5, CART, k-NN and Naïve Bayes. The performance of these algorithms are compared with accuracy, error rate and average cost on modified versions of NSL-KDD train and test dataset where the instances are classified into normal and four cyber-attack categories: DoS, Probing, R2L and U2R. Additionally the most important features to detect cyber-attacks in all categories and in each category are evaluated with Weka’s Attribute Evaluator and ranked according to Information Gain. The results show that the classification algorithm with best performance on the dataset is the k-NN algorithm. The most important features to detect cyber-attacks are basic features such as the number of seconds of a network connection, the protocol used for the connection, the network service used, normal or error status of the connection and the number of data bytes sent. The most important features to detect DoS, Probing and R2L attacks are basic features and the least important features are content features. Unlike U2R attacks, where the content features are the most important features to detect attacks.
Resumo:
Data from the World Federation of Exchanges show that Brazil’s Sao Paulo stock exchange is one of the largest worldwide in terms of market value. Thus, the objective of this study is to obtain univariate and bivariate forecasting models based on intraday data from the futures and spot markets of the BOVESPA index. The interest is to verify if there exist arbitrage opportunities in Brazilian financial market. To this end, three econometric forecasting models were built: ARFIMA, vector autoregressive (VAR), and vector error correction (VEC). Furthermore, it presents the results of a Granger causality test for the aforementioned series. This type of study shows that it is important to identify arbitrage opportunities in financial markets and, in particular, in the application of these models on data of this nature. In terms of the forecasts made with these models, VEC showed better results. The causality test shows that futures BOVESPA index Granger causes spot BOVESPA index. This result may indicate arbitrage opportunities in Brazil.
Resumo:
Data mining, as a heatedly discussed term, has been studied in various fields. Its possibilities in refining the decision-making process, realizing potential patterns and creating valuable knowledge have won attention of scholars and practitioners. However, there are less studies intending to combine data mining and libraries where data generation occurs all the time. Therefore, this thesis plans to fill such a gap. Meanwhile, potential opportunities created by data mining are explored to enhance one of the most important elements of libraries: reference service. In order to thoroughly demonstrate the feasibility and applicability of data mining, literature is reviewed to establish a critical understanding of data mining in libraries and attain the current status of library reference service. The result of the literature review indicates that free online data resources other than data generated on social media are rarely considered to be applied in current library data mining mandates. Therefore, the result of the literature review motivates the presented study to utilize online free resources. Furthermore, the natural match between data mining and libraries is established. The natural match is explained by emphasizing the data richness reality and considering data mining as one kind of knowledge, an easy choice for libraries, and a wise method to overcome reference service challenges. The natural match, especially the aspect that data mining could be helpful for library reference service, lays the main theoretical foundation for the empirical work in this study. Turku Main Library was selected as the case to answer the research question: whether data mining is feasible and applicable for reference service improvement. In this case, the daily visit from 2009 to 2015 in Turku Main Library is considered as the resource for data mining. In addition, corresponding weather conditions are collected from Weather Underground, which is totally free online. Before officially being analyzed, the collected dataset is cleansed and preprocessed in order to ensure the quality of data mining. Multiple regression analysis is employed to mine the final dataset. Hourly visits are the independent variable and weather conditions, Discomfort Index and seven days in a week are dependent variables. In the end, four models in different seasons are established to predict visiting situations in each season. Patterns are realized in different seasons and implications are created based on the discovered patterns. In addition, library-climate points are generated by a clustering method, which simplifies the process for librarians using weather data to forecast library visiting situation. Then the data mining result is interpreted from the perspective of improving reference service. After this data mining work, the result of the case study is presented to librarians so as to collect professional opinions regarding the possibility of employing data mining to improve reference services. In the end, positive opinions are collected, which implies that it is feasible to utilizing data mining as a tool to enhance library reference service.
Resumo:
The incredible rapid development to huge volumes of air travel, mainly because of jet airliners that appeared to the sky in the 1950s, created the need for systematic research for aviation safety and collecting data about air traffic. The structured data can be analysed easily using queries from databases and running theseresults through graphic tools. However, in analysing narratives that often give more accurate information about the case, mining tools are needed. The analysis of textual data with computers has not been possible until data mining tools have been developed. Their use, at least among aviation, is still at a moderate level. The research aims at discovering lethal trends in the flight safety reports. The narratives of 1,200 flight safety reports from years 1994 – 1996 in Finnish were processed with three text mining tools. One of them was totally language independent, the other had a specific configuration for Finnish and the third originally created for English, but encouraging results had been achieved with Spanish and that is why a Finnish test was undertaken, too. The global rate of accidents is stabilising and the situation can now be regarded as satisfactory, but because of the growth in air traffic, the absolute number of fatal accidents per year might increase, if the flight safety will not be improved. The collection of data and reporting systems have reached their top level. The focal point in increasing the flight safety is analysis. The air traffic has generally been forecasted to grow 5 – 6 per cent annually over the next two decades. During this period, the global air travel will probably double also with relatively conservative expectations of economic growth. This development makes the airline management confront growing pressure due to increasing competition, signify cant rise in fuel prices and the need to reduce the incident rate due to expected growth in air traffic volumes. All this emphasises the urgent need for new tools and methods. All systems provided encouraging results, as well as proved challenges still to be won. Flight safety can be improved through the development and utilisation of sophisticated analysis tools and methods, like data mining, using its results supporting the decision process of the executives.
Resumo:
La importancia del proceso de toma de decisiones en la determinación del éxito de las compañías; genera la necesidad de contar con una fuente de información confiable que permita la generación de conocimiento oportuno y a disposición de quien lo necesita. El propósito de esta investigación es establecer un marco de referencia de la utilización de Business Intelligence como soporte de las decisiones tácticas, estratégicas y operacionales en las empresas. Iniciando con la descripción de la evolución de los sistemas de información utilizados en el proceso de toma de decisiones, impulsada por los diferentes cambios tecnológicos que han marcado el camino del establecimiento de Business Intelligence como una solución integral para los desafíos que se presentan a diario relacionados con la búsqueda de generación de valor mediante la implementación de decisiones óptimas. Luego se describe la arquitectura de un sistema de inteligencia de negocios en la cual se define elementos básicos para el correcto funcionamiento, como lo son: almacenamiento de datos, funciones empresariales, sistemas de gestión y las interfaces de usuario. Además de describir el proceso y alcance de su correcta implementación, y poder así obtener los beneficios que estos sistemas ofrecen. La metodología desarrollada en la investigación fue descriptiva, y se fundamentó en identificar el grado de utilización de Business Intelligence por los tomadores de decisiones, representados por egresados y graduados de la Maestría en Administración Financiera de la Universidad de El Salvador en el período 2006-2015.
Resumo:
Este trabalho objetivou realizar a sistematização e análise das informações disponíveis na literatura sobre técnicas de produção de mudas de seis espécies florestais nativas e exóticas no Bioma Amazônia.
Resumo:
L'analisi di questa tesi si basa sui dati ottenuti contattando direttamente le aziende produttrici e/o di progettazione e riguarda due settori energetici: la cogenerazione e il solare fotovoltaico. Nel primo si sono studiate 60 unità complete di cogenerazione il cui studio si è basato sulla costruzione di grafici che mettono in relazione le potenze uscenti con il prezzo. Si è mostrato così la forte presenza di un fenomeno di economia di scala e la diversa ripartizione della potenza prodotta. Per unità di piccola-media potenza predomina la produzione di energia termica, la quale diminuisce la sua quota percentuale all’aumentare della taglia. Nel settore del solare fotovoltaico l’analisi è stata più diversificata e articolata. Si sono analizzati 70 pannelli di cui 35 monocristallini, 14 europei e 21 asiatici, e 35 policristallini, 26 europei e 9 asiatici, per i quali si sono graficati gli andamenti di resa, potenza specifica e costo specifico. I pannelli monocristallini, intrinsecamente più efficienti grazie al maggior grado di purezza, presentano un andamento crescente dei costi piuttosto ripido anche se non proporzionale con l’aumento delle prestazioni. Tra i pannelli policristallini invece si evidenzia l’assenza di un andamento nettamente crescente delle prestazioni all’aumentare dei costi. Confrontando i moduli divisi in base alla macro-regione di produzione si nota che l’Asia produce i pannelli al Si monocristallino più efficienti e più costosi mentre pannelli al Si policristallino più economici, ma con prestazioni in linea con quelli europei.Con i dati raccolti si è potuta svolgere anche una ristretta analisi sui pannelli al film sottile. Quelli che utilizzano il Si amorfo presentano prestazioni così basse da non permettere loro una vera affermazione sul mercato. Quest’ultima si presume che sarà invece possibile per pannelli basati su diversi materiali semiconduttori che presentano buoni livelli di prestazioni, ma prezzi ancora troppo elevati.
Resumo:
The semiarid region of northeastern Brazil, the Caatinga, is extremely important due to its biodiversity and endemism. Measurements of plant physiology are crucial to the calibration of Dynamic Global Vegetation Models (DGVMs) that are currently used to simulate the responses of vegetation in face of global changes. In a field work realized in an area of preserved Caatinga forest located in Petrolina, Pernambuco, measurements of carbon assimilation (in response to light and CO2) were performed on 11 individuals of Poincianella microphylla, a native species that is abundant in this region. These data were used to calibrate the maximum carboxylation velocity (Vcmax) used in the INLAND model. The calibration techniques used were Multiple Linear Regression (MLR), and data mining techniques as the Classification And Regression Tree (CART) and K-MEANS. The results were compared to the UNCALIBRATED model. It was found that simulated Gross Primary Productivity (GPP) reached 72% of observed GPP when using the calibrated Vcmax values, whereas the UNCALIBRATED approach accounted for 42% of observed GPP. Thus, this work shows the benefits of calibrating DGVMs using field ecophysiological measurements, especially in areas where field data is scarce or non-existent, such as in the Caatinga