847 resultados para Data Mining, Big Data, Consumi energetici, Weka Data Cleaning
Resumo:
Degut al gran interès actual per instal·lar clústers dedicats al tractament de dades amb Hadoop, s'ha dissenyat una distribució de Linux que automatitza totes les tasques associades. Aquesta distribució permet fer el desplegament sobre un clúster i realitzar una configuració bàsica del mateix de la forma més desatesa possible.
Resumo:
En aquest treball es presenta el laboratori que l'eLearn Center posa a disposició del professorat i investigadors en e-learning de la UOC per al disseny sistemàtic d'experiments sota una perspectiva de Learning Analytics, però tambémecanismes per fer seguiment i documentar tot el procés que envolta el disseny d'experiències docents de forma que sigui més senzill transferir-les a altres àmbits.
Resumo:
Este proyecto de final de carrera corresponde al área de inteligencia artificial y representa un caso de uso que pretende utilizar datos reales referentes a accidentes de tráfico (datos de accidentes, muertos, heridos, etc.) y analizarlas conjuntamente con datos que puedan tener una posible relación con los accidentes como el parque de vehículos, las temperaturas de la zona de los accidentes, etc. con la finalidad de poder obtener las posibles relaciones causa-efecto.
Resumo:
Aquest treball de final de carrera vol donar una solució a un suposat encàrrec de la Unió Europea de construir una base de dades relacional que permeti emmagatzemar dades de l'activitat física dels ciutadans, obtingudes a partir de dispositius wearables, i dades de l'estat de salut i malalties diagnosticades, recollides pels sistemes informàtics dels diferents serveis de salut. Amb totes aquestes dades recopilades la nostra base de dades permetrà, a través d'aplicacions d'alt nivell, extreure informació útil que permeti conèixer l'estat de salut real dels ciutadans i dissenyar actuacions i campanyes que permetin la seva millora.
Resumo:
Development of methods to explore data from educational settings, to understand better the learning process.
Resumo:
Marketing scholars have suggested a need for more empirical research on consumer response to malls, in order to have a better understanding of the variables that explain the behavior of the consumers. The segmentation methodology CHAID (Chi-square automatic interaction detection) was used in order to identify the profiles of consumers with regard to their activities at malls, on the basis of socio-demographic variables and behavioral variables (how and with whom they go to the malls). A sample of 790 subjects answered an online questionnaire. The CHAID analysis of the results was used to identify the profiles of consumers with regard to their activities at malls. In the set of variables analyzed the transport used in order to go shopping and the frequency of visits to centers are the main predictors of behavior in malls. The results provide guidelines for the development of effective strategies to attract consumers to malls and retain them there.
Resumo:
Visual data mining (VDM) tools employ information visualization techniques in order to represent large amounts of high-dimensional data graphically and to involve the user in exploring data at different levels of detail. The users are looking for outliers, patterns and models – in the form of clusters, classes, trends, and relationships – in different categories of data, i.e., financial, business information, etc. The focus of this thesis is the evaluation of multidimensional visualization techniques, especially from the business user’s perspective. We address three research problems. The first problem is the evaluation of projection-based visualizations with respect to their effectiveness in preserving the original distances between data points and the clustering structure of the data. In this respect, we propose the use of existing clustering validity measures. We illustrate their usefulness in evaluating five visualization techniques: Principal Components Analysis (PCA), Sammon’s Mapping, Self-Organizing Map (SOM), Radial Coordinate Visualization and Star Coordinates. The second problem is concerned with evaluating different visualization techniques as to their effectiveness in visual data mining of business data. For this purpose, we propose an inquiry evaluation technique and conduct the evaluation of nine visualization techniques. The visualizations under evaluation are Multiple Line Graphs, Permutation Matrix, Survey Plot, Scatter Plot Matrix, Parallel Coordinates, Treemap, PCA, Sammon’s Mapping and the SOM. The third problem is the evaluation of quality of use of VDM tools. We provide a conceptual framework for evaluating the quality of use of VDM tools and apply it to the evaluation of the SOM. In the evaluation, we use an inquiry technique for which we developed a questionnaire based on the proposed framework. The contributions of the thesis consist of three new evaluation techniques and the results obtained by applying these evaluation techniques. The thesis provides a systematic approach to evaluation of various visualization techniques. In this respect, first, we performed and described the evaluations in a systematic way, highlighting the evaluation activities, and their inputs and outputs. Secondly, we integrated the evaluation studies in the broad framework of usability evaluation. The results of the evaluations are intended to help developers and researchers of visualization systems to select appropriate visualization techniques in specific situations. The results of the evaluations also contribute to the understanding of the strengths and limitations of the visualization techniques evaluated and further to the improvement of these techniques.
Resumo:
Anders Söderbäckin esitys Kirjastoverkkopäivillä 26.10.2011 Helsingissä.
Resumo:
This study aimed at identifying different conditions of coffee plants after harvesting period, using data mining and spectral behavior profiles from Hyperion/EO1 sensor. The Hyperion image, with spatial resolution of 30 m, was acquired in August 28th, 2008, at the end of the coffee harvest season in the studied area. For pre-processing imaging, atmospheric and signal/noise effect corrections were carried out using Flaash and MNF (Minimum Noise Fraction Transform) algorithms, respectively. Spectral behavior profiles (38) of different coffee varieties were generated from 150 Hyperion bands. The spectral behavior profiles were analyzed by Expectation-Maximization (EM) algorithm considering 2; 3; 4 and 5 clusters. T-test with 5% of significance was used to verify the similarity among the wavelength cluster means. The results demonstrated that it is possible to separate five different clusters, which were comprised by different coffee crop conditions making possible to improve future intervention actions.
Resumo:
Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
The whole research of the current Master Thesis project is related to Big Data transfer over Parallel Data Link and my main objective is to assist the Saint-Petersburg National Research University ITMO research team to accomplish this project and apply Green IT methods for the data transfer system. The goal of the team is to transfer Big Data by using parallel data links with SDN Openflow approach. My task as a team member was to compare existing data transfer applications in case to verify which results the highest data transfer speed in which occasions and explain the reasons. In the context of this thesis work a comparison between 5 different utilities was done, which including Fast Data Transfer (FDT), BBCP, BBFTP, GridFTP, and FTS3. A number of scripts where developed which consist of creating random binary data to be incompressible to have fair comparison between utilities, execute the Utilities with specified parameters, create log files, results, system parameters, and plot graphs to compare the results. Transferring such an enormous variety of data can take a long time, and hence, the necessity appears to reduce the energy consumption to make them greener. In the context of Green IT approach, our team used Cloud Computing infrastructure called OpenStack. It’s more efficient to allocated specific amount of hardware resources to test different scenarios rather than using the whole resources from our testbed. Testing our implementation with OpenStack infrastructure results that the virtual channel does not consist of any traffic and we can achieve the highest possible throughput. After receiving the final results we are in place to identify which utilities produce faster data transfer in different scenarios with specific TCP parameters and we can use them in real network data links.
Resumo:
This thesis introduces heat demand forecasting models which are generated by using data mining algorithms. The forecast spans one full day and this forecast can be used in regulating heat consumption of buildings. For training the data mining models, two years of heat consumption data from a case building and weather measurement data from Finnish Meteorological Institute are used. The thesis utilizes Microsoft SQL Server Analysis Services data mining tools in generating the data mining models and CRISP-DM process framework to implement the research. Results show that the built models can predict heat demand at best with mean average percentage errors of 3.8% for 24-h profile and 5.9% for full day. A deployment model for integrating the generated data mining models into an existing building energy management system is also discussed.