766 resultados para Data Mining, Big Data, Consumi energetici, Weka Data Cleaning
Resumo:
C3S2E '16 Proceedings of the Ninth International C* Conference on Computer Science & Software Engineering
Resumo:
La intención del proyecto es mostrar las diferentes características que ofrece Oracle en el campo de la minería de datos, con la finalidad de saber si puede ser una plataforma apta para la investigación y la educación en la universidad. En la primera parte del proyecto se estudia la aplicación “Oracle Data Miner” y como, mediante un flujo de trabajo visual e intuitivo, pueden aplicarse las distintas técnicas de minería (clasificación, regresión, clustering y asociación). Para mostrar la ejecución de estas técnicas se han usado dataset procedentes de la universidad de Irvine. Con ello se ha conseguido observar el comportamiento de los distintos algoritmos en situaciones reales. Para cada técnica se expone como evaluar su fiabilidad y como interpretar los resultados que se obtienen a partir de su aplicación. También se muestra la aplicación de las técnicas mediante el uso del lenguaje PL/SQL. Gracias a ello podemos integrar la minería de datos en nuestras aplicaciones de manera sencilla. En la segunda parte del proyecto, se ha elaborado un prototipo de una aplicación que utiliza la minería de datos, en concreto la clasificación para obtener el diagnóstico y la probabilidad de que un tumor de mama sea maligno o benigno, a partir de los resultados de una citología.
Resumo:
International audience
Resumo:
The Exhibitium Project , awarded by the BBVA Foundation, is a data-driven project developed by an international consortium of research groups . One of its main objectives is to build a prototype that will serve as a base to produce a platform for the recording and exploitation of data about art-exhibitions available on the Internet . Therefore, our proposal aims to expose the methods, procedures and decision-making processes that have governed the technological implementation of this prototype, especially with regard to the reuse of WordPress (WP) as development framework.
Resumo:
With the exponential growth of the usage of web-based map services, the web GIS application has become more and more popular. Spatial data index, search, analysis, visualization and the resource management of such services are becoming increasingly important to deliver user-desired Quality of Service. First, spatial indexing is typically time-consuming and is not available to end-users. To address this, we introduce TerraFly sksOpen, an open-sourced an Online Indexing and Querying System for Big Geospatial Data. Integrated with the TerraFly Geospatial database [1-9], sksOpen is an efficient indexing and query engine for processing Top-k Spatial Boolean Queries. Further, we provide ergonomic visualization of query results on interactive maps to facilitate the user’s data analysis. Second, due to the highly complex and dynamic nature of GIS systems, it is quite challenging for the end users to quickly understand and analyze the spatial data, and to efficiently share their own data and analysis results with others. Built on the TerraFly Geo spatial database, TerraFly GeoCloud is an extra layer running upon the TerraFly map and can efficiently support many different visualization functions and spatial data analysis models. Furthermore, users can create unique URLs to visualize and share the analysis results. TerraFly GeoCloud also enables the MapQL technology to customize map visualization using SQL-like statements [10]. Third, map systems often serve dynamic web workloads and involve multiple CPU and I/O intensive tiers, which make it challenging to meet the response time targets of map requests while using the resources efficiently. Virtualization facilitates the deployment of web map services and improves their resource utilization through encapsulation and consolidation. Autonomic resource management allows resources to be automatically provisioned to a map service and its internal tiers on demand. v-TerraFly are techniques to predict the demand of map workloads online and optimize resource allocations, considering both response time and data freshness as the QoS target. The proposed v-TerraFly system is prototyped on TerraFly, a production web map service, and evaluated using real TerraFly workloads. The results show that v-TerraFly can accurately predict the workload demands: 18.91% more accurate; and efficiently allocate resources to meet the QoS target: improves the QoS by 26.19% and saves resource usages by 20.83% compared to traditional peak load-based resource allocation.
Resumo:
Effective decision making uses various databases including both micro and macro level datasets. In many cases it is a big challenge to ensure the consistency of the two levels. Different types of problems can occur and several methods can be used to solve them. The paper concentrates on the input alignment of the households’ income for microsimulation, which means refers to improving the elements of a micro data survey (EU-SILC) by using macro data from administrative sources. We use a combined micro-macro model called ECONS-TAX for this improvement. We also produced model projections until 2015 which is important because the official EU-SILC micro database will only be available in Hungary in the summer of 2017. The paper presents our estimations about the dynamics of income elements and the changes in income inequalities. Results show that the aligned data provides a different level of income inequality, but does not affect the direction of change from year to year. However, when we analyzed policy change, the use of aligned data caused larger differences both in income levels and in their dynamics.
Resumo:
Clustering data streams is an important task in data mining research. Recently, some algorithms have been proposed to cluster data streams as a whole, but just few of them deal with multivariate data streams. Even so, these algorithms merely aggregate the attributes without touching upon the correlation among them. In order to overcome this issue, we propose a new framework to cluster multivariate data streams based on their evolving behavior over time, exploring the correlations among their attributes by computing the fractal dimension. Experimental results with climate data streams show that the clusters' quality and compactness can be improved compared to the competing method, leading to the thoughtfulness that attributes correlations cannot be put aside. In fact, the clusters' compactness are 7 to 25 times better using our method. Our framework also proves to be an useful tool to assist meteorologists in understanding the climate behavior along a period of time.
Resumo:
Recent data indicate that levels of overweight and obesity are increasing at an alarming rate throughout the world. At a population level (and commonly to assess individual health risk), the prevalence of overweight and obesity is calculated using cut-offs of the Body Mass Index (BMI) derived from height and weight. Similarly, the BMI is also used to classify individuals and to provide a notional indication of potential health risk. It is likely that epidemiologic surveys that are reliant on BMI as a measure of adiposity will overestimate the number of individuals in the overweight (and slightly obese) categories. This tendency to misclassify individuals may be more pronounced in athletic populations or groups in which the proportion of more active individuals is higher. This differential is most pronounced in sports where it is advantageous to have a high BMI (but not necessarily high fatness). To illustrate this point we calculated the BMIs of international professional rugby players from the four teams involved in the semi-finals of the 2003 Rugby Union World Cup. According to the World Health Organisation (WHO) cut-offs for BMI, approximately 65% of the players were classified as overweight and approximately 25% as obese. These findings demonstrate that a high BMI is commonplace (and a potentially desirable attribute for sport performance) in professional rugby players. An unanswered question is what proportion of the wider population, classified as overweight (or obese) according to the BMI, is misclassified according to both fatness and health risk? It is evident that being overweight should not be an obstacle to a physically active lifestyle. Similarly, a reliance on BMI alone may misclassify a number of individuals who might otherwise have been automatically considered fat and/or unfit.
Resumo:
In this paper, a singularly perturbed ordinary differential equation with non-smooth data is considered. The numerical method is generated by means of a Petrov-Galerkin finite element method with the piecewise-exponential test function and the piecewise-linear trial function. At the discontinuous point of the coefficient, a special technique is used. The method is shown to be first-order accurate and singular perturbation parameter uniform convergence. Finally, numerical results are presented, which are in agreement with theoretical results.