916 resultados para Data Mining and its Application
Resumo:
Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.
Resumo:
This paper presents an Advanced Traveler Information System (ATIS) developed on Android platform, which is open source and free. The developed application has as its main objective the free use of a Vehicle-to- Infrastructure (V2I) communication through the wireless network access points available in urban centers. In addition to providing the necessary information for an Intelligent Transportation System (ITS) to a central server, the application also receives the traffic data close to the vehicle. Once obtained this traffic information, the application displays them to the driver in a clear and efficient way, allowing the user to make decisions about his route in real time. The application was tested in a real environment and the results are presented in the article. In conclusion we present the benefits of this application. © 2012 IEEE.
Resumo:
In soil surveys, several sampling systems can be used to define the most representative sites for sample collection and description of soil profiles. In recent years, the conditioned Latin hypercube sampling system has gained prominence for soil surveys. In Brazil, most of the soil maps are at small scales and in paper format, which hinders their refinement. The objectives of this work include: (i) to compare two sampling systems by conditioned Latin hypercube to map soil classes and soil properties; (II) to retrieve information from a detailed scale soil map of a pilot watershed for its refinement, comparing two data mining tools, and validation of the new soil map; and (III) to create and validate a soil map of a much larger and similar area from the extrapolation of information extracted from the existing soil map. Two sampling systems were created by conditioned Latin hypercube and by the cost-constrained conditioned Latin hypercube. At each prospection place, soil classification and measurement of the A horizon thickness were performed. Maps were generated and validated for each sampling system, comparing the efficiency of these methods. The conditioned Latin hypercube captured greater variability of soils and properties than the cost-constrained conditioned Latin hypercube, despite the former provided greater difficulty in field work. The conditioned Latin hypercube can capture greater soil variability and the cost-constrained conditioned Latin hypercube presents great potential for use in soil surveys, especially in areas of difficult access. From an existing detailed scale soil map of a pilot watershed, topographical information for each soil class was extracted from a Digital Elevation Model and its derivatives, by two data mining tools. Maps were generated using each tool. The more accurate of these tools was used for extrapolation of soil information for a much larger and similar area and the generated map was validated. It was possible to retrieve the existing soil map information and apply it on a larger area containing similar soil forming factors, at much low financial cost. The KnowledgeMiner tool for data mining, and ArcSIE, used to create the soil map, presented better results and enabled the use of existing soil map to extract soil information and its application in similar larger areas at reduced costs, which is especially important in development countries with limited financial resources for such activities, such as Brazil.
Spatial Data Mining to Support Environmental Management and Decision Making - A Case Study in Brazil
Resumo:
We use long instrumental temperature series together with available field reconstructions of sea-level pressure (SLP) and three-dimensional climate model simulations to analyze relations between temperature anomalies and atmospheric circulation patterns over much of Europe and the Mediterranean for the late winter/early spring (January–April, JFMA) season. A Canonical Correlation Analysis (CCA) investigates interannual to interdecadal covariability between a new gridded SLP field reconstruction and seven long instrumental temperature series covering the past 250 years. We then present and discuss prominent atmospheric circulation patterns related to anomalous warm and cold JFMA conditions within different European areas spanning the period 1760–2007. Next, using a data assimilation technique, we link gridded SLP data with a climate model (EC-Bilt-Clio) for a better dynamical understanding of the relationship between large scale circulation and European climate. We thus present an alternative approach to reconstruct climate for the pre-instrumental period based on the assimilated model simulations. Furthermore, we present an independent method to extend the dynamic circulation analysis for anomalously cold European JFMA conditions back to the sixteenth century. To this end, we use documentary records that are spatially representative for the long instrumental records and derive, through modern analogs, large-scale SLP, surface temperature and precipitation fields. The skill of the analog method is tested in the virtual world of two three-dimensional climate simulations (ECHO-G and HadCM3). This endeavor offers new possibilities to both constrain climate model into a reconstruction mode (through the assimilation approach) and to better asses documentary data in a quantitative way.
Resumo:
SDC has been involved in rural development in Cabo Delgado for more than 30 years. Shortly after the independence of Mozambique, projects in water supply and integrated rural development were initiated. The silvoagropastoral project FO9 based in Mueda was a very early experience in forestry in Cabo Delgado. Andreas Kläy was responsible for the forestry sector in FO9 for 3 years in the early 1980s and had an opportunity to initiate an exchange of ideas and experience in rural development theory and approaches with Yussuf Adam, who was doing research in human anthropology and history in the province. 25 years later, the current situation of forest management in Cabo Delgado was reassessed, with a specific focus on concessions in the North. The opportunity for a partnership between the MITI SA, the University of Eduardo Mondlane, and CDE was created on the basis of this preliminary study1. The aim of this partnership is to generate knowledge and develop capacity for sustainable forest management. The preliminary study showed that “…we have to face weaknesses and would like to start a learning process with the main institutions, organisations, and stakeholder groups active in forest management and research in the North of Cabo Delgado. This learning process will involve studies supported by competent research institutions and workshops …” The specific objectives of ESAPP project Q804 are the following: 1. Contribute to understanding of the forestry sector; 2. Capacity development for professionals and academics; 3. Support for the private sector and the local forest service; 4. Support data generation at Cabo Delgado's Provincial Service; 5. Capacity development for Swiss academic institutions (CDE and ETHZ). A conceptual planning platform was elaborated as a basis for cooperation and research in the partnership (cf. Annex 1). The partners agreed to work on two lines of research: biophysical and socio-economic. In order to ensure a transdisciplinary approach, disciplinary research is anchored in common understanding in workshops based on the LforS methods. These workshops integrate the main stakeholders in the local context of the COMADEL concession in Nangade District managed by MITI SA, and take place in the village of Namiune. The research team observed that current management schemes consist mainly of strategies of nature mining by most stakeholders involved. Institutional settings - formal and informal - have little impact due to weak capacity at the local level and corruption. Local difficulties in a remote rural area facilitate external access to resources and are perpetuated by the loss of benefits. The benefits of logging remain at the top level (economic and political elites). The interests of the owners of the concession in stopping the loss of resources caused by this regime offers a unique opportunity to intervene in the logic of resource degradation and agony in rural development and forest management.
Resumo:
Expert systems are built from knowledge traditionally elicited from the human expert. It is precisely knowledge elicitation from the expert that is the bottleneck in expert system construction. On the other hand, a data mining system, which automatically extracts knowledge, needs expert guidance on the successive decisions to be made in each of the system phases. In this context, expert knowledge and data mining discovered knowledge can cooperate, maximizing their individual capabilities: data mining discovered knowledge can be used as a complementary source of knowledge for the expert system, whereas expert knowledge can be used to guide the data mining process. This article summarizes different examples of systems where there is cooperation between expert knowledge and data mining discovered knowledge and reports our experience of such cooperation gathered from a medical diagnosis project called Intelligent Interpretation of Isokinetics Data, which we developed. From that experience, a series of lessons were learned throughout project development. Some of these lessons are generally applicable and others pertain exclusively to certain project types.
Resumo:
Ubiquitous computing software needs to be autonomous so that essential decisions such as how to configure its particular execution are self-determined. Moreover, data mining serves an important role for ubiquitous computing by providing intelligence to several types of ubiquitous computing applications. Thus, automating ubiquitous data mining is also crucial. We focus on the problem of automatically configuring the execution of a ubiquitous data mining algorithm. In our solution, we generate configuration decisions in a resource aware and context aware manner since the algorithm executes in an environment in which the context often changes and computing resources are often severely limited. We propose to analyze the execution behavior of the data mining algorithm by mining its past executions. By doing so, we discover the effects of resource and context states as well as parameter settings on the data mining quality. We argue that a classification model is appropriate for predicting the behavior of an algorithm?s execution and we concentrate on decision tree classifier. We also define taxonomy on data mining quality so that tradeoff between prediction accuracy and classification specificity of each behavior model that classifies by a different abstraction of quality, is scored for model selection. Behavior model constituents and class label transformations are formally defined and experimental validation of the proposed approach is also performed.
Resumo:
There are a number of factors that contribute to the success of dental implant operations. Among others, is the choice of location in which the prosthetic tooth is to be implanted. This project offers a new approach to analyse jaw tissue for the purpose of selecting suitable locations for teeth implant operations. The application developed takes as input jaw computed tomography stack of slices and trims data outside the jaw area, which is the point of interest. It then reconstructs a three dimensional model of the jaw highlighting points of interest on the reconstructed model. On another hand, data mining techniques have been utilised in order to construct a prediction model based on an information dataset of previous dental implant operations with observed stability values. The goal is to find patterns within the dataset that would help predicting the success likelihood of an implant.
Resumo:
The success of an aquaculture breeding program critically depends on the way in which the base population of breeders is constructed since all the genetic variability for the traits included originally in the breeding goal as well as those to be included in the future is contained in the initial founders. Traditionally, base populations were created from a number of wild strains by sampling equal numbers from each strain. However, for some aquaculture species improved strains are already available and, therefore, mean phenotypic values for economically important traits can be used as a criterion to optimize the sampling when creating base populations. Also, the increasing availability of genome-wide genotype information in aquaculture species could help to refine the estimation of relationships within and between candidate strains and, thus, to optimize the percentage of individuals to be sampled from each strain. This study explores the advantages of using phenotypic and genome-wide information when constructing base populations for aquaculture breeding programs in terms of initial and subsequent trait performance and genetic diversity level. Results show that a compromise solution between diversity and performance can be found when creating base populations. Up to 6% higher levels of phenotypic performance can be achieved at the same level of global diversity in the base population by optimizing the selection of breeders instead of sampling equal numbers from each strain. The higher performance observed in the base population persisted during 10 generations of phenotypic selection applied in the subsequent breeding program.
Resumo:
La diabetes mellitus es un trastorno en la metabolización de los carbohidratos, caracterizado por la nula o insuficiente segregación de insulina (hormona producida por el páncreas), como resultado del mal funcionamiento de la parte endocrina del páncreas, o de una creciente resistencia del organismo a esta hormona. Esto implica, que tras el proceso digestivo, los alimentos que ingerimos se transforman en otros compuestos químicos más pequeños mediante los tejidos exocrinos. La ausencia o poca efectividad de esta hormona polipéptida, no permite metabolizar los carbohidratos ingeridos provocando dos consecuencias: Aumento de la concentración de glucosa en sangre, ya que las células no pueden metabolizarla; consumo de ácidos grasos mediante el hígado, liberando cuerpos cetónicos para aportar la energía a las células. Esta situación expone al enfermo crónico, a una concentración de glucosa en sangre muy elevada, denominado hiperglucemia, la cual puede producir a medio o largo múltiples problemas médicos: oftalmológicos, renales, cardiovasculares, cerebrovasculares, neurológicos… La diabetes representa un gran problema de salud pública y es la enfermedad más común en los países desarrollados por varios factores como la obesidad, la vida sedentaria, que facilitan la aparición de esta enfermedad. Mediante el presente proyecto trabajaremos con los datos de experimentación clínica de pacientes con diabetes de tipo 1, enfermedad autoinmune en la que son destruidas las células beta del páncreas (productoras de insulina) resultando necesaria la administración de insulina exógena. Dicho esto, el paciente con diabetes tipo 1 deberá seguir un tratamiento con insulina administrada por la vía subcutánea, adaptado a sus necesidades metabólicas y a sus hábitos de vida. Para abordar esta situación de regulación del control metabólico del enfermo, mediante una terapia de insulina, no serviremos del proyecto “Páncreas Endocrino Artificial” (PEA), el cual consta de una bomba de infusión de insulina, un sensor continuo de glucosa, y un algoritmo de control en lazo cerrado. El objetivo principal del PEA es aportar al paciente precisión, eficacia y seguridad en cuanto a la normalización del control glucémico y reducción del riesgo de hipoglucemias. El PEA se instala mediante vía subcutánea, por lo que, el retardo introducido por la acción de la insulina, el retardo de la medida de glucosa, así como los errores introducidos por los sensores continuos de glucosa cuando, se descalibran dificultando el empleo de un algoritmo de control. Llegados a este punto debemos modelar la glucosa del paciente mediante sistemas predictivos. Un modelo, es todo aquel elemento que nos permita predecir el comportamiento de un sistema mediante la introducción de variables de entrada. De este modo lo que conseguimos, es una predicción de los estados futuros en los que se puede encontrar la glucosa del paciente, sirviéndonos de variables de entrada de insulina, ingesta y glucosa ya conocidas, por ser las sucedidas con anterioridad en el tiempo. Cuando empleamos el predictor de glucosa, utilizando parámetros obtenidos en tiempo real, el controlador es capaz de indicar el nivel futuro de la glucosa para la toma de decisones del controlador CL. Los predictores que se están empleando actualmente en el PEA no están funcionando correctamente por la cantidad de información y variables que debe de manejar. Data Mining, también referenciado como Descubrimiento del Conocimiento en Bases de Datos (Knowledge Discovery in Databases o KDD), ha sido definida como el proceso de extracción no trivial de información implícita, previamente desconocida y potencialmente útil. Todo ello, sirviéndonos las siguientes fases del proceso de extracción del conocimiento: selección de datos, pre-procesado, transformación, minería de datos, interpretación de los resultados, evaluación y obtención del conocimiento. Con todo este proceso buscamos generar un único modelo insulina glucosa que se ajuste de forma individual a cada paciente y sea capaz, al mismo tiempo, de predecir los estados futuros glucosa con cálculos en tiempo real, a través de unos parámetros introducidos. Este trabajo busca extraer la información contenida en una base de datos de pacientes diabéticos tipo 1 obtenidos a partir de la experimentación clínica. Para ello emplearemos técnicas de Data Mining. Para la consecución del objetivo implícito a este proyecto hemos procedido a implementar una interfaz gráfica que nos guía a través del proceso del KDD (con información gráfica y estadística) de cada punto del proceso. En lo que respecta a la parte de la minería de datos, nos hemos servido de la denominada herramienta de WEKA, en la que a través de Java controlamos todas sus funciones, para implementarlas por medio del programa creado. Otorgando finalmente, una mayor potencialidad al proyecto con la posibilidad de implementar el servicio de los dispositivos Android por la potencial capacidad de portar el código. Mediante estos dispositivos y lo expuesto en el proyecto se podrían implementar o incluso crear nuevas aplicaciones novedosas y muy útiles para este campo. Como conclusión del proyecto, y tras un exhaustivo análisis de los resultados obtenidos, podemos apreciar como logramos obtener el modelo insulina-glucosa de cada paciente. ABSTRACT. The diabetes mellitus is a metabolic disorder, characterized by the low or none insulin production (a hormone produced by the pancreas), as a result of the malfunctioning of the endocrine pancreas part or by an increasing resistance of the organism to this hormone. This implies that, after the digestive process, the food we consume is transformed into smaller chemical compounds, through the exocrine tissues. The absence or limited effectiveness of this polypeptide hormone, does not allow to metabolize the ingested carbohydrates provoking two consequences: Increase of the glucose concentration in blood, as the cells are unable to metabolize it; fatty acid intake through the liver, releasing ketone bodies to provide energy to the cells. This situation exposes the chronic patient to high blood glucose levels, named hyperglycemia, which may cause in the medium or long term multiple medical problems: ophthalmological, renal, cardiovascular, cerebrum-vascular, neurological … The diabetes represents a great public health problem and is the most common disease in the developed countries, by several factors such as the obesity or sedentary life, which facilitate the appearance of this disease. Through this project we will work with clinical experimentation data of patients with diabetes of type 1, autoimmune disease in which beta cells of the pancreas (producers of insulin) are destroyed resulting necessary the exogenous insulin administration. That said, the patient with diabetes type 1 will have to follow a treatment with insulin, administered by the subcutaneous route, adapted to his metabolic needs and to his life habits. To deal with this situation of metabolic control regulation of the patient, through an insulin therapy, we shall be using the “Endocrine Artificial Pancreas " (PEA), which consists of a bomb of insulin infusion, a constant glucose sensor, and a control algorithm in closed bow. The principal aim of the PEA is providing the patient precision, efficiency and safety regarding the normalization of the glycemic control and hypoglycemia risk reduction". The PEA establishes through subcutaneous route, consequently, the delay introduced by the insulin action, the delay of the glucose measure, as well as the mistakes introduced by the constant glucose sensors when, decalibrate, impede the employment of an algorithm of control. At this stage we must shape the patient glucose levels through predictive systems. A model is all that element or set of elements which will allow us to predict the behavior of a system by introducing input variables. Thus what we obtain, is a prediction of the future stages in which it is possible to find the patient glucose level, being served of input insulin, ingestion and glucose variables already known, for being the ones happened previously in the time. When we use the glucose predictor, using obtained real time parameters, the controller is capable of indicating the future level of the glucose for the decision capture CL controller. The predictors that are being used nowadays in the PEA are not working correctly for the amount of information and variables that it need to handle. Data Mining, also indexed as Knowledge Discovery in Databases or KDD, has been defined as the not trivial extraction process of implicit information, previously unknown and potentially useful. All this, using the following phases of the knowledge extraction process: selection of information, pre- processing, transformation, data mining, results interpretation, evaluation and knowledge acquisition. With all this process we seek to generate the unique insulin glucose model that adjusts individually and in a personalized way for each patient form and being capable, at the same time, of predicting the future conditions with real time calculations, across few input parameters. This project of end of grade seeks to extract the information contained in a database of type 1 diabetics patients, obtained from clinical experimentation. For it, we will use technologies of Data Mining. For the attainment of the aim implicit to this project we have proceeded to implement a graphical interface that will guide us across the process of the KDD (with graphical and statistical information) of every point of the process. Regarding the data mining part, we have been served by a tool called WEKA's tool called, in which across Java, we control all of its functions to implement them by means of the created program. Finally granting a higher potential to the project with the possibility of implementing the service for Android devices, porting the code. Through these devices and what has been exposed in the project they might help or even create new and very useful applications for this field. As a conclusion of the project, and after an exhaustive analysis of the obtained results, we can show how we achieve to obtain the insulin–glucose model for each patient.