13 resultados para Information Mining

em Universidad Politécnica de Madrid


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes a framework for annotation on travel blogs based on subjectivity (FATS). The framework has the capability to auto-annotate -sentence by sentence- sections from blogs (posts) about travelling in the Spanish language. FATS is used in this experiment to annotate com- ponents from travel blogs in order to create a corpus of 300 annotated posts. Each subjective element in a sentence is annotated as positive or negative as appropriate. Currently correct annotations add up to about 95 per cent in our subset of the travel domain. By means of an iterative process of annotation we can create a subjectively annotated domain specific corpus.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an approach to compare two types of data, subjective data (Polarity of Pan American Games 2011 event by country) and objective data (the number of medals won by each participating country), based on the Pearson corre- lation. When dealing with events described by people, knowledge acquisition is difficult because their structure is heterogeneous and subjective. A first step towards knowing the polarity of the information provided by people consists in automatically classifying the posts into clusters according to their polarity. The authors carried out a set of experiments using a corpus that consists of 5600 posts extracted from 168 Internet resources related to a specific event: the 2011 Pan American games. The approach is based on four components: a crawler, a filter, a synthesizer and a polarity analyzer. The PanAmerican approach automatically classifies the polarity of the event into clusters with the following results: 588 positive, 336 neutral, and 76 negative. Our work found out that the polarity of the content produced was strongly influenced by the results of the event with a correlation of .74. Thus, it is possible to conclude that the polarity of content is strongly affected by the results of the event. Finally, the accuracy of the PanAmerican approach is: .87, .90, and .80 according to the precision of the three classes of polarity evaluated.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

La nanotecnología es un área de investigación de reciente creación que trata con la manipulación y el control de la materia con dimensiones comprendidas entre 1 y 100 nanómetros. A escala nanométrica, los materiales exhiben fenómenos físicos, químicos y biológicos singulares, muy distintos a los que manifiestan a escala convencional. En medicina, los compuestos miniaturizados a nanoescala y los materiales nanoestructurados ofrecen una mayor eficacia con respecto a las formulaciones químicas tradicionales, así como una mejora en la focalización del medicamento hacia la diana terapéutica, revelando así nuevas propiedades diagnósticas y terapéuticas. A su vez, la complejidad de la información a nivel nano es mucho mayor que en los niveles biológicos convencionales (desde el nivel de población hasta el nivel de célula) y, por tanto, cualquier flujo de trabajo en nanomedicina requiere, de forma inherente, estrategias de gestión de información avanzadas. Desafortunadamente, la informática biomédica todavía no ha proporcionado el marco de trabajo que permita lidiar con estos retos de la información a nivel nano, ni ha adaptado sus métodos y herramientas a este nuevo campo de investigación. En este contexto, la nueva área de la nanoinformática pretende detectar y establecer los vínculos existentes entre la medicina, la nanotecnología y la informática, fomentando así la aplicación de métodos computacionales para resolver las cuestiones y problemas que surgen con la información en la amplia intersección entre la biomedicina y la nanotecnología. Las observaciones expuestas previamente determinan el contexto de esta tesis doctoral, la cual se centra en analizar el dominio de la nanomedicina en profundidad, así como en el desarrollo de estrategias y herramientas para establecer correspondencias entre las distintas disciplinas, fuentes de datos, recursos computacionales y técnicas orientadas a la extracción de información y la minería de textos, con el objetivo final de hacer uso de los datos nanomédicos disponibles. El autor analiza, a través de casos reales, alguna de las tareas de investigación en nanomedicina que requieren o que pueden beneficiarse del uso de métodos y herramientas nanoinformáticas, ilustrando de esta forma los inconvenientes y limitaciones actuales de los enfoques de informática biomédica a la hora de tratar con datos pertenecientes al dominio nanomédico. Se discuten tres escenarios diferentes como ejemplos de actividades que los investigadores realizan mientras llevan a cabo su investigación, comparando los contextos biomédico y nanomédico: i) búsqueda en la Web de fuentes de datos y recursos computacionales que den soporte a su investigación; ii) búsqueda en la literatura científica de resultados experimentales y publicaciones relacionadas con su investigación; iii) búsqueda en registros de ensayos clínicos de resultados clínicos relacionados con su investigación. El desarrollo de estas actividades requiere el uso de herramientas y servicios informáticos, como exploradores Web, bases de datos de referencias bibliográficas indexando la literatura biomédica y registros online de ensayos clínicos, respectivamente. Para cada escenario, este documento proporciona un análisis detallado de los posibles obstáculos que pueden dificultar el desarrollo y el resultado de las diferentes tareas de investigación en cada uno de los dos campos citados (biomedicina y nanomedicina), poniendo especial énfasis en los retos existentes en la investigación nanomédica, campo en el que se han detectado las mayores dificultades. El autor ilustra cómo la aplicación de metodologías provenientes de la informática biomédica a estos escenarios resulta efectiva en el dominio biomédico, mientras que dichas metodologías presentan serias limitaciones cuando son aplicadas al contexto nanomédico. Para abordar dichas limitaciones, el autor propone un enfoque nanoinformático, original, diseñado específicamente para tratar con las características especiales que la información presenta a nivel nano. El enfoque consiste en un análisis en profundidad de la literatura científica y de los registros de ensayos clínicos disponibles para extraer información relevante sobre experimentos y resultados en nanomedicina —patrones textuales, vocabulario en común, descriptores de experimentos, parámetros de caracterización, etc.—, seguido del desarrollo de mecanismos para estructurar y analizar dicha información automáticamente. Este análisis concluye con la generación de un modelo de datos de referencia (gold standard) —un conjunto de datos de entrenamiento y de test anotados manualmente—, el cual ha sido aplicado a la clasificación de registros de ensayos clínicos, permitiendo distinguir automáticamente los estudios centrados en nanodrogas y nanodispositivos de aquellos enfocados a testear productos farmacéuticos tradicionales. El presente trabajo pretende proporcionar los métodos necesarios para organizar, depurar, filtrar y validar parte de los datos nanomédicos existentes en la actualidad a una escala adecuada para la toma de decisiones. Análisis similares para otras tareas de investigación en nanomedicina ayudarían a detectar qué recursos nanoinformáticos se requieren para cumplir los objetivos actuales en el área, así como a generar conjunto de datos de referencia, estructurados y densos en información, a partir de literatura y otros fuentes no estructuradas para poder aplicar nuevos algoritmos e inferir nueva información de valor para la investigación en nanomedicina. ABSTRACT Nanotechnology is a research area of recent development that deals with the manipulation and control of matter with dimensions ranging from 1 to 100 nanometers. At the nanoscale, materials exhibit singular physical, chemical and biological phenomena, very different from those manifested at the conventional scale. In medicine, nanosized compounds and nanostructured materials offer improved drug targeting and efficacy with respect to traditional formulations, and reveal novel diagnostic and therapeutic properties. Nevertheless, the complexity of information at the nano level is much higher than the complexity at the conventional biological levels (from populations to the cell). Thus, any nanomedical research workflow inherently demands advanced information management. Unfortunately, Biomedical Informatics (BMI) has not yet provided the necessary framework to deal with such information challenges, nor adapted its methods and tools to the new research field. In this context, the novel area of nanoinformatics aims to build new bridges between medicine, nanotechnology and informatics, allowing the application of computational methods to solve informational issues at the wide intersection between biomedicine and nanotechnology. The above observations determine the context of this doctoral dissertation, which is focused on analyzing the nanomedical domain in-depth, and developing nanoinformatics strategies and tools to map across disciplines, data sources, computational resources, and information extraction and text mining techniques, for leveraging available nanomedical data. The author analyzes, through real-life case studies, some research tasks in nanomedicine that would require or could benefit from the use of nanoinformatics methods and tools, illustrating present drawbacks and limitations of BMI approaches to deal with data belonging to the nanomedical domain. Three different scenarios, comparing both the biomedical and nanomedical contexts, are discussed as examples of activities that researchers would perform while conducting their research: i) searching over the Web for data sources and computational resources supporting their research; ii) searching the literature for experimental results and publications related to their research, and iii) searching clinical trial registries for clinical results related to their research. The development of these activities will depend on the use of informatics tools and services, such as web browsers, databases of citations and abstracts indexing the biomedical literature, and web-based clinical trial registries, respectively. For each scenario, this document provides a detailed analysis of the potential information barriers that could hamper the successful development of the different research tasks in both fields (biomedicine and nanomedicine), emphasizing the existing challenges for nanomedical research —where the major barriers have been found. The author illustrates how the application of BMI methodologies to these scenarios can be proven successful in the biomedical domain, whilst these methodologies present severe limitations when applied to the nanomedical context. To address such limitations, the author proposes an original nanoinformatics approach specifically designed to deal with the special characteristics of information at the nano level. This approach consists of an in-depth analysis of the scientific literature and available clinical trial registries to extract relevant information about experiments and results in nanomedicine —textual patterns, common vocabulary, experiment descriptors, characterization parameters, etc.—, followed by the development of mechanisms to automatically structure and analyze this information. This analysis resulted in the generation of a gold standard —a manually annotated training or reference set—, which was applied to the automatic classification of clinical trial summaries, distinguishing studies focused on nanodrugs and nanodevices from those aimed at testing traditional pharmaceuticals. The present work aims to provide the necessary methods for organizing, curating and validating existing nanomedical data on a scale suitable for decision-making. Similar analysis for different nanomedical research tasks would help to detect which nanoinformatics resources are required to meet current goals in the field, as well as to generate densely populated and machine-interpretable reference datasets from the literature and other unstructured sources for further testing novel algorithms and inferring new valuable information for nanomedicine.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This poster raises the issue of a research work oriented to the storage, retrieval, representation and analysis of dynamic GI, taking into account The ultimate objective is the modelling and representation of the dynamic nature of geographic features, establishing mechanisms to store geometries enriched with a temporal structure (regardless of space) and a set of semantic descriptors detailing and clarifying the nature of the represented features and their temporality. the semantic, the temporal and the spatiotemporal components. We intend to define a set of methods, rules and restrictions for the adequate integration of these components into the primary elements of the GI: theme, location, time [1]. We intend to establish and incorporate three new structures (layers) into the core of data storage by using mark-up languages: a semantictemporal structure, a geosemantic structure, and an incremental spatiotemporal structure. Thus, data would be provided with the capability of pinpointing and expressing their own basic and temporal characteristics, enabling them to interact each other according to their context, and their time and meaning relationships that could be eventually established

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Due to recent scientific and technological advances in information sys¬tems, it is now possible to perform almost every application on a mobile device. The need to make sense of such devices more intelligent opens an opportunity to design data mining algorithm that are able to autonomous execute in local devices to provide the device with knowledge. The problem behind autonomous mining deals with the proper configuration of the algorithm to produce the most appropriate results. Contextual information together with resource information of the device have a strong impact on both the feasibility of a particu¬lar execution and on the production of the proper patterns. On the other hand, performance of the algorithm expressed in terms of efficacy and efficiency highly depends on the features of the dataset to be analyzed together with values of the parameters of a particular implementation of an algorithm. However, few existing approaches deal with autonomous configuration of data mining algorithms and in any case they do not deal with contextual or resources information. Both issues are of particular significance, in particular for social net¬works application. In fact, the widespread use of social networks and consequently the amount of information shared have made the need of modeling context in social application a priority. Also the resource consumption has a crucial role in such platforms as the users are using social networks mainly on their mobile devices. This PhD thesis addresses the aforementioned open issues, focusing on i) Analyzing the behavior of algorithms, ii) mapping contextual and resources information to find the most appropriate configuration iii) applying the model for the case of a social recommender. Four main contributions are presented: - The EE-Model: is able to predict the behavior of a data mining algorithm in terms of resource consumed and accuracy of the mining model it will obtain. - The SC-Mapper: maps a situation defined by the context and resource state to a data mining configuration. - SOMAR: is a social activity (event and informal ongoings) recommender for mobile devices. - D-SOMAR: is an evolution of SOMAR which incorporates the configurator in order to provide updated recommendations. Finally, the experimental validation of the proposed contributions using synthetic and real datasets allows us to achieve the objectives and answer the research questions proposed for this dissertation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ubiquitous computing software needs to be autonomous so that essential decisions such as how to configure its particular execution are self-determined. Moreover, data mining serves an important role for ubiquitous computing by providing intelligence to several types of ubiquitous computing applications. Thus, automating ubiquitous data mining is also crucial. We focus on the problem of automatically configuring the execution of a ubiquitous data mining algorithm. In our solution, we generate configuration decisions in a resource aware and context aware manner since the algorithm executes in an environment in which the context often changes and computing resources are often severely limited. We propose to analyze the execution behavior of the data mining algorithm by mining its past executions. By doing so, we discover the effects of resource and context states as well as parameter settings on the data mining quality. We argue that a classification model is appropriate for predicting the behavior of an algorithm?s execution and we concentrate on decision tree classifier. We also define taxonomy on data mining quality so that tradeoff between prediction accuracy and classification specificity of each behavior model that classifies by a different abstraction of quality, is scored for model selection. Behavior model constituents and class label transformations are formally defined and experimental validation of the proposed approach is also performed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In ubiquitous data stream mining applications, different devices often aim to learn concepts that are similar to some extent. In these applications, such as spam filtering or news recommendation, the data stream underlying concept (e.g., interesting mail/news) is likely to change over time. Therefore, the resultant model must be continuously adapted to such changes. This paper presents a novel Collaborative Data Stream Mining (Coll-Stream) approach that explores the similarities in the knowledge available from other devices to improve local classification accuracy. Coll-Stream integrates the community knowledge using an ensemble method where the classifiers are selected and weighted based on their local accuracy for different partitions of the feature space. We evaluate Coll-Stream classification accuracy in situations with concept drift, noise, partition granularity and concept similarity in relation to the local underlying concept. The experimental results show that Coll-Stream resultant model achieves stability and accuracy in a variety of situations using both synthetic and real world datasets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most data stream classification techniques assume that the underlying feature space is static. However, in real-world applications the set of features and their relevance to the target concept may change over time. In addition, when the underlying concepts reappear, reusing previously learnt models can enhance the learning process in terms of accuracy and processing time at the expense of manageable memory consumption. In this paper, we propose mining recurring concepts in a dynamic feature space (MReC-DFS), a data stream classification system to address the challenges of learning recurring concepts in a dynamic feature space while simultaneously reducing the memory cost associated with storing past models. MReC-DFS is able to detect and adapt to concept changes using the performance of the learning process and contextual information. To handle recurring concepts, stored models are combined in a dynamically weighted ensemble. Incremental feature selection is performed to reduce the combined feature space. This contribution allows MReC-DFS to store only the features most relevant to the learnt concepts, which in turn increases the memory efficiency of the technique. In addition, an incremental feature selection method is proposed that dynamically determines the threshold between relevant and irrelevant features. Experimental results demonstrating the high accuracy of MReC-DFS compared with state-of-the-art techniques on a variety of real datasets are presented. The results also show the superior memory efficiency of MReC-DFS.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There are a number of factors that contribute to the success of dental implant operations. Among others, is the choice of location in which the prosthetic tooth is to be implanted. This project offers a new approach to analyse jaw tissue for the purpose of selecting suitable locations for teeth implant operations. The application developed takes as input jaw computed tomography stack of slices and trims data outside the jaw area, which is the point of interest. It then reconstructs a three dimensional model of the jaw highlighting points of interest on the reconstructed model. On another hand, data mining techniques have been utilised in order to construct a prediction model based on an information dataset of previous dental implant operations with observed stability values. The goal is to find patterns within the dataset that would help predicting the success likelihood of an implant.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nanotechnology represents an area of particular promise and significant opportunity across multiple scientific disciplines. Ongoing nanotechnology research ranges from the characterization of nanoparticles and nanomaterials to the analysis and processing of experimental data seeking correlations between nanoparticles and their functionalities and side effects. Due to their special properties, nanoparticles are suitable for cellular-level diagnostics and therapy, offering numerous applications in medicine, e.g. development of biomedical devices, tissue repair, drug delivery systems and biosensors. In nanomedicine, recent studies are producing large amounts of structural and property data, highlighting the role for computational approaches in information management. While in vitro and in vivo assays are expensive, the cost of computing is falling. Furthermore, improvements in the accuracy of computational methods (e.g. data mining, knowledge discovery, modeling and simulation) have enabled effective tools to automate the extraction, management and storage of these vast data volumes. Since this information is widely distributed, one major issue is how to locate and access data where it resides (which also poses data-sharing limitations). The novel discipline of nanoinformatics addresses the information challenges related to nanotechnology research. In this paper, we summarize the needs and challenges in the field and present an overview of extant initiatives and efforts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La predicción del valor de las acciones en la bolsa de valores ha sido un tema importante en el campo de inversiones, que por varios años ha atraído tanto a académicos como a inversionistas. Esto supone que la información disponible en el pasado de la compañía que cotiza en bolsa tiene alguna implicación en el futuro del valor de la misma. Este trabajo está enfocado en ayudar a un persona u organismo que decida invertir en la bolsa de valores a través de gestión de compra o venta de acciones de una compañía a tomar decisiones respecto al tiempo de comprar o vender basado en el conocimiento obtenido de los valores históricos de las acciones de una compañía en la bolsa de valores. Esta decisión será inferida a partir de un modelo de regresión múltiple que es una de las técnicas de datamining. Para llevar conseguir esto se emplea una metodología conocida como CRISP-DM aplicada a los datos históricos de la compañía con mayor valor actual del NASDAQ.---ABSTRACT---The prediction of the value of shares in the stock market has been a major issue in the field of investments, which for several years has attracted both academics and investors. This means that the information available in the company last traded have any involvement in the future of the value of it. This work is focused on helping an investor decides to invest in the stock market through management buy or sell shares of a company to make decisions with respect to time to buy or sell based on the knowledge gained from the historic values of the shares of a company in the stock market. This decision will be inferred from a multiple regression model which is one of the techniques of data mining. To get this out a methodology known as CRISP-DM applied to historical data of the company with the highest current value of NASDAQ is used.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The mobile apps market is a tremendous success, with millions of apps downloaded and used every day by users spread all around the world. For apps’ developers, having their apps published on one of the major app stores (e.g. Google Play market) is just the beginning of the apps lifecycle. Indeed, in order to successfully compete with the other apps in the market, an app has to be updated frequently by adding new attractive features and by fixing existing bugs. Clearly, any developer interested in increasing the success of her app should try to implement features desired by the app’s users and to fix bugs affecting the user experience of many of them. A precious source of information to decide how to collect users’ opinions and wishes is represented by the reviews left by users on the store from which they downloaded the app. However, to exploit such information the app’s developer should manually read each user review and verify if it contains useful information (e.g. suggestions for new features). This is something not doable if the app receives hundreds of reviews per day, as happens for the very popular apps on the market. In this work, our aim is to provide support to mobile apps developers by proposing a novel approach exploiting data mining, natural language processing, machine learning, and clustering techniques in order to classify the user reviews on the basis of the information they contain (e.g. useless, suggestion for new features, bugs reporting). Such an approach has been empirically evaluated and made available in a web-­‐based tool publicly available to all apps’ developers. The achieved results showed that the developed tool: (i) is able to correctly categorise user reviews on the basis of their content (e.g. isolating those reporting bugs) with 78% of accuracy, (ii) produces clusters of reviews (e.g. groups together reviews indicating exactly the same bug to be fixed) that are meaningful from a developer’s point-­‐of-­‐view, and (iii) is considered useful by a software company working in the mobile apps’ development market.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

La diabetes mellitus es un trastorno en la metabolización de los carbohidratos, caracterizado por la nula o insuficiente segregación de insulina (hormona producida por el páncreas), como resultado del mal funcionamiento de la parte endocrina del páncreas, o de una creciente resistencia del organismo a esta hormona. Esto implica, que tras el proceso digestivo, los alimentos que ingerimos se transforman en otros compuestos químicos más pequeños mediante los tejidos exocrinos. La ausencia o poca efectividad de esta hormona polipéptida, no permite metabolizar los carbohidratos ingeridos provocando dos consecuencias: Aumento de la concentración de glucosa en sangre, ya que las células no pueden metabolizarla; consumo de ácidos grasos mediante el hígado, liberando cuerpos cetónicos para aportar la energía a las células. Esta situación expone al enfermo crónico, a una concentración de glucosa en sangre muy elevada, denominado hiperglucemia, la cual puede producir a medio o largo múltiples problemas médicos: oftalmológicos, renales, cardiovasculares, cerebrovasculares, neurológicos… La diabetes representa un gran problema de salud pública y es la enfermedad más común en los países desarrollados por varios factores como la obesidad, la vida sedentaria, que facilitan la aparición de esta enfermedad. Mediante el presente proyecto trabajaremos con los datos de experimentación clínica de pacientes con diabetes de tipo 1, enfermedad autoinmune en la que son destruidas las células beta del páncreas (productoras de insulina) resultando necesaria la administración de insulina exógena. Dicho esto, el paciente con diabetes tipo 1 deberá seguir un tratamiento con insulina administrada por la vía subcutánea, adaptado a sus necesidades metabólicas y a sus hábitos de vida. Para abordar esta situación de regulación del control metabólico del enfermo, mediante una terapia de insulina, no serviremos del proyecto “Páncreas Endocrino Artificial” (PEA), el cual consta de una bomba de infusión de insulina, un sensor continuo de glucosa, y un algoritmo de control en lazo cerrado. El objetivo principal del PEA es aportar al paciente precisión, eficacia y seguridad en cuanto a la normalización del control glucémico y reducción del riesgo de hipoglucemias. El PEA se instala mediante vía subcutánea, por lo que, el retardo introducido por la acción de la insulina, el retardo de la medida de glucosa, así como los errores introducidos por los sensores continuos de glucosa cuando, se descalibran dificultando el empleo de un algoritmo de control. Llegados a este punto debemos modelar la glucosa del paciente mediante sistemas predictivos. Un modelo, es todo aquel elemento que nos permita predecir el comportamiento de un sistema mediante la introducción de variables de entrada. De este modo lo que conseguimos, es una predicción de los estados futuros en los que se puede encontrar la glucosa del paciente, sirviéndonos de variables de entrada de insulina, ingesta y glucosa ya conocidas, por ser las sucedidas con anterioridad en el tiempo. Cuando empleamos el predictor de glucosa, utilizando parámetros obtenidos en tiempo real, el controlador es capaz de indicar el nivel futuro de la glucosa para la toma de decisones del controlador CL. Los predictores que se están empleando actualmente en el PEA no están funcionando correctamente por la cantidad de información y variables que debe de manejar. Data Mining, también referenciado como Descubrimiento del Conocimiento en Bases de Datos (Knowledge Discovery in Databases o KDD), ha sido definida como el proceso de extracción no trivial de información implícita, previamente desconocida y potencialmente útil. Todo ello, sirviéndonos las siguientes fases del proceso de extracción del conocimiento: selección de datos, pre-procesado, transformación, minería de datos, interpretación de los resultados, evaluación y obtención del conocimiento. Con todo este proceso buscamos generar un único modelo insulina glucosa que se ajuste de forma individual a cada paciente y sea capaz, al mismo tiempo, de predecir los estados futuros glucosa con cálculos en tiempo real, a través de unos parámetros introducidos. Este trabajo busca extraer la información contenida en una base de datos de pacientes diabéticos tipo 1 obtenidos a partir de la experimentación clínica. Para ello emplearemos técnicas de Data Mining. Para la consecución del objetivo implícito a este proyecto hemos procedido a implementar una interfaz gráfica que nos guía a través del proceso del KDD (con información gráfica y estadística) de cada punto del proceso. En lo que respecta a la parte de la minería de datos, nos hemos servido de la denominada herramienta de WEKA, en la que a través de Java controlamos todas sus funciones, para implementarlas por medio del programa creado. Otorgando finalmente, una mayor potencialidad al proyecto con la posibilidad de implementar el servicio de los dispositivos Android por la potencial capacidad de portar el código. Mediante estos dispositivos y lo expuesto en el proyecto se podrían implementar o incluso crear nuevas aplicaciones novedosas y muy útiles para este campo. Como conclusión del proyecto, y tras un exhaustivo análisis de los resultados obtenidos, podemos apreciar como logramos obtener el modelo insulina-glucosa de cada paciente. ABSTRACT. The diabetes mellitus is a metabolic disorder, characterized by the low or none insulin production (a hormone produced by the pancreas), as a result of the malfunctioning of the endocrine pancreas part or by an increasing resistance of the organism to this hormone. This implies that, after the digestive process, the food we consume is transformed into smaller chemical compounds, through the exocrine tissues. The absence or limited effectiveness of this polypeptide hormone, does not allow to metabolize the ingested carbohydrates provoking two consequences: Increase of the glucose concentration in blood, as the cells are unable to metabolize it; fatty acid intake through the liver, releasing ketone bodies to provide energy to the cells. This situation exposes the chronic patient to high blood glucose levels, named hyperglycemia, which may cause in the medium or long term multiple medical problems: ophthalmological, renal, cardiovascular, cerebrum-vascular, neurological … The diabetes represents a great public health problem and is the most common disease in the developed countries, by several factors such as the obesity or sedentary life, which facilitate the appearance of this disease. Through this project we will work with clinical experimentation data of patients with diabetes of type 1, autoimmune disease in which beta cells of the pancreas (producers of insulin) are destroyed resulting necessary the exogenous insulin administration. That said, the patient with diabetes type 1 will have to follow a treatment with insulin, administered by the subcutaneous route, adapted to his metabolic needs and to his life habits. To deal with this situation of metabolic control regulation of the patient, through an insulin therapy, we shall be using the “Endocrine Artificial Pancreas " (PEA), which consists of a bomb of insulin infusion, a constant glucose sensor, and a control algorithm in closed bow. The principal aim of the PEA is providing the patient precision, efficiency and safety regarding the normalization of the glycemic control and hypoglycemia risk reduction". The PEA establishes through subcutaneous route, consequently, the delay introduced by the insulin action, the delay of the glucose measure, as well as the mistakes introduced by the constant glucose sensors when, decalibrate, impede the employment of an algorithm of control. At this stage we must shape the patient glucose levels through predictive systems. A model is all that element or set of elements which will allow us to predict the behavior of a system by introducing input variables. Thus what we obtain, is a prediction of the future stages in which it is possible to find the patient glucose level, being served of input insulin, ingestion and glucose variables already known, for being the ones happened previously in the time. When we use the glucose predictor, using obtained real time parameters, the controller is capable of indicating the future level of the glucose for the decision capture CL controller. The predictors that are being used nowadays in the PEA are not working correctly for the amount of information and variables that it need to handle. Data Mining, also indexed as Knowledge Discovery in Databases or KDD, has been defined as the not trivial extraction process of implicit information, previously unknown and potentially useful. All this, using the following phases of the knowledge extraction process: selection of information, pre- processing, transformation, data mining, results interpretation, evaluation and knowledge acquisition. With all this process we seek to generate the unique insulin glucose model that adjusts individually and in a personalized way for each patient form and being capable, at the same time, of predicting the future conditions with real time calculations, across few input parameters. This project of end of grade seeks to extract the information contained in a database of type 1 diabetics patients, obtained from clinical experimentation. For it, we will use technologies of Data Mining. For the attainment of the aim implicit to this project we have proceeded to implement a graphical interface that will guide us across the process of the KDD (with graphical and statistical information) of every point of the process. Regarding the data mining part, we have been served by a tool called WEKA's tool called, in which across Java, we control all of its functions to implement them by means of the created program. Finally granting a higher potential to the project with the possibility of implementing the service for Android devices, porting the code. Through these devices and what has been exposed in the project they might help or even create new and very useful applications for this field. As a conclusion of the project, and after an exhaustive analysis of the obtained results, we can show how we achieve to obtain the insulin–glucose model for each patient.