789 resultados para Multimedia Data Mining


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Vivimos en una sociedad en la que la información ha adquirido una vital importancia. El uso de Internet y el desarrollo de nuevos sistemas de la información han generado un ferviente interés tanto de empresas como de instituciones en la búsqueda de nuevos patrones que les proporcione la clave del éxito. La Analítica de Negocio reúne un conjunto de herramientas, estrategias y técnicas orientadas a la explotación de la información con el objetivo de crear conocimiento útil dentro de un marco de trabajo y facilitar la optimización de los recursos tanto de empresas como de instituciones. El presente proyecto se enmarca en lo que se conoce como Gestión Educativa. Se aplicará una arquitectura y modelo de trabajo similar a lo que se ha venido haciendo en los últimos años en el entorno empresarial con la Inteligencia de Negocio. Con esta variante, se pretende mejorar la calidad de la enseñanza, agilizar las decisiones dentro de la institución académica, fortalecer las capacidades del cuerpo docente y en definitiva favorecer el aprendizaje del alumnado. Para lograr el objetivo se ha decidido seguir las etapas del Knowledge Discovery in Databases (KDD), una de las metodologías más conocidas dentro de la Inteligencia de Negocio, que describe el procedimiento que va desde la selección de la información y su carga en sistemas de almacenamiento, hasta la aplicación de técnicas de minería de datos para la obtención nuevo conocimiento. Los estudios se realizan a partir de la información de la activad de los usuarios dentro la plataforma de Tele-Enseñanza de la Universidad Politécnica de Madrid (Moodle). Se desarrollan trabajos de extracción y preprocesado de la base de datos en crudo y se aplican técnicas de minería de datos. En la aplicación de técnicas de minería de datos, uno de los factores más importantes a tener en cuenta es el tipo de información que se va a tratar. Por este motivo, se trabaja con la Minería de Datos Educativa, en inglés, Educational Data Mining (EDM) que consiste en la aplicación de técnicas de minería optimizadas para la información que se genera en entornos educativos. Dentro de las posibilidades que ofrece el EDM, se ha decidido centrar los estudios en lo que se conoce como analítica predictiva. El objetivo fundamental es conocer la influencia que tienen las interacciones alumno-plataforma en las calificaciones finales y descubrir nuevas reglas que describan comportamientos que faciliten al profesorado discriminar si un estudiante va a aprobar o suspender la asignatura, de tal forma que se puedan tomar medidas que mejoren su rendimiento. Toda la información tratada en el presente proyecto ha sido previamente anonimizada para evitar cualquier tipo de intromisión que atente contra la privacidad de los elementos participantes en el estudio. ABSTRACT. We live in a society dominated by data. The use of the Internet accompanied by developments in information systems has generated a sustained interest among companies and institutions to discover new patterns to succeed in their business ventures. Business Analytics (BA) combines tools, strategies and techniques focused on exploiting the available information, to optimize resources and create useful insight. The current project is framed under Educational Management. A Business Intelligence (BI) architecture and business models taught up to date will be applied with the aim to accelerate the decision-making in academic institutions, strengthen teacher´s skills and ultimately improve the quality of teaching and learning. The best way to achieve this is to follow the Knowledge Discovery in Databases (KDD), one of the best-known methodologies in B.I. This process describes data preparation, selection, and cleansing through to the application of purely Data Mining Techniques in order to incorporate prior knowledge on data sets and interpret accurate solutions from the observed results. The studies will be performed using the information extracted from the Universidad Politécnica de Madrid Learning Management System (LMS), Moodle. The stored data is based on the user-platform interaction. The raw data will be extracted and pre-processed and afterwards, Data Mining Techniques will be applied. One of the crucial factors in the application of Data Mining Techniques is the kind of information that will be processed. For this reason, a new Data Mining perspective will be taken, called Educational Data Mining (EDM). EDM consists of the application of Data Mining Techniques but optimized for the raw data generated by the educational environment. Within EDM, we have decided to drive our research on what is called Predictive Analysis. The main purpose is to understand the influence of the user-platform interactions in the final grades of students and discover new patterns that explain their behaviours. This could allow teachers to intervene ahead of a student passing or failing, in such a way an action could be taken to improve the student performance. All the information processed has been previously anonymized to avoid the invasion of privacy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Stream-mining approach is defined as a set of cutting-edge techniques designed to process streams of data in real time, in order to extract knowledge. In the particular case of classification, stream-mining has to adapt its behaviour to the volatile underlying data distributions, what has been called concept drift. Moreover, it is important to note that concept drift may lead to situations where predictive models become invalid and have therefore to be updated to represent the actual concepts that data poses. In this context, there is a specific type of concept drift, known as recurrent concept drift, where the concepts represented by data have already appeared in the past. In those cases the learning process could be saved or at least minimized by applying a previously trained model. This could be extremely useful in ubiquitous environments that are characterized by the existence of resource constrained devices. To deal with the aforementioned scenario, meta-models can be used in the process of enhancing the drift detection mechanisms used by data stream algorithms, by representing and predicting when the change will occur. There are some real-world situations where a concept reappears, as in the case of intrusion detection systems (IDS), where the same incidents or an adaptation of them usually reappear over time. In these environments the early prediction of drift by means of a better knowledge of past models can help to anticipate to the change, thus improving efficiency of the model regarding the training instances needed. By means of using meta-models as a recurrent drift detection mechanism, the ability to share concepts representations among different data mining processes is open. That kind of exchanges could improve the accuracy of the resultant local model as such model may benefit from patterns similar to the local concept that were observed in other scenarios, but not yet locally. This would also improve the efficiency of training instances used during the classification process, as long as the exchange of models would aid in the application of already trained recurrent models, that have been previously seen by any of the collaborative devices. Which it is to say that the scope of recurrence detection and representation is broaden. In fact the detection, representation and exchange of concept drift patterns would be extremely useful for the law enforcement activities fighting against cyber crime. Being the information exchange one of the main pillars of cooperation, national units would benefit from the experience and knowledge gained by third parties. Moreover, in the specific scope of critical infrastructures protection it is crucial to count with information exchange mechanisms, both from a strategical and technical scope. The exchange of concept drift detection schemes in cyber security environments would aid in the process of preventing, detecting and effectively responding to threads in cyber space. Furthermore, as a complement of meta-models, a mechanism to assess the similarity between classification models is also needed when dealing with recurrent concepts. In this context, when reusing a previously trained model a rough comparison between concepts is usually made, applying boolean logic. The introduction of fuzzy logic comparisons between models could lead to a better efficient reuse of previously seen concepts, by applying not just equal models, but also similar ones. This work faces the aforementioned open issues by means of: the MMPRec system, that integrates a meta-model mechanism and a fuzzy similarity function; a collaborative environment to share meta-models between different devices; a recurrent drift generator that allows to test the usefulness of recurrent drift systems, as it is the case of MMPRec. Moreover, this thesis presents an experimental validation of the proposed contributions using synthetic and real datasets.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

El presente proyecto de fin de grado es uno de los resultados generados en un proyecto de financiación privada por parte de Telefónica consistente en el desarrollo y posterior implantación de un sistema para minería de datos de empresas presentes en Internet. Este TFG surge a partir de un proyecto que el grupo de investigación AICU-LABS (Mercator) de la UPM ha desarrollado para Telefónica, y tiene como elemento principal el desarrollo de Agentes web (también llamados robots software, “softbots” o “crawlers”) capaces de obtener datos de empresas a partir de sus CIF a través de internet. El listado de empresas nos los proporciona Telefónica, y está compuesto por empresas que no son clientes de Telefónica en la actualidad. Nuestra misión es proporcionarles los datos necesarios (principalmente teléfono, correo electrónico y dirección de la empresa) para la creación de una base de datos de potenciales clientes. Para llevar a cabo esta tarea, se ha realizado una aplicación que, a partir de los CIF que nos proporcionan, busque información en internet y extraiga aquella que nos interese. Además se han desarrollado sistemas de validación de datos para ayudarnos a descartar datos no válidos y clasificar los datos según su calidad para así maximizar la calidad de los datos producidos por el robot. La búsqueda de datos se hará tanto en bases de datos online como, en caso de localizarlas, las propias páginas web de las empresas. ABSTRACT This Final Degree Project is one of the results obtained from a project funded by Telefónica. This project consists on the development and subsequent implantation of a system which performs data mining on companies operating on the Internet. This document arises from a project the research group AICU-LABS (Mercator) from the Universidad Politécnica de Madrid has developed for Telefónica. The main goal of this project is the creation of web agents (also known as “crawlers” or “web spiders”) able to obtain data from businesses through the Internet, knowing only their VAT identification number. The list of companies is given by Telefónica, and it is composed by companies that are not Telefónica’s customers today. Our mission is to provide the data required (mainly phone, email and address of the company) to create a database of potential customers. To perform this task, we’ve developed an application that, starting with the given VAT numbers, searches the web for information and extracts the data sought. In addition, we have developed data validation systems, that are capable of discarding low quality data and also sorting the data according to their quality, to maximize the quality of the results produced by the robot. We’ll use both the companies’ websites and external databases as our sources of information.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Esta tesis presenta el diseño y la aplicación de una metodología que permite la determinación de los parámetros para la planificación de nodos e infraestructuras logísticas en un territorio, considerando además el impacto de estas en los diferentes componentes territoriales, así como en el desarrollo poblacional, el desarrollo económico y el medio ambiente, presentando así un avance en la planificación integral del territorio. La Metodología propuesta está basada en Minería de Datos, que permite el descubrimiento de patrones detrás de grandes volúmenes de datos previamente procesados. Las características propias de los datos sobre el territorio y los componentes que lo conforman hacen de los estudios territoriales un campo ideal para la aplicación de algunas de las técnicas de Minería de Datos, tales como los ´arboles decisión y las redes bayesianas. Los árboles de decisión permiten representar y categorizar de forma esquemática una serie de variables de predicción que ayudan al análisis de una variable objetivo. Las redes bayesianas representan en un grafo acíclico dirigido, un modelo probabilístico de variables distribuidas en padres e hijos, y la inferencia estadística que permite determinar la probabilidad de certeza de una hipótesis planteada, es decir, permiten construir modelos de probabilidad conjunta que presentan de manera gráfica las dependencias relevantes en un conjunto de datos. Al igual que con los árboles de decisión, la división del territorio en diferentes unidades administrativas hace de las redes bayesianas una herramienta potencial para definir las características físicas de alguna tipología especifica de infraestructura logística tomando en consideración las características territoriales, poblacionales y económicas del área donde se plantea su desarrollo y las posibles sinergias que se puedan presentar sobre otros nodos e infraestructuras logísticas. El caso de estudio seleccionado para la aplicación de la metodología ha sido la República de Panamá, considerando que este país presenta algunas características singulares, entra las que destacan su alta concentración de población en la Ciudad de Panamá; que a su vez a concentrado la actividad económica del país; su alto porcentaje de zonas protegidas, lo que ha limitado la vertebración del territorio; y el Canal de Panamá y los puertos de contenedores adyacentes al mismo. La metodología se divide en tres fases principales: Fase 1: Determinación del escenario de trabajo 1. Revisión del estado del arte. 2. Determinación y obtención de las variables de estudio. Fase 2: Desarrollo del modelo de inteligencia artificial 3. Construcción de los ´arboles de decisión. 4. Construcción de las redes bayesianas. Fase 3: Conclusiones 5. Determinación de las conclusiones. Con relación al modelo de planificación aplicado al caso de estudio, una vez aplicada la metodología, se estableció un modelo compuesto por 47 variables que definen la planificación logística de Panamá, el resto de variables se definen a partir de estas, es decir, conocidas estas, el resto se definen a través de ellas. Este modelo de planificación establecido a través de la red bayesiana considera los aspectos de una planificación sostenible: económica, social y ambiental; que crean sinergia con la planificación de nodos e infraestructuras logísticas. The thesis presents the design and application of a methodology that allows the determination of parameters for the planning of nodes and logistics infrastructure in a territory, besides considering the impact of these different territorial components, as well as the population growth, economic and environmental development. The proposed methodology is based on Data Mining, which allows the discovery of patterns behind large volumes of previously processed data. The own characteristics of the territorial data makes of territorial studies an ideal field of knowledge for the implementation of some of the Data Mining techniques, such as Decision Trees and Bayesian Networks. Decision trees categorize schematically a series of predictor variables of an analyzed objective variable. Bayesian Networks represent a directed acyclic graph, a probabilistic model of variables divided in fathers and sons, and statistical inference that allow determine the probability of certainty in a hypothesis. The case of study for the application of the methodology is the Republic of Panama. This country has some unique features: a high population density in the Panama City, a concentration of economic activity, a high percentage of protected areas, and the Panama Canal. The methodology is divided into three main phases: Phase 1: definition of the work stage. 1. Review of the State of the art. 2. Determination of the variables. Phase 2: Development of artificial intelligence model 3. Construction of decision trees. 4. Construction of Bayesian Networks. Phase 3: conclusions 5. Determination of the conclusions. The application of the methodology to the case study established a model composed of 47 variables that define the logistics planning for Panama. This model of planning established through the Bayesian network considers aspects of sustainable planning and simulates the synergies between the nodes and logistical infrastructure planning.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis is the result of a project whose objective has been to develop and deploy a dashboard for sentiment analysis of football in Twitter based on web components and D3.js. To do so, a visualisation server has been developed in order to present the data obtained from Twitter and analysed with Senpy. This visualisation server has been developed with Polymer web components and D3.js. Data mining has been done with a pipeline between Twitter, Senpy and ElasticSearch. Luigi have been used in this process because helps building complex pipelines of batch jobs, so it has analysed all tweets and stored them in ElasticSearch. To continue, D3.js has been used to create interactive widgets that make data easily accessible, this widgets will allow the user to interact with them and �filter the most interesting data for him. Polymer web components have been used to make this dashboard according to Google's material design and be able to show dynamic data in widgets. As a result, this project will allow an extensive analysis of the social network, pointing out the influence of players and teams and the emotions and sentiments that emerge in a lapse of time.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Esta dissertação visa apresentar o mapeamento do uso das teorias de sistemas de informações, usando técnicas de recuperação de informação e metodologias de mineração de dados e textos. As teorias abordadas foram Economia de Custos de Transações (Transactions Costs Economics TCE), Visão Baseada em Recursos da Firma (Resource-Based View-RBV) e Teoria Institucional (Institutional Theory-IT), sendo escolhidas por serem teorias de grande relevância para estudos de alocação de investimentos e implementação em sistemas de informação, tendo como base de dados o conteúdo textual (em inglês) do resumo e da revisão teórica dos artigos dos periódicos Information System Research (ISR), Management Information Systems Quarterly (MISQ) e Journal of Management Information Systems (JMIS) no período de 2000 a 2008. Os resultados advindos da técnica de mineração textual aliada à mineração de dados foram comparadas com a ferramenta de busca avançada EBSCO e demonstraram uma eficiência maior na identificação de conteúdo. Os artigos fundamentados nas três teorias representaram 10% do total de artigos dos três períodicos e o período mais profícuo de publicação foi o de 2001 e 2007.(AU)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Esta dissertação visa apresentar o mapeamento do uso das teorias de sistemas de informações, usando técnicas de recuperação de informação e metodologias de mineração de dados e textos. As teorias abordadas foram Economia de Custos de Transações (Transactions Costs Economics TCE), Visão Baseada em Recursos da Firma (Resource-Based View-RBV) e Teoria Institucional (Institutional Theory-IT), sendo escolhidas por serem teorias de grande relevância para estudos de alocação de investimentos e implementação em sistemas de informação, tendo como base de dados o conteúdo textual (em inglês) do resumo e da revisão teórica dos artigos dos periódicos Information System Research (ISR), Management Information Systems Quarterly (MISQ) e Journal of Management Information Systems (JMIS) no período de 2000 a 2008. Os resultados advindos da técnica de mineração textual aliada à mineração de dados foram comparadas com a ferramenta de busca avançada EBSCO e demonstraram uma eficiência maior na identificação de conteúdo. Os artigos fundamentados nas três teorias representaram 10% do total de artigos dos três períodicos e o período mais profícuo de publicação foi o de 2001 e 2007.(AU)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The Biomolecular Interaction Network Database (BIND; http://binddb.org) is a database designed to store full descriptions of interactions, molecular complexes and pathways. Development of the BIND 2.0 data model has led to the incorporation of virtually all components of molecular mechanisms including interactions between any two molecules composed of proteins, nucleic acids and small molecules. Chemical reactions, photochemical activation and conformational changes can also be described. Everything from small molecule biochemistry to signal transduction is abstracted in such a way that graph theory methods may be applied for data mining. The database can be used to study networks of interactions, to map pathways across taxonomic branches and to generate information for kinetic simulations. BIND anticipates the coming large influx of interaction information from high-throughput proteomics efforts including detailed information about post-translational modifications from mass spectrometry. Version 2.0 of the BIND data model is discussed as well as implementation, content and the open nature of the BIND project. The BIND data specification is available as ASN.1 and XML DTD.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

One challenge presented by large-scale genome sequencing efforts is effective display of uniform information to the scientific community. The Comprehensive Microbial Resource (CMR) contains robust annotation of all complete microbial genomes and allows for a wide variety of data retrievals. The bacterial information has been placed on the Web at http://www.tigr.org/CMR for retrieval using standard web browsing technology. Retrievals can be based on protein properties such as molecular weight or hydrophobicity, GC-content, functional role assignments and taxonomy. The CMR also has special web-based tools to allow data mining using pre-run homology searches, whole genome dot-plots, batch downloading and traversal across genomes using a variety of datatypes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We set out to define patterns of gene expression during kidney organogenesis by using high-density DNA array technology. Expression analysis of 8,740 rat genes revealed five discrete patterns or groups of gene expression during nephrogenesis. Group 1 consisted of genes with very high expression in the early embryonic kidney, many with roles in protein translation and DNA replication. Group 2 consisted of genes that peaked in midembryogenesis and contained many transcripts specifying proteins of the extracellular matrix. Many additional transcripts allied with groups 1 and 2 had known or proposed roles in kidney development and included LIM1, POD1, GFRA1, WT1, BCL2, Homeobox protein A11, timeless, pleiotrophin, HGF, HNF3, BMP4, TGF-α, TGF-β2, IGF-II, met, FGF7, BMP4, and ganglioside-GD3. Group 3 consisted of transcripts that peaked in the neonatal period and contained a number of retrotransposon RNAs. Group 4 contained genes that steadily increased in relative expression levels throughout development, including many genes involved in energy metabolism and transport. Group 5 consisted of genes with relatively low levels of expression throughout embryogenesis but with markedly higher levels in the adult kidney; this group included a heterogeneous mix of transporters, detoxification enzymes, and oxidative stress genes. The data suggest that the embryonic kidney is committed to cellular proliferation and morphogenesis early on, followed sequentially by extracellular matrix deposition and acquisition of markers of terminal differentiation. The neonatal burst of retrotransposon mRNA was unexpected and may play a role in a stress response associated with birth. Custom analytical tools were developed including “The Equalizer” and “eBlot,” which contain improved methods for data normalization, significance testing, and data mining.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Em virtude de uma elevada expectativa de vida mundial, faz-se crescente a probabilidade de ocorrer acidentes naturais e traumas físicos no cotidiano, o que ocasiona um aumento na demanda por reabilitação. A terapia física, sob o paradigma da reabilitação robótica com serious games, oferece maior motivação e engajamento do paciente ao tratamento, cujo emprego foi recomendado pela American Heart Association (AHA), apontando a mais alta avaliação (Level A) para pacientes internados e ambulatoriais. No entanto, o potencial de análise dos dados coletados pelos dispositivos robóticos envolvidos é pouco explorado, deixando de extrair informações que podem ser de grande valia para os tratamentos. O foco deste trabalho consiste na aplicação de técnicas para descoberta de conhecimento, classificando o desempenho de pacientes diagnosticados com hemiparesia crônica. Os pacientes foram inseridos em um ambiente de reabilitação robótica, fazendo uso do InMotion ARM, um dispositivo robótico para reabilitação de membros superiores e coleta dos dados de desempenho. Foi aplicado sobre os dados um roteiro para descoberta de conhecimento em bases de dados, desempenhando pré-processamento, transformação (extração de características) e então a mineração de dados a partir de algoritmos de aprendizado de máquina. A estratégia do presente trabalho culminou em uma classificação de padrões com a capacidade de distinguir lados hemiparéticos sob uma precisão de 94%, havendo oito atributos alimentando a entrada do mecanismo obtido. Interpretando esta coleção de atributos, foi observado que dados de força são mais significativos, os quais abrangem metade da composição de uma amostra.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Devido às tendências de crescimento da quantidade de dados processados e a crescente necessidade por computação de alto desempenho, mudanças significativas estão acontecendo no projeto de arquiteturas de computadores. Com isso, tem-se migrado do paradigma sequencial para o paralelo, com centenas ou milhares de núcleos de processamento em um mesmo chip. Dentro desse contexto, o gerenciamento de energia torna-se cada vez mais importante, principalmente em sistemas embarcados, que geralmente são alimentados por baterias. De acordo com a Lei de Moore, o desempenho de um processador dobra a cada 18 meses, porém a capacidade das baterias dobra somente a cada 10 anos. Esta situação provoca uma enorme lacuna, que pode ser amenizada com a utilização de arquiteturas multi-cores heterogêneas. Um desafio fundamental que permanece em aberto para estas arquiteturas é realizar a integração entre desenvolvimento de código embarcado, escalonamento e hardware para gerenciamento de energia. O objetivo geral deste trabalho de doutorado é investigar técnicas para otimização da relação desempenho/consumo de energia em arquiteturas multi-cores heterogêneas single-ISA implementadas em FPGA. Nesse sentido, buscou-se por soluções que obtivessem o melhor desempenho possível a um consumo de energia ótimo. Isto foi feito por meio da combinação de mineração de dados para a análise de softwares baseados em threads aliadas às técnicas tradicionais para gerenciamento de energia, como way-shutdown dinâmico, e uma nova política de escalonamento heterogeneity-aware. Como principais contribuições pode-se citar a combinação de técnicas de gerenciamento de energia em diversos níveis como o nível do hardware, do escalonamento e da compilação; e uma política de escalonamento integrada com uma arquitetura multi-core heterogênea em relação ao tamanho da memória cache L1.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

En el siguiente Trabajo de Fin de Máster se pone en práctica la Minería de Datos (Data Mining), llevando a cabo una investigación de CRM (Customer Relationship Management) en la cual se analizan los comportamientos de compra de los clientes de una empresa que comercializa solo por internet (online). Este negocio es de origen español y mediante estos análisis podremos saber principalmente cuántos tipos de clientes posee y cómo son sus hábitos de compra para poder clasificarlos. Para ello, utilizaremos la segmentación RFM (Recency, Frequency, Monetary) que la calcularemos mediante dos metodologías muy importantes que son el Método Convencional y el Método de las 2-Tuplas. En el primer método realizaremos una clasificación numérica mediante quintiles que se numerarán de 1 a 5 tanto para la Recencia, la Frecuencia y el Valor Monetario, con los que podremos determinar el comportamiento de compra de cada cliente. En el segundo método veremos otra clasificación de los clientes más precisa, más detallada y con la ventaja que ofrece un valor lingüístico para poder entender mejor a que cluster pertenece cada cliente. Finalmente, realizaremos unos análisis de clusters con el método de “K-medias” con diferentes segmentos (entre 5 y 7 segmentos) que nos permitirán distinguir cuántos tipos de clientes tiene este negocio y cómo son con respecto a su hábito de compra. Todo esto con el fin de dar respuesta a este negocio sobre cómo es el comportamiento de compra de cada cliente, cuáles son los más importantes, cuáles son los menos importantes, cuántos han dejado de comprar, etc.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Geographic knowledge discovery (GKD) is the process of extracting information and knowledge from massive georeferenced databases. Usually the process is accomplished by two different systems, the Geographic Information Systems (GIS) and the data mining engines. However, the development of those systems is a complex task due to it does not follow a systematic, integrated and standard methodology. To overcome these pitfalls, in this paper, we propose a modeling framework that addresses the development of the different parts of a multilayer GKD process. The main advantages of our framework are that: (i) it reduces the design effort, (ii) it improves quality systems obtained, (iii) it is independent of platforms, (iv) it facilitates the use of data mining techniques on geo-referenced data, and finally, (v) it ameliorates the communication between different users.