9 resultados para data complexity
em Universidad Politécnica de Madrid
Resumo:
Valoración de la transferencia temporal de los modelos de distribución de especies para su aplicación en nuestros días utilizando datos paleobotánicos Corilus avellana y Alnus glutinosa.
Resumo:
Data grid services have been used to deal with the increasing needs of applications in terms of data volume and throughput. The large scale, heterogeneity and dynamism of grid environments often make management and tuning of these data services very complex. Furthermore, current high-performance I/O approaches are characterized by their high complexity and specific features that usually require specialized administrator skills. Autonomic computing can help manage this complexity. The present paper describes an autonomic subsystem intended to provide self-management features aimed at efficiently reducing the I/O problem in a grid environment, thereby enhancing the quality of service (QoS) of data access and storage services in the grid. Our proposal takes into account that data produced in an I/O system is not usually immediately required. Therefore, performance improvements are related not only to current but also to any future I/O access, as the actual data access usually occurs later on. Nevertheless, the exact time of the next I/O operations is unknown. Thus, our approach proposes a long-term prediction designed to forecast the future workload of grid components. This enables the autonomic subsystem to determine the optimal data placement to improve both current and future I/O operations.
Resumo:
During the last years cities around the world have invested important quantities of money in measures for reducing congestion and car-trips. Investments which are nothing but potential solutions for the well-known urban sprawl phenomenon, also called the “development trap” that leads to further congestion and a higher proportion of our time spent in slow moving cars. Over the path of this searching for solutions, the complex relationship between urban environment and travel behaviour has been studied in a number of cases. The main question on discussion is, how to encourage multi-stop tours? Thus, the objective of this paper is to verify whether unobserved factors influence tour complexity. For this purpose, we use a data-base from a survey conducted in 2006-2007 in Madrid, a suitable case study for analyzing urban sprawl due to new urban developments and substantial changes in mobility patterns in the last years. A total of 943 individuals were interviewed from 3 selected neighbourhoods (CBD, urban and suburban). We study the effect of unobserved factors on trip frequency. This paper present the estimation of an hybrid model where the latent variable is called propensity to travel and the discrete choice model is composed by 5 alternatives of tour type. The results show that characteristics of the neighbourhoods in Madrid are important to explain trip frequency. The influence of land use variables on trip generation is clear and in particular the presence of commercial retails. Through estimation of elasticities and forecasting we determine to what extent land-use policy measures modify travel demand. Comparing aggregate elasticities with percentage variations, it can be seen that percentage variations could lead to inconsistent results. The result shows that hybrid models better explain travel behavior than traditional discrete choice models.
Resumo:
This paper reports on an innovative approach that aims to reduce information management costs in data-intensive and cognitively-complex biomedical environments. Recognizing the importance of prominent high-performance computing paradigms and large data processing technologies as well as collaboration support systems to remedy data-intensive issues, it adopts a hybrid approach by building on the synergy of these technologies. The proposed approach provides innovative Web-based workbenches that integrate and orchestrate a set of interoperable services that reduce the data-intensiveness and complexity overload at critical decision points to a manageable level, thus permitting stakeholders to be more productive and concentrate on creative activities.
Resumo:
The Semantics Difficulty Model (SDM) is a model that measures the difficult of introducing semantics technology into a company. SDM manages three descriptions of stages, which we will refer to as ?snapshots?: a company semantic snapshot, data snapshot and semantic application snapshot. Understanding a priory the complexity of introducing semantics into a company is important because it allows the organization to take early decisions, thus saving time and money, mitigating risks and improving innovation, time to market and productivity. SDM works by measuring the distance between each initial snapshot and its reference models (the company semantic snapshots reference model, data snapshots reference model, and the semantic application snapshots reference model) with Euclidian distances. The difficulty level will be "not at all difficult" when the distance is small, and becomes "extremely difficult" when the the distance is large. SDM has been tested experimentally with 2000 simulated companies with arrangements and several initial stages. The output is measured by five linguistic values: "not at all difficult, slightly difficult, averagely difficult, very difficult and extremely difficult". As the preliminary results of our SDM simulation model indicate, transforming a search application into integrated data from different sources with semantics is a "slightly difficult", in contrast with data and opinion extraction applications for which it is "very difficult".
Resumo:
There is an increasing tendency of turning the current power grid, essentially unaware of variations in electricity demand and scattered energy sources, into something capable of bringing a degree of intelligence by using tools strongly related to information and communication technologies, thus turning into the so-called Smart Grid. In fact, it could be considered that the Smart Grid is an extensive smart system that spreads throughout any area where power is required, providing a significant optimization in energy generation, storage and consumption. However, the information that must be treated to accomplish these tasks is challenging both in terms of complexity (semantic features, distributed systems, suitable hardware) and quantity (consumption data, generation data, forecasting functionalities, service reporting), since the different energy beneficiaries are prone to be heterogeneous, as the nature of their own activities is. This paper presents a proposal on how to deal with these issues by using a semantic middleware architecture that integrates different components focused on specific tasks, and how it is used to handle information at every level and satisfy end user requests.
Resumo:
Probabilistic graphical models are a huge research field in artificial intelligence nowadays. The scope of this work is the study of directed graphical models for the representation of discrete distributions. Two of the main research topics related to this area focus on performing inference over graphical models and on learning graphical models from data. Traditionally, the inference process and the learning process have been treated separately, but given that the learned models structure marks the inference complexity, this kind of strategies will sometimes produce very inefficient models. With the purpose of learning thinner models, in this master thesis we propose a new model for the representation of network polynomials, which we call polynomial trees. Polynomial trees are a complementary representation for Bayesian networks that allows an efficient evaluation of the inference complexity and provides a framework for exact inference. We also propose a set of methods for the incremental compilation of polynomial trees and an algorithm for learning polynomial trees from data using a greedy score+search method that includes the inference complexity as a penalization in the scoring function.
Resumo:
La gran cantidad de datos que se registran diariamente en los sistemas de base de datos de las organizaciones ha generado la necesidad de analizarla. Sin embargo, se enfrentan a la complejidad de procesar enormes volúmenes de datos a través de métodos tradicionales de análisis. Además, dentro de un contexto globalizado y competitivo las organizaciones se mantienen en la búsqueda constante de mejorar sus procesos, para lo cual requieren herramientas que les permitan tomar mejores decisiones. Esto implica estar mejor informado y conocer su historia digital para describir sus procesos y poder anticipar (predecir) eventos no previstos. Estos nuevos requerimientos de análisis de datos ha motivado el desarrollo creciente de proyectos de minería de datos. El proceso de minería de datos busca obtener desde un conjunto masivo de datos, modelos que permitan describir los datos o predecir nuevas instancias en el conjunto. Implica etapas de: preparación de los datos, procesamiento parcial o totalmente automatizado para identificar modelos en los datos, para luego obtener como salida patrones, relaciones o reglas. Esta salida debe significar un nuevo conocimiento para la organización, útil y comprensible para los usuarios finales, y que pueda ser integrado a los procesos para apoyar la toma de decisiones. Sin embargo, la mayor dificultad es justamente lograr que el analista de datos, que interviene en todo este proceso, pueda identificar modelos lo cual es una tarea compleja y muchas veces requiere de la experiencia, no sólo del analista de datos, sino que también del experto en el dominio del problema. Una forma de apoyar el análisis de datos, modelos y patrones es a través de su representación visual, utilizando las capacidades de percepción visual del ser humano, la cual puede detectar patrones con mayor facilidad. Bajo este enfoque, la visualización ha sido utilizada en minería datos, mayormente en el análisis descriptivo de los datos (entrada) y en la presentación de los patrones (salida), dejando limitado este paradigma para el análisis de modelos. El presente documento describe el desarrollo de la Tesis Doctoral denominada “Nuevos Esquemas de Visualizaciones para Mejorar la Comprensibilidad de Modelos de Data Mining”. Esta investigación busca aportar con un enfoque de visualización para apoyar la comprensión de modelos minería de datos, para esto propone la metáfora de modelos visualmente aumentados. ABSTRACT The large amount of data to be recorded daily in the systems database of organizations has generated the need to analyze it. However, faced with the complexity of processing huge volumes of data over traditional methods of analysis. Moreover, in a globalized and competitive environment organizations are kept constantly looking to improve their processes, which require tools that allow them to make better decisions. This involves being bettered informed and knows your digital story to describe its processes and to anticipate (predict) unanticipated events. These new requirements of data analysis, has led to the increasing development of data-mining projects. The data-mining process seeks to obtain from a massive data set, models to describe the data or predict new instances in the set. It involves steps of data preparation, partially or fully automated processing to identify patterns in the data, and then get output patterns, relationships or rules. This output must mean new knowledge for the organization, useful and understandable for end users, and can be integrated into the process to support decision-making. However, the biggest challenge is just getting the data analyst involved in this process, which can identify models is complex and often requires experience not only of the data analyst, but also the expert in the problem domain. One way to support the analysis of the data, models and patterns, is through its visual representation, i.e., using the capabilities of human visual perception, which can detect patterns easily in any context. Under this approach, the visualization has been used in data mining, mostly in exploratory data analysis (input) and the presentation of the patterns (output), leaving limited this paradigm for analyzing models. This document describes the development of the doctoral thesis entitled "New Visualizations Schemes to Improve Understandability of Data-Mining Models". This research aims to provide a visualization approach to support understanding of data mining models for this proposed metaphor visually enhanced models.
Resumo:
PURPOSE The decision-making process plays a key role in organizations. Every decision-making process produces a final choice that may or may not prompt action. Recurrently, decision makers find themselves in the dichotomous question of following a traditional sequence decision-making process where the output of a decision is used as the input of the next stage of the decision, or following a joint decision-making approach where several decisions are taken simultaneously. The implication of the decision-making process will impact different players of the organization. The choice of the decision- making approach becomes difficult to find, even with the current literature and practitioners’ knowledge. The pursuit of better ways for making decisions has been a common goal for academics and practitioners. Management scientists use different techniques and approaches to improve different types of decisions. The purpose of this decision is to use the available resources as well as possible (data and techniques) to achieve the objectives of the organization. The developing and applying of models and concepts may be helpful to solve managerial problems faced every day in different companies. As a result of this research different decision models are presented to contribute to the body of knowledge of management science. The first models are focused on the manufacturing industry and the second part of the models on the health care industry. Despite these models being case specific, they serve the purpose of exemplifying that different approaches to the problems and could provide interesting results. Unfortunately, there is no universal recipe that could be applied to all the problems. Furthermore, the same model could deliver good results with certain data and bad results for other data. A framework to analyse the data before selecting the model to be used is presented and tested in the models developed to exemplify the ideas. METHODOLOGY As the first step of the research a systematic literature review on the joint decision is presented, as are the different opinions and suggestions of different scholars. For the next stage of the thesis, the decision-making process of more than 50 companies was analysed in companies from different sectors in the production planning area at the Job Shop level. The data was obtained using surveys and face-to-face interviews. The following part of the research into the decision-making process was held in two application fields that are highly relevant for our society; manufacturing and health care. The first step was to study the interactions and develop a mathematical model for the replenishment of the car assembly where the problem of “Vehicle routing problem and Inventory” were combined. The next step was to add the scheduling or car production (car sequencing) decision and use some metaheuristics such as ant colony and genetic algorithms to measure if the behaviour is kept up with different case size problems. A similar approach is presented in a production of semiconductors and aviation parts, where a hoist has to change from one station to another to deal with the work, and a jobs schedule has to be done. However, for this problem simulation was used for experimentation. In parallel, the scheduling of operating rooms was studied. Surgeries were allocated to surgeons and the scheduling of operating rooms was analysed. The first part of the research was done in a Teaching hospital, and for the second part the interaction of uncertainty was added. Once the previous problem had been analysed a general framework to characterize the instance was built. In the final chapter a general conclusion is presented. FINDINGS AND PRACTICAL IMPLICATIONS The first part of the contributions is an update of the decision-making literature review. Also an analysis of the possible savings resulting from a change in the decision process is made. Then, the results of the survey, which present a lack of consistency between what the managers believe and the reality of the integration of their decisions. In the next stage of the thesis, a contribution to the body of knowledge of the operation research, with the joint solution of the replenishment, sequencing and inventory problem in the assembly line is made, together with a parallel work with the operating rooms scheduling where different solutions approaches are presented. In addition to the contribution of the solving methods, with the use of different techniques, the main contribution is the framework that is proposed to pre-evaluate the problem before thinking of the techniques to solve it. However, there is no straightforward answer as to whether it is better to have joint or sequential solutions. Following the proposed framework with the evaluation of factors such as the flexibility of the answer, the number of actors, and the tightness of the data, give us important hints as to the most suitable direction to take to tackle the problem. RESEARCH LIMITATIONS AND AVENUES FOR FUTURE RESEARCH In the first part of the work it was really complicated to calculate the possible savings of different projects, since in many papers these quantities are not reported or the impact is based on non-quantifiable benefits. The other issue is the confidentiality of many projects where the data cannot be presented. For the car assembly line problem more computational power would allow us to solve bigger instances. For the operation research problem there was a lack of historical data to perform a parallel analysis in the teaching hospital. In order to keep testing the decision framework it is necessary to keep applying more case studies in order to generalize the results and make them more evident and less ambiguous. The health care field offers great opportunities since despite the recent awareness of the need to improve the decision-making process there are many opportunities to improve. Another big difference with the automotive industry is that the last improvements are not spread among all the actors. Therefore, in the future this research will focus more on the collaboration between academia and the health care sector.