895 resultados para data types and operators
Resumo:
This paper describes seagrass species and percentage cover point-based field data sets derived from georeferenced photo transects. Annually or biannually over a ten year period (2004-2015) data sets were collected using 30-50 transects, 500-800 m in length distributed across a 142 km**2 shallow, clear water seagrass habitat, the Eastern Banks, Moreton Bay, Australia. Each of the eight data sets include seagrass property information derived from approximately 3000 georeferenced, downward looking photographs captured at 2-4 m intervals along the transects. Photographs were manually interpreted to estimate seagrass species composition and percentage cover (Coral Point Count excel; CPCe). Understanding seagrass biology, ecology and dynamics for scientific and management purposes requires point-based data on species composition and cover. This data set, and the methods used to derive it are a globally unique example for seagrass ecological applications. It provides the basis for multiple further studies at this site, regional to global comparative studies, and, for the design of similar monitoring programs elsewhere.
Resumo:
Sedimentary sequences in ancient or long-lived lakes can reach several thousands of meters in thickness and often provide an unrivalled perspective of the lake's regional climatic, environmental, and biological history. Over the last few years, deep-drilling projects in ancient lakes became increasingly multi- and interdisciplinary, as, among others, seismological, sedimentological, biogeochemical, climatic, environmental, paleontological, and evolutionary information can be obtained from sediment cores. However, these multi- and interdisciplinary projects pose several challenges. The scientists involved typically approach problems from different scientific perspectives and backgrounds, and setting up the program requires clear communication and the alignment of interests. One of the most challenging tasks, besides the actual drilling operation, is to link diverse datasets with varying resolution, data quality, and age uncertainties to answer interdisciplinary questions synthetically and coherently. These problems are especially relevant when secondary data, i.e., datasets obtained independently of the drilling operation, are incorporated in analyses. Nonetheless, the inclusion of secondary information, such as isotopic data from fossils found in outcrops or genetic data from extant species, may help to achieve synthetic answers. Recent technological and methodological advances in paleolimnology are likely to increase the possibilities of integrating secondary information. Some of the new approaches have started to revolutionize scientific drilling in ancient lakes, but at the same time, they also add a new layer of complexity to the generation and analysis of sediment-core data. The enhanced opportunities presented by new scientific approaches to study the paleolimnological history of these lakes, therefore, come at the expense of higher logistic, communication, and analytical efforts. Here we review types of data that can be obtained in ancient lake drilling projects and the analytical approaches that can be applied to empirically and statistically link diverse datasets to create an integrative perspective on geological and biological data. In doing so, we highlight strengths and potential weaknesses of new methods and analyses, and provide recommendations for future interdisciplinary deep-drilling projects.
Resumo:
Measures have been developed to understand tendencies in the distribution of economic activity. The merits of these measures are in the convenience of data collection and processing. In this interim report, investigating the property of such measures to determine the geographical spread of economic activities, we summarize the merits and limitations of measures, and make clear that we must apply caution in their usage. As a first trial to access areal data, this project focus on administrative areas, not on point data and input-output data. Firm level data is not within the scope of this article. The rest of this article is organized as follows. In Section 2, we touch on the the limitations and problems associated with the measures and areal data. Specific measures are introduced in Section 3, and applied in Section 4. The conclusion summarizes the findings and discusses future work.
Resumo:
This study compares the innovation process of a privately-owned enterprise and a state-owned enterprise in China using their patent data. Huawei and ZTE were selected for this study because they experienced the same historical environment in the same industry from the same region in China leaving their owner types as their critical difference. This study investigates the difference in the innovation process in R&D between a privately-owned and a state-owned enterprise by analyzing (1) domestic and international patent application pattern, (2) co-application and co-applicants, (3) knowledge accumulation inside Huawei and ZTE, and (4) knowledge spillover to domestic and foreign firms.
Resumo:
In a series of attempts to research and document relevant sloshing type phenomena, a series of experiments have been conducted. The aim of this paper is to describe the setup and data processing of such experiments. A sloshing tank is subjected to angular motion. As a result pressure registers are obtained at several locations, together with the motion data, torque and a collection of image and video information. The experimental rig and the data acquisition systems are described. Useful information for experimental sloshing research practitioners is provided. This information is related to the liquids used in the experiments, the dying techniques, tank building processes, synchronization of acquisition systems, etc. A new procedure for reconstructing experimental data, that takes into account experimental uncertainties, is presented. This procedure is based on a least squares spline approximation of the data. Based on a deterministic approach to the first sloshing wave impact event in a sloshing experiment, an uncertainty analysis procedure of the associated first pressure peak value is described.
Resumo:
The synapses in the cerebral cortex can be classified into two main types, Gray’s type I and type II, which correspond to asymmetric (mostly glutamatergic excitatory) and symmetric (inhibitory GABAergic) synapses, respectively. Hence, the quantification and identification of their different types and the proportions in which they are found, is extraordinarily important in terms of brain function. The ideal approach to calculate the number of synapses per unit volume is to analyze 3D samples reconstructed from serial sections. However, obtaining serial sections by transmission electron microscopy is an extremely time consuming and technically demanding task. Using focused ion beam/scanning electron microscope microscopy, we recently showed that virtually all synapses can be accurately identified as asymmetric or symmetric synapses when they are visualized, reconstructed, and quantified from large 3D tissue samples obtained in an automated manner. Nevertheless, the analysis, segmentation, and quantification of synapses is still a labor intensive procedure. Thus, novel solutions are currently necessary to deal with the large volume of data that is being generated by automated 3D electron microscopy. Accordingly, we have developed ESPINA, a software tool that performs the automated segmentation and counting of synapses in a reconstructed 3D volume of the cerebral cortex, and that greatly facilitates and accelerates these processes.
Resumo:
Sensor networks are increasingly becoming one of the main sources of Big Data on the Web. However, the observations that they produce are made available with heterogeneous schemas, vocabularies and data formats, making it difficult to share and reuse these data for other purposes than those for which they were originally set up. In this thesis we address these challenges, considering how we can transform streaming raw data to rich ontology-based information that is accessible through continuous queries for streaming data. Our main contribution is an ontology-based approach for providing data access and query capabilities to streaming data sources, allowing users to express their needs at a conceptual level, independent of implementation and language-specific details. We introduce novel query rewriting and data translation techniques that rely on mapping definitions relating streaming data models to ontological concepts. Specific contributions include: • The syntax and semantics of the SPARQLStream query language for ontologybased data access, and a query rewriting approach for transforming SPARQLStream queries into streaming algebra expressions. • The design of an ontology-based streaming data access engine that can internally reuse an existing data stream engine, complex event processor or sensor middleware, using R2RML mappings for defining relationships between streaming data models and ontology concepts. Concerning the sensor metadata of such streaming data sources, we have investigated how we can use raw measurements to characterize streaming data, producing enriched data descriptions in terms of ontological models. Our specific contributions are: • A representation of sensor data time series that captures gradient information that is useful to characterize types of sensor data. • A method for classifying sensor data time series and determining the type of data, using data mining techniques, and a method for extracting semantic sensor metadata features from the time series.
Resumo:
The development of new-generation intelligent vehicle technologies will lead to a better level of road safety and CO2 emission reductions. However, the weak point of all these systems is their need for comprehensive and reliable data. For traffic data acquisition, two sources are currently available: 1) infrastructure sensors and 2) floating vehicles. The former consists of a set of fixed point detectors installed in the roads, and the latter consists of the use of mobile probe vehicles as mobile sensors. However, both systems still have some deficiencies. The infrastructure sensors retrieve information fromstatic points of the road, which are spaced, in some cases, kilometers apart. This means that the picture of the actual traffic situation is not a real one. This deficiency is corrected by floating cars, which retrieve dynamic information on the traffic situation. Unfortunately, the number of floating data vehicles currently available is too small and insufficient to give a complete picture of the road traffic. In this paper, we present a floating car data (FCD) augmentation system that combines information fromfloating data vehicles and infrastructure sensors, and that, by using neural networks, is capable of incrementing the amount of FCD with virtual information. This system has been implemented and tested on actual roads, and the results show little difference between the data supplied by the floating vehicles and the virtual vehicles.
Resumo:
This paper describes the process followed in order to make some of the public meterological data from the Agencia Estatal de Meteorología (AEMET, Spanish Meteorological Office) available as Linked Data. The method followed has been already used to publish geographical, statistical, and leisure data. The data selected for publication are generated every ten minutes by the 250 automatic stations that belong to AEMET and that are deployed across Spain. These data are available as spreadsheets in the AEMET data catalog, and contain more than twenty types of measurements per station. Spreadsheets are retrieved from the website, processed with Python scripts, transformed to RDF according to an ontology network about meteorology that reuses the W3C SSN Ontology, published in a triple store and visualized in maps with Map4rdf.
Resumo:
Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.
Resumo:
Leaf nitrogen and leaf surface area influence the exchange of gases between terrestrial ecosystems and the atmosphere, and play a significant role in the global cycles of carbon, nitrogen and water. The purpose of this study is to use field-based and satellite remote-sensing-based methods to assess leaf nitrogen pools in five diverse European agricultural landscapes located in Denmark, Scotland (United Kingdom), Poland, the Netherlands and Italy. REGFLEC (REGularized canopy reFLECtance) is an advanced image-based inverse canopy radiative transfer modelling system which has shown proficiency for regional mapping of leaf area index (LAI) and leaf chlorophyll (CHLl) using remote sensing data. In this study, high spatial resolution (10–20 m) remote sensing images acquired from the multispectral sensors aboard the SPOT (Satellite For Observation of Earth) satellites were used to assess the capability of REGFLEC for mapping spatial variations in LAI, CHLland the relation to leaf nitrogen (Nl) data in five diverse European agricultural landscapes. REGFLEC is based on physical laws and includes an automatic model parameterization scheme which makes the tool independent of field data for model calibration. In this study, REGFLEC performance was evaluated using LAI measurements and non-destructive measurements (using a SPAD meter) of leaf-scale CHLl and Nl concentrations in 93 fields representing crop- and grasslands of the five landscapes. Furthermore, empirical relationships between field measurements (LAI, CHLl and Nl and five spectral vegetation indices (the Normalized Difference Vegetation Index, the Simple Ratio, the Enhanced Vegetation Index-2, the Green Normalized Difference Vegetation Index, and the green chlorophyll index) were used to assess field data coherence and to serve as a comparison basis for assessing REGFLEC model performance. The field measurements showed strong vertical CHLl gradient profiles in 26% of fields which affected REGFLEC performance as well as the relationships between spectral vegetation indices (SVIs) and field measurements. When the range of surface types increased, the REGFLEC results were in better agreement with field data than the empirical SVI regression models. Selecting only homogeneous canopies with uniform CHLl distributions as reference data for evaluation, REGFLEC was able to explain 69% of LAI observations (rmse = 0.76), 46% of measured canopy chlorophyll contents (rmse = 719 mg m−2) and 51% of measured canopy nitrogen contents (rmse = 2.7 g m−2). Better results were obtained for individual landscapes, except for Italy, where REGFLEC performed poorly due to a lack of dense vegetation canopies at the time of satellite recording. Presence of vegetation is needed to parameterize the REGFLEC model. Combining REGFLEC- and SVI-based model results to minimize errors for a "snap-shot" assessment of total leaf nitrogen pools in the five landscapes, results varied from 0.6 to 4.0 t km−2. Differences in leaf nitrogen pools between landscapes are attributed to seasonal variations, extents of agricultural area, species variations, and spatial variations in nutrient availability. In order to facilitate a substantial assessment of variations in Nl pools and their relation to landscape based nitrogen and carbon cycling processes, time series of satellite data are needed. The upcoming Sentinel-2 satellite mission will provide new multiple narrowband data opportunities at high spatio-temporal resolution which are expected to further improve remote sensing capabilities for mapping LAI, CHLl and Nl.
Resumo:
Replication Data Management (RDM) aims at enabling the use of data collections from several iterations of an experiment. However, there are several major challenges to RDM from integrating data models and data from empirical study infrastructures that were not designed to cooperate, e.g., data model variation of local data sources. [Objective] In this paper we analyze RDM needs and evaluate conceptual RDM approaches to support replication researchers. [Method] We adapted the ATAM evaluation process to (a) analyze RDM use cases and needs of empirical replication study research groups and (b) compare three conceptual approaches to address these RDM needs: central data repositories with a fixed data model, heterogeneous local repositories, and an empirical ecosystem. [Results] While the central and local approaches have major issues that are hard to resolve in practice, the empirical ecosystem allows bridging current gaps in RDM from heterogeneous data sources. [Conclusions] The empirical ecosystem approach should be explored in diverse empirical environments.
Resumo:
By 2050 it is estimated that the number of worldwide Alzheimer?s disease (AD) patients will quadruple from the current number of 36 million people. To date, no single test, prior to postmortem examination, can confirm that a person suffers from AD. Therefore, there is a strong need for accurate and sensitive tools for the early diagnoses of AD. The complex etiology and multiple pathogenesis of AD call for a system-level understanding of the currently available biomarkers and the study of new biomarkers via network-based modeling of heterogeneous data types. In this review, we summarize recent research on the study of AD as a connectivity syndrome. We argue that a network-based approach in biomarker discovery will provide key insights to fully understand the network degeneration hypothesis (disease starts in specific network areas and progressively spreads to connected areas of the initial loci-networks) with a potential impact for early diagnosis and disease-modifying treatments. We introduce a new framework for the quantitative study of biomarkers that can help shorten the transition between academic research and clinical diagnosis in AD.
Resumo:
Recientemente, el paradigma de la computación en la nube ha recibido mucho interés por parte tanto de la industria como del mundo académico. Las infraestructuras cloud públicas están posibilitando nuevos modelos de negocio y ayudando a reducir costes. Sin embargo, una compañía podría desear ubicar sus datos y servicios en sus propias instalaciones, o tener que atenerse a leyes de protección de datos. Estas circunstancias hacen a las infraestructuras cloud privadas ciertamente deseables, ya sea para complementar a las públicas o para sustituirlas por completo. Por desgracia, las carencias en materia de estándares han impedido que las soluciones para la gestión de infraestructuras privadas se hayan desarrollado adecuadamente. Además, la multitud de opciones disponibles ha creado en los clientes el miedo a depender de una tecnología concreta (technology lock-in). Una de las causas de este problema es la falta de alineación entre la investigación académica y los productos comerciales, ya que aquella está centrada en el estudio de escenarios idealizados sin correspondencia con el mundo real, mientras que éstos consisten en soluciones desarrolladas sin tener en cuenta cómo van a encajar con los estándares más comunes o sin preocuparse de hacer públicos sus resultados. Con objeto de resolver este problema, propongo un sistema de gestión modular para infraestructuras cloud privadas enfocado en tratar con las aplicaciones en lugar de centrarse únicamente en los recursos hardware. Este sistema de gestión sigue el paradigma de la computación autónoma y está diseñado en torno a un modelo de información sencillo, desarrollado para ser compatible con los estándares más comunes. Este modelo divide el entorno en dos vistas, que sirven para separar aquello que debe preocupar a cada actor involucrado del resto de información, pero al mismo tiempo permitiendo relacionar el entorno físico con las máquinas virtuales que se despliegan encima de él. En dicho modelo, las aplicaciones cloud están divididas en tres tipos genéricos (Servicios, Trabajos de Big Data y Reservas de Instancias), para que así el sistema de gestión pueda sacar partido de las características propias de cada tipo. El modelo de información está complementado por un conjunto de acciones de gestión atómicas, reversibles e independientes, que determinan las operaciones que se pueden llevar a cabo sobre el entorno y que es usado para hacer posible la escalabilidad en el entorno. También describo un motor de gestión encargado de, a partir del estado del entorno y usando el ya mencionado conjunto de acciones, la colocación de recursos. Está dividido en dos niveles: la capa de Gestores de Aplicación, encargada de tratar sólo con las aplicaciones; y la capa del Gestor de Infraestructura, responsable de los recursos físicos. Dicho motor de gestión obedece un ciclo de vida con dos fases, para así modelar mejor el comportamiento de una infraestructura real. El problema de la colocación de recursos es atacado durante una de las fases (la de consolidación) por un resolutor de programación entera, y durante la otra (la online) por un heurístico hecho ex-profeso. Varias pruebas han demostrado que este acercamiento combinado es superior a otras estrategias. Para terminar, el sistema de gestión está acoplado a arquitecturas de monitorización y de actuadores. Aquella estando encargada de recolectar información del entorno, y ésta siendo modular en su diseño y capaz de conectarse con varias tecnologías y ofrecer varios modos de acceso. ABSTRACT The cloud computing paradigm has raised in popularity within the industry and the academia. Public cloud infrastructures are enabling new business models and helping to reduce costs. However, the desire to host company’s data and services on premises, and the need to abide to data protection laws, make private cloud infrastructures desirable, either to complement or even fully substitute public oferings. Unfortunately, a lack of standardization has precluded private infrastructure management solutions to be developed to a certain level, and a myriad of diferent options have induced the fear of lock-in in customers. One of the causes of this problem is the misalignment between academic research and industry ofering, with the former focusing in studying idealized scenarios dissimilar from real-world situations, and the latter developing solutions without taking care about how they f t with common standards, or even not disseminating their results. With the aim to solve this problem I propose a modular management system for private cloud infrastructures that is focused on the applications instead of just the hardware resources. This management system follows the autonomic system paradigm, and is designed around a simple information model developed to be compatible with common standards. This model splits the environment in two views that serve to separate the concerns of the stakeholders while at the same time enabling the traceability between the physical environment and the virtual machines deployed onto it. In it, cloud applications are classifed in three broad types (Services, Big Data Jobs and Instance Reservations), in order for the management system to take advantage of each type’s features. The information model is paired with a set of atomic, reversible and independent management actions which determine the operations that can be performed over the environment and is used to realize the cloud environment’s scalability. From the environment’s state and using the aforementioned set of actions, I also describe a management engine tasked with the resource placement. It is divided in two tiers: the Application Managers layer, concerned just with applications; and the Infrastructure Manager layer, responsible of the actual physical resources. This management engine follows a lifecycle with two phases, to better model the behavior of a real infrastructure. The placement problem is tackled during one phase (consolidation) by using an integer programming solver, and during the other (online) with a custom heuristic. Tests have demonstrated that this combined approach is superior to other strategies. Finally, the management system is paired with monitoring and actuators architectures. The former able to collect the necessary information from the environment, and the later modular in design and capable of interfacing with several technologies and ofering several access interfaces.
Resumo:
La legislación existente en nuestro país, el Real Decreto 1620/200, de 7 de diciembre, por el que se establece el régimen jurídico de la reutilización de las aguas depuradas, no ha sido modificada en estos últimos años a pesar de las opiniones de expertos y operadores en el sentido que merece una revisión y mejora. Por ello se ha realizado un análisis pormenorizado de toda la legislación existente a nivel país y de otros países con amplia experiencia en reutilización para proponer mejoras o cambios en base a la información recopilada, tanto en estudios específicos de investigación como de los datos obtenidos de operadores y/o explotadores de estaciones regeneradoras de aguas residuales en España. Del estudio surgen algunas propuestas claras que se ponen a consideración de las autoridades. Se ha comprobado que no existen estudios suficientes relacionados a tipos de controles en estaciones de tratamiento terciario de aguas residuales en cada uno de los pasos de la línea de proceso, que permitan conocer las garantías de funcionamiento de dichas etapas y del proceso en su conjunto. De éste modo se podrían analizar todas las etapas por separado y comprobar si el funcionamiento en las condiciones previstas de diseño es apropiado, o si es posible mejorar la eficiencia a partir de estos datos intermedios, en lugar de los controles normales efectuados en la entrada y la salida realizados por los explotadores de las plantas existentes en el país. Existen, sin embargo, investigaciones y datos de casi todas las tecnologías existentes en el mercado relacionadas con la regeneración, lo cual ha permitido identificar sus ventajas e inconvenientes. La aplicación de un proceso de APPCC (Análisis de Peligros y Puntos de Control Críticos) a una planta existente ha permitido comprobar que es una herramienta práctica que permite gestionar la seguridad de un proceso de producción industrial como es el caso de un sistema de reutilización, por lo que se considera conveniente recomendar su aplicación siempre que sea posible. The existing legislation in our country, Royal Decree 1620/2007, of 7th December, establishing the legal regime for the reuse of treated water has not been modified in recent years despite expert and operators opinions in the sense that it deserves review and improvement. Therefore it has been conducted a detailed analysis of all existing legislation at country level and of other countries with extensive experience in water reuse to propose improvements or changes based on the information gathered both in specific research studies and the data obtained from operators and/or operators of regenerative sewage stations in Spain. As a result of these studies, some clear proposals are made to be considered by the authorities. It has been found that there are not enough studies related to types of controls in tertiary wastewater treatment plants, in each step of the process line, that provide insight into the performance guarantees of the mention stages and of the whole process. In this manner all stages could be analyzed separately and check if the operation in the design conditions envisaged is appropriate, or whether it is possible to improve efficiency taking in account these intermediate data, instead to the normal checks at entry and exit carried out by operators in existing plants in the country. There are, however, research and data from almost all technologies existing in the market related to regeneration, which has allowed identified its advantages and disadvantages. The application of a process of HACCP (hazard analysis and critical control points) to an existing plant has shown that it is a practical tool for managing the security of a process of industrial production as is the case of a water reuse system, so it is considered appropriate to recommend application whenever possible.