11 resultados para in-domain data requirement

em Universidad Politécnica de Madrid


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Abstract. The uptake of Linked Data (LD) has promoted the proliferation of datasets and their associated ontologies for describing different domains. Ac-cording to LD principles, developers should reuse as many available terms as possible to describe their data. Importing ontologies or referring to their terms’ URIs are the two main ways to reuse knowledge from available ontologies. In this paper, we have analyzed 18589 terms appearing within 196 ontologies in-cluded in the Linked Open Vocabularies (LOV) registry with the aim of under-standing the current state of ontology reuse in the LD context. In order to char-acterize the landscape of ontology reuse in this context, we have extracted sta-tistics about currently reused elements, calculated ratios for reuse, and drawn graphs about imports and references between ontologies. Keywords: ontology, vocabulary, reuse, linked data, ontology import

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the last decade, complex networks have widely been applied to the study of many natural and man-made systems, and to the extraction of meaningful information from the interaction structures created by genes and proteins. Nevertheless, less attention has been devoted to metabonomics, due to the lack of a natural network representation of spectral data. Here we define a technique for reconstructing networks from spectral data sets, where nodes represent spectral bins, and pairs of them are connected when their intensities follow a pattern associated with a disease. The structural analysis of the resulting network can then be used to feed standard data-mining algorithms, for instance for the classification of new (unlabeled) subjects. Furthermore, we show how the structure of the network is resilient to the presence of external additive noise, and how it can be used to extract relevant knowledge about the development of the disease.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Linked Data is not always published with a license. Sometimes a wrong license type is used, like a license for software, or it is not expressed in a standard, machine readable manner. Yet, Linked Data resources may be subject to intellectual property and database laws, may contain personal data subject to privacy restrictions or may even contain important trade secrets. The proper declaration of which rights are held, waived or licensed is a must for the lawful use of Linked Data at its different granularity levels, from the simple RDF statement to a dataset or a mapping. After comparing the current practice with the actual needs, six research questions are posed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mass spectrometry (MS) data provide a promising strategy for biomarker discovery. For this purpose, the detection of relevant peakbins in MS data is currently under intense research. Data from mass spectrometry are challenging to analyze because of their high dimensionality and the generally low number of samples available. To tackle this problem, the scientific community is becoming increasingly interested in applying feature subset selection techniques based on specialized machine learning algorithms. In this paper, we present a performance comparison of some metaheuristics: best first (BF), genetic algorithm (GA), scatter search (SS) and variable neighborhood search (VNS). Up to now, all the algorithms, except for GA, have been first applied to detect relevant peakbins in MS data. All these metaheuristic searches are embedded in two different filter and wrapper schemes coupled with Naive Bayes and SVM classifiers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In ubiquitous data stream mining applications, different devices often aim to learn concepts that are similar to some extent. In these applications, such as spam filtering or news recommendation, the data stream underlying concept (e.g., interesting mail/news) is likely to change over time. Therefore, the resultant model must be continuously adapted to such changes. This paper presents a novel Collaborative Data Stream Mining (Coll-Stream) approach that explores the similarities in the knowledge available from other devices to improve local classification accuracy. Coll-Stream integrates the community knowledge using an ensemble method where the classifiers are selected and weighted based on their local accuracy for different partitions of the feature space. We evaluate Coll-Stream classification accuracy in situations with concept drift, noise, partition granularity and concept similarity in relation to the local underlying concept. The experimental results show that Coll-Stream resultant model achieves stability and accuracy in a variety of situations using both synthetic and real world datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Energy consumption in data centers is nowadays a critical objective because of its dramatic environmental and economic impact. Over the last years, several approaches have been proposed to tackle the energy/cost optimization problem, but most of them have failed on providing an analytical model to target both the static and dynamic optimization domains for complex heterogeneous data centers. This paper proposes and solves an optimization problem for the energy-driven configuration of a heterogeneous data center. It also advances in the proposition of a new mechanism for task allocation and distribution of workload. The combination of both approaches outperforms previous published results in the field of energy minimization in heterogeneous data centers and scopes a promising area of research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the advancement of both, information technology in general, and databases in particular; data storage devices are becoming cheaper and data processing speed is increasing. As result of this, organizations tend to store large volumes of data holding great potential information. Decision Support Systems, DSS try to use the stored data to obtain valuable information for organizations. In this paper, we use both data models and use cases to represent the functionality of data processing in DSS following Software Engineering processes. We propose a methodology to develop DSS in the Analysis phase, respective of data processing modeling. We have used, as a starting point, a data model adapted to the semantics involved in multidimensional databases or data warehouses, DW. Also, we have taken an algorithm that provides us with all the possible ways to automatically cross check multidimensional model data. Using the aforementioned, we propose diagrams and descriptions of use cases, which can be considered as patterns representing the DSS functionality, in regard to DW data processing, DW on which DSS are based. We highlight the reusability and automation benefits that this can be achieved, and we think this study can serve as a guide in the development of DSS.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La mayoría de las aplicaciones forestales del escaneo laser aerotransportado (ALS, del inglés airborne laser scanning) requieren la integración y uso simultaneo de diversas fuentes de datos, con el propósito de conseguir diversos objetivos. Los proyectos basados en sensores remotos normalmente consisten en aumentar la escala de estudio progresivamente a lo largo de varias fases de fusión de datos: desde la información más detallada obtenida sobre un área limitada (la parcela de campo), hasta una respuesta general de la cubierta forestal detectada a distancia de forma más incierta pero cubriendo un área mucho más amplia (la extensión cubierta por el vuelo o el satélite). Todas las fuentes de datos necesitan en ultimo termino basarse en las tecnologías de sistemas de navegación global por satélite (GNSS, del inglés global navigation satellite systems), las cuales son especialmente erróneas al operar por debajo del dosel forestal. Otras etapas adicionales de procesamiento, como la ortorectificación, también pueden verse afectadas por la presencia de vegetación, deteriorando la exactitud de las coordenadas de referencia de las imágenes ópticas. Todos estos errores introducen ruido en los modelos, ya que los predictores se desplazan de la posición real donde se sitúa su variable respuesta. El grado por el que las estimaciones forestales se ven afectadas depende de la dispersión espacial de las variables involucradas, y también de la escala utilizada en cada caso. Esta tesis revisa las fuentes de error posicional que pueden afectar a los diversos datos de entrada involucrados en un proyecto de inventario forestal basado en teledetección ALS, y como las propiedades del dosel forestal en sí afecta a su magnitud, aconsejando en consecuencia métodos para su reducción. También se incluye una discusión sobre las formas más apropiadas de medir exactitud y precisión en cada caso, y como los errores de posicionamiento de hecho afectan a la calidad de las estimaciones, con vistas a una planificación eficiente de la adquisición de los datos. La optimización final en el posicionamiento GNSS y de la radiometría del sensor óptico permitió detectar la importancia de este ultimo en la predicción de la desidad relativa de un bosque monoespecífico de Pinus sylvestris L. ABSTRACT Most forestry applications of airborne laser scanning (ALS) require the integration and simultaneous use of various data sources, pursuing a variety of different objectives. Projects based on remotely-sensed data generally consist in upscaling data fusion stages: from the most detailed information obtained for a limited area (field plot) to a more uncertain forest response sensed over a larger extent (airborne and satellite swath). All data sources ultimately rely on global navigation satellite systems (GNSS), which are especially error-prone when operating under forest canopies. Other additional processing stages, such as orthorectification, may as well be affected by vegetation, hence deteriorating the accuracy of optical imagery’s reference coordinates. These errors introduce noise to the models, as predictors displace from their corresponding response. The degree to which forest estimations are affected depends on the spatial dispersion of the variables involved and the scale used. This thesis reviews the sources of positioning errors which may affect the different inputs involved in an ALS-assisted forest inventory project, and how the properties of the forest canopy itself affects their magnitude, advising on methods for diminishing them. It is also discussed how accuracy should be assessed, and how positioning errors actually affect forest estimation, toward a cost-efficient planning for data acquisition. The final optimization in positioning the GNSS and optical image allowed to detect the importance of the latter in predicting relative density in a monospecific Pinus sylvestris L. forest.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, the authors introduce a novel mechanism for data management in a middleware for smart home control, where a relational database and semantic ontology storage are used at the same time in a Data Warehouse. An annotation system has been designed for instructing the storage format and location, registering new ontology concepts and most importantly, guaranteeing the Data Consistency between the two storage methods. For easing the data persistence process, the Data Access Object (DAO) pattern is applied and optimized to enhance the Data Consistency assurance. Finally, this novel mechanism provides an easy manner for the development of applications and their integration with BATMP. Finally, an application named "Parameter Monitoring Service" is given as an example for assessing the feasibility of the system.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La meta de intercambiabilidad de piezas establecida en los sistemas de producción del siglo XIX, es ampliada en el último cuarto del siglo pasado para lograr la capacidad de fabricación de varios tipos de producto en un mismo sistema de manufactura, requerimiento impulsado por la incertidumbre del mercado. Esta incertidumbre conduce a plantear la flexibilidad como característica importante en el sistema de producción. La presente tesis se ubica en el problema de integración del sistema informático (SI) con el equipo de producción (EP) en la búsqueda de una solución que coadyuve a satisfacer los requerimientos de flexibilidad impuestas por las condiciones actuales de mercado. Se describen antecedentes de los sistemas de producción actuales y del concepto de flexibilidad. Se propone una clasificación compacta y práctica de los tipos de flexibilidad relevantes en el problema de integración SI-EP, con la finalidad de ubicar el significado de flexibilidad en el área de interés. Así mismo, las variables a manejar en la solución son clasificadas en cuatro tipos: Medio físico, lenguajes de programación y controlador, naturaleza del equipo y componentes de acoplamiento. Por otra parte, la característica de reusabilidad como un efecto importante y deseable de un sistema flexible, es planteada como meta en la solución propuesta no solo a nivel aplicación del sistema sino también a nivel de reuso de conceptos de diseño. Se propone un esquema de referencia en tres niveles de abstracción, que permita manejar y reutilizar en forma organizada el conocimiento del dominio de aplicación (integración SI-EP), el desarrollo de sistemas de aplicación genérica así como también la aplicación del mismo en un caso particular. Un análisis del concepto de acoplamiento débil (AD) es utilizado como base en la solución propuesta al problema de integración SI-EP. El desarrollo inicia identificando condiciones para la existencia del acoplamiento débil, compensadores para soportar la operación del sistema bajo AD y los efectos que ocasionan en el sistema informático los cambios en el conjunto de equipos de producción. Así mismo, se introducen como componentes principales del acoplamiento los componentes tecnológico, tarea y rol, a utilizar en el análisis de los requerimientos para el desarrollo de una solución de AD entre SI-EP. La estructura de tres niveles del esquema de referencia propuesto surge del análisis del significado de conceptos de referencia comúnmente reportados en la literatura, tales como arquitectura de referencia, modelo de referencia, marco de trabajo, entre otros. Se presenta un análisis de su significado como base para la definición de cada uno de los niveles de la estructura del esquema, pretendiendo con ello evitar la ambigüedad existente debido al uso indistinto de tales conceptos en la literatura revisada. Por otra parte, la relación entre niveles es definida tomando como base la estructura de cuatro capas planteada en el área de modelado de datos. La arquitectura de referencia, implementada en el primer nivel del esquema propuesto es utilizada como base para el desarrollo del modelo de referencia o marco de trabajo para el acoplamiento débil entre el SI y el EP. La solución propuesta es validada en la integración de un sistema informático de coordinación de flujo y procesamiento de pieza con un conjunto variable de equipos de diferentes tipos, naturaleza y fabricantes. En el ejercicio de validación se abordaron diferentes estándares y técnicas comúnmente empleadas como soporte al problema de integración a nivel componente tecnológico, tales como herramientas de cero configuración (ejemplo: plug and play), estándar OPC-UA, colas de mensajes y servicios web, permitiendo así ubicar el apoyo de estas técnicas en el ámbito del componente tecnológico y su relación con los otros componentes de acoplamiento: tarea y rol. ABSTRACT The interchangeability of parts, as a goal of manufacturing systems at the nineteenth century, is extended into the present to achieve the ability to manufacture various types of products in the same manufacturing system, requirement associated with market uncertainty. This uncertainty raises flexibility as an important feature in the production system. This thesis addresses the problem regarding integration of software system (SS) and the set of production equipment (PE); looking for a solution that contributes to satisfy the requirements of flexibility that the current market conditions impose on manufacturing, particularly to the production floor. Antecedents to actual production systems as well as the concept of flexibility are described and analyzed in detail. A practical and compact classification of flexibility types of relevance to the integration SS-EP problem is proposed with the aim to delimit the meaning of flexibility regarding the area of interest. Also, a classification for the variables involved in the integration problem is presented into four types: Physical media, programming and controller languages, equipment nature and coupling components. In addition, the characteristic of reusability that has been seen as an important and desirable effect of a flexible system is taken as a goal in the proposed solution, not only at system implementation level but also at system design level. In this direction, a reference scheme is proposed consisting of three abstraction levels to systematically support management and reuse of domain knowledge (SS-PE), development of a generic system as well as its application in a particular case. The concept of loose coupling is used as a basis in the development of the proposed solution to the problem of integration SS-EP. The first step of the development process consists of an analysis of the loose coupled concept, identifying conditions for its existence, compensators for system operation under loose coupling conditions as well as effects in the software system caused by modification in the set of production equipment. In addition coupling components: technological, task and role are introduced as main components to support the analysis of requirements regarding loose coupling of SS-PE. The three tier structure of the proposed reference scheme emerges from the analysis of reference concepts commonly reported in the literature, such as reference architecture, reference model and framework, among others. An analysis of these concepts is used as a basis for definition of the structure levels of the proposed scheme, trying to avoid the ambiguity due to the indiscriminate use of such concepts in the reviewed literature. In addition, the relation between adjacent levels of the structure is defined based on the four tiers structure commonly used in the data modelling area. The reference architecture is located as the first level in the structure of the proposed reference scheme and it is utilized as a basis for the development of the reference model or loose coupling framework for SS-PE integration. The proposed solution is validated by integrating a software system (process and piece flow coordination system) with a variable set of production equipment including different types, nature and manufacturers of equipment. Furthermore, in this validation exercise, different standards and techniques commonly used have been taken into account to support the issue of technology coupling component, such as tools for zero configuration (i.e. Plug and Play), message queues, OPC-UA standard, and web services. Through this part of the validation exercise, these integration tools are located as a part of the technological component and they are related to the role and task components of coupling.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La gran cantidad de datos que se registran diariamente en los sistemas de base de datos de las organizaciones ha generado la necesidad de analizarla. Sin embargo, se enfrentan a la complejidad de procesar enormes volúmenes de datos a través de métodos tradicionales de análisis. Además, dentro de un contexto globalizado y competitivo las organizaciones se mantienen en la búsqueda constante de mejorar sus procesos, para lo cual requieren herramientas que les permitan tomar mejores decisiones. Esto implica estar mejor informado y conocer su historia digital para describir sus procesos y poder anticipar (predecir) eventos no previstos. Estos nuevos requerimientos de análisis de datos ha motivado el desarrollo creciente de proyectos de minería de datos. El proceso de minería de datos busca obtener desde un conjunto masivo de datos, modelos que permitan describir los datos o predecir nuevas instancias en el conjunto. Implica etapas de: preparación de los datos, procesamiento parcial o totalmente automatizado para identificar modelos en los datos, para luego obtener como salida patrones, relaciones o reglas. Esta salida debe significar un nuevo conocimiento para la organización, útil y comprensible para los usuarios finales, y que pueda ser integrado a los procesos para apoyar la toma de decisiones. Sin embargo, la mayor dificultad es justamente lograr que el analista de datos, que interviene en todo este proceso, pueda identificar modelos lo cual es una tarea compleja y muchas veces requiere de la experiencia, no sólo del analista de datos, sino que también del experto en el dominio del problema. Una forma de apoyar el análisis de datos, modelos y patrones es a través de su representación visual, utilizando las capacidades de percepción visual del ser humano, la cual puede detectar patrones con mayor facilidad. Bajo este enfoque, la visualización ha sido utilizada en minería datos, mayormente en el análisis descriptivo de los datos (entrada) y en la presentación de los patrones (salida), dejando limitado este paradigma para el análisis de modelos. El presente documento describe el desarrollo de la Tesis Doctoral denominada “Nuevos Esquemas de Visualizaciones para Mejorar la Comprensibilidad de Modelos de Data Mining”. Esta investigación busca aportar con un enfoque de visualización para apoyar la comprensión de modelos minería de datos, para esto propone la metáfora de modelos visualmente aumentados. ABSTRACT The large amount of data to be recorded daily in the systems database of organizations has generated the need to analyze it. However, faced with the complexity of processing huge volumes of data over traditional methods of analysis. Moreover, in a globalized and competitive environment organizations are kept constantly looking to improve their processes, which require tools that allow them to make better decisions. This involves being bettered informed and knows your digital story to describe its processes and to anticipate (predict) unanticipated events. These new requirements of data analysis, has led to the increasing development of data-mining projects. The data-mining process seeks to obtain from a massive data set, models to describe the data or predict new instances in the set. It involves steps of data preparation, partially or fully automated processing to identify patterns in the data, and then get output patterns, relationships or rules. This output must mean new knowledge for the organization, useful and understandable for end users, and can be integrated into the process to support decision-making. However, the biggest challenge is just getting the data analyst involved in this process, which can identify models is complex and often requires experience not only of the data analyst, but also the expert in the problem domain. One way to support the analysis of the data, models and patterns, is through its visual representation, i.e., using the capabilities of human visual perception, which can detect patterns easily in any context. Under this approach, the visualization has been used in data mining, mostly in exploratory data analysis (input) and the presentation of the patterns (output), leaving limited this paradigm for analyzing models. This document describes the development of the doctoral thesis entitled "New Visualizations Schemes to Improve Understandability of Data-Mining Models". This research aims to provide a visualization approach to support understanding of data mining models for this proposed metaphor visually enhanced models.