893 resultados para data analysis: algorithms and implementation
Resumo:
La gran cantidad de datos que se registran diariamente en los sistemas de base de datos de las organizaciones ha generado la necesidad de analizarla. Sin embargo, se enfrentan a la complejidad de procesar enormes volúmenes de datos a través de métodos tradicionales de análisis. Además, dentro de un contexto globalizado y competitivo las organizaciones se mantienen en la búsqueda constante de mejorar sus procesos, para lo cual requieren herramientas que les permitan tomar mejores decisiones. Esto implica estar mejor informado y conocer su historia digital para describir sus procesos y poder anticipar (predecir) eventos no previstos. Estos nuevos requerimientos de análisis de datos ha motivado el desarrollo creciente de proyectos de minería de datos. El proceso de minería de datos busca obtener desde un conjunto masivo de datos, modelos que permitan describir los datos o predecir nuevas instancias en el conjunto. Implica etapas de: preparación de los datos, procesamiento parcial o totalmente automatizado para identificar modelos en los datos, para luego obtener como salida patrones, relaciones o reglas. Esta salida debe significar un nuevo conocimiento para la organización, útil y comprensible para los usuarios finales, y que pueda ser integrado a los procesos para apoyar la toma de decisiones. Sin embargo, la mayor dificultad es justamente lograr que el analista de datos, que interviene en todo este proceso, pueda identificar modelos lo cual es una tarea compleja y muchas veces requiere de la experiencia, no sólo del analista de datos, sino que también del experto en el dominio del problema. Una forma de apoyar el análisis de datos, modelos y patrones es a través de su representación visual, utilizando las capacidades de percepción visual del ser humano, la cual puede detectar patrones con mayor facilidad. Bajo este enfoque, la visualización ha sido utilizada en minería datos, mayormente en el análisis descriptivo de los datos (entrada) y en la presentación de los patrones (salida), dejando limitado este paradigma para el análisis de modelos. El presente documento describe el desarrollo de la Tesis Doctoral denominada “Nuevos Esquemas de Visualizaciones para Mejorar la Comprensibilidad de Modelos de Data Mining”. Esta investigación busca aportar con un enfoque de visualización para apoyar la comprensión de modelos minería de datos, para esto propone la metáfora de modelos visualmente aumentados. ABSTRACT The large amount of data to be recorded daily in the systems database of organizations has generated the need to analyze it. However, faced with the complexity of processing huge volumes of data over traditional methods of analysis. Moreover, in a globalized and competitive environment organizations are kept constantly looking to improve their processes, which require tools that allow them to make better decisions. This involves being bettered informed and knows your digital story to describe its processes and to anticipate (predict) unanticipated events. These new requirements of data analysis, has led to the increasing development of data-mining projects. The data-mining process seeks to obtain from a massive data set, models to describe the data or predict new instances in the set. It involves steps of data preparation, partially or fully automated processing to identify patterns in the data, and then get output patterns, relationships or rules. This output must mean new knowledge for the organization, useful and understandable for end users, and can be integrated into the process to support decision-making. However, the biggest challenge is just getting the data analyst involved in this process, which can identify models is complex and often requires experience not only of the data analyst, but also the expert in the problem domain. One way to support the analysis of the data, models and patterns, is through its visual representation, i.e., using the capabilities of human visual perception, which can detect patterns easily in any context. Under this approach, the visualization has been used in data mining, mostly in exploratory data analysis (input) and the presentation of the patterns (output), leaving limited this paradigm for analyzing models. This document describes the development of the doctoral thesis entitled "New Visualizations Schemes to Improve Understandability of Data-Mining Models". This research aims to provide a visualization approach to support understanding of data mining models for this proposed metaphor visually enhanced models.
Resumo:
Los sistemas de adquisición de datos utilizados en los diagnósticos de los dispositivos de fusión termonuclear se enfrentan a importantes retos planteados en los dispositivos de pulso largo. Incluso en los dispositivos de pulso corto, en los que se analizan los datos después de la descarga, existen aún una gran cantidad de datos sin analizar, lo cual supone que queda una gran cantidad de conocimiento por descubrir dentro de las bases de datos existentes. En la última década, la comunidad de fusión ha realizado un gran esfuerzo para mejorar los métodos de análisis off‐line para mejorar este problema, pero no se ha conseguido resolver completamente, debido a que algunos de estos métodos han de resolverse en tiempo real. Este paradigma lleva a establecer que los dispositivos de pulso largo deberán incluir dispositivos de adquisición de datos con capacidades de procesamiento local, capaces de ejecutar avanzados algoritmos de análisis. Los trabajos de investigación realizados en esta tesis tienen como objetivo determinar si es posible incrementar la capacidad local de procesamiento en tiempo real de dichos sistemas mediante el uso de GPUs. Para ello durante el trascurso del periodo de experimentación realizado se han evaluado distintas propuestas a través de casos de uso reales elaborados para algunos de los dispositivos de fusión más representativos como ITER, JET y TCV. Las conclusiones y experiencias obtenidas en dicha fase han permitido proponer un modelo y una metodología de desarrollo para incluir esta tecnología en los sistemas de adquisición para diagnósticos de distinta naturaleza. El modelo define no sólo la arquitectura hardware óptima para realizar dicha integración, sino también la incorporación de este nuevo recurso de procesamiento en los Sistemas de Control de Supervisión y Adquisición de Datos (SCADA) utilizados en la comunidad de fusión (EPICS), proporcionando una solución completa. La propuesta se complementa con la definición de una metodología que resuelve las debilidades detectadas, y permite trazar un camino de integración de la solución en los estándares hardware y software existentes. La evaluación final se ha realizado mediante el desarrollo de un caso de uso representativo de los diagnósticos que necesitan adquisición y procesado de imágenes en el contexto del dispositivo internacional ITER, y ha sido testeada con éxito en sus instalaciones. La solución propuesta en este trabajo ha sido incluida por la ITER IO en su catálogo de soluciones estándar para el desarrollo de sus futuros diagnósticos. Por otra parte, como resultado y fruto de la investigación de esta tesis, cabe destacar el acuerdo llevado a cabo con la empresa National Instruments en términos de transferencia tecnológica, lo que va a permitir la actualización de los sistemas de adquisición utilizados en los dispositivos de fusión. ABSTRACT Data acquisition systems used in the diagnostics of thermonuclear fusion devices face important challenges due to the change in the data acquisition paradigm needed for long pulse operation. Even in shot pulse devices, where data is mainly analyzed after the discharge has finished , there is still a large amount of data that has not been analyzed, therefore producing a lot of buried knowledge that still lies undiscovered in the data bases holding the vast amount of data that has been generated. There has been a strong effort in the fusion community in the last decade to improve the offline analysis methods to overcome this problem, but it has proved to be insufficient unless some of these mechanisms can be run in real time. In long pulse devices this new paradigm, where data acquisition devices include local processing capabilities to be able to run advanced data analysis algorithms, will be a must. The research works done in this thesis aim to determining whether it is possible to increase local capacity for real‐time processing of such systems by using GPUs. For that, during the experimentation period, various proposals have been evaluated through use cases developed for several of the most representative fusion devices, ITER, JET and TCV. Conclusions and experiences obtained have allowed to propose a model, and a development methodology, to include this technology in systems for diagnostics of different nature. The model defines not only the optimal hardware architecture for achieving this integration, but also the incorporation of this new processing resource in one of the Systems of Supervision Control and Data Acquisition (SCADA) systems more relevant at the moment in the fusion community (EPICS), providing a complete solution. The final evaluation has been performed through a use case developed for a generic diagnostic requiring image acquisition and processing for the international ITER device, and has been successfully tested in their premises. The solution proposed in this thesis has been included by the ITER IO in his catalog of standard solutions for the development of their future diagnostics. This has been possible thanks to the technologic transfer agreement signed with xi National Instruments which has permitted us to modify and update one of their core software products targeted for the acquisition systems used in these devices.
Resumo:
In this paper a consistent analysis of reinforced concrete (RC) two-dimensional (2-D) structures,namely slab structures subjected to in-plane and out-plane forces, is presented. By using this method of analysis the well established methodology for dimensioning and verifying RC sections of beam structures is extended to 2-D structures. The validity of the proposed analysis results is checked by comparing them with some published experimental test results. Several examples show some of these proposed analysis features, such as the influence of the reinforcement layout on the service and ultimate behavior of a slab structure and the non straightforward problem of the optimal dimension at a slab point subjected to several loading cases. Also, in these examples, the method applications to design situations as multiple steel families and non orthogonal reinforcement layout are commented.
Resumo:
The amount of genomic and proteomic data that is entered each day into databases and the experimental literature is outstripping the ability of experimental scientists to keep pace. While generic databases derived from automated curation efforts are useful, most biological scientists tend to focus on a class or family of molecules and their biological impact. Consequently, there is a need for molecular class-specific or other specialized databases. Such databases collect and organize data around a single topic or class of molecules. If curated well, such systems are extremely useful as they allow experimental scientists to obtain a large portion of the available data most relevant to their needs from a single source. We are involved in the development of two such databases with substantial pharmacological relevance. These are the GPCRDB and NucleaRDB information systems, which collect and disseminate data related to G protein-coupled receptors and intra-nuclear hormone receptors, respectively. The GPCRDB was a pilot project aimed at building a generic molecular class-specific database capable of dealing with highly heterogeneous data. A first version of the GPCRDB project has been completed and it is routinely used by thousands of scientists. The NucleaRDB was started recently as an application of the concept for the generalization of this technology. The GPCRDB is available via the WWW at http://www.gpcr.org/7tm/ and the NucleaRDB at http://www.receptors.org/NR/.
Resumo:
Controversy still exists over the adaptive nature of variation of enzyme loci. In conifers, random amplified polymorphic DNAs (RAPDs) represent a class of marker loci that is unlikely to fall within or be strongly linked to coding DNA. We have compared the genetic diversity in natural populations of black spruce [Picea mariana (Mill.) B.S.P.] using genotypic data at allozyme loci and RAPD loci as well as phenotypic data from inferred RAPD fingerprints. The genotypic data for both allozymes and RAPDs were obtained from at least six haploid megagametophytes for each of 75 sexually mature individuals distributed in five populations. Heterozygosities and population fixation indices were in complete agreement between allozyme loci and RAPD loci. In black spruce, it is more likely that the similar levels of variation detected at both enzyme and RAPD loci are due to such evolutionary forces as migration and the mating system, rather than to balancing selection and overdominance. Furthermore, we show that biased estimates of expected heterozygosity and among-population differentiation are obtained when using allele frequencies derived from dominant RAPD phenotypes.
Resumo:
The strong presence of religious institutions in Latin America, especially the Roman Catholic Church, and their participation in the creation and implementation of public policy within a sovereign state can be counter-productive for the social development and progress of that specific country. Argentina and Uruguay and the social controversy of social issues of abortion and same-sex marriage are used as examples to establish the accuracy of the above statement. Historical, statistical, and legislative information about both topics in both countries show that the political power that the Roman Catholic Church has in the region is more an outdated influence than a reality, and the principle of secularization appears to be the most stabilizing philosophy for modern nations.
Resumo:
Recent years have witnessed a surge of interest in computational methods for affect, ranging from opinion mining, to subjectivity detection, to sentiment and emotion analysis. This article presents a brief overview of the latest trends in the field and describes the manner in which the articles contained in the special issue contribute to the advancement of the area. Finally, we comment on the current challenges and envisaged developments of the subjectivity and sentiment analysis fields, as well as their application to other Natural Language Processing tasks and related domains.