19 resultados para automated lexical analysis
em Universidad Politécnica de Madrid
Resumo:
Due to the relative transparency of its embryos and larvae, the zebrafish is an ideal model organism for bioimaging approaches in vertebrates. Novel microscope technologies allow the imaging of developmental processes in unprecedented detail, and they enable the use of complex image-based read-outs for high-throughput/high-content screening. Such applications can easily generate Terabytes of image data, the handling and analysis of which becomes a major bottleneck in extracting the targeted information. Here, we describe the current state of the art in computational image analysis in the zebrafish system. We discuss the challenges encountered when handling high-content image data, especially with regard to data quality, annotation, and storage. We survey methods for preprocessing image data for further analysis, and describe selected examples of automated image analysis, including the tracking of cells during embryogenesis, heartbeat detection, identification of dead embryos, recognition of tissues and anatomical landmarks, and quantification of behavioral patterns of adult fish. We review recent examples for applications using such methods, such as the comprehensive analysis of cell lineages during early development, the generation of a three-dimensional brain atlas of zebrafish larvae, and high-throughput drug screens based on movement patterns. Finally, we identify future challenges for the zebrafish image analysis community, notably those concerning the compatibility of algorithms and data formats for the assembly of modular analysis pipelines.
Resumo:
La mineralogía de procesos se ha convertido en los últimos años en una herramienta indispensable dentro del ámbito minero-metalúrgico debido fundamentalmente a la emergencia de la Geometalurgia. Esta disciplina en auge, a través de la integración de datos geológicos, mineros y metalúrgicos, proporciona la información necesaria para que el circuito de concentración mineral pueda responder de manera rápida y eficaz a la variabilidad mineralógica inherente a la geología del yacimiento. Para la generación del modelo geometalúrgico, la mineralogía de procesos debe aportar datos cuantitativos sobre los rasgos mineralógicos influyentes en el comportamiento de los minerales y para ello se apoya en el uso de sistemas de análisis mineralógico automatizado. Estos sistemas son capaces de proporcionar gran cantidad de datos mineralógicos de manera rápida y precisa. Sin embargo, cuando se trata de la caracterización de la textura, el mineralogista debe recurrir a descripciones cualitativas basadas en la observación, ya que los sistemas actuales no ofrecen información textural automatizada. Esta tesis doctoral surge precisamente para proporcionar de manera sistemática información textural relevante para los procesos de concentración mineral. La tesis tiene como objetivo principal la identificación y caracterización del tipo de intercrecimiento que un determinado mineral presenta en las partículas minerales, e inicialmente se han tenido en cuenta los siete tipos de intercrecimiento considerados como los más relevantes bajo el punto de vista del comportamiento de las partículas minerales durante flotación, lixiviación y molienda. Para alcanzar este objetivo se ha desarrollado una metodología basada en el diseño y cálculo de una serie de índices numéricos, a los que se ha llamado índices mineralúrgicos, que cumplen una doble función: por un lado, cada índice aporta información relevante para caracterizar los principales rasgos mineralógicos que gobiernan el comportamiento de las partículas minerales a lo largo de los procesos de concentración y por otro lado, estos índices sirven como variables discriminantes para identificar el tipo de intercrecimiento mineral mediante la aplicación de Análisis Discriminante. Dentro del conjunto de índices propuestos en este trabajo, se han considerado algunos índices propuestos por otros autores para su aplicación tanto en el ámbito de la mineralogía como en otros ámbitos de la ciencia de materiales. Se trata del Índice de Contigüidad (Gurland, 1958), Índice de Intercrecimiento (Amstutz y Giger, 1972) e Índice de Coordinación (Jeulin, 1981), adaptados en este caso para el análisis de partículas minerales. El diseño de los índices se ha basado en los principios básicos de la Estereología y el análisis digital de imagen, y su cálculo se ha llevado a cabo aplicando el método de interceptos lineales mediante la programación en MATLAB de varias rutinas. Este método estereológico permite recoger una serie de medidas a partir de las que es posible calcular varios parámetros, tanto estereológicos como geométricos, que han servido de base para calcular los índices mineralúrgicos. Para evaluar la capacidad discriminatoria de los índices mineralúrgicos se han seleccionado 200 casos en los que se puede reconocer de manera clara alguno de los siete tipos de intercrecimiento considerados inicialmente en este trabajo. Para cada uno de estos casos se han calculado los índices mineralúrgicos y se ha aplicado Análisis Discriminante, obteniendo un porcentaje de acierto en la clasificación del 95%. Esta cifra indica que los índices propuestos son discriminadores fiables del tipo de intercrecimiento. Una vez probada la capacidad discriminatoria de los índices, la metodología desarrollada ha sido aplicada para caracterizar una muestra de un concentrado de cobre procedente de la mina Kansanshi (Zambia). Esta caracterización se ha llevado a cabo para obtener la distribución de calcopirita según su tipo de intercrecimiento. La utilidad de esta distribución ha sido analizada bajo diferentes puntos de vista y en todos ellos los índices mineralúrgicos aportan información valiosa para caracterizar el comportamiento mineralúrgico de las partículas minerales. Los resultados derivados tanto del Análisis Discriminante como de la caracterización del concentrado de Kansanshi muestran la fiabilidad, utilidad y versatilidad de la metodología desarrollada, por lo que su integración como herramienta rutinaria en los sistemas actuales de análisis mineralógico pondría a disposición del mineralurgista gran cantidad de información textural complementaria a la información ofrecida por las técnicas actuales de caracterización mineralógica. ABSTRACT Process mineralogy has become in the last decades an essential tool in the mining and metallurgical sphere, especially driven by the emergence of Geometallurgy. This emergent discipline provides required information to efficiently tailor the circuit performance to the mineralogical variability inherent to ore deposits. To contribute to the Geometallurgical model, process mineralogy must provide quantitative data about the main mineralogical features implied in the minerallurgical behaviour of minerals. To address this characterisation, process mineralogy relies on automated systems. These systems are capable of providing a large amount of data quickly and accurately. However, when it comes to the characterisation of texture, mineralogists need to turn to qualitative descriptions based on observation, due to the fact that current systems can not offer quantitative textural information in a routine way. Aiming at the automated characterisation of textural information, this doctoral thesis arises to provide textural information relevant for concentration processes in a systematic way. The main objective of the thesis is the automated identification and characterisation of intergrowth types in mineral particles. Initially, the seven intergrowth types most relevant for flotation, leaching and grinding are considered. To achieve this goal, a methodology has been developed based on the computation of a set of numerical indices, which have been called minerallurgical indices. These indices have been designed with two main purposes: on the one hand, each index provides information to characterise the main mineralogical features which determine particle behaviour during concentration processes and, on the other hand, these indices are used as discriminant variables for identifying the intergrowth type by Discriminant Analysis. Along with the indices developed in this work, three indices proposed by other authors belonging to different fields of materials science have been also considered after being adapted to the analysis of mineral particles. These indices are Contiguity Index (Gurland, 1958), Intergrowth Index (Amstutz and Giger, 1972) and Coordination Index (Jeulin, 1981). The design of minerallurgical indices is based on the fundamental principles of Stereology and Digital Image Analysis. Their computation has been carried out using the linear intercepts method, implemented by means of MATLAB programming. This stereological method provides a set of measurements to obtain several parameters, both stereological and geometric. Based on these parameters, minerallurgical indices have been computed. For the assessment of the discriminant capacity of the developed indices, 200 cases have been selected according to their internal structure, so that one of the seven intergrowth types initially considered in this work can be easily recognised in any of their constituents. Minerallurgical indices have been computed for each case and used as discriminant variables. After applying discriminant analysis, 95% of the cases were correctly classified. This result shows that the proposed indices are reliable identifiers of intergrowth type. Once the discriminant power of the indices has been assessed, the developed methodology has been applied to characterise a copper concentrate sample from the Kansanshi copper mine (Zambia). This characterisation has been carried out to quantify the distribution of chalcopyrite with respect to intergrowth types. Different examples of the application of this distribution have been given to test the usefulness of the method. In all of them, the proposed indices provide valuable information to characterise the minerallurgical behaviour of mineral particles. Results derived from both Discriminant Analysis and the characterisation of the Kansanshi concentrate show the reliability, usefulness and versatility of the developed methodology. Therefore, its integration as a routine tool in current systems of automated mineralogical analysis should make available for minerallurgists a great deal of complementary information to treat the ore more efficiently.
Resumo:
Traumatic Brain Injury -TBI- -1- is defined as an acute event that causes certain damage to areas of the brain. TBI may result in a significant impairment of an individuals physical, cognitive and psychosocial functioning. The main consequence of TBI is a dramatic change in the individuals daily life involving a profound disruption of the family, a loss of future income capacity and an increase of lifetime cost. One of the main challenges of TBI Neuroimaging is to develop robust automated image analysis methods to detect signatures of TBI, such as: hyper-intensity areas, changes in image contrast and in brain shape. The final goal of this research is to develop a method to identify the altered brain structures by automatically detecting landmarks on the image where signal changes and to provide comprehensive information to the clinician about them. These landmarks identify injured structures by co-registering the patient?s image with an atlas where landmarks have been previously detected. The research work has been initiated by identifying brain structures on healthy subjects to validate the proposed method. Later, this method will be used to identify modified structures on TBI imaging studies.
Resumo:
New digital artifacts are emerging in data-intensive science. For example, scientific workflows are executable descriptions of scientific procedures that define the sequence of computational steps in an automated data analysis, supporting reproducible research and the sharing and replication of best-practice and know-how through reuse. Workflows are specified at design time and interpreted through their execution in a variety of situations, environments, and domains. Hence it is essential to preserve both their static and dynamic aspects, along with the research context in which they are used. To achieve this, we propose the use of multidimensional digital objects (Research Objects) that aggregate the resources used and/or produced in scientific investigations, including workflow models, provenance of their executions, and links to the relevant associated resources, along with the provision of technological support for their preservation and efficient retrieval and reuse. In this direction, we specified a software architecture for the design and implementation of a Research Object preservation system, and realized this architecture with a set of services and clients, drawing together practices in digital libraries, preservation systems, workflow management, social networking and Semantic Web technologies. In this paper, we describe the backbone system of this realization, a digital library system built on top of dLibra.
Resumo:
The properties of data and activities in business processes can be used to greatly facilítate several relevant tasks performed at design- and run-time, such as fragmentation, compliance checking, or top-down design. Business processes are often described using workflows. We present an approach for mechanically inferring business domain-specific attributes of workflow components (including data Ítems, activities, and elements of sub-workflows), taking as starting point known attributes of workflow inputs and the structure of the workflow. We achieve this by modeling these components as concepts and applying sharing analysis to a Horn clause-based representation of the workflow. The analysis is applicable to workflows featuring complex control and data dependencies, embedded control constructs, such as loops and branches, and embedded component services.
Resumo:
The study presented in this paper aims to provide a description of telecommunication blogs as a genre. Lexical phrases are analysed in order to reach conclusions regarding the nature of the language in these texts and the extent to which the results obtained are comparable with other written or conversational discourse types. Although the departing hypothesis is that the articles in blogs are basically transactional and their main objective is to transfer information, the conclusions point to interaction as a distinctive characteristic of this type of discourse and also to a more careful organization in the comments to the blogs entries than originally expected. RESUMEN. Este trabajo presenta una descripción de los blogs de Telecomunicación como género. A través del análisis de frases léxicas se llega a conclusiones sobre algunas de las características que definen la lengua que se usa en este tipo de textos y se comparan los resultados obtenidos con otros previos sobre el discurso escrito y el conversacional. Aunque la hipótesis de partida es que los artículos de los blogs son básicamente transaccionales, donde el principal objetivo es transmitir información, los resultados llevan a conclusiones sobre la importancia de la interacción en este tipo de discurso y también apuntan a una mayor organización de la esperable en las entradas de los comentarios al artículo principal del blog.
Resumo:
The aim of this study was to compare automated ribosomal intergenic spacer analysis (ARISA) and denaturing gradient gel electrophoresis (DGGE) techniques to assess bacterial diversity in the rumen of sheep. Sheep were fed 2 diets with 70% of either alfalfa hay or grass hay, and the solid (SOL) and liquid (LIQ) phases of the rumen were sampled immediately before feeding (0 h) and at 4 and 8 h postfeeding. Both techniques detected similar differences between forages, with alfalfa hay promoting greater (P < 0.05) bacterial diversity than grass hay. In contrast, whereas ARISA analysis showed a decrease (P < 0.05) of bacterial diversity in SOL at 4 h postfeeding compared with 0 and 8 h samplings, no variations (P > 0.05) over the postfeeding period were detected by DGGE. The ARISA technique showed lower (P < 0.05) bacterial diversity in SOL than in LIQ samples at 4 h postfeeding, but no differences (P > 0.05) in bacterial diversity between both rumen phases were detected by DGGE. Under the conditions of this study, the DGGE was not sensitive enough to detect some changes in ruminal bacterial communities, and therefore ARISA was considered more accurate for assessing bacterial diversity of ruminal samples. The results highlight the influence of the fingerprinting technique used to draw conclusions on factors affecting ruminal bacterial diversity.
Resumo:
El uso de aritmética de punto fijo es una opción de diseño muy extendida en sistemas con fuertes restricciones de área, consumo o rendimiento. Para producir implementaciones donde los costes se minimicen sin impactar negativamente en la precisión de los resultados debemos llevar a cabo una asignación cuidadosa de anchuras de palabra. Encontrar la combinación óptima de anchuras de palabra en coma fija para un sistema dado es un problema combinatorio NP-hard al que los diseñadores dedican entre el 25 y el 50 % del ciclo de diseño. Las plataformas hardware reconfigurables, como son las FPGAs, también se benefician de las ventajas que ofrece la aritmética de coma fija, ya que éstas compensan las frecuencias de reloj más bajas y el uso más ineficiente del hardware que hacen estas plataformas respecto a los ASICs. A medida que las FPGAs se popularizan para su uso en computación científica los diseños aumentan de tamaño y complejidad hasta llegar al punto en que no pueden ser manejados eficientemente por las técnicas actuales de modelado de señal y ruido de cuantificación y de optimización de anchura de palabra. En esta Tesis Doctoral exploramos distintos aspectos del problema de la cuantificación y presentamos nuevas metodologías para cada uno de ellos: Las técnicas basadas en extensiones de intervalos han permitido obtener modelos de propagación de señal y ruido de cuantificación muy precisos en sistemas con operaciones no lineales. Nosotros llevamos esta aproximación un paso más allá introduciendo elementos de Multi-Element Generalized Polynomial Chaos (ME-gPC) y combinándolos con una técnica moderna basada en Modified Affine Arithmetic (MAA) estadístico para así modelar sistemas que contienen estructuras de control de flujo. Nuestra metodología genera los distintos caminos de ejecución automáticamente, determina las regiones del dominio de entrada que ejercitarán cada uno de ellos y extrae los momentos estadísticos del sistema a partir de dichas soluciones parciales. Utilizamos esta técnica para estimar tanto el rango dinámico como el ruido de redondeo en sistemas con las ya mencionadas estructuras de control de flujo y mostramos la precisión de nuestra aproximación, que en determinados casos de uso con operadores no lineales llega a tener tan solo una desviación del 0.04% con respecto a los valores de referencia obtenidos mediante simulación. Un inconveniente conocido de las técnicas basadas en extensiones de intervalos es la explosión combinacional de términos a medida que el tamaño de los sistemas a estudiar crece, lo cual conlleva problemas de escalabilidad. Para afrontar este problema presen tamos una técnica de inyección de ruidos agrupados que hace grupos con las señales del sistema, introduce las fuentes de ruido para cada uno de los grupos por separado y finalmente combina los resultados de cada uno de ellos. De esta forma, el número de fuentes de ruido queda controlado en cada momento y, debido a ello, la explosión combinatoria se minimiza. También presentamos un algoritmo de particionado multi-vía destinado a minimizar la desviación de los resultados a causa de la pérdida de correlación entre términos de ruido con el objetivo de mantener los resultados tan precisos como sea posible. La presente Tesis Doctoral también aborda el desarrollo de metodologías de optimización de anchura de palabra basadas en simulaciones de Monte-Cario que se ejecuten en tiempos razonables. Para ello presentamos dos nuevas técnicas que exploran la reducción del tiempo de ejecución desde distintos ángulos: En primer lugar, el método interpolativo aplica un interpolador sencillo pero preciso para estimar la sensibilidad de cada señal, y que es usado después durante la etapa de optimización. En segundo lugar, el método incremental gira en torno al hecho de que, aunque es estrictamente necesario mantener un intervalo de confianza dado para los resultados finales de nuestra búsqueda, podemos emplear niveles de confianza más relajados, lo cual deriva en un menor número de pruebas por simulación, en las etapas iniciales de la búsqueda, cuando todavía estamos lejos de las soluciones optimizadas. Mediante estas dos aproximaciones demostramos que podemos acelerar el tiempo de ejecución de los algoritmos clásicos de búsqueda voraz en factores de hasta x240 para problemas de tamaño pequeño/mediano. Finalmente, este libro presenta HOPLITE, una infraestructura de cuantificación automatizada, flexible y modular que incluye la implementación de las técnicas anteriores y se proporciona de forma pública. Su objetivo es ofrecer a desabolladores e investigadores un entorno común para prototipar y verificar nuevas metodologías de cuantificación de forma sencilla. Describimos el flujo de trabajo, justificamos las decisiones de diseño tomadas, explicamos su API pública y hacemos una demostración paso a paso de su funcionamiento. Además mostramos, a través de un ejemplo sencillo, la forma en que conectar nuevas extensiones a la herramienta con las interfaces ya existentes para poder así expandir y mejorar las capacidades de HOPLITE. ABSTRACT Using fixed-point arithmetic is one of the most common design choices for systems where area, power or throughput are heavily constrained. In order to produce implementations where the cost is minimized without negatively impacting the accuracy of the results, a careful assignment of word-lengths is required. The problem of finding the optimal combination of fixed-point word-lengths for a given system is a combinatorial NP-hard problem to which developers devote between 25 and 50% of the design-cycle time. Reconfigurable hardware platforms such as FPGAs also benefit of the advantages of fixed-point arithmetic, as it compensates for the slower clock frequencies and less efficient area utilization of the hardware platform with respect to ASICs. As FPGAs become commonly used for scientific computation, designs constantly grow larger and more complex, up to the point where they cannot be handled efficiently by current signal and quantization noise modelling and word-length optimization methodologies. In this Ph.D. Thesis we explore different aspects of the quantization problem and we present new methodologies for each of them: The techniques based on extensions of intervals have allowed to obtain accurate models of the signal and quantization noise propagation in systems with non-linear operations. We take this approach a step further by introducing elements of MultiElement Generalized Polynomial Chaos (ME-gPC) and combining them with an stateof- the-art Statistical Modified Affine Arithmetic (MAA) based methodology in order to model systems that contain control-flow structures. Our methodology produces the different execution paths automatically, determines the regions of the input domain that will exercise them, and extracts the system statistical moments from the partial results. We use this technique to estimate both the dynamic range and the round-off noise in systems with the aforementioned control-flow structures. We show the good accuracy of our approach, which in some case studies with non-linear operators shows a 0.04 % deviation respect to the simulation-based reference values. A known drawback of the techniques based on extensions of intervals is the combinatorial explosion of terms as the size of the targeted systems grows, which leads to scalability problems. To address this issue we present a clustered noise injection technique that groups the signals in the system, introduces the noise terms in each group independently and then combines the results at the end. In this way, the number of noise sources in the system at a given time is controlled and, because of this, the combinato rial explosion is minimized. We also present a multi-way partitioning algorithm aimed at minimizing the deviation of the results due to the loss of correlation between noise terms, in order to keep the results as accurate as possible. This Ph.D. Thesis also covers the development of methodologies for word-length optimization based on Monte-Carlo simulations in reasonable times. We do so by presenting two novel techniques that explore the reduction of the execution times approaching the problem in two different ways: First, the interpolative method applies a simple but precise interpolator to estimate the sensitivity of each signal, which is later used to guide the optimization effort. Second, the incremental method revolves on the fact that, although we strictly need to guarantee a certain confidence level in the simulations for the final results of the optimization process, we can do it with more relaxed levels, which in turn implies using a considerably smaller amount of samples, in the initial stages of the process, when we are still far from the optimized solution. Through these two approaches we demonstrate that the execution time of classical greedy techniques can be accelerated by factors of up to ×240 for small/medium sized problems. Finally, this book introduces HOPLITE, an automated, flexible and modular framework for quantization that includes the implementation of the previous techniques and is provided for public access. The aim is to offer a common ground for developers and researches for prototyping and verifying new techniques for system modelling and word-length optimization easily. We describe its work flow, justifying the taken design decisions, explain its public API and we do a step-by-step demonstration of its execution. We also show, through an example, the way new extensions to the flow should be connected to the existing interfaces in order to expand and improve the capabilities of HOPLITE.
Resumo:
An important competence of human data analysts is to interpret and explain the meaning of the results of data analysis to end-users. However, existing automatic solutions for intelligent data analysis provide limited help to interpret and communicate information to non-expert users. In this paper we present a general approach to generating explanatory descriptions about the meaning of quantitative sensor data. We propose a type of web application: a virtual newspaper with automatically generated news stories that describe the meaning of sensor data. This solution integrates a variety of techniques from intelligent data analysis into a web-based multimedia presentation system. We validated our approach in a real world problem and demonstrate its generality using data sets from several domains. Our experience shows that this solution can facilitate the use of sensor data by general users and, therefore, can increase the utility of sensor network infrastructures.
Resumo:
A Near Infrared Spectroscopy (NIRS) industrial application was developed by the LPF-Tagralia team, and transferred to a Spanish dehydrator company (Agrotécnica Extremeña S.L.) for the classification of dehydrator onion bulbs for breeding purposes. The automated operation of the system has allowed the classification of more than one million onion bulbs during seasons 2004 to 2008 (Table 1). The performance achieved by the original model (R2=0,65; SEC=2,28ºBrix) was enough for qualitative classification thanks to the broad range of variation of the initial population (18ºBrix). Nevertheless, a reduction of the classification performance of the model has been observed with the passing of seasons. One of the reasons put forward is the reduction of the range of variation that naturally occurs during a breeding process, the other is the variations in other parameters than the variable of interest but whose effects would probably be affecting the measurements [1]. This study points to the application of Independent Component Analysis (ICA) on this highly variable dataset coming from a NIRS industrial application for the identification of the different sources of variation present through seasons.
Resumo:
The synapses in the cerebral cortex can be classified into two main types, Gray’s type I and type II, which correspond to asymmetric (mostly glutamatergic excitatory) and symmetric (inhibitory GABAergic) synapses, respectively. Hence, the quantification and identification of their different types and the proportions in which they are found, is extraordinarily important in terms of brain function. The ideal approach to calculate the number of synapses per unit volume is to analyze 3D samples reconstructed from serial sections. However, obtaining serial sections by transmission electron microscopy is an extremely time consuming and technically demanding task. Using focused ion beam/scanning electron microscope microscopy, we recently showed that virtually all synapses can be accurately identified as asymmetric or symmetric synapses when they are visualized, reconstructed, and quantified from large 3D tissue samples obtained in an automated manner. Nevertheless, the analysis, segmentation, and quantification of synapses is still a labor intensive procedure. Thus, novel solutions are currently necessary to deal with the large volume of data that is being generated by automated 3D electron microscopy. Accordingly, we have developed ESPINA, a software tool that performs the automated segmentation and counting of synapses in a reconstructed 3D volume of the cerebral cortex, and that greatly facilitates and accelerates these processes.
Resumo:
Goal independent analysis of logic programs is commonly discussed in the context of the bottom-up approach. However, while the literature is rich in descriptions of top-down analysers and their application, practical experience with bottom-up analysis is still in a preliminary stage. Moreover, the practical use of existing top-down frameworks for goal independent analysis has not been addressed in a practical system. We illustrate the efficient use of existing goal dependent, top-down frameworks for abstract interpretation in performing goal independent analyses of logic programs much the same as those usually derived from bottom-up frameworks. We present several optimizations for this flavour of top-down analysis. The approach is fully implemented within an existing top-down framework. Several implementation tradeoffs are discussed as well as the influence of domain characteristics. An experimental evaluation including a comparison with a bottom-up analysis for the domain Prop is presented. We conclude that the technique can offer advantages with respect to standard goal dependent analyses.
Resumo:
Automatic cost analysis of programs has been traditionally concentrated on a reduced number of resources such as execution steps, time, or memory. However, the increasing relevance of analysis applications such as static debugging and/or certiflcation of user-level properties (including for mobile code) makes it interesting to develop analyses for resource notions that are actually application-dependent. This may include, for example, bytes sent or received by an application, number of files left open, number of SMSs sent or received, number of accesses to a datábase, money spent, energy consumption, etc. We present a fully automated analysis for inferring upper bounds on the usage that a Java bytecode program makes of a set of application programmer-deflnable resources. In our context, a resource is defined by programmer-provided annotations which state the basic consumption that certain program elements make of that resource. From these deflnitions our analysis derives functions which return an upper bound on the usage that the whole program (and individual blocks) make of that resource for any given set of input data sizes. The analysis proposed is independent of the particular resource. We also present some experimental results from a prototype implementation of the approach covering a signiflcant set of interesting resources.
Resumo:
Automatic cost analysis of programs has been traditionally studied in terms of a number of concrete, predefined resources such as execution steps, time, or memory. However, the increasing relevance of analysis applications such as static debugging and/or certification of user-level properties (including for mobile code) makes it interesting to develop analyses for resource notions that are actually applicationdependent. This may include, for example, bytes sent or received by an application, number of files left open, number of SMSs sent or received, number of accesses to a database, money spent, energy consumption, etc. We present a fully automated analysis for inferring upper bounds on the usage that a Java bytecode program makes of a set of application programmer-definable resources. In our context, a resource is defined by programmer-provided annotations which state the basic consumption that certain program elements make of that resource. From these definitions our analysis derives functions which return an upper bound on the usage that the whole program (and individual blocks) make of that resource for any given set of input data sizes. The analysis proposed is independent of the particular resource. We also present some experimental results from a prototype implementation of the approach covering an ample set of interesting resources.
Resumo:
While workflow technology has gained momentum in the last decade as a means for specifying and enacting computational experiments in modern science, reusing and repurposing existing workflows to build new scientific experiments is still a daunting task. This is partly due to the difficulty that scientists experience when attempting to understand existing workflows, which contain several data preparation and adaptation steps in addition to the scientifically significant analysis steps. One way to tackle the understandability problem is through providing abstractions that give a high-level view of activities undertaken within workflows. As a first step towards abstractions, we report in this paper on the results of a manual analysis performed over a set of real-world scientific workflows from Taverna and Wings systems. Our analysis has resulted in a set of scientific workflow motifs that outline i) the kinds of data intensive activities that are observed in workflows (data oriented motifs), and ii) the different manners in which activities are implemented within workflows (workflow oriented motifs). These motifs can be useful to inform workflow designers on the good and bad practices for workflow development, to inform the design of automated tools for the generation of workflow abstractions, etc.