969 resultados para Data Flow Algorithm
Resumo:
A deep theoretical analysis of the graph cut image segmentation framework presented in this paper simultaneously translates into important contributions in several directions. The most important practical contribution of this work is a full theoretical description, and implementation, of a novel powerful segmentation algorithm, GC(max). The output of GC(max) coincides with a version of a segmentation algorithm known as Iterative Relative Fuzzy Connectedness, IRFC. However, GC(max) is considerably faster than the classic IRFC algorithm, which we prove theoretically and show experimentally. Specifically, we prove that, in the worst case scenario, the GC(max) algorithm runs in linear time with respect to the variable M=|C|+|Z|, where |C| is the image scene size and |Z| is the size of the allowable range, Z, of the associated weight/affinity function. For most implementations, Z is identical to the set of allowable image intensity values, and its size can be treated as small with respect to |C|, meaning that O(M)=O(|C|). In such a situation, GC(max) runs in linear time with respect to the image size |C|. We show that the output of GC(max) constitutes a solution of a graph cut energy minimization problem, in which the energy is defined as the a"" (a) norm ayenF (P) ayen(a) of the map F (P) that associates, with every element e from the boundary of an object P, its weight w(e). This formulation brings IRFC algorithms to the realm of the graph cut energy minimizers, with energy functions ayenF (P) ayen (q) for qa[1,a]. Of these, the best known minimization problem is for the energy ayenF (P) ayen(1), which is solved by the classic min-cut/max-flow algorithm, referred to often as the Graph Cut algorithm. We notice that a minimization problem for ayenF (P) ayen (q) , qa[1,a), is identical to that for ayenF (P) ayen(1), when the original weight function w is replaced by w (q) . Thus, any algorithm GC(sum) solving the ayenF (P) ayen(1) minimization problem, solves also one for ayenF (P) ayen (q) with qa[1,a), so just two algorithms, GC(sum) and GC(max), are enough to solve all ayenF (P) ayen (q) -minimization problems. We also show that, for any fixed weight assignment, the solutions of the ayenF (P) ayen (q) -minimization problems converge to a solution of the ayenF (P) ayen(a)-minimization problem (ayenF (P) ayen(a)=lim (q -> a)ayenF (P) ayen (q) is not enough to deduce that). An experimental comparison of the performance of GC(max) and GC(sum) algorithms is included. This concentrates on comparing the actual (as opposed to provable worst scenario) algorithms' running time, as well as the influence of the choice of the seeds on the output.
Resumo:
This work proposes a system for classification of industrial steel pieces by means of magnetic nondestructive device. The proposed classification system presents two main stages, online system stage and off-line system stage. In online stage, the system classifies inputs and saves misclassification information in order to perform posterior analyses. In the off-line optimization stage, the topology of a Probabilistic Neural Network is optimized by a Feature Selection algorithm combined with the Probabilistic Neural Network to increase the classification rate. The proposed Feature Selection algorithm searches for the signal spectrogram by combining three basic elements: a Sequential Forward Selection algorithm, a Feature Cluster Grow algorithm with classification rate gradient analysis and a Sequential Backward Selection. Also, a trash-data recycling algorithm is proposed to obtain the optimal feedback samples selected from the misclassified ones.
Resumo:
The miniaturization race in the hardware industry aiming at continuous increasing of transistor density on a die does not bring respective application performance improvements any more. One of the most promising alternatives is to exploit a heterogeneous nature of common applications in hardware. Supported by reconfigurable computation, which has already proved its efficiency in accelerating data intensive applications, this concept promises a breakthrough in contemporary technology development. Memory organization in such heterogeneous reconfigurable architectures becomes very critical. Two primary aspects introduce a sophisticated trade-off. On the one hand, a memory subsystem should provide well organized distributed data structure and guarantee the required data bandwidth. On the other hand, it should hide the heterogeneous hardware structure from the end-user, in order to support feasible high-level programmability of the system. This thesis work explores the heterogeneous reconfigurable hardware architectures and presents possible solutions to cope the problem of memory organization and data structure. By the example of the MORPHEUS heterogeneous platform, the discussion follows the complete design cycle, starting from decision making and justification, until hardware realization. Particular emphasis is made on the methods to support high system performance, meet application requirements, and provide a user-friendly programmer interface. As a result, the research introduces a complete heterogeneous platform enhanced with a hierarchical memory organization, which copes with its task by means of separating computation from communication, providing reconfigurable engines with computation and configuration data, and unification of heterogeneous computational devices using local storage buffers. It is distinguished from the related solutions by distributed data-flow organization, specifically engineered mechanisms to operate with data on local domains, particular communication infrastructure based on Network-on-Chip, and thorough methods to prevent computation and communication stalls. In addition, a novel advanced technique to accelerate memory access was developed and implemented.
Resumo:
[EN]We present a new method, based on the idea of the meccano method and a novel T-mesh optimization procedure, to construct a T-spline parameterization of 2D geometries for the application of isogeometric analysis. The proposed method only demands a boundary representation of the geometry as input data. The algorithm obtains, as a result, high quality parametric transformation between 2D objects and the parametric domain, the unit square. First, we define a parametric mapping between the input boundary of the object and the boundary of the parametric domain. Then, we build a T-mesh adapted to the geometric singularities of the domain in order to preserve the features of the object boundary with a desired tolerance…
Resumo:
The Gaia space mission is a major project for the European astronomical community. As challenging as it is, the processing and analysis of the huge data-flow incoming from Gaia is the subject of thorough study and preparatory work by the DPAC (Data Processing and Analysis Consortium), in charge of all aspects of the Gaia data reduction. This PhD Thesis was carried out in the framework of the DPAC, within the team based in Bologna. The task of the Bologna team is to define the calibration model and to build a grid of spectro-photometric standard stars (SPSS) suitable for the absolute flux calibration of the Gaia G-band photometry and the BP/RP spectrophotometry. Such a flux calibration can be performed by repeatedly observing each SPSS during the life-time of the Gaia mission and by comparing the observed Gaia spectra to the spectra obtained by our ground-based observations. Due to both the different observing sites involved and the huge amount of frames expected (≃100000), it is essential to maintain the maximum homogeneity in data quality, acquisition and treatment, and a particular care has to be used to test the capabilities of each telescope/instrument combination (through the “instrument familiarization plan”), to devise methods to keep under control, and eventually to correct for, the typical instrumental effects that can affect the high precision required for the Gaia SPSS grid (a few % with respect to Vega). I contributed to the ground-based survey of Gaia SPSS in many respects: with the observations, the instrument familiarization plan, the data reduction and analysis activities (both photometry and spectroscopy), and to the maintenance of the data archives. However, the field I was personally responsible for was photometry and in particular relative photometry for the production of short-term light curves. In this context I defined and tested a semi-automated pipeline which allows for the pre-reduction of imaging SPSS data and the production of aperture photometry catalogues ready to be used for further analysis. A series of semi-automated quality control criteria are included in the pipeline at various levels, from pre-reduction, to aperture photometry, to light curves production and analysis.
Resumo:
This work presents exact algorithms for the Resource Allocation and Cyclic Scheduling Problems (RA&CSPs). Cyclic Scheduling Problems arise in a number of application areas, such as in hoist scheduling, mass production, compiler design (implementing scheduling loops on parallel architectures), software pipelining, and in embedded system design. The RA&CS problem concerns time and resource assignment to a set of activities, to be indefinitely repeated, subject to precedence and resource capacity constraints. In this work we present two constraint programming frameworks facing two different types of cyclic problems. In first instance, we consider the disjunctive RA&CSP, where the allocation problem considers unary resources. Instances are described through the Synchronous Data-flow (SDF) Model of Computation. The key problem of finding a maximum-throughput allocation and scheduling of Synchronous Data-Flow graphs onto a multi-core architecture is NP-hard and has been traditionally solved by means of heuristic (incomplete) algorithms. We propose an exact (complete) algorithm for the computation of a maximum-throughput mapping of applications specified as SDFG onto multi-core architectures. Results show that the approach can handle realistic instances in terms of size and complexity. Next, we tackle the Cyclic Resource-Constrained Scheduling Problem (i.e. CRCSP). We propose a Constraint Programming approach based on modular arithmetic: in particular, we introduce a modular precedence constraint and a global cumulative constraint along with their filtering algorithms. Many traditional approaches to cyclic scheduling operate by fixing the period value and then solving a linear problem in a generate-and-test fashion. Conversely, our technique is based on a non-linear model and tackles the problem as a whole: the period value is inferred from the scheduling decisions. The proposed approaches have been tested on a number of non-trivial synthetic instances and on a set of realistic industrial instances achieving good results on practical size problem.
Resumo:
An optimizing compiler internal representation fundamentally affects the clarity, efficiency and feasibility of optimization algorithms employed by the compiler. Static Single Assignment (SSA) as a state-of-the-art program representation has great advantages though still can be improved. This dissertation explores the domain of single assignment beyond SSA, and presents two novel program representations: Future Gated Single Assignment (FGSA) and Recursive Future Predicated Form (RFPF). Both FGSA and RFPF embed control flow and data flow information, enabling efficient traversal program information and thus leading to better and simpler optimizations. We introduce future value concept, the designing base of both FGSA and RFPF, which permits a consumer instruction to be encountered before the producer of its source operand(s) in a control flow setting. We show that FGSA is efficiently computable by using a series T1/T2/TR transformation, yielding an expected linear time algorithm for combining together the construction of the pruned single assignment form and live analysis for both reducible and irreducible graphs. As a result, the approach results in an average reduction of 7.7%, with a maximum of 67% in the number of gating functions compared to the pruned SSA form on the SPEC2000 benchmark suite. We present a solid and near optimal framework to perform inverse transformation from single assignment programs. We demonstrate the importance of unrestricted code motion and present RFPF. We develop algorithms which enable instruction movement in acyclic, as well as cyclic regions, and show the ease to perform optimizations such as Partial Redundancy Elimination on RFPF.
Resumo:
Abstract Due to recent scientific and technological advances in information sys¬tems, it is now possible to perform almost every application on a mobile device. The need to make sense of such devices more intelligent opens an opportunity to design data mining algorithm that are able to autonomous execute in local devices to provide the device with knowledge. The problem behind autonomous mining deals with the proper configuration of the algorithm to produce the most appropriate results. Contextual information together with resource information of the device have a strong impact on both the feasibility of a particu¬lar execution and on the production of the proper patterns. On the other hand, performance of the algorithm expressed in terms of efficacy and efficiency highly depends on the features of the dataset to be analyzed together with values of the parameters of a particular implementation of an algorithm. However, few existing approaches deal with autonomous configuration of data mining algorithms and in any case they do not deal with contextual or resources information. Both issues are of particular significance, in particular for social net¬works application. In fact, the widespread use of social networks and consequently the amount of information shared have made the need of modeling context in social application a priority. Also the resource consumption has a crucial role in such platforms as the users are using social networks mainly on their mobile devices. This PhD thesis addresses the aforementioned open issues, focusing on i) Analyzing the behavior of algorithms, ii) mapping contextual and resources information to find the most appropriate configuration iii) applying the model for the case of a social recommender. Four main contributions are presented: - The EE-Model: is able to predict the behavior of a data mining algorithm in terms of resource consumed and accuracy of the mining model it will obtain. - The SC-Mapper: maps a situation defined by the context and resource state to a data mining configuration. - SOMAR: is a social activity (event and informal ongoings) recommender for mobile devices. - D-SOMAR: is an evolution of SOMAR which incorporates the configurator in order to provide updated recommendations. Finally, the experimental validation of the proposed contributions using synthetic and real datasets allows us to achieve the objectives and answer the research questions proposed for this dissertation.
Resumo:
Analysis of river flow using hydraulic modelling and its implications in derived environ-mental applications are inextricably connected with the way in which the river boundary shape is represented. This relationship is scale-dependent upon the modelling resolution which in turn determines the importance of a subscale performance of the model and the way subscale (surface and flow) processes are parameterised. Commonly, the subscale behaviour of the model relies upon a roughness parameterisation whose meaning depends on the dimensionality of the hydraulic model and the resolution of the topographic represen¬tation scale. This latter is, in turn, dependent on the resolution of the computational mesh as well as on the detail of measured topographic data. Flow results are affected by this interactions between scale and subscale parameterisation according to the dimensionality approach. The aim of this dissertation is the evaluation of these interactions upon hy¬draulic modelling results. Current high resolution topographic source availability induce this research which is tackled using a suitable roughness approach according to each di¬mensionality with the purpose of the interaction assessment. A 1D HEC-RAS model, a 2D raster-based diffusion-wave model with a scale-dependent distributed roughness parame-terisation and a 3D finite volume scheme with a porosity algorithm approach to incorporate complex topography have been used. Different topographic sources are assessed using a 1D scheme. LiDAR data are used to isolate the mesh resolution from the topographic content of the DEM effects upon 2D and 3D flow results. A distributed roughness parameterisation, using a roughness height approach dependent upon both mesh resolution and topographic content is developed and evaluated for the 2D scheme. Grain-size data and fractal methods are used for the reconstruction of topography with microscale information, required for some applications but not easily available. Sensitivity of hydraulic parameters to this topographic parameterisation is evaluated in a 3D scheme at different mesh resolu¬tions. Finally, the structural variability of simulated flow is analysed and related to scale interactions. Model simulations demonstrate (i) the importance of the topographic source in a 1D models; (ii) the mesh resolution approach is dominant in 2D and 3D simulations whereas in a 1D model the topographic source and even the roughness parameterisation impacts are more critical; (iii) the increment of the sensitivity to roughness parameterisa-tion in 1D and 2D schemes with detailed topographic sources and finer mesh resolutions; and (iv) the topographic content and microtopography impact throughout the vertical profile of computed 3D velocity in a depth-dependent way, whereas 2D results are not affected by topographic content variations. Finally, the spatial analysis shows that the mesh resolution controls high resolution model scale results, roughness parameterisation control 2D simulation results for a constant mesh resolution; and topographic content and micro-topography variations impacts upon the organisation of flow results depth-dependently in a 3D scheme. Resumen La topografía juega un papel fundamental en la distribución del agua y la energía en los paisajes naturales (Beven and Kirkby 1979; Wood et al. 1997). La simulación hidráulica combinada con métodos de medición del terreno por teledetección constituyen una poderosa herramienta de investigación en la comprensión del comportamiento de los flujos de agua debido a la variabilidad de la superficie sobre la que fluye. La representación e incorporación de la topografía en el esquema hidráulico tiene una importancia crucial en los resultados y determinan el desarrollo de sus aplicaciones al campo medioambiental. Cualquier simulación es una simplificación de un proceso del mundo real, y por tanto el grado de simplificación determinará el significado de los resultados simulados. Este razonamiento es particularmente difícil de trasladar a la simulación hidráulica donde aspectos de la escala tan diferentes como la escala de los procesos de flujo y de representación del contorno son considerados conjuntamente incluso en fases de parametrización (e.g. parametrización de la rugosidad). Por una parte, esto es debido a que las decisiones de escala vienen condicionadas entre ellas (e.g. la dimensionalidad del modelo condiciona la escala de representación del contorno) y por tanto interaccionan en sus resultados estrechamente. Y por otra parte, debido a los altos requerimientos numéricos y computacionales de una representación explícita de alta resolución de los procesos de flujo y discretización de la malla. Además, previo a la modelización hidráulica, la superficie del terreno sobre la que el agua fluye debe ser modelizada y por tanto presenta su propia escala de representación, que a su vez dependerá de la escala de los datos topográficos medidos con que se elabora el modelo. En última instancia, esta topografía es la que determina el comportamiento espacial del flujo. Por tanto, la escala de la topografía en sus fases de medición y modelización (resolución de los datos y representación topográfica) previas a su incorporación en el modelo hidráulico producirá a su vez un impacto que se acumulará al impacto global resultante debido a la escala computacional del modelo hidráulico y su dimensión. La comprensión de las interacciones entre las complejas geometrías del contorno y la estructura del flujo utilizando la modelización hidráulica depende de las escalas consideradas en la simplificación de los procesos hidráulicos y del terreno (dimensión del modelo, tamaño de escala computacional y escala de los datos topográficos). La naturaleza de la aplicación del modelo hidráulico (e.g. habitat físico, análisis de riesgo de inundaciones, transporte de sedimentos) determina en primer lugar la escala del estudio y por tanto el detalle de los procesos a simular en el modelo (i.e. la dimensionalidad) y, en consecuencia, la escala computacional a la que se realizarán los cálculos (i.e. resolución computacional). Esta última a su vez determina, el detalle geográfico con que deberá representarse el contorno acorde con la resolución de la malla computacional. La parametrización persigue incorporar en el modelo hidráulico la cuantificación de los procesos y condiciones físicas del sistema natural y por tanto debe incluir no solo aquellos procesos que tienen lugar a la escala de modelización, sino también aquellos que tienen lugar a un nivel subescalar y que deben ser definidos mediante relaciones de escalado con las variables modeladas explícitamente. Dicha parametrización se implementa en la práctica mediante la provisión de datos al modelo, por tanto la escala de los datos geográficos utilizados para parametrizar el modelo no sólo influirá en los resultados, sino también determinará la importancia del comportamiento subescalar del modelo y el modo en que estos procesos deban ser parametrizados (e.g. la variabilidad natural del terreno dentro de la celda de discretización o el flujo en las direcciones laterales y verticales en un modelo unidimensional). En esta tesis, se han utilizado el modelo unidimensional HEC-RAS, (HEC 1998b), un modelo ráster bidimensional de propagación de onda, (Yu 2005) y un esquema tridimensional de volúmenes finitos con un algoritmo de porosidad para incorporar la topografía, (Lane et al. 2004; Hardy et al. 2005). La geometría del contorno viene definida por la escala de representación topográfica (resolución de malla y contenido topográfico), la cual a su vez depende de la escala de la fuente cartográfica. Todos estos factores de escala interaccionan en la respuesta del modelo hidráulico a la topografía. En los últimos años, métodos como el análisis fractal y las técnicas geoestadísticas utilizadas para representar y analizar elementos geográficos (e.g. en la caracterización de superficies (Herzfeld and Overbeck 1999; Butler et al. 2001)), están promoviendo nuevos enfoques en la cuantificación de los efectos de escala (Lam et al. 2004; Atkinson and Tate 2000; Lam et al. 2006) por medio del análisis de la estructura espacial de la variable (e.g. Bishop et al. 2006; Ju et al. 2005; Myint et al. 2004; Weng 2002; Bian and Xie 2004; Southworth et al. 2006; Pozd-nyakova et al. 2005; Kyriakidis and Goodchild 2006). Estos métodos cuantifican tanto el rango de valores de la variable presentes a diferentes escalas como la homogeneidad o heterogeneidad de la variable espacialmente distribuida (Lam et al. 2004). En esta tesis, estas técnicas se han utilizado para analizar el impacto de la topografía sobre la estructura de los resultados hidráulicos simulados. Los datos de teledetección de alta resolución y técnicas GIS también están siendo utilizados para la mejor compresión de los efectos de escala en modelos medioambientales (Marceau 1999; Skidmore 2002; Goodchild 2003) y se utilizan en esta tesis. Esta tesis como corpus de investigación aborda las interacciones de esas escalas en la modelización hidráulica desde un punto de vista global e interrelacionado. Sin embargo, la estructura y el foco principal de los experimentos están relacionados con las nociones espaciales de la escala de representación en relación con una visión global de las interacciones entre escalas. En teoría, la representación topográfica debe caracterizar la superficie sobre la que corre el agua a una adecuada (conforme a la finalidad y dimensión del modelo) escala de discretización, de modo que refleje los procesos de interés. La parametrización de la rugosidad debe de reflejar los efectos de la variabilidad de la superficie a escalas de más detalle que aquellas representadas explícitamente en la malla topográfica (i.e. escala de discretización). Claramente, ambos conceptos están físicamente relacionados por un
Resumo:
Ubiquitous computing software needs to be autonomous so that essential decisions such as how to configure its particular execution are self-determined. Moreover, data mining serves an important role for ubiquitous computing by providing intelligence to several types of ubiquitous computing applications. Thus, automating ubiquitous data mining is also crucial. We focus on the problem of automatically configuring the execution of a ubiquitous data mining algorithm. In our solution, we generate configuration decisions in a resource aware and context aware manner since the algorithm executes in an environment in which the context often changes and computing resources are often severely limited. We propose to analyze the execution behavior of the data mining algorithm by mining its past executions. By doing so, we discover the effects of resource and context states as well as parameter settings on the data mining quality. We argue that a classification model is appropriate for predicting the behavior of an algorithm?s execution and we concentrate on decision tree classifier. We also define taxonomy on data mining quality so that tradeoff between prediction accuracy and classification specificity of each behavior model that classifies by a different abstraction of quality, is scored for model selection. Behavior model constituents and class label transformations are formally defined and experimental validation of the proposed approach is also performed.
Resumo:
PAMELA (Phased Array Monitoring for Enhanced Life Assessment) SHMTM System is an integrated embedded ultrasonic guided waves based system consisting of several electronic devices and one system manager controller. The data collected by all PAMELA devices in the system must be transmitted to the controller, who will be responsible for carrying out the advanced signal processing to obtain SHM maps. PAMELA devices consist of hardware based on a Virtex 5 FPGA with a PowerPC 440 running an embedded Linux distribution. Therefore, PAMELA devices, in addition to the capability of performing tests and transmitting the collected data to the controller, have the capability of perform local data processing or pre-processing (reduction, normalization, pattern recognition, feature extraction, etc.). Local data processing decreases the data traffic over the network and allows CPU load of the external computer to be reduced. Even it is possible that PAMELA devices are running autonomously performing scheduled tests, and only communicates with the controller in case of detection of structural damages or when programmed. Each PAMELA device integrates a software management application (SMA) that allows to the developer downloading his own algorithm code and adding the new data processing algorithm to the device. The development of the SMA is done in a virtual machine with an Ubuntu Linux distribution including all necessary software tools to perform the entire cycle of development. Eclipse IDE (Integrated Development Environment) is used to develop the SMA project and to write the code of each data processing algorithm. This paper presents the developed software architecture and describes the necessary steps to add new data processing algorithms to SMA in order to increase the processing capabilities of PAMELA devices.An example of basic damage index estimation using delay and sum algorithm is provided.
Resumo:
Comunicación presentada en las XVI Jornadas de Ingeniería del Software y Bases de Datos, JISBD 2011, A Coruña, 5-7 septiembre 2011.
Resumo:
In this paper a utilization of the high data-rates channels by threading of sending and receiving is studied. As a communication technology evolves the higher speeds are used more and more in various applications. But generating traffic with Gbps data-rates also brings some complications. Especially if UDP protocol is used and it is necessary to avoid packet fragmentation, for example for high-speed reliable transport protocols based on UDP. For such situation the Ethernet network packet size has to correspond to standard 1500 bytes MTU[1], which is widely used in the Internet. System may not has enough capacity to send messages with necessary rate in a single-threaded mode. A possible solution is to use more threads. It can be efficient on widespread multicore systems. Also the fact that in real network non-constant data flow can be expected brings another object of study –- an automatic adaptation to the traffic which is changing during runtime. Cases investigated in this paper include adjusting number of threads to a given speed and keeping speed on a given rate when CPU gets heavily loaded by other processes while sending data.
Resumo:
The modelling of inpatient length of stay (LOS) has important implications in health care studies. Finite mixture distributions are usually used to model the heterogeneous LOS distribution, due to a certain proportion of patients sustaining-a longer stay. However, the morbidity data are collected from hospitals, observations clustered within the same hospital are often correlated. The generalized linear mixed model approach is adopted to accommodate the inherent correlation via unobservable random effects. An EM algorithm is developed to obtain residual maximum quasi-likelihood estimation. The proposed hierarchical mixture regression approach enables the identification and assessment of factors influencing the long-stay proportion and the LOS for the long-stay patient subgroup. A neonatal LOS data set is used for illustration, (C) 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
Hannenhalli and Pevzner developed the first polynomial-time algorithm for the combinatorial problem of sorting of signed genomic data. Their algorithm solves the minimum number of reversals required for rearranging a genome to another when gene duplication is nonexisting. In this paper, we show how to extend the Hannenhalli-Pevzner approach to genomes with multigene families. We propose a new heuristic algorithm to compute the reversal distance between two genomes with multigene families via the concept of binary integer programming without removing gene duplicates. The experimental results on simulated and real biological data demonstrate that the proposed algorithm is able to find the reversal distance accurately. ©2005 IEEE