836 resultados para Data fusion applications
Resumo:
Background The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. Results We have implemented an extension of Chado – the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Conclusions Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different “omics” technologies with patient’s clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans webcite.
Resumo:
Abstract Background The study and analysis of gene expression measurements is the primary focus of functional genomics. Once expression data is available, biologists are faced with the task of extracting (new) knowledge associated to the underlying biological phenomenon. Most often, in order to perform this task, biologists execute a number of analysis activities on the available gene expression dataset rather than a single analysis activity. The integration of heteregeneous tools and data sources to create an integrated analysis environment represents a challenging and error-prone task. Semantic integration enables the assignment of unambiguous meanings to data shared among different applications in an integrated environment, allowing the exchange of data in a semantically consistent and meaningful way. This work aims at developing an ontology-based methodology for the semantic integration of gene expression analysis tools and data sources. The proposed methodology relies on software connectors to support not only the access to heterogeneous data sources but also the definition of transformation rules on exchanged data. Results We have studied the different challenges involved in the integration of computer systems and the role software connectors play in this task. We have also studied a number of gene expression technologies, analysis tools and related ontologies in order to devise basic integration scenarios and propose a reference ontology for the gene expression domain. Then, we have defined a number of activities and associated guidelines to prescribe how the development of connectors should be carried out. Finally, we have applied the proposed methodology in the construction of three different integration scenarios involving the use of different tools for the analysis of different types of gene expression data. Conclusions The proposed methodology facilitates the development of connectors capable of semantically integrating different gene expression analysis tools and data sources. The methodology can be used in the development of connectors supporting both simple and nontrivial processing requirements, thus assuring accurate data exchange and information interpretation from exchanged data.
Resumo:
O cenário atual de repositórios digitais mundialmente distribuídos estimula estudos diversificados com os quais esse trabalho visa contribuir, objetivando um levantamento dos repositórios de instituições de ensino superior no Brasil, verificando a eficácia de uma ferramenta experimental no tratamento e análise dos dados e usando como fontes os diretórios Registry of Open Access Repositories (ROAR), Directory of Open Access Repositories (OpenDOAR), Diretório Luso-Brasileiro de Periódicos e Repositórios de Acesso Livre e a lista L_repositories. A ferramenta experimental Google Fusion Tables foi aplicada nos dados dos repositórios institucionais pesquisados, categorizando suas principais características: Instituição mantenedora, Natureza da instituição, Local, Região geográfica, Software adotado e sua versão, adoção do padrão Dublin Core e quantidade de trabalhos disponibilizados na data do estudo. Foram identificados 49 repositórios que em agosto de 2013 disponibilizavam 396.881 itens, sendo as instituições federais as com maior povoamento e o repositório LUME o primeiro em volume de itens; a região Sudeste com o maior número de repositórios e volume de itens disponibilizados; o DSpace o software predominante, com maior utilização da versão 1.6.2 e o padrão de metadados Dublin Core em todas as aplicações desse software. Este estudo comprovou a eficácia e utilidade do FusionTables, permitindo caracterizar o panorama atual de repositórios de instituições de ensino superior no Brasil. Os resultados foram disponibilizados em um Catálogo de Repositórios de Instituições de Ensino Superior no Brasil e um Mapa interativo dos Repositórios de Instituições de Ensino Superior no Brasil.
Resumo:
The synthesis of zirconia-based ordered mesoporous structures for catalytic applications is a research area under development. These systems are also potential candidates as anodes in intermediate temperature solid oxide fuel cells (it-SOFC) due to an enhancement on their surface area [1-4]. The structural features of mesoporous zirconia-ceria materials in combination with oxygen storage/release capacity (OSC) are crucial for various catalytic reactions. The direct use of hydrocarbons as fuel for the SOFC (instead of pure H2), without the necessity of reforming and purification reactors can improve global efficiency of these systems [4]. The X-ray diffraction data showed that ZrO2-x%CeO2 samples with x>50 are formed by a larger fraction of the cubic phase (spatial group Fm3m), while for x<50 the major crystalline structure is the tetragonal phase (spatial group P42/nmc). The crystallite size of the cubic phase increases with increase in ceria content. The tetragonal crystallite size decreases when ceria content increases. After impregnation, the Rietveld analysis showed a NiO content around 60wt.% for all samples. The lattice parameters for the ZrO2 tetragonal phase are lower for higher ZrO2 contents, while for all samples the cubic NiO and CeO2 parameters do not present changes. The calculated densities are higher for higher ceria content, as expected. The crystallite size of NiO are similar (~20nm) for all samples and 55nm for the NiO standard. Nitrogen adsorption experiments revealed a broader particle size distribution for higher CeO2 content. The superficial area values were around 35m2/g for all samples, the average pore diameter and pore volumes were higher when increasing ceria content. After NiO impregnation the particle size distribution was the same for all samples, with two pore sizes, the first around 3nm and a broader peak around 10nm. The superficial area increased to approximately 45m2/g for all samples, and the pore volume was also higher after impregnation and increased when ceria content increased. These results point up that the impregnation of NiO improves the textural characteristics of the pristine material. The complementary TEM/EDS images present a homogeneous coating of NiO particles over the ZrO2-x%CeO2 support, showing that these samples are excellent for catalysis applications. [1] D. Y. Zhao, J. Feng, Q. Huo, N. Melosh, G. H. Fredrickson, B. F. Chmelka, G. D. Stucky, Science 279, 548-552 (1998). [2] C. Yu, Y. Yu, D. Zhao, Chem. Comm. 575-576 (2000). [3] A. Trovarelli, M. Boaro, E. Rocchini, C. de Leitenburg, G. Dolcetti, J. Alloys Compd. 323-324 (2001) 584-591. [4] S. Larrondo, M. A. Vidal, B. Irigoyen, A. F. Craievich, D. G. Lamas, I. O. Fábregas, et al. Catal. Today 107–108 (2005) 53-59.
Resumo:
This work proposes a system for classification of industrial steel pieces by means of magnetic nondestructive device. The proposed classification system presents two main stages, online system stage and off-line system stage. In online stage, the system classifies inputs and saves misclassification information in order to perform posterior analyses. In the off-line optimization stage, the topology of a Probabilistic Neural Network is optimized by a Feature Selection algorithm combined with the Probabilistic Neural Network to increase the classification rate. The proposed Feature Selection algorithm searches for the signal spectrogram by combining three basic elements: a Sequential Forward Selection algorithm, a Feature Cluster Grow algorithm with classification rate gradient analysis and a Sequential Backward Selection. Also, a trash-data recycling algorithm is proposed to obtain the optimal feedback samples selected from the misclassified ones.
Resumo:
An important feature in computer systems developed for the agricultural sector is to satisfy the heterogeneity of data generated in different processes. Most problems related with this heterogeneity arise from the lack of standard for different computing solutions proposed. An efficient solution for that is to create a single standard for data exchange. The study on the actual process involved in cotton production was based on a research developed by the Brazilian Agricultural Research Corporation (EMBRAPA) that reports all phases as a result of the compilation of several theoretical and practical researches related to cotton crop. The proposition of a standard starts with the identification of the most important classes of data involved in the process, and includes an ontology that is the systematization of concepts related to the production of cotton fiber and results in a set of classes, relations, functions and instances. The results are used as a reference for the development of computational tools, transforming implicit knowledge into applications that support the knowledge described. This research is based on data from the Midwest of Brazil. The choice of the cotton process as a study case comes from the fact that Brazil is one of the major players and there are several improvements required for system integration in this segment.
Resumo:
[EN] We discuss the processing of data recorded with multimonochromatic x-ray imagers (MMI) in inertial confinement fusion experiments. The MMI records hundreds of gated, spectrally resolved images that can be used to unravel the spatial structure of the implosion core. In particular, we present a new method to determine the centers in all the array of images, a better reconstruction technique of narrowband implosion core images, two algorithms to determine the shape and size of the implosion core volume based on reconstructed broadband images recorded along three-quasiorthogonal lines of sight, and the removal of artifacts from the space-integrated spectra.
Resumo:
Although Recovery is often defined as the less studied and documented phase of the Emergency Management Cycle, a wide literature is available for describing characteristics and sub-phases of this process. Previous works do not allow to gain an overall perspective because of a lack of systematic consistent monitoring of recovery utilizing advanced technologies such as remote sensing and GIS technologies. Taking into consideration the key role of Remote Sensing in Response and Damage Assessment, this thesis is aimed to verify the appropriateness of such advanced monitoring techniques to detect recovery advancements over time, with close attention to the main characteristics of the study event: Hurricane Katrina storm surge. Based on multi-source, multi-sensor and multi-temporal data, the post-Katrina recovery was analysed using both a qualitative and a quantitative approach. The first phase was dedicated to the investigation of the relation between urban types, damage and recovery state, referring to geographical and technological parameters. Damage and recovery scales were proposed to review critical observations on remarkable surge- induced effects on various typologies of structures, analyzed at a per-building level. This wide-ranging investigation allowed a new understanding of the distinctive features of the recovery process. A quantitative analysis was employed to develop methodological procedures suited to recognize and monitor distribution, timing and characteristics of recovery activities in the study area. Promising results, gained by applying supervised classification algorithms to detect localization and distribution of blue tarp, have proved that this methodology may help the analyst in the detection and monitoring of recovery activities in areas that have been affected by medium damage. The study found that Mahalanobis Distance was the classifier which provided the most accurate results, in localising blue roofs with 93.7% of blue roof classified correctly and a producer accuracy of 70%. It was seen to be the classifier least sensitive to spectral signature alteration. The application of the dissimilarity textural classification to satellite imagery has demonstrated the suitability of this technique for the detection of debris distribution and for the monitoring of demolition and reconstruction activities in the study area. Linking these geographically extensive techniques with expert per-building interpretation of advanced-technology ground surveys provides a multi-faceted view of the physical recovery process. Remote sensing and GIS technologies combined to advanced ground survey approach provides extremely valuable capability in Recovery activities monitoring and may constitute a technical basis to lead aid organization and local government in the Recovery management.
Resumo:
The main aim of this Ph.D. dissertation is the study of clustering dependent data by means of copula functions with particular emphasis on microarray data. Copula functions are a popular multivariate modeling tool in each field where the multivariate dependence is of great interest and their use in clustering has not been still investigated. The first part of this work contains the review of the literature of clustering methods, copula functions and microarray experiments. The attention focuses on the K–means (Hartigan, 1975; Hartigan and Wong, 1979), the hierarchical (Everitt, 1974) and the model–based (Fraley and Raftery, 1998, 1999, 2000, 2007) clustering techniques because their performance is compared. Then, the probabilistic interpretation of the Sklar’s theorem (Sklar’s, 1959), the estimation methods for copulas like the Inference for Margins (Joe and Xu, 1996) and the Archimedean and Elliptical copula families are presented. In the end, applications of clustering methods and copulas to the genetic and microarray experiments are highlighted. The second part contains the original contribution proposed. A simulation study is performed in order to evaluate the performance of the K–means and the hierarchical bottom–up clustering methods in identifying clusters according to the dependence structure of the data generating process. Different simulations are performed by varying different conditions (e.g., the kind of margins (distinct, overlapping and nested) and the value of the dependence parameter ) and the results are evaluated by means of different measures of performance. In light of the simulation results and of the limits of the two investigated clustering methods, a new clustering algorithm based on copula functions (‘CoClust’ in brief) is proposed. The basic idea, the iterative procedure of the CoClust and the description of the written R functions with their output are given. The CoClust algorithm is tested on simulated data (by varying the number of clusters, the copula models, the dependence parameter value and the degree of overlap of margins) and is compared with the performance of model–based clustering by using different measures of performance, like the percentage of well–identified number of clusters and the not rejection percentage of H0 on . It is shown that the CoClust algorithm allows to overcome all observed limits of the other investigated clustering techniques and is able to identify clusters according to the dependence structure of the data independently of the degree of overlap of margins and the strength of the dependence. The CoClust uses a criterion based on the maximized log–likelihood function of the copula and can virtually account for any possible dependence relationship between observations. Many peculiar characteristics are shown for the CoClust, e.g. its capability of identifying the true number of clusters and the fact that it does not require a starting classification. Finally, the CoClust algorithm is applied to the real microarray data of Hedenfalk et al. (2001) both to the gene expressions observed in three different cancer samples and to the columns (tumor samples) of the whole data matrix.
Resumo:
This artwork reports on two different projects that were carried out during the three years of Doctor of the Philosophy course. In the first years a project regarding Capacitive Pressure Sensors Array for Aerodynamic Applications was developed in the Applied Aerodynamic research team of the Second Faculty of Engineering, University of Bologna, Forlì, Italy, and in collaboration with the ARCES laboratories of the same university. Capacitive pressure sensors were designed and fabricated, investigating theoretically and experimentally the sensor’s mechanical and electrical behaviours by means of finite elements method simulations and by means of wind tunnel tests. During the design phase, the sensor figures of merit are considered and evaluated for specific aerodynamic applications. The aim of this work is the production of low cost MEMS-alternative devices suitable for a sensor network to be implemented in air data system. The last two year was dedicated to a project regarding Wireless Pressure Sensor Network for Nautical Applications. Aim of the developed sensor network is to sense the weak pressure field acting on the sail plan of a full batten sail by means of instrumented battens, providing a real time differential pressure map over the entire sail surface. The wireless sensor network and the sensing unit were designed, fabricated and tested in the faculty laboratories. A static non-linear coupled mechanical-electrostatic simulation, has been developed to predict the pressure versus capacitance static characteristic suitable for the transduction process and to tune the geometry of the transducer to reach the required resolution, sensitivity and time response in the appropriate full scale pressure input A time dependent viscoelastic error model has been inferred and developed by means of experimental data in order to model, predict and reduce the inaccuracy bound due to the viscolelastic phenomena affecting the Mylar® polyester film used for the sensor diaphragm. The development of the two above mentioned subjects are strictly related but presently separately in this artwork.
Resumo:
This work provides a forward step in the study and comprehension of the relationships between stochastic processes and a certain class of integral-partial differential equation, which can be used in order to model anomalous diffusion and transport in statistical physics. In the first part, we brought the reader through the fundamental notions of probability and stochastic processes, stochastic integration and stochastic differential equations as well. In particular, within the study of H-sssi processes, we focused on fractional Brownian motion (fBm) and its discrete-time increment process, the fractional Gaussian noise (fGn), which provide examples of non-Markovian Gaussian processes. The fGn, together with stationary FARIMA processes, is widely used in the modeling and estimation of long-memory, or long-range dependence (LRD). Time series manifesting long-range dependence, are often observed in nature especially in physics, meteorology, climatology, but also in hydrology, geophysics, economy and many others. We deepely studied LRD, giving many real data examples, providing statistical analysis and introducing parametric methods of estimation. Then, we introduced the theory of fractional integrals and derivatives, which indeed turns out to be very appropriate for studying and modeling systems with long-memory properties. After having introduced the basics concepts, we provided many examples and applications. For instance, we investigated the relaxation equation with distributed order time-fractional derivatives, which describes models characterized by a strong memory component and can be used to model relaxation in complex systems, which deviates from the classical exponential Debye pattern. Then, we focused in the study of generalizations of the standard diffusion equation, by passing through the preliminary study of the fractional forward drift equation. Such generalizations have been obtained by using fractional integrals and derivatives of distributed orders. In order to find a connection between the anomalous diffusion described by these equations and the long-range dependence, we introduced and studied the generalized grey Brownian motion (ggBm), which is actually a parametric class of H-sssi processes, which have indeed marginal probability density function evolving in time according to a partial integro-differential equation of fractional type. The ggBm is of course Non-Markovian. All around the work, we have remarked many times that, starting from a master equation of a probability density function f(x,t), it is always possible to define an equivalence class of stochastic processes with the same marginal density function f(x,t). All these processes provide suitable stochastic models for the starting equation. Studying the ggBm, we just focused on a subclass made up of processes with stationary increments. The ggBm has been defined canonically in the so called grey noise space. However, we have been able to provide a characterization notwithstanding the underline probability space. We also pointed out that that the generalized grey Brownian motion is a direct generalization of a Gaussian process and in particular it generalizes Brownain motion and fractional Brownain motion as well. Finally, we introduced and analyzed a more general class of diffusion type equations related to certain non-Markovian stochastic processes. We started from the forward drift equation, which have been made non-local in time by the introduction of a suitable chosen memory kernel K(t). The resulting non-Markovian equation has been interpreted in a natural way as the evolution equation of the marginal density function of a random time process l(t). We then consider the subordinated process Y(t)=X(l(t)) where X(t) is a Markovian diffusion. The corresponding time-evolution of the marginal density function of Y(t) is governed by a non-Markovian Fokker-Planck equation which involves the same memory kernel K(t). We developed several applications and derived the exact solutions. Moreover, we considered different stochastic models for the given equations, providing path simulations.
Resumo:
The main problem connected to cone beam computed tomography (CT) systems for industrial applications employing 450 kV X-ray tubes is the high amount of scattered radiation which is added to the primary radiation (signal). This stray radiation leads to a significant degradation of the image quality. A better understanding of the scattering and methods to reduce its effects are therefore necessary to improve the image quality. Several studies have been carried out in the medical field at lower energies, whereas studies in industrial CT, especially for energies up to 450 kV, are lacking. Moreover, the studies reported in literature do not consider the scattered radiation generated by the CT system structure and the walls of the X-ray room (environmental scatter). In order to investigate the scattering on CT projections a GEANT4-based Monte Carlo (MC) model was developed. The model, which has been validated against experimental data, has enabled the calculation of the scattering including the environmental scatter, the optimization of an anti-scatter grid suitable for the CT system, and the optimization of the hardware components of the CT system. The investigation of multiple scattering in the CT projections showed that its contribution is 2.3 times the one of primary radiation for certain objects. The results of the environmental scatter showed that it is the major component of the scattering for aluminum box objects of front size 70 x 70 mm2 and that it strongly depends on the thickness of the object and therefore on the projection. For that reason, its correction is one of the key factors for achieving high quality images. The anti-scatter grid optimized by means of the developed MC model was found to reduce the scatter-toprimary ratio in the reconstructed images by 20 %. The object and environmental scatter calculated by means of the simulation were used to improve the scatter correction algorithm which could be patented by Empa. The results showed that the cupping effect in the corrected image is strongly reduced. The developed CT simulation is a powerful tool to optimize the design of the CT system and to evaluate the contribution of the scattered radiation to the image. Besides, it has offered a basis for a new scatter correction approach by which it has been possible to achieve images with the same spatial resolution as state-of-the-art well collimated fan-beam CT with a gain in the reconstruction time of a factor 10. This result has a high economic impact in non-destructive testing and evaluation, and reverse engineering.
Resumo:
By the end of the 19th century, geodesy has contributed greatly to the knowledge of regional tectonics and fault movement through its ability to measure, at sub-centimetre precision, the relative positions of points on the Earth’s surface. Nowadays the systematic analysis of geodetic measurements in active deformation regions represents therefore one of the most important tool in the study of crustal deformation over different temporal scales [e.g., Dixon, 1991]. This dissertation focuses on motion that can be observed geodetically with classical terrestrial position measurements, particularly triangulation and leveling observations. The work is divided into two sections: an overview of the principal methods for estimating longterm accumulation of elastic strain from terrestrial observations, and an overview of the principal methods for rigorously inverting surface coseismic deformation fields for source geometry with tests on synthetic deformation data sets and applications in two different tectonically active regions of the Italian peninsula. For the long-term accumulation of elastic strain analysis, triangulation data were available from a geodetic network across the Messina Straits area (southern Italy) for the period 1971 – 2004. From resulting angle changes, the shear strain rates as well as the orientation of the principal axes of the strain rate tensor were estimated. The computed average annual shear strain rates for the time period between 1971 and 2004 are γ˙1 = 113.89 ± 54.96 nanostrain/yr and γ˙2 = -23.38 ± 48.71 nanostrain/yr, with the orientation of the most extensional strain (θ) at N140.80° ± 19.55°E. These results suggests that the first-order strain field of the area is dominated by extension in the direction perpendicular to the trend of the Straits, sustaining the hypothesis that the Messina Straits could represents an area of active concentrated deformation. The orientation of θ agree well with GPS deformation estimates, calculated over shorter time interval, and is consistent with previous preliminary GPS estimates [D’Agostino and Selvaggi, 2004; Serpelloni et al., 2005] and is also similar to the direction of the 1908 (MW 7.1) earthquake slip vector [e.g., Boschi et al., 1989; Valensise and Pantosti, 1992; Pino et al., 2000; Amoruso et al., 2002]. Thus, the measured strain rate can be attributed to an active extension across the Messina Straits, corresponding to a relative extension rate ranges between < 1mm/yr and up to ~ 2 mm/yr, within the portion of the Straits covered by the triangulation network. These results are consistent with the hypothesis that the Messina Straits is an important active geological boundary between the Sicilian and the Calabrian domains and support previous preliminary GPS-based estimates of strain rates across the Straits, which show that the active deformation is distributed along a greater area. Finally, the preliminary dislocation modelling has shown that, although the current geodetic measurements do not resolve the geometry of the dislocation models, they solve well the rate of interseismic strain accumulation across the Messina Straits and give useful information about the locking the depth of the shear zone. Geodetic data, triangulation and leveling measurements of the 1976 Friuli (NE Italy) earthquake, were available for the inversion of coseismic source parameters. From observed angle and elevation changes, the source parameters of the seismic sequence were estimated in a join inversion using an algorithm called “simulated annealing”. The computed optimal uniform–slip elastic dislocation model consists of a 30° north-dipping shallow (depth 1.30 ± 0.75 km) fault plane with azimuth of 273° and accommodating reverse dextral slip of about 1.8 m. The hypocentral location and inferred fault plane of the main event are then consistent with the activation of Periadriatic overthrusts or other related thrust faults as the Gemona- Kobarid thrust. Then, the geodetic data set exclude the source solution of Aoudia et al. [2000], Peruzza et al. [2002] and Poli et al. [2002] that considers the Susans-Tricesimo thrust as the May 6 event. The best-fit source model is then more consistent with the solution of Pondrelli et al. [2001], which proposed the activation of other thrusts located more to the North of the Susans-Tricesimo thrust, probably on Periadriatic related thrust faults. The main characteristics of the leveling and triangulation data are then fit by the optimal single fault model, that is, these results are consistent with a first-order rupture process characterized by a progressive rupture of a single fault system. A single uniform-slip fault model seems to not reproduce some minor complexities of the observations, and some residual signals that are not modelled by the optimal single-fault plane solution, were observed. In fact, the single fault plane model does not reproduce some minor features of the leveling deformation field along the route 36 south of the main uplift peak, that is, a second fault seems to be necessary to reproduce these residual signals. By assuming movements along some mapped thrust located southward of the inferred optimal single-plane solution, the residual signal has been successfully modelled. In summary, the inversion results presented in this Thesis, are consistent with the activation of some Periadriatic related thrust for the main events of the sequence, and with a minor importance of the southward thrust systems of the middle Tagliamento plain.
Resumo:
Bioinformatics is a recent and emerging discipline which aims at studying biological problems through computational approaches. Most branches of bioinformatics such as Genomics, Proteomics and Molecular Dynamics are particularly computationally intensive, requiring huge amount of computational resources for running algorithms of everincreasing complexity over data of everincreasing size. In the search for computational power, the EGEE Grid platform, world's largest community of interconnected clusters load balanced as a whole, seems particularly promising and is considered the new hope for satisfying the everincreasing computational requirements of bioinformatics, as well as physics and other computational sciences. The EGEE platform, however, is rather new and not yet free of problems. In addition, specific requirements of bioinformatics need to be addressed in order to use this new platform effectively for bioinformatics tasks. In my three years' Ph.D. work I addressed numerous aspects of this Grid platform, with particular attention to those needed by the bioinformatics domain. I hence created three major frameworks, Vnas, GridDBManager and SETest, plus an additional smaller standalone solution, to enhance the support for bioinformatics applications in the Grid environment and to reduce the effort needed to create new applications, additionally addressing numerous existing Grid issues and performing a series of optimizations. The Vnas framework is an advanced system for the submission and monitoring of Grid jobs that provides an abstraction with reliability over the Grid platform. In addition, Vnas greatly simplifies the development of new Grid applications by providing a callback system to simplify the creation of arbitrarily complex multistage computational pipelines and provides an abstracted virtual sandbox which bypasses Grid limitations. Vnas also reduces the usage of Grid bandwidth and storage resources by transparently detecting equality of virtual sandbox files based on content, across different submissions, even when performed by different users. BGBlast, evolution of the earlier project GridBlast, now provides a Grid Database Manager (GridDBManager) component for managing and automatically updating biological flatfile databases in the Grid environment. GridDBManager sports very novel features such as an adaptive replication algorithm that constantly optimizes the number of replicas of the managed databases in the Grid environment, balancing between response times (performances) and storage costs according to a programmed cost formula. GridDBManager also provides a very optimized automated management for older versions of the databases based on reverse delta files, which reduces the storage costs required to keep such older versions available in the Grid environment by two orders of magnitude. The SETest framework provides a way to the user to test and regressiontest Python applications completely scattered with side effects (this is a common case with Grid computational pipelines), which could not easily be tested using the more standard methods of unit testing or test cases. The technique is based on a new concept of datasets containing invocations and results of filtered calls. The framework hence significantly accelerates the development of new applications and computational pipelines for the Grid environment, and the efforts required for maintenance. An analysis of the impact of these solutions will be provided in this thesis. This Ph.D. work originated various publications in journals and conference proceedings as reported in the Appendix. Also, I orally presented my work at numerous international conferences related to Grid and bioinformatics.
Resumo:
In fluid dynamics research, pressure measurements are of great importance to define the flow field acting on aerodynamic surfaces. In fact the experimental approach is fundamental to avoid the complexity of the mathematical models for predicting the fluid phenomena. It’s important to note that, using in-situ sensor to monitor pressure on large domains with highly unsteady flows, several problems are encountered working with the classical techniques due to the transducer cost, the intrusiveness, the time response and the operating range. An interesting approach for satisfying the previously reported sensor requirements is to implement a sensor network capable of acquiring pressure data on aerodynamic surface using a wireless communication system able to collect the pressure data with the lowest environmental–invasion level possible. In this thesis a wireless sensor network for fluid fields pressure has been designed, built and tested. To develop the system, a capacitive pressure sensor, based on polymeric membrane, and read out circuitry, based on microcontroller, have been designed, built and tested. The wireless communication has been performed using the Zensys Z-WAVE platform, and network and data management have been implemented. Finally, the full embedded system with antenna has been created. As a proof of concept, the monitoring of pressure on the top of the mainsail in a sailboat has been chosen as working example.