925 resultados para data-types
Resumo:
This thesis presents a study of the Grid data access patterns in distributed analysis in the CMS experiment at the LHC accelerator. This study ranges from the deep analysis of the historical patterns of access to the most relevant data types in CMS, to the exploitation of a supervised Machine Learning classification system to set-up a machinery able to eventually predict future data access patterns - i.e. the so-called dataset “popularity” of the CMS datasets on the Grid - with focus on specific data types. All the CMS workflows run on the Worldwide LHC Computing Grid (WCG) computing centers (Tiers), and in particular the distributed analysis systems sustains hundreds of users and applications submitted every day. These applications (or “jobs”) access different data types hosted on disk storage systems at a large set of WLCG Tiers. The detailed study of how this data is accessed, in terms of data types, hosting Tiers, and different time periods, allows to gain precious insight on storage occupancy over time and different access patterns, and ultimately to extract suggested actions based on this information (e.g. targetted disk clean-up and/or data replication). In this sense, the application of Machine Learning techniques allows to learn from past data and to gain predictability potential for the future CMS data access patterns. Chapter 1 provides an introduction to High Energy Physics at the LHC. Chapter 2 describes the CMS Computing Model, with special focus on the data management sector, also discussing the concept of dataset popularity. Chapter 3 describes the study of CMS data access patterns with different depth levels. Chapter 4 offers a brief introduction to basic machine learning concepts and gives an introduction to its application in CMS and discuss the results obtained by using this approach in the context of this thesis.
Resumo:
In this paper we demonstrate a refinement calculus for logic programs, which is a framework for developing logic programs from specifications. The paper is written in a tutorial-style, using a running example to illustrate how the refinement calculus is used to develop logic programs. The paper also presents an overview of some of the advanced features of the calculus, including the introduction of higher-order procedures and the refinement of abstract data types.
Resumo:
O desenvolvimento de software orientado a modelos defende a utilização dos modelos como um artefacto que participa activamente no processo de desenvolvimento. O modelo ocupa uma posição que se encontra ao mesmo nível do código. Esta é uma abordagem importante que tem sido alvo de atenção crescente nos últimos tempos. O Object Management Group (OMG) é o responsável por uma das principais especificações utilizadas na definição da arquitectura dos sistemas cujo desenvolvimento é orientado a modelos: o Model Driven Architecture (MDA). Os projectos que têm surgido no âmbito da modelação e das linguagens específicas de domínio para a plataforma Eclipse são um bom exemplo da atenção dada a estas áreas. São projectos totalmente abertos à comunidade, que procuram respeitar os standards e que constituem uma excelente oportunidade para testar e por em prática novas ideias e abordagens. Nesta dissertação foram usadas ferramentas criadas no âmbito do Amalgamation Project, desenvolvido para a plataforma Eclipse. Explorando o UML e usando a linguagem QVT, desenvolveu-se um processo automático para extrair elementos da arquitectura do sistema a partir da definição de requisitos. Os requisitos são representados por modelos UML que são transformados de forma a obter elementos para uma aproximação inicial à arquitectura do sistema. No final, obtêm-se um modelo UML que agrega os componentes, interfaces e tipos de dados extraídos a partir dos modelos dos requisitos. É uma abordagem orientada a modelos que mostrou ser exequível, capaz de oferecer resultados práticos e promissora no que concerne a trabalho futuro.
Resumo:
This paper describes a communication model to integrate repositories of programming problems with other e-Learning software components. The motivation for this work comes from the EduJudge project that aims to connect an existing repository of programming problems to learning management systems. When trying to use the existing repositories of learning objects we realized that they are mainly specialized search engines and lack features for integration with other e-Learning systems. With this model we intend to clarify the main features of a programming problem repository, in order to enable the design and development of software components that use it. The two main points of this model are the definition of programming problems as learning objects and the definition of the core functions exposed by the repository. In both cases, this model follows the existing specifications of the IMS standard and proposes extensions to deal with the special requirements of automatic evaluation and grading of programming exercises. In the definition of programming problems as learning objects we introduced a new schema for meta-data. This schema is used to represent meta-data related to automatic evaluation that cannot be conveniently represented using the standard: the type of automatic evaluation; the requirements of the evaluation engine; or the roles of different assets - tests cases, program solutions, etc. In the definition of the core functions we used two different web services flavours - SOAP and REST - and described each function as an operation for each type of interface. We describe also the data types of the arguments of each operation. These data types consist mainly on learning objects and their identifications, but include also usage reports and queries using XQuery.
Resumo:
XML Schema is one of the most used specifications for defining types of XML documents. It provides an extensive set of primitive data types, ways to extend and reuse definitions and an XML syntax that simplifies automatic manipulation. However, many features that make XML Schema Definitions (XSD) so interesting also make them rather cumbersome to read. Several tools to visualize and browse schema definitions have been proposed to cope with this issue. The novel approach proposed in this paper is to base XSD visualization and navigation on the XML document itself, using solely the web browser, without requiring a pre-processing step or an intermediate representation. We present the design and implementation of a web-based XML Schema browser called schem@Doc that operates over the XSD file itself. With this approach, XSD visualization is synchronized with the source file and always reflects its current state. This tool fits well in the schema development process and is easy to integrate in web repositories containing large numbers of XSD files.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Information systems are widespread and used by anyone with computing devices as well as corporations and governments. It is often the case that security leaks are introduced during the development of an application. Reasons for these security bugs are multiple but among them one can easily identify that it is very hard to define and enforce relevant security policies in modern software. This is because modern applications often rely on container sharing and multi-tenancy where, for instance, data can be stored in the same physical space but is logically mapped into different security compartments or data structures. In turn, these security compartments, to which data is classified into in security policies, can also be dynamic and depend on runtime data. In this thesis we introduce and develop the novel notion of dependent information flow types, and focus on the problem of ensuring data confidentiality in data-centric software. Dependent information flow types fit within the standard framework of dependent type theory, but, unlike usual dependent types, crucially allow the security level of a type, rather than just the structural data type itself, to depend on runtime values. Our dependent function and dependent sum information flow types provide a direct, natural and elegant way to express and enforce fine grained security policies on programs. Namely programs that manipulate structured data types in which the security level of a structure field may depend on values dynamically stored in other fields The main contribution of this work is an efficient analysis that allows programmers to verify, during the development phase, whether programs have information leaks, that is, it verifies whether programs protect the confidentiality of the information they manipulate. As such, we also implemented a prototype typechecker that can be found at http://ctp.di.fct.unl.pt/DIFTprototype/.
Resumo:
BACKGROUND: DNA sequence integrity, mRNA concentrations and protein-DNA interactions have been subject to genome-wide analyses based on microarrays with ever increasing efficiency and reliability over the past fifteen years. However, very recently novel technologies for Ultra High-Throughput DNA Sequencing (UHTS) have been harnessed to study these phenomena with unprecedented precision. As a consequence, the extensive bioinformatics environment available for array data management, analysis, interpretation and publication must be extended to include these novel sequencing data types. DESCRIPTION: MIMAS was originally conceived as a simple, convenient and local Microarray Information Management and Annotation System focused on GeneChips for expression profiling studies. MIMAS 3.0 enables users to manage data from high-density oligonucleotide SNP Chips, expression arrays (both 3'UTR and tiling) and promoter arrays, BeadArrays as well as UHTS data using MIAME-compliant standardized vocabulary. Importantly, researchers can export data in MAGE-TAB format and upload them to the EBI's ArrayExpress certified data repository using a one-step procedure. CONCLUSION: We have vastly extended the capability of the system such that it processes the data output of six types of GeneChips (Affymetrix), two different BeadArrays for mRNA and miRNA (Illumina) and the Genome Analyzer (a popular Ultra-High Throughput DNA Sequencer, Illumina), without compromising on its flexibility and user-friendliness. MIMAS, appropriately renamed into Multiomics Information Management and Annotation System, is currently used by scientists working in approximately 50 academic laboratories and genomics platforms in Switzerland and France. MIMAS 3.0 is freely available via http://multiomics.sourceforge.net/.
Resumo:
Background: The analysis and usage of biological data is hindered by the spread of information across multiple repositories and the difficulties posed by different nomenclature systems and storage formats. In particular, there is an important need for data unification in the study and use of protein-protein interactions. Without good integration strategies, it is difficult to analyze the whole set of available data and its properties.Results: We introduce BIANA (Biologic Interactions and Network Analysis), a tool for biological information integration and network management. BIANA is a Python framework designed to achieve two major goals: i) the integration of multiple sources of biological information, including biological entities and their relationships, and ii) the management of biological information as a network where entities are nodes and relationships are edges. Moreover, BIANA uses properties of proteins and genes to infer latent biomolecular relationships by transferring edges to entities sharing similar properties. BIANA is also provided as a plugin for Cytoscape, which allows users to visualize and interactively manage the data. A web interface to BIANA providing basic functionalities is also available. The software can be downloaded under GNU GPL license from http://sbi.imim.es/web/BIANA.php.Conclusions: BIANA's approach to data unification solves many of the nomenclature issues common to systems dealing with biological data. BIANA can easily be extended to handle new specific data repositories and new specific data types. The unification protocol allows BIANA to be a flexible tool suitable for different user requirements: non-expert users can use a suggested unification protocol while expert users can define their own specific unification rules.
Resumo:
We present an open-source ITK implementation of a directFourier method for tomographic reconstruction, applicableto parallel-beam x-ray images. Direct Fourierreconstruction makes use of the central-slice theorem tobuild a polar 2D Fourier space from the 1D transformedprojections of the scanned object, that is resampled intoa Cartesian grid. Inverse 2D Fourier transform eventuallyyields the reconstructed image. Additionally, we providea complex wrapper to the BSplineInterpolateImageFunctionto overcome ITKâeuro?s current lack for image interpolatorsdealing with complex data types. A sample application ispresented and extensively illustrated on the Shepp-Loganhead phantom. We show that appropriate input zeropaddingand 2D-DFT oversampling rates together with radial cubicb-spline interpolation improve 2D-DFT interpolationquality and are efficient remedies to reducereconstruction artifacts.
Resumo:
Because of the increase in workplace automation and the diversification of industrial processes, workplaces have become more and more complex. The classical approaches used to address workplace hazard concerns, such as checklists or sequence models, are, therefore, of limited use in such complex systems. Moreover, because of the multifaceted nature of workplaces, the use of single-oriented methods, such as AEA (man oriented), FMEA (system oriented), or HAZOP (process oriented), is not satisfactory. The use of a dynamic modeling approach in order to allow multiple-oriented analyses may constitute an alternative to overcome this limitation. The qualitative modeling aspects of the MORM (man-machine occupational risk modeling) model are discussed in this article. The model, realized on an object-oriented Petri net tool (CO-OPN), has been developed to simulate and analyze industrial processes in an OH&S perspective. The industrial process is modeled as a set of interconnected subnets (state spaces), which describe its constitutive machines. Process-related factors are introduced, in an explicit way, through machine interconnections and flow properties. While man-machine interactions are modeled as triggering events for the state spaces of the machines, the CREAM cognitive behavior model is used in order to establish the relevant triggering events. In the CO-OPN formalism, the model is expressed as a set of interconnected CO-OPN objects defined over data types expressing the measure attached to the flow of entities transiting through the machines. Constraints on the measures assigned to these entities are used to determine the state changes in each machine. Interconnecting machines implies the composition of such flow and consequently the interconnection of the measure constraints. This is reflected by the construction of constraint enrichment hierarchies, which can be used for simulation and analysis optimization in a clear mathematical framework. The use of Petri nets to perform multiple-oriented analysis opens perspectives in the field of industrial risk management. It may significantly reduce the duration of the assessment process. But, most of all, it opens perspectives in the field of risk comparisons and integrated risk management. Moreover, because of the generic nature of the model and tool used, the same concepts and patterns may be used to model a wide range of systems and application fields.
Resumo:
Application of semi-distributed hydrological models to large, heterogeneous watersheds deals with several problems. On one hand, the spatial and temporal variability in catchment features should be adequately represented in the model parameterization, while maintaining the model complexity in an acceptable level to take advantage of state-of-the-art calibration techniques. On the other hand, model complexity enhances uncertainty in adjusted model parameter values, therefore increasing uncertainty in the water routing across the watershed. This is critical for water quality applications, where not only streamflow, but also a reliable estimation of the surface versus subsurface contributions to the runoff is needed. In this study, we show how a regularized inversion procedure combined with a multiobjective function calibration strategy successfully solves the parameterization of a complex application of a water quality-oriented hydrological model. The final value of several optimized parameters showed significant and consistentdifferences across geological and landscape features. Although the number of optimized parameters was significantly increased by the spatial and temporal discretization of adjustable parameters, the uncertainty in water routing results remained at reasonable values. In addition, a stepwise numerical analysis showed that the effects on calibration performance due to inclusion of different data types in the objective function could be inextricably linked. Thus caution should be taken when adding or removing data from an aggregated objective function.
Resumo:
Recent years have seen a significant increase in understanding of the host genetic and genomic determinants of susceptibility to HIV-1 infection and disease progression, driven in large part by candidate gene studies, genome-wide association studies, genome-wide transcriptome analyses, and large-scale in vitro genome screens. These studies have identified common variants in some host loci that clearly influence disease progression, characterized the scale and dynamics of gene and protein expression changes in response to infection, and provided the first comprehensive catalogs of genes and pathways involved in viral replication. Experimental models of AIDS and studies in natural hosts of primate lentiviruses have complemented and in some cases extended these findings. As the relevant technology continues to progress, the expectation is that such studies will increase in depth (e.g., to include host whole exome and whole genome sequencing) and in breadth (in particular, by integrating multiple data types).
Resumo:
The integrated system of design for manufacturing and assembly (DFMA) and internet based collaborative design are presented to support product design, manufacturing process, and assembly planning for axial eccentric oil-pump design. The presented system manages and schedules group oriented collaborative activities. The design guidelines of internet based collaborative design & DFMA are expressed. The components and the manufacturing stages of axial eccentric oil-pump are expressed in detail. The file formats of the presented system include the data types of collaborative design of the product, assembly design, assembly planning and assembly system design. Product design and assembly planning can be operated synchronously and intelligently and they are integrated under the condition of internet based collaborative design and DFMA. The technologies of collaborative modelling, collaborative manufacturing, and internet based collaborative assembly for the specific pump construction are developed. A seven-security level is presented to ensure the security of the internet based collaborative design system.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014