824 resultados para decentralised data fusion framework
Resumo:
The recently announced Higgs boson discovery marks the dawn of the direct probing of the electroweak symmetry breaking sector. Sorting out the dynamics responsible for electroweak symmetry breaking now requires probing the Higgs boson interactions and searching for additional states connected to this sector. In this work, we analyze the constraints on Higgs boson couplings to the standard model gauge bosons using the available data from Tevatron and LHC. We work in a model-independent framework expressing the departure of the Higgs boson couplings to gauge bosons by dimension-six operators. This allows for independent modifications of its couplings to gluons, photons, and weak gauge bosons while still preserving the Standard Model (SM) gauge invariance. Our results indicate that best overall agreement with data is obtained if the cross section of Higgs boson production via gluon fusion is suppressed with respect to its SM value and the Higgs boson branching ratio into two photons is enhanced, while keeping the production and decays associated to couplings to weak gauge bosons close to their SM prediction.
Resumo:
In multi-label classification, examples can be associated with multiple labels simultaneously. The task of learning from multi-label data can be addressed by methods that transform the multi-label classification problem into several single-label classification problems. The binary relevance approach is one of these methods, where the multi-label learning task is decomposed into several independent binary classification problems, one for each label in the set of labels, and the final labels for each example are determined by aggregating the predictions from all binary classifiers. However, this approach fails to consider any dependency among the labels. Aiming to accurately predict label combinations, in this paper we propose a simple approach that enables the binary classifiers to discover existing label dependency by themselves. An experimental study using decision trees, a kernel method as well as Naive Bayes as base-learning techniques shows the potential of the proposed approach to improve the multi-label classification performance.
Resumo:
Abstract Background One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements. Results A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology. Conclusion The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.
Resumo:
Abstract Background Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. Results Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. Conclusion Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.
Resumo:
OBJECTIVE: To evaluate tools for the fusion of images generated by tomography and structural and functional magnetic resonance imaging. METHODS: Magnetic resonance and functional magnetic resonance imaging were performed while a volunteer who had previously undergone cranial tomography performed motor and somatosensory tasks in a 3-Tesla scanner. Image data were analyzed with different programs, and the results were compared. RESULTS: We constructed a flow chart of computational processes that allowed measurement of the spatial congruence between the methods. There was no single computational tool that contained the entire set of functions necessary to achieve the goal. CONCLUSION: The fusion of the images from the three methods proved to be feasible with the use of four free-access software programs (OsiriX, Register, MRIcro and FSL). Our results may serve as a basis for building software that will be useful as a virtual tool prior to neurosurgery.
Resumo:
With the increasing production of information from e-government initiatives, there is also the need to transform a large volume of unstructured data into useful information for society. All this information should be easily accessible and made available in a meaningful and effective way in order to achieve semantic interoperability in electronic government services, which is a challenge to be pursued by governments round the world. Our aim is to discuss the context of e-Government Big Data and to present a framework to promote semantic interoperability through automatic generation of ontologies from unstructured information found in the Internet. We propose the use of fuzzy mechanisms to deal with natural language terms and present some related works found in this area. The results achieved in this study are based on the architectural definition and major components and requirements in order to compose the proposed framework. With this, it is possible to take advantage of the large volume of information generated from e-Government initiatives and use it to benefit society.
Resumo:
In electronic commerce, systems development is based on two fundamental types of models, business models and process models. A business model is concerned with value exchanges among business partners, while a process model focuses on operational and procedural aspects of business communication. Thus, a business model defines the what in an e-commerce system, while a process model defines the how. Business process design can be facilitated and improved by a method for systematically moving from a business model to a process model. Such a method would provide support for traceability, evaluation of design alternatives, and seamless transition from analysis to realization. This work proposes a unified framework that can be used as a basis to analyze, to interpret and to understand different concepts associated at different stages in e-Commerce system development. In this thesis, we illustrate how UN/CEFACT’s recommended metamodels for business and process design can be analyzed, extended and then integrated for the final solutions based on the proposed unified framework. Also, as an application of the framework, we demonstrate how process-modeling tasks can be facilitated in e-Commerce system design. The proposed methodology, called BP3 stands for Business Process Patterns Perspective. The BP3 methodology uses a question-answer interface to capture different business requirements from the designers. It is based on pre-defined process patterns, and the final solution is generated by applying the captured business requirements by means of a set of production rules to complete the inter-process communication among these patterns.
Resumo:
[EN] Programming software for controlling robotic systems in order to built working systems that perform adequately according to their design requirements remains being a task that requires an important development effort. Currently, there are no clear programming paradigms for programming robotic systems, and the programming techniques which are of common use today are not adequate to deal with the complexity associated with these systems. The work presented in this document describes a programming tool, concretely a framework, that must be considered as a first step to devise a tool for dealing with the complexity present in robotics systems. In this framework the software that controls a system is viewed as a dynamic network of units of execution inter-connected by means of data paths. Each one of these units of execution, called a component, is a port automaton which provides a given functionality, hidden behind an external interface specifying clearly which data it needs and which data it produces. Components, once defined and built, may be instantiated, integrated and used as many times as needed in other systems. The framework provides the infrastructure necessary to support this concept for components and the inter communication between them by means of data paths (port connections) which can be established and de-established dynamically. Moreover, and considering that the more robust components that conform a system are, the more robust the system is, the framework provides the necessary infrastructure to control and monitor the components than integrate a system at any given instant of time.
Resumo:
[EN] We discuss the processing of data recorded with multimonochromatic x-ray imagers (MMI) in inertial confinement fusion experiments. The MMI records hundreds of gated, spectrally resolved images that can be used to unravel the spatial structure of the implosion core. In particular, we present a new method to determine the centers in all the array of images, a better reconstruction technique of narrowband implosion core images, two algorithms to determine the shape and size of the implosion core volume based on reconstructed broadband images recorded along three-quasiorthogonal lines of sight, and the removal of artifacts from the space-integrated spectra.
Resumo:
In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
The Gaia space mission is a major project for the European astronomical community. As challenging as it is, the processing and analysis of the huge data-flow incoming from Gaia is the subject of thorough study and preparatory work by the DPAC (Data Processing and Analysis Consortium), in charge of all aspects of the Gaia data reduction. This PhD Thesis was carried out in the framework of the DPAC, within the team based in Bologna. The task of the Bologna team is to define the calibration model and to build a grid of spectro-photometric standard stars (SPSS) suitable for the absolute flux calibration of the Gaia G-band photometry and the BP/RP spectrophotometry. Such a flux calibration can be performed by repeatedly observing each SPSS during the life-time of the Gaia mission and by comparing the observed Gaia spectra to the spectra obtained by our ground-based observations. Due to both the different observing sites involved and the huge amount of frames expected (≃100000), it is essential to maintain the maximum homogeneity in data quality, acquisition and treatment, and a particular care has to be used to test the capabilities of each telescope/instrument combination (through the “instrument familiarization plan”), to devise methods to keep under control, and eventually to correct for, the typical instrumental effects that can affect the high precision required for the Gaia SPSS grid (a few % with respect to Vega). I contributed to the ground-based survey of Gaia SPSS in many respects: with the observations, the instrument familiarization plan, the data reduction and analysis activities (both photometry and spectroscopy), and to the maintenance of the data archives. However, the field I was personally responsible for was photometry and in particular relative photometry for the production of short-term light curves. In this context I defined and tested a semi-automated pipeline which allows for the pre-reduction of imaging SPSS data and the production of aperture photometry catalogues ready to be used for further analysis. A series of semi-automated quality control criteria are included in the pipeline at various levels, from pre-reduction, to aperture photometry, to light curves production and analysis.
Resumo:
This PhD thesis concerns geochemical constraints on recycling and partial melting of Archean continental crust. A natural example of such processes was found in the Iisalmi area of Central Finland. The rocks from this area are Middle to Late Archean in age and experienced metamorphism and partial melting between 2.7-2.63 Ga. The work is based on extensive field work. It is furthermore founded on bulk rock geochemical data as well as in-situ analyses of minerals. All geochemical data were obtained at the Institute of Geosciences, University of Mainz using X-ray fluorescence, solution ICP-MS and laser ablation-ICP-MS for bulk rock geochemical analyses. Mineral analyses were accomplished by electron microprobe and laser ablation ICP-MS. Fluid inclusions were studied by microscope on a heating-freezing-stage at the Geoscience Center, University Göttingen. Part I focuses on the development of a new analytical method for bulk rock trace element determination by laser ablation-ICP-MS using homogeneous glasses fused from rock powder on an Iridium strip heater. This method is applicable for mafic rock samples whose melts have low viscosities and homogenize quickly at temperatures of ~1200°C. Highly viscous melts of felsic samples prevent melting and homogenization at comparable temperatures. Fusion of felsic samples can be enabled by addition of MgO to the rock powder and adjustment of melting temperature and melting duration to the rock composition. Advantages of the fusion method are low detection limits compared to XRF analyses and avoidance of wet-chemical processing and use of strong acids as in solution ICP-MS as well as smaller sample volumes compared to the other methods. Part II of the thesis uses bulk rock geochemical data and results from fluid inclusion studies for discrimination of melting processes observed in different rock types. Fluid inclusion studies demonstrate a major change in fluid composition from CO2-dominated fluids in granulites to aqueous fluids in TTG gneisses and amphibolites. Partial melts were generated in the dry, CO2-rich environment by dehydration melting reactions of amphibole which in addition to tonalitic melts produced the anhydrous mineral assemblages of granulites (grt + cpx + pl ± amph or opx + cpx + pl + amph). Trace element modeling showed that mafic granulites are residues of 10-30 % melt extraction from amphibolitic precursor rocks. The maximum degree of melting in intermediate granulites was ~10 % as inferred from modal abundances of amphibole, clinopyroxene and orthopyroxene. Carbonic inclusions are absent in upper-amphibolite facies migmatites whereas aqueous inclusion with up to 20 wt% NaCl are abundant. This suggests that melting within TTG gneisses and amphibolites took place in the presence of an aqueous fluid phase that enabled melting at the wet solidus at temperatures of 700-750°C. The strong disruption of pre-metamorphic structures in some outcrops suggests that the maximum amount of melt in TTG gneisses was ~25 vol%. The presence of leucosomes in all rock types is taken as the principle evidence for melt formation. However, mineralogical appearance as well as major and trace element composition of many leucosomes imply that leucosomes seldom represent frozen in-situ melts. They are better considered as remnants of the melt channel network, e.g. ways on which melts escaped from the system. Part III of the thesis describes how analyses of minerals from a specific rock type (granulite) can be used to determine partition coefficients between different minerals and between minerals and melt suitable for lower crustal conditions. The trace element analyses by laser ablation-ICP-MS show coherent distribution among the principal mineral phases independent of rock composition. REE contents in amphibole are about 3 times higher than REE contents in clinopyroxene from the same sample. This consistency has to be taken into consideration in models of lower crustal melting where amphibole is replaced by clinopyroxene in the course of melting. A lack of equilibrium is observed between matrix clinopyroxene / amphibole and garnet porphyroblasts which suggests a late stage growth of garnet and slow diffusion and equilibration of the REE during metamorphism. The data provide a first set of distribution coefficients of the transition metals (Sc, V, Cr, Ni) in the lower crust. In addition, analyses of ilmenite and apatite demonstrate the strong influence of accessory phases on trace element distribution. Apatite contains high amounts of REE and Sr while ilmenite incorporates about 20-30 times higher amounts of Nb and Ta than amphibole. Furthermore, trace element mineral analyses provide evidence for magmatic processes such as melt depletion, melt segregation, accumulation and fractionation as well as metasomatism having operated in this high-grade anatectic area.
Resumo:
The aim of this work is to contribute to the development of new multifunctional nanocarriers for improved encapsulation and delivery of anticancer and antiviral drugs. The work focused on water soluble and biocompatible oligosaccharides, the cyclodextrins (CyDs), and a new family of nanostructured, biodegradable carrier materials made of porous metal-organic frameworks (nanoMOFs). The drugs of choice were the anticancer doxorubicin (DOX), azidothymidine (AZT) and its phosphate derivatives and artemisinin (ART). DOX possesses a pharmacological drawback due to its self-aggregation tendency in water. The non covalent binding of DOX to a series of CyD derivatives, such as g-CyD, an epichlorohydrin crosslinked b-CyD polymer (pb-CyD) and a citric acid crosslinked g-CyD polymer (pg-CyD) was studied by UV visible absorption, circular dichroism and fluorescence. Multivariate global analysis of multiwavelength data from spectroscopic titrations allowed identification and characterization of the stable complexes. pg-CyD proved to be the best carrier showing both high association constants and ability to monomerize DOX. AZT is an important antiretroviral drug. The active form is AZT-triphosphate (AZT-TP), formed in metabolic paths of low efficiency. Direct administration of AZT-TP is limited by its poor stability in biological media. So the development of suitable carriers is highly important. In this context we studied the binding of some phosphorilated derivatives to nanoMOFs by spectroscopic methods. The results obtained with iron(III)-trimesate nanoMOFs allowed to prove that the binding of these drugs mainly occurs by strong iono-covalent bonds to iron(III) centers. On the basis of these and other results obtained in partner laboratories, it was possible to propose this highly versatile and “green” carrier system for delivery of phosphorylated nucleoside analogues. The interaction of DOX with nanoMOFs was also studied. Finally the binding of the antimalarial drug, artemisinin (ART) with two cyclodextrin-based carriers,the pb-CyD and a light responsive bis(b-CyD) host, was also studied.
Resumo:
The research aims at developing a framework for semantic-based digital survey of architectural heritage. Rooted in knowledge-based modeling which extracts mathematical constraints of geometry from architectural treatises, as-built information of architecture obtained from image-based modeling is integrated with the ideal model in BIM platform. The knowledge-based modeling transforms the geometry and parametric relation of architectural components from 2D printings to 3D digital models, and create large amount variations based on shape grammar in real time thanks to parametric modeling. It also provides prior knowledge for semantically segmenting unorganized survey data. The emergence of SfM (Structure from Motion) provides access to reconstruct large complex architectural scenes with high flexibility, low cost and full automation, but low reliability of metric accuracy. We solve this problem by combing photogrammetric approaches which consists of camera configuration, image enhancement, and bundle adjustment, etc. Experiments show the accuracy of image-based modeling following our workflow is comparable to that from range-based modeling. We also demonstrate positive results of our optimized approach in digital reconstruction of portico where low-texture-vault and dramatical transition of illumination bring huge difficulties in the workflow without optimization. Once the as-built model is obtained, it is integrated with the ideal model in BIM platform which allows multiple data enrichment. In spite of its promising prospect in AEC industry, BIM is developed with limited consideration of reverse-engineering from survey data. Besides representing the architectural heritage in parallel ways (ideal model and as-built model) and comparing their difference, we concern how to create as-built model in BIM software which is still an open area to be addressed. The research is supposed to be fundamental for research of architectural history, documentation and conservation of architectural heritage, and renovation of existing buildings.