19 resultados para Metadata schema
Resumo:
When hosting XML information on relational backends, a mapping has to be established between the schemas of the information source and the target storage repositories. A rich body of recent literature exists for mapping isolated components of XML Schema to their relational counterparts, especially with regard to table configurations. In this paper, we present the Elixir system for designing industrial-strength mappings for real-world applications. Specifically, it produces an information-preserving holistic mapping that transforms the complete XML world-view (XML schema with constraints, XML documents XQuery queries including triggers and views) into a full-scale relational mapping (table definitions, integrity constraints, indices, triggers and views) that is tuned to the application workload. A key design feature of Elixir is that it performs all its mapping-related optimizations in the XML source space, rather than in the relational target space. Further, unlike the XML mapping tools of commercial database systems, which rely heavily on user inputs, Elixir takes a principled cost-based approach to automatically find an efficient relational mapping. A prototype of Elixir is operational and we quantitatively demonstrate its functionality and efficacy on a variety of real-life XML schemas.
Resumo:
The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: Global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data experts. One promising approach is to avoid using a global schema, and instead to develop keyword search-based data integration-where the system lazily discovers associations enabling it to join together matches to keywords, and return ranked results. The user is expected to understand the data domain and provide feedback about answers' quality. The system generalizes such feedback to learn how to correctly integrate data. A major open challenge is that under this model, the user only sees and offers feedback on a few ``top-'' results: This result set must be carefully selected to include answers of high relevance and answers that are highly informative when feedback is given on them. Existing systems merely focus on predicting relevance, by composing the scores of various schema and record matching algorithms. In this paper, we show how to predict the uncertainty associated with a query result's score, as well as how informative feedback is on a given result. We build upon these foundations to develop an active learning approach to keyword search-based data integration, and we validate the effectiveness of our solution over real data from several very different domains.
Resumo:
Background: Tuberculosis still remains one of the largest killer infectious diseases, warranting the identification of newer targets and drugs. Identification and validation of appropriate targets for designing drugs are critical steps in drug discovery, which are at present major bottle-necks. A majority of drugs in current clinical use for many diseases have been designed without the knowledge of the targets, perhaps because standard methodologies to identify such targets in a high-throughput fashion do not really exist. With different kinds of 'omics' data that are now available, computational approaches can be powerful means of obtaining short-lists of possible targets for further experimental validation. Results: We report a comprehensive in silico target identification pipeline, targetTB, for Mycobacterium tuberculosis. The pipeline incorporates a network analysis of the protein-protein interactome, a flux balance analysis of the reactome, experimentally derived phenotype essentiality data, sequence analyses and a structural assessment of targetability, using novel algorithms recently developed by us. Using flux balance analysis and network analysis, proteins critical for survival of M. tuberculosis are first identified, followed by comparative genomics with the host, finally incorporating a novel structural analysis of the binding sites to assess the feasibility of a protein as a target. Further analyses include correlation with expression data and non-similarity to gut flora proteins as well as 'anti-targets' in the host, leading to the identification of 451 high-confidence targets. Through phylogenetic profiling against 228 pathogen genomes, shortlisted targets have been further explored to identify broad-spectrum antibiotic targets, while also identifying those specific to tuberculosis. Targets that address mycobacterial persistence and drug resistance mechanisms are also analysed. Conclusion: The pipeline developed provides rational schema for drug target identification that are likely to have high rates of success, which is expected to save enormous amounts of money, resources and time in the drug discovery process. A thorough comparison with previously suggested targets in the literature demonstrates the usefulness of the integrated approach used in our study, highlighting the importance of systems-level analyses in particular. The method has the potential to be used as a general strategy for target identification and validation and hence significantly impact most drug discovery programmes.
Resumo:
In this paper we study two problems in feedback stabilization. The first is the simultaneous stabilization problem, which can be stated as follows. Given plantsG_{0}, G_{1},..., G_{l}, does there exist a single compensatorCthat stabilizes all of them? The second is that of stabilization by a stable compensator, or more generally, a "least unstable" compensator. Given a plantG, we would like to know whether or not there exists a stable compensatorCthat stabilizesG; if not, what is the smallest number of right half-place poles (counted according to their McMillan degree) that any stabilizing compensator must have? We show that the two problems are equivalent in the following sense. The problem of simultaneously stabilizingl + 1plants can be reduced to the problem of simultaneously stabilizinglplants using a stable compensator, which in turn can be stated as the following purely algebraic problem. Given2lmatricesA_{1}, ..., A_{l}, B_{1}, ..., B_{l}, whereA_{i}, B_{i}are right-coprime for alli, does there exist a matrixMsuch thatA_{i} + MB_{i}, is unimodular for alli?Conversely, the problem of simultaneously stabilizinglplants using a stable compensator can be formulated as one of simultaneously stabilizingl + 1plants. The problem of determining whether or not there exists anMsuch thatA + BMis unimodular, given a right-coprime pair (A, B), turns out to be a special case of a question concerning a matrix division algorithm in a proper Euclidean domain. We give an answer to this question, and we believe this result might be of some independent interest. We show that, given twon times mplantsG_{0} and G_{1}we can generically stabilize them simultaneously provided eithernormis greater than one. In contrast, simultaneous stabilizability, of two single-input-single-output plants, g0and g1, is not generic.
Resumo:
It has been shown that it is possible to extend the validity of the Townsend breakdown criterion for evaluating the breakdown voltages in the complete pd range in which Paschen curves are available. Evaluation of the breakdown voltages for air (pd=0.0133 to 1400 kPa · cm), N2(pd=0.0313 to 1400 kPa · cm) and SF6 (pd=0.3000 to 1200 kPa · cm) has been done and in most cases the computed values are accurate to ±3% of the measured values. The computations show that it is also possible to estimate the secondary ionization coefficient ¿ in the pd ranges mentioned above.
Resumo:
It is well known that the notions of normal forms and acyclicity capture many practical desirable properties for database schemes. The basic schema design problem is to develop design methodologies that strive toward these ideals. The usual approach is to first normalize the database scheme as far as possible. If the resulting scheme is cyclic, then one tries to transform it into an acyclic scheme. In this paper, we argue in favor of carrying out these two phases of design concurrently. In order to do this efficiently, we need to be able to incrementally analyze the acyclicity status of a database scheme as it is being designed. To this end, we propose the formalism of "binary decompositions". Using this, we characterize design sequences that exactly generate theta-acyclic schemes, for theta = agr,beta. We then show how our results can be put to use in database design. Finally, we also show that our formalism above can be effectively used as a proof tool in dependency theory. We demonstrate its power by showing that it leads to a significant simplification of the proofs of some previous results connecting sets of multivalued dependencies and acyclic join dependencies.
Resumo:
GEODERM, a microcomputer-based solid modeller, which incorporates the parametric object model, is discussed. The entity-relationship model, which is used to describe the conceptual schema of the geometric database, is also presented. Three of the four modules of GEODERM, which have been implemented are described in some detail. They are the Solid Definition Language (SDL), the Solid Manipulation Language (SML) and the User-System Interface.
Resumo:
A divide-and-correct algorithm is described for multiple-precision division in the negative base number system. In this algorithm an initial quotient estimate is obtained from suitable segmented operands; this is then corrected by simple rules to arrive at the true quotient.
Resumo:
An algorithm is described for developing a hierarchy among a set of elements having certain precedence relations. This algorithm, which is based on tracing a path through the graph, is easily implemented by a computer.
Resumo:
It is shown that a method based on the principle of analytic continuation can be used to solve a set of inhomogeneous infinite simultaneous equations encountered in the analysis of surface acoustic wave propagation along the periodically perturbed surface of a piezoelectric medium.
Resumo:
Data mining involves nontrivial process of extracting knowledge or patterns from large databases. Genetic Algorithms are efficient and robust searching and optimization methods that are used in data mining. In this paper we propose a Self-Adaptive Migration Model GA (SAMGA), where parameters of population size, the number of points of crossover and mutation rate for each population are adaptively fixed. Further, the migration of individuals between populations is decided dynamically. This paper gives a mathematical schema analysis of the method stating and showing that the algorithm exploits previously discovered knowledge for a more focused and concentrated search of heuristically high yielding regions while simultaneously performing a highly explorative search on the other regions of the search space. The effective performance of the algorithm is then shown using standard testbed functions and a set of actual classification datamining problems. Michigan style of classifier was used to build the classifier and the system was tested with machine learning databases of Pima Indian Diabetes database, Wisconsin Breast Cancer database and few others. The performance of our algorithm is better than others.
Resumo:
In this paper, we propose a self Adaptive Migration Model for Genetic Algorithms, where parameters of population size, the number of points of crossover and mutation rate for each population are fixed adaptively. Further, the migration of individuals between populations is decided dynamically. This paper gives a mathematical schema analysis of the method stating and showing that the algorithm exploits previously discovered knowledge for a more focused and concentrated search of heuristically high yielding regions while simultaneously performing a highly explorative search on the other regions of the search space. The effective performance of the algorithm is then shown using standard testbed functions, when compared with Island model GA(IGA) and Simple GA(SGA).
Resumo:
In this paper, we propose a self Adaptive Migration Model for Genetic Algorithms, where parameters of population size, the number of points of crossover and mutation rate for each population are fixed adaptively. Further, the migration of individuals between populations is decided dynamically. This paper gives a mathematical schema analysis of the method stating and showing that the algorithm exploits previously discovered knowledge for a more focused and concentrated search of heuristically high yielding regions while simultaneously performing a highly explorative search on the other regions of the search space. The effective performance of the algorithm is then shown using standard testbed functions, when compared with Island model GA(IGA) and Simple GA(SGA).
Resumo:
Many of the research institutions and universities across the world are facilitating open-access (OA) to their intellectual outputs through their respective OA institutional repositories (IRs) or through the centralized subject-based repositories. The registry of open access repositories (ROAR) lists more than 2850 such repositories across the world. The awareness about the benefits of OA to scholarly literature and OA publishing is picking up in India, too. As per the ROAR statistics, to date, there are more than 90 OA repositories in the country. India is doing particularly well in publishing open-access journals (OAJ). As per the directory of open-access journals (DOAJ), to date, India with 390 OAJs, is ranked 5th in the world in terms of numbers of OAJs being published. Much of the research done in India is reported in the journals published from India. These journals have limited readership and many of them are not being indexed by Web of Science, Scopus or other leading international abstracting and indexing databases. Consequently, research done in the country gets hidden not only from the fellow countrymen, but also from the international community. This situation can be easily overcome if all the researchers facilitate OA to their publications. One of the easiest ways to facilitate OA to scientific literature is through the institutional repositories. If every research institution and university in India set up an open-access IR and ensure that copies of the final accepted versions of all the research publications are uploaded in the IRs, then the research done in India will get far better visibility. The federation of metadata from all the distributed, interoperable OA repositories in the country will serve as a window to the research done across the country. Federation of metadata from the distributed OAI-compliant repositories can be easily achieved by setting up harvesting software like the PKP Harvester. In this paper, we share our experience in setting up a prototype metadata harvesting service using the PKP harvesting software for the OAI-compliant repositories in India.
Resumo:
Practical usage of machine learning is gaining strategic importance in enterprises looking for business intelligence. However, most enterprise data is distributed in multiple relational databases with expert-designed schema. Using traditional single-table machine learning techniques over such data not only incur a computational penalty for converting to a flat form (mega-join), even the human-specified semantic information present in the relations is lost. In this paper, we present a practical, two-phase hierarchical meta-classification algorithm for relational databases with a semantic divide and conquer approach. We propose a recursive, prediction aggregation technique over heterogeneous classifiers applied on individual database tables. The proposed algorithm was evaluated on three diverse datasets. namely TPCH, PKDD and UCI benchmarks and showed considerable reduction in classification time without any loss of prediction accuracy. (C) 2012 Elsevier Ltd. All rights reserved.