889 resultados para XML, Schema matching
Resumo:
In this paper, we classify, review, and experimentally compare major methods that are exploited in the definition, adoption, and utilization of element similarity measures in the context of XML schema matching. We aim at presenting a unified view which is useful when developing a new element similarity measure, when implementing an XML schema matching component, when using an XML schema matching system, and when comparing XML schema matching systems.
Resumo:
Authors from Burrough (1992) to Heuvelink et al. (2007) have highlighted the importance of GIS frameworks which can handle incomplete knowledge in data inputs, in decision rules and in the geometries and attributes modelled. It is particularly important for this uncertainty to be characterised and quantified when GI data is used for spatial decision making. Despite a substantial and valuable literature on means of representing and encoding uncertainty and its propagation in GI (e.g.,Hunter and Goodchild 1993; Duckham et al. 2001; Couclelis 2003), no framework yet exists to describe and communicate uncertainty in an interoperable way. This limits the usability of Internet resources of geospatial data, which are ever-increasing, based on specifications that provide frameworks for the ‘GeoWeb’ (Botts and Robin 2007; Cox 2006). In this paper we present UncertML, an XML schema which provides a framework for describing uncertainty as it propagates through many applications, including online risk management chains. This uncertainty description ranges from simple summary statistics (e.g., mean and variance) to complex representations such as parametric, multivariate distributions at each point of a regular grid. The philosophy adopted in UncertML is that all data values are inherently uncertain, (i.e., they are random variables, rather than values with defined quality metadata).
Resumo:
In the last few years we have observed a proliferation of approaches for clustering XML docu- ments and schemas based on their structure and content. The presence of such a huge amount of approaches is due to the different applications requiring the XML data to be clustered. These applications need data in the form of similar contents, tags, paths, structures and semantics. In this paper, we first outline the application contexts in which clustering is useful, then we survey approaches so far proposed relying on the abstract representation of data (instances or schema), on the identified similarity measure, and on the clustering algorithm. This presentation leads to draw a taxonomy in which the current approaches can be classified and compared. We aim at introducing an integrated view that is useful when comparing XML data clustering approaches, when developing a new clustering algorithm, and when implementing an XML clustering compo- nent. Finally, the paper moves into the description of future trends and research issues that still need to be faced.
Resumo:
The continuous growth of the XML data poses a great concern in the area of XML data management. The need for processing large amounts of XML data brings complications to many applications, such as information retrieval, data integration and many others. One way of simplifying this problem is to break the massive amount of data into smaller groups by application of clustering techniques. However, XML clustering is an intricate task that may involve the processing of both the structure and the content of XML data in order to identify similar XML data. This research presents four clustering methods, two methods utilizing the structure of XML documents and the other two utilizing both the structure and the content. The two structural clustering methods have different data models. One is based on a path model and other is based on a tree model. These methods employ rigid similarity measures which aim to identifying corresponding elements between documents with different or similar underlying structure. The two clustering methods that utilize both the structural and content information vary in terms of how the structure and content similarity are combined. One clustering method calculates the document similarity by using a linear weighting combination strategy of structure and content similarities. The content similarity in this clustering method is based on a semantic kernel. The other method calculates the distance between documents by a non-linear combination of the structure and content of XML documents using a semantic kernel. Empirical analysis shows that the structure-only clustering method based on the tree model is more scalable than the structure-only clustering method based on the path model as the tree similarity measure for the tree model does not need to visit the parents of an element many times. Experimental results also show that the clustering methods perform better with the inclusion of the content information on most test document collections. To further the research, the structural clustering method based on tree model is extended and employed in XML transformation. The results from the experiments show that the proposed transformation process is faster than the traditional transformation system that translates and converts the source XML documents sequentially. Also, the schema matching process of XML transformation produces a better matching result in a shorter time.
Resumo:
When hosting XML information on relational backends, a mapping has to be established between the schemas of the information source and the target storage repositories. A rich body of recent literature exists for mapping isolated components of XML Schema to their relational counterparts, especially with regard to table configurations. In this paper, we present the Elixir system for designing industrial-strength mappings for real-world applications. Specifically, it produces an information-preserving holistic mapping that transforms the complete XML world-view (XML schema with constraints, XML documents XQuery queries including triggers and views) into a full-scale relational mapping (table definitions, integrity constraints, indices, triggers and views) that is tuned to the application workload. A key design feature of Elixir is that it performs all its mapping-related optimizations in the XML source space, rather than in the relational target space. Further, unlike the XML mapping tools of commercial database systems, which rely heavily on user inputs, Elixir takes a principled cost-based approach to automatically find an efficient relational mapping. A prototype of Elixir is operational and we quantitatively demonstrate its functionality and efficacy on a variety of real-life XML schemas.
Resumo:
Matching query interfaces is a crucial step in data integration across multiple Web databases. The problem is closely related to schema matching that typically exploits different features of schemas. Relying on a particular feature of schemas is not suffcient. We propose an evidential approach to combining multiple matchers using Dempster-Shafer theory of evidence. First, our approach views the match results of an individual matcher as a source of evidence that provides a level of confidence on the validity of each candidate attribute correspondence. Second, it combines multiple sources of evidence to get a combined mass function that represents the overall level of confidence, taking into account the match results of different matchers. Our combination mechanism does not require use of weighing parameters, hence no setting and tuning of them is needed. Third, it selects the top k attribute correspondences of each source attribute from the target schema based on the combined mass function. Finally it uses some heuristics to resolve any conflicts between the attribute correspondences of different source attributes. Our experimental results show that our approach is highly accurate and effective.
Resumo:
Dans cette thèse, nous présentons les problèmes d’échange de documents d'affaires et proposons une méthode pour y remédier. Nous proposons une méthodologie pour adapter les standards d’affaires basés sur XML aux technologies du Web sémantique en utilisant la transformation des documents définis en DTD ou XML Schema vers une représentation ontologique en OWL 2. Ensuite, nous proposons une approche basée sur l'analyse formelle de concept pour regrouper les classes de l'ontologie partageant une certaine sémantique dans le but d'améliorer la qualité, la lisibilité et la représentation de l'ontologie. Enfin, nous proposons l’alignement d'ontologies pour déterminer les liens sémantiques entre les ontologies d'affaires hétérogènes générés par le processus de transformation pour aider les entreprises à communiquer fructueusement.
Resumo:
A día de hoy, XML (Extensible Markup Language) es uno de los formatos más utilizados para el intercambio y almacenamiento de información estructurada en la World Wide Web. Es habitual que las aplicaciones que utilizan archivos XML presupongan en ellos una estructura determinada, pudiendo producirse errores si se intentase emplear documentos que no la cumplan. A fin de poder expresar este tipo de limitaciones y poder verificar que un documento las cumple, se definió en el mismo estándar XML el DTD, si bien pronto se mostró bastante limitado en cuanto a su capacidad expresiva. Es por este motivo que se decidió crear el XML Schema, un lenguaje XML para definir qué estructura deben tener otros documentos XML. Contar con un esquema tiene múltiples ventajas, siendo la principal de ellas el poder validar documentos contra él para comprobar si su estructura es correcta u otras como la generación automática de código. Sin embargo, definir una estructura común a varios documentos XML de una manera óptima puede convertirse en una tarea ardua si se hace de manera manual. Este problema puede salvarse contando con una herramienta que automatice el proceso de creación de dichos XSDs. En este proyecto, desarrollaremos una herramienta en Java que, a partir de una serie de documentos XML de entrada, inferirá automáticamente un esquema contra el que validen todos ellos, expresando su estructura de manera completa y concisa. Dicha herramienta permitirá elegir varios parámetros de inferencia, a fin de que el esquema generado se adapte lo más posible a los propósitos del usuario. Esta herramienta generará también una serie de estadísticas adicionales, que permitirán conocer más información sobre los ficheros de entrada.
Resumo:
Schema heterogeneity issues often represent an obstacle for discovering coreference links between individuals in semantic data repositories. In this paper we present an approach, which performs ontology schema matching in order to improve instance coreference resolution performance. A novel feature of the approach is its use of existing instance-level coreference links defined in third-party repositories as background knowledge for schema matching techniques. In our tests of this approach we obtained encouraging results, in particular, a substantial increase in recall in comparison with existing sets of coreference links.
Resumo:
In the global strategy for preservation genetic resources of farm animals the implementation of information technology is of great importance. In this regards platform independent information tools and approaches for data exchange are needed in order to obtain aggregate values for regions and countries of spreading a separate breed. The current paper presents a XML based solution for data exchange in management genetic resources of farm animals’ small populations. There are specific requirements to the exchanged documents that come from the goal of data analysis. Three main types of documents are distinguished and their XML formats are discussed. DTD and XML Schema for each type are suggested. Some examples of XML documents are given also.