848 resultados para XML (SEMI-STRUCTURED) DATA
Resumo:
We consider the problem of resource selection in clustered Peer-to-Peer Information Retrieval (P2P IR) networks with cooperative peers. The clustered P2P IR framework presents a significant departure from general P2P IR architectures by employing clustering to ensure content coherence between resources at the resource selection layer, without disturbing document allocation. We propose that such a property could be leveraged in resource selection by adapting well-studied and popular inverted lists for centralized document retrieval. Accordingly, we propose the Inverted PeerCluster Index (IPI), an approach that adapts the inverted lists, in a straightforward manner, for resource selection in clustered P2P IR. IPI also encompasses a strikingly simple peer-specific scoring mechanism that exploits the said index for resource selection. Through an extensive empirical analysis on P2P IR testbeds, we establish that IPI competes well with the sophisticated state-of-the-art methods in virtually every parameter of interest for the resource selection task, in the context of clustered P2P IR.
Resumo:
Computers employing some degree of data flow organisation are now well established as providing a possible vehicle for concurrent computation. Although data-driven computation frees the architecture from the constraints of the single program counter, processor and global memory, inherent in the classic von Neumann computer, there can still be problems with the unconstrained generation of fresh result tokens if a pure data flow approach is adopted. The advantages of allowing serial processing for those parts of a program which are inherently serial, and of permitting a demand-driven, as well as data-driven, mode of operation are identified and described. The MUSE machine described here is a structured architecture supporting both serial and parallel processing which allows the abstract structure of a program to be mapped onto the machine in a logical way.
Resumo:
Tutkimuksen tarkoituksena on selvittää, millainen vaikutus yrityskuvalla on yrityksen rekrytointiprosessiin. Yrityskuva- ja rekrytointikirjallisuuden pohjalta luotiin tutkimukselle viitekehys, jonka pohjalta yrityskuvan ja rekrytointiprosessin välistä suhdetta analysoitiin. Tutkimuksen empiirinen osa toteutettiin tapaustutkimuksena kvalitatiivisia tutkimusmenetelmiä hyödyntäen. Kohdeyrityksenä toimi Gigantti Oy Ab, ja tiedonkeruu toteutettiin puolistrukturoiduilla teemahaastatteluilla yhteensä kolmessa Gigantin myymälässä sekä pääkonttorilla. Tutkimuksen tulokset osoittavat, että yrityskuva vaikuttaa yrityksen kiinnostavuuteen ja positiivinen yrityskuva lisää hakijamääriä. Myös yrityksen tapa suorittaa rekrytointinsa vaikuttaa yrityksen yrityskuvaan, sillä se muokkaa hakijoiden mielikuvia yrityksestä.
Resumo:
My doctoral research is about the modelling of symbolism in the cultural heritage domain, and on connecting artworks based on their symbolism through knowledge extraction and representation techniques. In particular, I participated in the design of two ontologies: one models the relationships between a symbol, its symbolic meaning, and the cultural context in which the symbol symbolizes the symbolic meaning; the second models artistic interpretations of a cultural heritage object from an iconographic and iconological (thus also symbolic) perspective. I also converted several sources of unstructured data, a dictionary of symbols and an encyclopaedia of symbolism, and semi-structured data, DBpedia and WordNet, to create HyperReal, the first knowledge graph dedicated to conventional cultural symbolism. By making use of HyperReal's content, I showed how linked open data about cultural symbolism could be utilized to initiate a series of quantitative studies that analyse (i) similarities between cultural contexts based on their symbologies, (ii) broad symbolic associations, (iii) specific case studies of symbolism such as the relationship between symbols, their colours, and their symbolic meanings. Moreover, I developed a system that can infer symbolic, cultural context-dependent interpretations from artworks according to what they depict, envisioning potential use cases for museum curation. I have then re-engineered the iconographic and iconological statements of Wikidata, a widely used general-domain knowledge base, creating ICONdata: an iconographic and iconological knowledge graph. ICONdata was then enriched with automatic symbolic interpretations. Subsequently, I demonstrated the significance of enhancing artwork information through alignment with linked open data related to symbolism, resulting in the discovery of novel connections between artworks. Finally, I contributed to the creation of a software application. This application leverages established connections, allowing users to investigate the symbolic expression of a concept across different cultural contexts through the generation of a three-dimensional exhibition of artefacts symbolising the chosen concept.
Resumo:
Thesis submitted to Faculdade de Ciências e Tecnologia of the Universidade Nova de Lisboa, in partial fulfilment of the requirements for the degree of Master in Computer Science
Resumo:
La présente étude est à la fois une évaluation du processus de la mise en oeuvre et des impacts de la police de proximité dans les cinq plus grandes zones urbaines de Suisse - Bâle, Berne, Genève, Lausanne et Zurich. La police de proximité (community policing) est à la fois une philosophie et une stratégie organisationnelle qui favorise un partenariat renouvelé entre la police et les communautés locales dans le but de résoudre les problèmes relatifs à la sécurité et à l'ordre public. L'évaluation de processus a analysé des données relatives aux réformes internes de la police qui ont été obtenues par l'intermédiaire d'entretiens semi-structurés avec des administrateurs clés des cinq départements de police, ainsi que dans des documents écrits de la police et d'autres sources publiques. L'évaluation des impacts, quant à elle, s'est basée sur des variables contextuelles telles que des statistiques policières et des données de recensement, ainsi que sur des indicateurs d'impacts construit à partir des données du Swiss Crime Survey (SCS) relatives au sentiment d'insécurité, à la perception du désordre public et à la satisfaction de la population à l'égard de la police. Le SCS est un sondage régulier qui a permis d'interroger des habitants des cinq grandes zones urbaines à plusieurs reprises depuis le milieu des années 1980. L'évaluation de processus a abouti à un « Calendrier des activités » visant à créer des données de panel permettant de mesurer les progrès réalisés dans la mise en oeuvre de la police de proximité à l'aide d'une grille d'évaluation à six dimensions à des intervalles de cinq ans entre 1990 et 2010. L'évaluation des impacts, effectuée ex post facto, a utilisé un concept de recherche non-expérimental (observational design) dans le but d'analyser les impacts de différents modèles de police de proximité dans des zones comparables à travers les cinq villes étudiées. Les quartiers urbains, délimités par zone de code postal, ont ainsi été regroupés par l'intermédiaire d'une typologie réalisée à l'aide d'algorithmes d'apprentissage automatique (machine learning). Des algorithmes supervisés et non supervisés ont été utilisés sur les données à haute dimensionnalité relatives à la criminalité, à la structure socio-économique et démographique et au cadre bâti dans le but de regrouper les quartiers urbains les plus similaires dans des clusters. D'abord, les cartes auto-organisatrices (self-organizing maps) ont été utilisées dans le but de réduire la variance intra-cluster des variables contextuelles et de maximiser simultanément la variance inter-cluster des réponses au sondage. Ensuite, l'algorithme des forêts d'arbres décisionnels (random forests) a permis à la fois d'évaluer la pertinence de la typologie de quartier élaborée et de sélectionner les variables contextuelles clés afin de construire un modèle parcimonieux faisant un minimum d'erreurs de classification. Enfin, pour l'analyse des impacts, la méthode des appariements des coefficients de propension (propensity score matching) a été utilisée pour équilibrer les échantillons prétest-posttest en termes d'âge, de sexe et de niveau d'éducation des répondants au sein de chaque type de quartier ainsi identifié dans chacune des villes, avant d'effectuer un test statistique de la différence observée dans les indicateurs d'impacts. De plus, tous les résultats statistiquement significatifs ont été soumis à une analyse de sensibilité (sensitivity analysis) afin d'évaluer leur robustesse face à un biais potentiel dû à des covariables non observées. L'étude relève qu'au cours des quinze dernières années, les cinq services de police ont entamé des réformes majeures de leur organisation ainsi que de leurs stratégies opérationnelles et qu'ils ont noué des partenariats stratégiques afin de mettre en oeuvre la police de proximité. La typologie de quartier développée a abouti à une réduction de la variance intra-cluster des variables contextuelles et permet d'expliquer une partie significative de la variance inter-cluster des indicateurs d'impacts avant la mise en oeuvre du traitement. Ceci semble suggérer que les méthodes de géocomputation aident à équilibrer les covariables observées et donc à réduire les menaces relatives à la validité interne d'un concept de recherche non-expérimental. Enfin, l'analyse des impacts a révélé que le sentiment d'insécurité a diminué de manière significative pendant la période 2000-2005 dans les quartiers se trouvant à l'intérieur et autour des centres-villes de Berne et de Zurich. Ces améliorations sont assez robustes face à des biais dus à des covariables inobservées et covarient dans le temps et l'espace avec la mise en oeuvre de la police de proximité. L'hypothèse alternative envisageant que les diminutions observées dans le sentiment d'insécurité soient, partiellement, un résultat des interventions policières de proximité semble donc être aussi plausible que l'hypothèse nulle considérant l'absence absolue d'effet. Ceci, même si le concept de recherche non-expérimental mis en oeuvre ne peut pas complètement exclure la sélection et la régression à la moyenne comme explications alternatives. The current research project is both a process and impact evaluation of community policing in Switzerland's five major urban areas - Basel, Bern, Geneva, Lausanne, and Zurich. Community policing is both a philosophy and an organizational strategy that promotes a renewed partnership between the police and the community to solve problems of crime and disorder. The process evaluation data on police internal reforms were obtained through semi-structured interviews with key administrators from the five police departments as well as from police internal documents and additional public sources. The impact evaluation uses official crime records and census statistics as contextual variables as well as Swiss Crime Survey (SCS) data on fear of crime, perceptions of disorder, and public attitudes towards the police as outcome measures. The SCS is a standing survey instrument that has polled residents of the five urban areas repeatedly since the mid-1980s. The process evaluation produced a "Calendar of Action" to create panel data to measure community policing implementation progress over six evaluative dimensions in intervals of five years between 1990 and 2010. The impact evaluation, carried out ex post facto, uses an observational design that analyzes the impact of the different community policing models between matched comparison areas across the five cities. Using ZIP code districts as proxies for urban neighborhoods, geospatial data mining algorithms serve to develop a neighborhood typology in order to match the comparison areas. To this end, both unsupervised and supervised algorithms are used to analyze high-dimensional data on crime, the socio-economic and demographic structure, and the built environment in order to classify urban neighborhoods into clusters of similar type. In a first step, self-organizing maps serve as tools to develop a clustering algorithm that reduces the within-cluster variance in the contextual variables and simultaneously maximizes the between-cluster variance in survey responses. The random forests algorithm then serves to assess the appropriateness of the resulting neighborhood typology and to select the key contextual variables in order to build a parsimonious model that makes a minimum of classification errors. Finally, for the impact analysis, propensity score matching methods are used to match the survey respondents of the pretest and posttest samples on age, gender, and their level of education for each neighborhood type identified within each city, before conducting a statistical test of the observed difference in the outcome measures. Moreover, all significant results were subjected to a sensitivity analysis to assess the robustness of these findings in the face of potential bias due to some unobserved covariates. The study finds that over the last fifteen years, all five police departments have undertaken major reforms of their internal organization and operating strategies and forged strategic partnerships in order to implement community policing. The resulting neighborhood typology reduced the within-cluster variance of the contextual variables and accounted for a significant share of the between-cluster variance in the outcome measures prior to treatment, suggesting that geocomputational methods help to balance the observed covariates and hence to reduce threats to the internal validity of an observational design. Finally, the impact analysis revealed that fear of crime dropped significantly over the 2000-2005 period in the neighborhoods in and around the urban centers of Bern and Zurich. These improvements are fairly robust in the face of bias due to some unobserved covariate and covary temporally and spatially with the implementation of community policing. The alternative hypothesis that the observed reductions in fear of crime were at least in part a result of community policing interventions thus appears at least as plausible as the null hypothesis of absolutely no effect, even if the observational design cannot completely rule out selection and regression to the mean as alternative explanations.
Resumo:
Tämän diplomityön tarkoituksena on käydä läpi XML:n tarjoamia mahdollisuuksia heterogeenisen palveluverkon integroinnissa. Työssä kuvataan XML-kielen yleistä teoriaa ja perehdytään etenkin sovellusten välisen kommunikoinnin kannalta tärkeisiin ominaisuuksiin. Samalla käydään läpi sovelluskehitysympäristöjen muuttumista heterogeenisemmiksi ja siitä seurannutta palveluarkkitehtuurien kehittymistä ja kuinka nämä muutokset vaikuttavat XML:n hyväksikäyttöön. Työssä suunniteltiin ja toteutettiin luonnollisen kielen palvelukehitykseen Fuse-palvelualusta. Työssä kuvataan palvelualustan arkkitehtuuri ja siinä tarkastellaan XML:n hyödyntämistä luonnollisen kielen tulkin ja palvelun integroinnissa. Samalla arvioidaan muita XML:n käyttömahdollisuuksia Fuse-palvelualustan parantamiseksi.
Resumo:
XML-muotoista tiedonesitystapaa hyödynnetään yhä enemmän esitettäessä rakenteellista tietoa. Tarkoituksena on antaa yleishyödyllinen ja uudelleenkäytettävä tapa jakaa yleistä tietoa erilaisten rajapintojen yli. XML-tekniikoita käytetään myös korjaamaan aiemmin tehdyissä sovellutuksissa esiintyneitä puutteita ja parantamaan niiden toimintaa. Tässä diplomityössä esitellään Telestelle LabView-pohjaiseen testaussovellusympäristöön suunniteltava ajuriuudistus. Työssä paranneltiin aiempaa ajurimallia soveltamalla siihen XML-tekniikoita hyödyntäviä toimintoja. Tarkoituksena oli vähentää testaussovelluskehityksessä vaadittavaa ohjelmointityötä korvaamalla sovelluksiin kovakoodatut ominaisuudet XML-pohjaisilla konfiguraatiotiedostoilla. Järjestelmän pohjana on yleiskäyttöinen ajuri, joka käyttää Telesten omaa EMS-protokollaa kommunikoinnissaan testattavien tuotteiden kanssa. Ajurimalli käyttää XML-pohjaisia konfiguraatiotiedostoja määrittelemään testattavien tuotteiden ominaisuuksia. XML-skeematiedostoilla esitetään ajurin käyttämän kommunikaatioprotokollan viestityypit ja niiden rakenteet. Työn tuloksena onnistuttiin luomaan uudenlainen XML-tekniikoita hyödyntävä ajurimalli. Yhteen yhteiseen ajuriin perustuva malli yhdenmukaistaa testaussovelluksien toteuttamista ja vähentää tarvittavaa ohjelmointityötä. Ajurin käyttöä helpotettiin toteuttamalla testaussovelluksien kehitysympäristöön erityinen editori, jolla voidaan helposti luoda ajuria käyttäviä toimintoja.
Resumo:
This paper presents preliminary results from an ethnoarchaeological study of animal husbandry in the modern village of Bestansur, situated in the lower Zagros Mountains of Iraqi Kurdistan. This research explores how modern families use and manage their livestock within the local landscape and identifies traces of this use. The aim is to provide the groundwork for future archaeological investigations focusing on the nearby Neolithic site of Bestansur. This is based on the premise that modern behaviours can suggest testable patterns for past practices within the same functional and ecological domains. Semi-structured interviews conducted with villagers from several households provided large amounts of information on modern behaviours that helped direct data collection, and which also illustrate notable shifts in practices and use of the local landscape over time. Strontium isotope analysis of modern plant material demonstrates that a measurable variation exists between the alluvial floodplain and the lower foothills, while analysis of modern dung samples shows clear variation between sheep/goat and cow dung, in terms of numbers of faecal spherulites. These results are specific to the local environment of Bestansur and can be used for evaluating and contextualising archaeological evidence as well as providing modern reference material for comparative purposes.
Resumo:
Purpose: To investigate the relationship between research data management (RDM) and data sharing in the formulation of RDM policies and development of practices in higher education institutions (HEIs). Design/methodology/approach: Two strands of work were undertaken sequentially: firstly, content analysis of 37 RDM policies from UK HEIs; secondly, two detailed case studies of institutions with different approaches to RDM based on semi-structured interviews with staff involved in the development of RDM policy and services. The data are interpreted using insights from Actor Network Theory. Findings: RDM policy formation and service development has created a complex set of networks within and beyond institutions involving different professional groups with widely varying priorities shaping activities. Data sharing is considered an important activity in the policies and services of HEIs studied, but its prominence can in most cases be attributed to the positions adopted by large research funders. Research limitations/implications: The case studies, as research based on qualitative data, cannot be assumed to be universally applicable but do illustrate a variety of issues and challenges experienced more generally, particularly in the UK. Practical implications: The research may help to inform development of policy and practice in RDM in HEIs and funder organisations. Originality/value: This paper makes an early contribution to the RDM literature on the specific topic of the relationship between RDM policy and services, and openness – a topic which to date has received limited attention.
Resumo:
In parallel to the effort of creating Open Linked Data for the World Wide Web there is a number of projects aimed for developing the same technologies but in the context of their usage in closed environments such as private enterprises. In the paper, we present results of research on interlinking structured data for use in Idea Management Systems - a still rare breed of knowledge management systems dedicated to innovation management. In our study, we show the process of extending an ontology that initially covers only the Idea Management System structure towards the concept of linking with distributed enterprise data and public data using Semantic Web technologies. Furthermore we point out how the established links can help to solve the key problems of contemporary Idea Management Systems
Resumo:
Compile-time program analysis techniques can be applied to Web service orchestrations to prove or check various properties. In particular, service orchestrations can be subjected to resource analysis, in which safe approximations of upper and lower resource usage bounds are deduced. A uniform analysis can be simultaneously performed for different generalized resources that can be directiy correlated with cost- and performance-related quality attributes, such as invocations of partners, network traffic, number of activities, iterations, and data accesses. The resulting safe upper and lower bounds do not depend on probabilistic assumptions, and are expressed as functions of size or length of data components from an initiating message, using a finegrained structured data model that corresponds to the XML-style of information structuring. The analysis is performed by transforming a BPEL-like representation of an orchestration into an equivalent program in another programming language for which the appropriate analysis tools already exist.
Resumo:
In the global strategy for preservation genetic resources of farm animals the implementation of information technology is of great importance. In this regards platform independent information tools and approaches for data exchange are needed in order to obtain aggregate values for regions and countries of spreading a separate breed. The current paper presents a XML based solution for data exchange in management genetic resources of farm animals’ small populations. There are specific requirements to the exchanged documents that come from the goal of data analysis. Three main types of documents are distinguished and their XML formats are discussed. DTD and XML Schema for each type are suggested. Some examples of XML documents are given also.
Resumo:
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.
Resumo:
Truck drivers are one of the largest occupational groups in Iran. Evidence from previous studies suggests that working and living conditions on the road engender many concerns for truck drivers, and their families and communities. This research aimed to explore the experiences of Iranian truck drivers regarding life on the road. This qualitative study was conducted among Iranian truck drivers working in the inter-state transportation sector. A purposeful sample of 20 truck drivers took part in this research. Data were collected through semi-structured interviews and analyzed based on qualitative content analysis. After analysis of the data, three main themes emerged: "Individual impacts related to the hardships of life on the road life", "Family impacts related to the hardships of road life", and "Having positive attitude towards work and road". These findings represent the dimensions of perspectives in the road-life of truck drivers. Although truck drivers possess positive beliefs about their occupation and life on the road, they and their families face many hardships which should be well understood. They also need support to be better able to solve the road-life concerns they face. This study's findings are useful for occupational programming and in the promotion of health for truck drivers.