975 resultados para Structured data
Resumo:
XML-muotoista tiedonesitystapaa hyödynnetään yhä enemmän esitettäessä rakenteellista tietoa. Tarkoituksena on antaa yleishyödyllinen ja uudelleenkäytettävä tapa jakaa yleistä tietoa erilaisten rajapintojen yli. XML-tekniikoita käytetään myös korjaamaan aiemmin tehdyissä sovellutuksissa esiintyneitä puutteita ja parantamaan niiden toimintaa. Tässä diplomityössä esitellään Telestelle LabView-pohjaiseen testaussovellusympäristöön suunniteltava ajuriuudistus. Työssä paranneltiin aiempaa ajurimallia soveltamalla siihen XML-tekniikoita hyödyntäviä toimintoja. Tarkoituksena oli vähentää testaussovelluskehityksessä vaadittavaa ohjelmointityötä korvaamalla sovelluksiin kovakoodatut ominaisuudet XML-pohjaisilla konfiguraatiotiedostoilla. Järjestelmän pohjana on yleiskäyttöinen ajuri, joka käyttää Telesten omaa EMS-protokollaa kommunikoinnissaan testattavien tuotteiden kanssa. Ajurimalli käyttää XML-pohjaisia konfiguraatiotiedostoja määrittelemään testattavien tuotteiden ominaisuuksia. XML-skeematiedostoilla esitetään ajurin käyttämän kommunikaatioprotokollan viestityypit ja niiden rakenteet. Työn tuloksena onnistuttiin luomaan uudenlainen XML-tekniikoita hyödyntävä ajurimalli. Yhteen yhteiseen ajuriin perustuva malli yhdenmukaistaa testaussovelluksien toteuttamista ja vähentää tarvittavaa ohjelmointityötä. Ajurin käyttöä helpotettiin toteuttamalla testaussovelluksien kehitysympäristöön erityinen editori, jolla voidaan helposti luoda ajuria käyttäviä toimintoja.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
Metadata in increasing levels of sophistication has been the most powerful concept used in management of unstructured information ever since the first librarian used the Dewey decimal system for library classifications. It remains to be seen, however, what the best approach is to implementing metadata to manage huge volumes of unstructured information in a large organization. Also, once implemented, how is it possible to track whether it is adding value to the company, and whether the implementation has been successful? Existing literature on metadata seems to either focus too much on technical and quality aspects or describe issues with respect to adoption for general information management initiatives. This research therefore, strives to contribute to these gaps: to give a consolidated framework for striving to understand the value added by implementing metadata. The basic methodology used is that of case study, which incorporates aspects of design science, surveys, and interviews in order to provide a holistic approach to quantitative and qualitative analysis of the case. The research identifies the various approaches to implementing metadata, particularly studying the one followed by the unit of analysis of case study, a large company in the Oil and Gas Sector. Of the three approaches identified, the selected company already follows an approach that appears to be superior. The researcher further explores its shortcomings, and proposes a slightly modified approach that can handle them. The research categorically and thoroughly (in context) identifies the top effectiveness criteria, and corresponding key performance indicators(KPIs) that can be measured to understand the level of advancement of the metadata management initiative in the company. In an effort to contrast and have a basis of comparison for the findings, the research also includes views from information managers dealing with core structured data stored in ERPs and other databases. In addition, the results include the basic criteria that can be used to evaluate metrics, in order to classify a metric as a KPI.
Resumo:
Tutkimuksen tarkoituksena on selvittää, millainen vaikutus yrityskuvalla on yrityksen rekrytointiprosessiin. Yrityskuva- ja rekrytointikirjallisuuden pohjalta luotiin tutkimukselle viitekehys, jonka pohjalta yrityskuvan ja rekrytointiprosessin välistä suhdetta analysoitiin. Tutkimuksen empiirinen osa toteutettiin tapaustutkimuksena kvalitatiivisia tutkimusmenetelmiä hyödyntäen. Kohdeyrityksenä toimi Gigantti Oy Ab, ja tiedonkeruu toteutettiin puolistrukturoiduilla teemahaastatteluilla yhteensä kolmessa Gigantin myymälässä sekä pääkonttorilla. Tutkimuksen tulokset osoittavat, että yrityskuva vaikuttaa yrityksen kiinnostavuuteen ja positiivinen yrityskuva lisää hakijamääriä. Myös yrityksen tapa suorittaa rekrytointinsa vaikuttaa yrityksen yrityskuvaan, sillä se muokkaa hakijoiden mielikuvia yrityksestä.
Resumo:
Academic libraries worldwide have witnessed a number of trends and paradigm shifts over the last decade. It is vital for university libraries to develop a collection of high standards to satisfy academics and researchers for supporting the vision and mission of a university. The area of collection development and management is the most important part of any library. This paper reports on the problems and prospects of collection and asset management of the University Library of Cochin University of Science and Technology (CUSAT). The insight for the paper comes from the authors’ first-hand experience supported by literature review. Detailed information regarding the purchase of books, serials, policies regarding the acquisition, and changing trends and problems were collected from the official records with the help of a structured data sheet. The study discovers the current trends in collection and asset management in CUSAT and point out the changes likely to be adopted in future.
Resumo:
Structured data represented in the form of graphs arises in several fields of the science and the growing amount of available data makes distributed graph mining techniques particularly relevant. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiver-initiated, load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening dataset, where the approach attains close-to linear speedup in a network of workstations.
Resumo:
Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset.
Resumo:
The multi-relational Data Mining approach has emerged as alternative to the analysis of structured data, such as relational databases. Unlike traditional algorithms, the multi-relational proposals allow mining directly multiple tables, avoiding the costly join operations. In this paper, is presented a comparative study involving the traditional Patricia Mine algorithm and its corresponding multi-relational proposed, MR-Radix in order to evaluate the performance of two approaches for mining association rules are used for relational databases. This study presents two original contributions: the proposition of an algorithm multi-relational MR-Radix, which is efficient for use in relational databases, both in terms of execution time and in relation to memory usage and the presentation of the empirical approach multirelational advantage in performance over several tables, which avoids the costly join operations from multiple tables. © 2011 IEEE.
Resumo:
Metadata is data that fully describes the data and the areas they represent, allowing the user to decide on their use as best as possible. Allow reporting on the existence of a set of data linked to specific needs. The use of metadata has the purpose of documenting and organizing a structured organizational data in order to minimize duplication of efforts to locate them and to facilitate maintenance. It also provides the administration of large amounts of data, discovery, retrieval and editing features. The global use of metadata is regulated by a technical group or task force composed of several segments such as industries, universities and research firms. Agriculture in particular is a good example for the development of typical applications using metadata is the integration of systems and equipment, allowing the implementation of techniques used in precision agriculture, the integration of different computer systems via webservices or other type of solution requires the integration of structured data. The purpose of this paper is to present an overview of the standards of metadata areas consolidated as agricultural.
Resumo:
XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.
Resumo:
Introduction: Over the last decades, Swiss sports clubs have lost their "monopoly" in the market for sports-related services and increasingly are in competition with other sports providers. For many sport clubs long-term membership cannot be seen as a matter of course. Current research on sports clubs in Switzerland – as well as for other European countries – confirms the increasing difficulties in achieving long-term member commitment. Looking at recent findings of the Swiss sport clubs report (Lamprecht, Fischer & Stamm, 2012), it can be noted, that a decrease in memberships does not equally affect all clubs. There are sports clubs – because of their specific situational and structural conditions – that have few problems with member fluctuation, while other clubs show considerable declines in membership. Therefore, a clear understanding of individual and structural factors that trigger and sustain member commitment would help sports clubs to tackle this problem more effectively. This situation poses the question: What are the individual and structural determinants that influence the tendency to continue or to quit the membership? Methods: Existing research has extensively investigated the drivers of members’ commitment at an individual level. As commitment of members usually occurs within an organizational context, the characteristics of the organisation should be also considered. However, this context has been largely neglected in current research. This presentation addresses both the individual characteristics of members and the corresponding structural conditions of sports clubs resulting in a multi-level framework for the investigation of the factors of members’ commitment in sports clubs. The multilevel analysis grant a adequate handling of hierarchically structured data (e.g., Hox, 2002). The influences of both the individual and context level on the stability of memberships are estimated in multi-level models based on a sample of n = 1,434 sport club members from 36 sports clubs. Results: Results of these multi-level analyses indicate that commitment of members is not just an outcome of individual characteristics, such as strong identification with the club, positively perceived communication and cooperation, satisfaction with sports clubs’ offers, or voluntary engagement. It is also influenced by club-specific structural conditions: stable memberships are more probable in rural sports clubs, and in clubs that explicitly support sociability, whereas sporting-success oriented goals in clubs have a destabilizing effect. Discussion/Conclusion: The proposed multi-level framework and the multi-level analysis can open new perspectives for research concerning commitment of members to sports clubs and other topics and problems of sport organisation research, especially in assisting to understand individual behavior within organizational contexts. References: Hox, J. J. (2002). Multilevel analysis: Techniques and applications. Mahwah: Lawrence Erlbaum. Lamprecht, M., Fischer, A., & Stamm, H.-P. (2012). Die Schweizer Sportvereine – Strukturen, Leistungen, Herausforderungen. Zurich: Seismo.
Does context matter? Analysing structural and individual factors of member commitment in sport clubs
Resumo:
This article addresses factors that infl uence member commitment in sport clubs. Based on the theory of social action and the economic behaviour theory, it focuses not only on individual characteristics of club members but also on the corresponding structural conditions of sport clubs. Accordingly, a multilevel framework is developed for explaining member commitment in sport clubs. Different multilevel models were estimated in order to analyse the infl uences of both the individual and corresponding context Level in a sample of n = 1,699 members of 42 Swiss and German sport clubs. The multilevel analysis permitted an adequate handling of hierarchically structured data. Results of These multilevel analyses indicated that the commitment of members is not just an outcome of individual characteristics such as strong identifi cation with their club, positively perceived (collective) solidarity, satisfaction with their sport club, or voluntary engagement. It is also determined by club-specific structural conditions: commitment proves to be more probable in rural sport clubs and clubs that explicitly support sociability. Furthermore, cross-level effects in relation to member commitment were also found between the context variable sociability and the individual variable identification.
Resumo:
In the information society large amounts of information are being generated and transmitted constantly, especially in the most natural way for humans, i.e., natural language. Social networks, blogs, forums, and Q&A sites are a dynamic Large Knowledge Repository. So, Web 2.0 contains structured data but still the largest amount of information is expressed in natural language. Linguistic structures for text recognition enable the extraction of structured information from texts. However, the expressiveness of the current structures is limited as they have been designed with a strict order in their phrases, limiting their applicability to other languages and making them more sensible to grammatical errors. To overcome these limitations, in this paper we present a linguistic structure named ?linguistic schema?, with a richer expressiveness that introduces less implicit constraints over annotations.
Resumo:
This paper presents a Focused Crawler in order to Get Semantic Web Resources (CSR). Structured data web are available in formats such as Extensible Markup Language (XML), Resource Description Framework (RDF) and Ontology Web Language (OWL) that can be used for processing. One of the main challenges for performing a manual search and download semantic web resources is that this task consumes a lot of time. Our research work propose a focused crawler which allow to download these resources automatically and store them on disk in order to have a collection that will be used for data processing. CRS consists of three layers: (a) The User Interface Layer, (b) The Focus Crawler Layer and (c) The Base Crawler Layer. CSR uses as a selection policie the Shark-Search method. CSR was conducted with two experiments. The first one starts on December 15 2012 at 7:11 am and ends on December 16 2012 at 4:01 were obtained 448,123,537 bytes of data. The CSR ends by itself after to analyze 80,4375 seeds with an unlimited depth. CSR got 16,576 semantic resources files where the 89 % was RDF, the 10 % was XML and the 1% was OWL. The second one was based on the Web Data Commons work of the Research Group Data and Web Science at the University of Mannheim and the Institute AIFB at the Karlsruhe Institute of Technology. This began at 4:46 am of June 2 2013 and 1:37 am June 9 2013. After 162.51 hours of execution the result was 285,279 semantic resources where predominated the XML resources with 99 % and OWL and RDF with 1 % each one.
Resumo:
Decision support systems (DSS) support business or organizational decision-making activities, which require the access to information that is internally stored in databases or data warehouses, and externally in the Web accessed by Information Retrieval (IR) or Question Answering (QA) systems. Graphical interfaces to query these sources of information ease to constrain dynamically query formulation based on user selections, but they present a lack of flexibility in query formulation, since the expressivity power is reduced to the user interface design. Natural language interfaces (NLI) are expected as the optimal solution. However, especially for non-expert users, a real natural communication is the most difficult to realize effectively. In this paper, we propose an NLI that improves the interaction between the user and the DSS by means of referencing previous questions or their answers (i.e. anaphora such as the pronoun reference in “What traits are affected by them?”), or by eliding parts of the question (i.e. ellipsis such as “And to glume colour?” after the question “Tell me the QTLs related to awn colour in wheat”). Moreover, in order to overcome one of the main problems of NLIs about the difficulty to adapt an NLI to a new domain, our proposal is based on ontologies that are obtained semi-automatically from a framework that allows the integration of internal and external, structured and unstructured information. Therefore, our proposal can interface with databases, data warehouses, QA and IR systems. Because of the high NL ambiguity of the resolution process, our proposal is presented as an authoring tool that helps the user to query efficiently in natural language. Finally, our proposal is tested on a DSS case scenario about Biotechnology and Agriculture, whose knowledge base is the CEREALAB database as internal structured data, and the Web (e.g. PubMed) as external unstructured information.