862 resultados para data processing
Resumo:
Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. This survey analyzes the convergence of trends from both areas: an increasing number of researchers is working on improving the results of Web Mining by exploiting semantic structures in the Web, and they make use of Web Mining techniques for building the Semantic Web. Last but not least, these techniques can be used for mining the Semantic Web itself. The Semantic Web is the second-generation WWW, enriched by machine-processable information which supports the user in his tasks. Given the enormous size even of today’s Web, it is impossible to manually enrich all of these resources. Therefore, automated schemes for learning the relevant information are increasingly being used. Web Mining aims at discovering insights about the meaning of Web resources and their usage. Given the primarily syntactical nature of the data being mined, the discovery of meaning is impossible based on these data only. Therefore, formalizations of the semantics of Web sites and navigation behavior are becoming more and more common. Furthermore, mining the Semantic Web itself is another upcoming application. We argue that the two areas Web Mining and Semantic Web need each other to fulfill their goals, but that the full potential of this convergence is not yet realized. This paper gives an overview of where the two areas meet today, and sketches ways of how a closer integration could be profitable.
Resumo:
Association rules are a popular knowledge discovery technique for warehouse basket analysis. They indicate which items of the warehouse are frequently bought together. The problem of association rule mining has first been stated in 1993. Five years later, several research groups discovered that this problem has a strong connection to Formal Concept Analysis (FCA). In this survey, we will first introduce some basic ideas of this connection along a specific algorithm, TITANIC, and show how FCA helps in reducing the number of resulting rules without loss of information, before giving a general overview over the history and state of the art of applying FCA for association rule mining.
Resumo:
Ein wichtiger Baustein des neu entdeckten World Wide Web - des "Web 2.0" - stellen Folksonomies dar. In diesen Systemen können Benutzer gemeinsam Ressourcen verwalten und mit Schlagwörtern versehen. Die dadurch entstehenden begrifflichen Strukturen stellen ein interessantes Forschungsfeld dar. Dieser Artikel untersucht Ansätze und Wege zur Entdeckung und Strukturierung von Nutzergruppen ("Communities") in Folksonomies.
Resumo:
As the number of resources on the web exceeds by far the number of documents one can track, it becomes increasingly difficult to remain up to date on ones own areas of interest. The problem becomes more severe with the increasing fraction of multimedia data, from which it is difficult to extract some conceptual description of their contents. One way to overcome this problem are social bookmark tools, which are rapidly emerging on the web. In such systems, users are setting up lightweight conceptual structures called folksonomies, and overcome thus the knowledge acquisition bottleneck. As more and more people participate in the effort, the use of a common vocabulary becomes more and more stable. We present an approach for discovering topic-specific trends within folksonomies. It is based on a differential adaptation of the PageRank algorithm to the triadic hypergraph structure of a folksonomy. The approach allows for any kind of data, as it does not rely on the internal structure of the documents. In particular, this allows to consider different data types in the same analysis step. We run experiments on a large-scale real-world snapshot of a social bookmarking system.
Resumo:
Recently, research projects such as PADLR and SWAP have developed tools like Edutella or Bibster, which are targeted at establishing peer-to-peer knowledge management (P2PKM) systems. In such a system, it is necessary to obtain provide brief semantic descriptions of peers, so that routing algorithms or matchmaking processes can make decisions about which communities peers should belong to, or to which peers a given query should be forwarded. This paper proposes the use of graph clustering techniques on knowledge bases for that purpose. Using this clustering, we can show that our strategy requires up to 58% fewer queries than the baselines to yield full recall in a bibliographic P2PKM scenario.
Resumo:
Social resource sharing systems like YouTube and del.icio.us have acquired a large number of users within the last few years. They provide rich resources for data analysis, information retrieval, and knowledge discovery applications. A first step towards this end is to gain better insights into content and structure of these systems. In this paper, we will analyse the main network characteristics of two of the systems. We consider their underlying data structures – socalled folksonomies – as tri-partite hypergraphs, and adapt classical network measures like characteristic path length and clustering coefficient to them. Subsequently, we introduce a network of tag co-occurrence and investigate some of its statistical properties, focusing on correlations in node connectivity and pointing out features that reflect emergent semantics within the folksonomy. We show that simple statistical indicators unambiguously spot non-social behavior such as spam.
Resumo:
Social resource sharing systems like YouTube and del.icio.us have acquired a large number of users within the last few years. They provide rich resources for data analysis, information retrieval, and knowledge discovery applications. A first step towards this end is to gain better insights into content and structure of these systems. In this paper, we will analyse the main network characteristics of two of these systems. We consider their underlying data structures â so-called folksonomies â as tri-partite hypergraphs, and adapt classical network measures like characteristic path length and clustering coefficient to them. Subsequently, we introduce a network of tag cooccurrence and investigate some of its statistical properties, focusing on correlations in node connectivity and pointing out features that reflect emergent semantics within the folksonomy. We show that simple statistical indicators unambiguously spot non-social behavior such as spam.
Resumo:
A key argument for modeling knowledge in ontologies is the easy re-use and re-engineering of the knowledge. However, beside consistency checking, current ontology engineering tools provide only basic functionalities for analyzing ontologies. Since ontologies can be considered as (labeled, directed) graphs, graph analysis techniques are a suitable answer for this need. Graph analysis has been performed by sociologists for over 60 years, and resulted in the vivid research area of Social Network Analysis (SNA). While social network structures in general currently receive high attention in the Semantic Web community, there are only very few SNA applications up to now, and virtually none for analyzing the structure of ontologies. We illustrate in this paper the benefits of applying SNA to ontologies and the Semantic Web, and discuss which research topics arise on the edge between the two areas. In particular, we discuss how different notions of centrality describe the core content and structure of an ontology. From the rather simple notion of degree centrality over betweenness centrality to the more complex eigenvector centrality based on Hermitian matrices, we illustrate the insights these measures provide on two ontologies, which are different in purpose, scope, and size.
Resumo:
Distributed systems are one of the most vital components of the economy. The most prominent example is probably the internet, a constituent element of our knowledge society. During the recent years, the number of novel network types has steadily increased. Amongst others, sensor networks, distributed systems composed of tiny computational devices with scarce resources, have emerged. The further development and heterogeneous connection of such systems imposes new requirements on the software development process. Mobile and wireless networks, for instance, have to organize themselves autonomously and must be able to react to changes in the environment and to failing nodes alike. Researching new approaches for the design of distributed algorithms may lead to methods with which these requirements can be met efficiently. In this thesis, one such method is developed, tested, and discussed in respect of its practical utility. Our new design approach for distributed algorithms is based on Genetic Programming, a member of the family of evolutionary algorithms. Evolutionary algorithms are metaheuristic optimization methods which copy principles from natural evolution. They use a population of solution candidates which they try to refine step by step in order to attain optimal values for predefined objective functions. The synthesis of an algorithm with our approach starts with an analysis step in which the wanted global behavior of the distributed system is specified. From this specification, objective functions are derived which steer a Genetic Programming process where the solution candidates are distributed programs. The objective functions rate how close these programs approximate the goal behavior in multiple randomized network simulations. The evolutionary process step by step selects the most promising solution candidates and modifies and combines them with mutation and crossover operators. This way, a description of the global behavior of a distributed system is translated automatically to programs which, if executed locally on the nodes of the system, exhibit this behavior. In our work, we test six different ways for representing distributed programs, comprising adaptations and extensions of well-known Genetic Programming methods (SGP, eSGP, and LGP), one bio-inspired approach (Fraglets), and two new program representations called Rule-based Genetic Programming (RBGP, eRBGP) designed by us. We breed programs in these representations for three well-known example problems in distributed systems: election algorithms, the distributed mutual exclusion at a critical section, and the distributed computation of the greatest common divisor of a set of numbers. Synthesizing distributed programs the evolutionary way does not necessarily lead to the envisaged results. In a detailed analysis, we discuss the problematic features which make this form of Genetic Programming particularly hard. The two Rule-based Genetic Programming approaches have been developed especially in order to mitigate these difficulties. In our experiments, at least one of them (eRBGP) turned out to be a very efficient approach and in most cases, was superior to the other representations.
Resumo:
In recent years, progress in the area of mobile telecommunications has changed our way of life, in the private as well as the business domain. Mobile and wireless networks have ever increasing bit rates, mobile network operators provide more and more services, and at the same time costs for the usage of mobile services and bit rates are decreasing. However, mobile services today still lack functions that seamlessly integrate into users’ everyday life. That is, service attributes such as context-awareness and personalisation are often either proprietary, limited or not available at all. In order to overcome this deficiency, telecommunications companies are heavily engaged in the research and development of service platforms for networks beyond 3G for the provisioning of innovative mobile services. These service platforms are to support such service attributes. Service platforms are to provide basic service-independent functions such as billing, identity management, context management, user profile management, etc. Instead of developing own solutions, developers of end-user services such as innovative messaging services or location-based services can utilise the platform-side functions for their own purposes. In doing so, the platform-side support for such functions takes away complexity, development time and development costs from service developers. Context-awareness and personalisation are two of the most important aspects of service platforms in telecommunications environments. The combination of context-awareness and personalisation features can also be described as situation-dependent personalisation of services. The support for this feature requires several processing steps. The focus of this doctoral thesis is on the processing step, in which the user’s current context is matched against situation-dependent user preferences to find the matching user preferences for the current user’s situation. However, to achieve this, a user profile management system and corresponding functionality is required. These parts are also covered by this thesis. Altogether, this thesis provides the following contributions: The first part of the contribution is mainly architecture-oriented. First and foremost, we provide a user profile management system that addresses the specific requirements of service platforms in telecommunications environments. In particular, the user profile management system has to deal with situation-specific user preferences and with user information for various services. In order to structure the user information, we also propose a user profile structure and the corresponding user profile ontology as part of an ontology infrastructure in a service platform. The second part of the contribution is the selection mechanism for finding matching situation-dependent user preferences for the personalisation of services. This functionality is provided as a sub-module of the user profile management system. Contrary to existing solutions, our selection mechanism is based on ontology reasoning. This mechanism is evaluated in terms of runtime performance and in terms of supported functionality compared to other approaches. The results of the evaluation show the benefits and the drawbacks of ontology modelling and ontology reasoning in practical applications.
Resumo:
In dieser Arbeit wird ein modulares Agentensystem entworfen und auf einer marktüblichen PC-basierten SPS-Steuerung implementiert, welches in der Lage ist, den logistischen Anlagenteil eines Hybriden Prozessmodells abhängig von einem dynamischen Energiepreis zu steuern. Dies wird durch die Rekonfiguration und energetische Optimierung der Funktionen einzelner Module auf Grundlage eines Umweltmodells durch Softwareagenten erreicht. Dieses Umweltmodell wird zunächst in verschiedenen Diagrammen der objektorientierten Modellierungssprachen UML und SysML modelliert. Hierfür werden im Rahmen dieser Arbeit dazu notwendige Erweiterungen des Timing-Diagramms der UML entworfen. Das Agentensystem wird mit der Methode Gaia entworfen. Durch das in dieser Methode enthaltene Rollenkonzept werden Möglichkeiten zur Wiederverwendung von Teilen des Entwurfs auch innerhalb von anderen Arbeiten ermöglicht. Die Echtzeitanforderungen, welche sich an die Steuerungssoftware des logistischen Prozesses ergeben, können bei dem Entwurf durch Gaia modelliert werden. Das im Vorgehen nach Gaia enthaltene Bekanntschaftsmodell wird um die Darstellung weiterer wichtiger Informationen erweitert.
Resumo:
We introduce a new mode of operation for CD-systems of restarting automata by providing explicit enable and disable conditions in the form of regular constraints. We show that, for each CD-system M of restarting automata and each mode m of operation considered by Messerschmidt and Otto, there exists a CD-system M' of restarting automata of the same type as M that, working in the new mode ed, accepts the language that M accepts in mode m. Further, we prove that in mode ed, a locally deterministic CD-system of restarting automata of type RR(W)(W) can be simulated by a locally deterministic CD-system of restarting automata of the more restricted type R(W)(W). This is the first time that a non-monotone type of R-automaton without auxiliary symbols is shown to be as expressive as the corresponding type of RR-automaton.
Resumo:
Die Bedeutung des Dienstgüte-Managements (SLM) im Bereich von Unternehmensanwendungen steigt mit der zunehmenden Kritikalität von IT-gestützten Prozessen für den Erfolg einzelner Unternehmen. Traditionell werden zur Implementierung eines wirksamen SLMs Monitoringprozesse in hierarchischen Managementumgebungen etabliert, die einen Administrator bei der notwendigen Rekonfiguration von Systemen unterstützen. Auf aktuelle, hochdynamische Softwarearchitekturen sind diese hierarchischen Ansätze jedoch nur sehr eingeschränkt anwendbar. Ein Beispiel dafür sind dienstorientierte Architekturen (SOA), bei denen die Geschäftsfunktionalität durch das Zusammenspiel einzelner, voneinander unabhängiger Dienste auf Basis deskriptiver Workflow-Beschreibungen modelliert wird. Dadurch ergibt sich eine hohe Laufzeitdynamik der gesamten Architektur. Für das SLM ist insbesondere die dezentrale Struktur einer SOA mit unterschiedlichen administrativen Zuständigkeiten für einzelne Teilsysteme problematisch, da regelnde Eingriffe zum einen durch die Kapselung der Implementierung einzelner Dienste und zum anderen durch das Fehlen einer zentralen Kontrollinstanz nur sehr eingeschränkt möglich sind. Die vorliegende Arbeit definiert die Architektur eines SLM-Systems für SOA-Umgebungen, in dem autonome Management-Komponenten kooperieren, um übergeordnete Dienstgüteziele zu erfüllen: Mithilfe von Selbst-Management-Technologien wird zunächst eine Automatisierung des Dienstgüte-Managements auf Ebene einzelner Dienste erreicht. Die autonomen Management-Komponenten dieser Dienste können dann mithilfe von Selbstorganisationsmechanismen übergreifende Ziele zur Optimierung von Dienstgüteverhalten und Ressourcennutzung verfolgen. Für das SLM auf Ebene von SOA Workflows müssen temporär dienstübergreifende Kooperationen zur Erfüllung von Dienstgüteanforderungen etabliert werden, die sich damit auch über mehrere administrative Domänen erstrecken können. Eine solche zeitlich begrenzte Kooperation autonomer Teilsysteme kann sinnvoll nur dezentral erfolgen, da die jeweiligen Kooperationspartner im Vorfeld nicht bekannt sind und – je nach Lebensdauer einzelner Workflows – zur Laufzeit beteiligte Komponenten ausgetauscht werden können. In der Arbeit wird ein Verfahren zur Koordination autonomer Management-Komponenten mit dem Ziel der Optimierung von Antwortzeiten auf Workflow-Ebene entwickelt: Management-Komponenten können durch Übertragung von Antwortzeitanteilen untereinander ihre individuellen Ziele straffen oder lockern, ohne dass das Gesamtantwortzeitziel dadurch verändert wird. Die Übertragung von Antwortzeitanteilen wird mithilfe eines Auktionsverfahrens realisiert. Technische Grundlage der Kooperation bildet ein Gruppenkommunikationsmechanismus. Weiterhin werden in Bezug auf die Nutzung geteilter, virtualisierter Ressourcen konkurrierende Dienste entsprechend geschäftlicher Ziele priorisiert. Im Rahmen der praktischen Umsetzung wird die Realisierung zentraler Architekturelemente und der entwickelten Verfahren zur Selbstorganisation beispielhaft für das SLM konkreter Komponenten vorgestellt. Zur Untersuchung der Management-Kooperation in größeren Szenarien wird ein hybrider Simulationsansatz verwendet. Im Rahmen der Evaluation werden Untersuchungen zur Skalierbarkeit des Ansatzes durchgeführt. Schwerpunkt ist hierbei die Betrachtung eines Systems aus kooperierenden Management-Komponenten, insbesondere im Hinblick auf den Kommunikationsaufwand. Die Evaluation zeigt, dass ein dienstübergreifendes, autonomes Performance-Management in SOA-Umgebungen möglich ist. Die Ergebnisse legen nahe, dass der entwickelte Ansatz auch in großen Umgebungen erfolgreich angewendet werden kann.
Resumo:
The 21st century has brought new challenges for forest management at a time when globalization in world trade is increasing and global climate change is becoming increasingly apparent. In addition to various goods and services like food, feed, timber or biofuels being provided to humans, forest ecosystems are a large store of terrestrial carbon and account for a major part of the carbon exchange between the atmosphere and the land surface. Depending on the stage of the ecosystems and/or management regimes, forests can be either sinks, or sources of carbon. At the global scale, rapid economic development and a growing world population have raised much concern over the use of natural resources, especially forest resources. The challenging question is how can the global demands for forest commodities be satisfied in an increasingly globalised economy, and where could they potentially be produced? For this purpose, wood demand estimates need to be integrated in a framework, which is able to adequately handle the competition for land between major land-use options such as residential land or agricultural land. This thesis is organised in accordance with the requirements to integrate the simulation of forest changes based on wood extraction in an existing framework for global land-use modelling called LandSHIFT. Accordingly, the following neuralgic points for research have been identified: (1) a review of existing global-scale economic forest sector models (2) simulation of global wood production under selected scenarios (3) simulation of global vegetation carbon yields and (4) the implementation of a land-use allocation procedure to simulate the impact of wood extraction on forest land-cover. Modelling the spatial dynamics of forests on the global scale requires two important inputs: (1) simulated long-term wood demand data to determine future roundwood harvests in each country and (2) the changes in the spatial distribution of woody biomass stocks to determine how much of the resource is available to satisfy the simulated wood demands. First, three global timber market models are reviewed and compared in order to select a suitable economic model to generate wood demand scenario data for the forest sector in LandSHIFT. The comparison indicates that the ‘Global Forest Products Model’ (GFPM) is most suitable for obtaining projections on future roundwood harvests for further study with the LandSHIFT forest sector. Accordingly, the GFPM is adapted and applied to simulate wood demands for the global forestry sector conditional on selected scenarios from the Millennium Ecosystem Assessment and the Global Environmental Outlook until 2050. Secondly, the Lund-Potsdam-Jena (LPJ) dynamic global vegetation model is utilized to simulate the change in potential vegetation carbon stocks for the forested locations in LandSHIFT. The LPJ data is used in collaboration with spatially explicit forest inventory data on aboveground biomass to allocate the demands for raw forest products and identify locations of deforestation. Using the previous results as an input, a methodology to simulate the spatial dynamics of forests based on wood extraction is developed within the LandSHIFT framework. The land-use allocation procedure specified in the module translates the country level demands for forest products into woody biomass requirements for forest areas, and allocates these on a five arc minute grid. In a first version, the model assumes only actual conditions through the entire study period and does not explicitly address forest age structure. Although the module is in a very preliminary stage of development, it already captures the effects of important drivers of land-use change like cropland and urban expansion. As a first plausibility test, the module performance is tested under three forest management scenarios. The module succeeds in responding to changing inputs in an expected and consistent manner. The entire methodology is applied in an exemplary scenario analysis for India. A couple of future research priorities need to be addressed, particularly the incorporation of plantation establishments; issue of age structure dynamics; as well as the implementation of a new technology change factor in the GFPM which can allow the specification of substituting raw wood products (especially fuelwood) by other non-wood products.