165 resultados para Natural language processing (Computer science) -- TFC
Resumo:
We study several extensions of the notion of alternation from context-free grammars to context-sensitive and arbitrary phrase-structure grammars. Thereby new grammatical characterizations are obtained for the class of languages that are accepted by alternating pushdown automata.
Resumo:
The process of developing software that takes advantage of multiple processors is commonly referred to as parallel programming. For various reasons, this process is much harder than the sequential case. For decades, parallel programming has been a problem for a small niche only: engineers working on parallelizing mostly numerical applications in High Performance Computing. This has changed with the advent of multi-core processors in mainstream computer architectures. Parallel programming in our days becomes a problem for a much larger group of developers. The main objective of this thesis was to find ways to make parallel programming easier for them. Different aims were identified in order to reach the objective: research the state of the art of parallel programming today, improve the education of software developers about the topic, and provide programmers with powerful abstractions to make their work easier. To reach these aims, several key steps were taken. To start with, a survey was conducted among parallel programmers to find out about the state of the art. More than 250 people participated, yielding results about the parallel programming systems and languages in use, as well as about common problems with these systems. Furthermore, a study was conducted in university classes on parallel programming. It resulted in a list of frequently made mistakes that were analyzed and used to create a programmers' checklist to avoid them in the future. For programmers' education, an online resource was setup to collect experiences and knowledge in the field of parallel programming - called the Parawiki. Another key step in this direction was the creation of the Thinking Parallel weblog, where more than 50.000 readers to date have read essays on the topic. For the third aim (powerful abstractions), it was decided to concentrate on one parallel programming system: OpenMP. Its ease of use and high level of abstraction were the most important reasons for this decision. Two different research directions were pursued. The first one resulted in a parallel library called AthenaMP. It contains so-called generic components, derived from design patterns for parallel programming. These include functionality to enhance the locks provided by OpenMP, to perform operations on large amounts of data (data-parallel programming), and to enable the implementation of irregular algorithms using task pools. AthenaMP itself serves a triple role: the components are well-documented and can be used directly in programs, it enables developers to study the source code and learn from it, and it is possible for compiler writers to use it as a testing ground for their OpenMP compilers. The second research direction was targeted at changing the OpenMP specification to make the system more powerful. The main contributions here were a proposal to enable thread-cancellation and a proposal to avoid busy waiting. Both were implemented in a research compiler, shown to be useful in example applications, and proposed to the OpenMP Language Committee.
Resumo:
Die vorliegende Arbeit behandelt Restartautomaten und Erweiterungen von Restartautomaten. Restartautomaten sind ein Werkzeug zum Erkennen formaler Sprachen. Sie sind motiviert durch die linguistische Methode der Analyse durch Reduktion und wurden 1995 von Jancar, Mráz, Plátek und Vogel eingeführt. Restartautomaten bestehen aus einer endlichen Kontrolle, einem Lese/Schreibfenster fester Größe und einem flexiblen Band. Anfänglich enthält dieses sowohl die Eingabe als auch Bandbegrenzungssymbole. Die Berechnung eines Restartautomaten läuft in so genannten Zyklen ab. Diese beginnen am linken Rand im Startzustand, in ihnen wird eine lokale Ersetzung auf dem Band durchgeführt und sie enden mit einem Neustart, bei dem das Lese/Schreibfenster wieder an den linken Rand bewegt wird und der Startzustand wieder eingenommen wird. Die vorliegende Arbeit beschäftigt sich hauptsächlich mit zwei Erweiterungen der Restartautomaten: CD-Systeme von Restartautomaten und nichtvergessende Restartautomaten. Nichtvergessende Restartautomaten können einen Zyklus in einem beliebigen Zustand beenden und CD-Systeme von Restartautomaten bestehen aus einer Menge von Restartautomaten, die zusammen die Eingabe verarbeiten. Dabei wird ihre Zusammenarbeit durch einen Operationsmodus, ähnlich wie bei CD-Grammatik Systemen, geregelt. Für beide Erweiterungen zeigt sich, dass die deterministischen Modelle mächtiger sind als deterministische Standardrestartautomaten. Es wird gezeigt, dass CD-Systeme von Restartautomaten in vielen Fällen durch nichtvergessende Restartautomaten simuliert werden können und andererseits lassen sich auch nichtvergessende Restartautomaten durch CD-Systeme von Restartautomaten simulieren. Des Weiteren werden Restartautomaten und nichtvergessende Restartautomaten untersucht, die nichtdeterministisch sind, aber keine Fehler machen. Es zeigt sich, dass diese Automaten durch deterministische (nichtvergessende) Restartautomaten simuliert werden können, wenn sie direkt nach der Ersetzung einen neuen Zyklus beginnen, oder ihr Fenster nach links und rechts bewegen können. Außerdem gilt, dass alle (nichtvergessenden) Restartautomaten, die zwar Fehler machen dürfen, diese aber nach endlich vielen Zyklen erkennen, durch (nichtvergessende) Restartautomaten simuliert werden können, die keine Fehler machen. Ein weiteres wichtiges Resultat besagt, dass die deterministischen monotonen nichtvergessenden Restartautomaten mit Hilfssymbolen, die direkt nach dem Ersetzungsschritt den Zyklus beenden, genau die deterministischen kontextfreien Sprachen erkennen, wohingegen die deterministischen monotonen nichtvergessenden Restartautomaten mit Hilfssymbolen ohne diese Einschränkung echt mehr, nämlich die links-rechts regulären Sprachen, erkennen. Damit werden zum ersten Mal Restartautomaten mit Hilfssymbolen, die direkt nach dem Ersetzungsschritt ihren Zyklus beenden, von Restartautomaten desselben Typs ohne diese Einschränkung getrennt. Besonders erwähnenswert ist hierbei, dass beide Automatentypen wohlbekannte Sprachklassen beschreiben.
Resumo:
The nonforgetting restarting automaton is a generalization of the restarting automaton that, when executing a restart operation, changes its internal state based on the current state and the actual contents of its read/write window instead of resetting it to the initial state. Another generalization of the restarting automaton is the cooperating distributed system (CD-system) of restarting automata. Here a finite system of restarting automata works together in analyzing a given sentence, where they interact based on a given mode of operation. As it turned out, CD-systems of restarting automata of some type X working in mode =1 are just as expressive as nonforgetting restarting automata of the same type X. Further, various types of determinism have been introduced for CD-systems of restarting automata called strict determinism, global determinism, and local determinism, and it has been shown that globally deterministic CD-systems working in mode =1 correspond to deterministic nonforgetting restarting automata. Here we derive some lower bound results for some types of nonforgetting restarting automata and for some types of CD-systems of restarting automata. In this way we establish separations between the corresponding language classes, thus providing detailed technical proofs for some of the separation results announced in the literature.
Resumo:
This paper contributes to the study of Freely Rewriting Restarting Automata (FRR-automata) and Parallel Communicating Grammar Systems (PCGS), which both are useful models in computational linguistics. For PCGSs we study two complexity measures called 'generation complexity' and 'distribution complexity', and we prove that a PCGS Pi, for which the generation complexity and the distribution complexity are both bounded by constants, can be transformed into a freely rewriting restarting automaton of a very restricted form. From this characterization it follows that the language L(Pi) generated by Pi is semi-linear, that its characteristic analysis is of polynomial size, and that this analysis can be computed in polynomial time.
Resumo:
Software Defined Radio (SDR) hardware platforms use parallel architectures. Current concepts of developing applications (such as WLAN) for these platforms are complex, because developers describe an application with hardware-specifics that are relevant to parallelism such as mapping and scheduling. To reduce this complexity, we have developed a new programming approach for SDR applications, called Virtual Radio Engine (VRE). VRE defines a language for describing applications, and a tool chain that consists of a compiler kernel and other tools (such as a code generator) to generate executables. The thesis presents this concept, as well as describes the language and the compiler kernel that have been developed by the author. The language is hardware-independent, i.e., developers describe tasks and dependencies between them. The compiler kernel performs automatic parallelization, i.e., it is capable of transforming a hardware-independent program into a hardware-specific program by solving hardware-specifics, in particular mapping, scheduling and synchronizations. Thus, VRE simplifies programming tasks as developers do not solve hardware-specifics manually.
Resumo:
A conceptual information system consists of a database together with conceptual hierarchies. The management system TOSCANA visualizes arbitrary combinations of conceptual hierarchies by nested line diagrams and allows an on-line interaction with a database to analyze data conceptually. The paper describes the conception of conceptual information systems and discusses the use of their visualization techniques for on-line analytical processing (OLAP).
Resumo:
Conceptual Information Systems provide a multi-dimensional conceptually structured view on data stored in relational databases. On restricting the expressiveness of the retrieval language, they allow the visualization of sets of realted queries in conceptual hierarchies, hence supporting the search of something one does not have a precise description, but only a vague idea of. Information Retrieval is considered as the process of finding specific objects (documents etc.) out of a large set of objects which fit to some description. In some data analysis and knowledge discovery applications, the dual task is of interest: The analyst needs to determine, for a subset of objects, a description for this subset. In this paper we discuss how Conceptual Information Systems can be extended to support also the second task.
Resumo:
Among many other knowledge representations formalisms, Ontologies and Formal Concept Analysis (FCA) aim at modeling ‘concepts’. We discuss how these two formalisms may complement another from an application point of view. In particular, we will see how FCA can be used to support Ontology Engineering, and how ontologies can be exploited in FCA applications. The interplay of FCA and ontologies is studied along the life cycle of an ontology: (i) FCA can support the building of the ontology as a learning technique. (ii) The established ontology can be analyzed and navigated by using techniques of FCA. (iii) Last but not least, the ontology may be used to improve an FCA application.
Resumo:
In this paper we study two orthogonal extensions of the classical data mining problem of mining association rules, and show how they naturally interact. The first is the extension from a propositional representation to datalog, and the second is the condensed representation of frequent itemsets by means of Formal Concept Analysis (FCA). We combine the notion of frequent datalog queries with iceberg concept lattices (also called closed itemsets) of FCA and introduce two kinds of iceberg query lattices as condensed representations of frequent datalog queries. We demonstrate that iceberg query lattices provide a natural way to visualize relational association rules in a non-redundant way.
Resumo:
About ten years ago, triadic contexts were presented by Lehmann and Wille as an extension of Formal Concept Analysis. However, they have rarely been used up to now, which may be due to the rather complex structure of the resulting diagrams. In this paper, we go one step back and discuss how traditional line diagrams of standard (dyadic) concept lattices can be used for exploring and navigating triadic data. Our approach is inspired by the slice & dice paradigm of On-Line-Analytical Processing (OLAP). We recall the basic ideas of OLAP, and show how they may be transferred to triadic contexts. For modeling the navigation patterns a user might follow, we use the formalisms of finite state machines. In order to present the benefits of our model, we show how it can be used for navigating the IT Baseline Protection Manual of the German Federal Office for Information Security.
Resumo:
We introduce a new mode of operation for CD-systems of restarting automata by providing explicit enable and disable conditions in the form of regular constraints. We show that, for each CD-system M of restarting automata and each mode m of operation considered by Messerschmidt and Otto, there exists a CD-system M' of restarting automata of the same type as M that, working in the new mode ed, accepts the language that M accepts in mode m. Further, we prove that in mode ed, a locally deterministic CD-system of restarting automata of type RR(W)(W) can be simulated by a locally deterministic CD-system of restarting automata of the more restricted type R(W)(W). This is the first time that a non-monotone type of R-automaton without auxiliary symbols is shown to be as expressive as the corresponding type of RR-automaton.
Resumo:
The 21st century has brought new challenges for forest management at a time when globalization in world trade is increasing and global climate change is becoming increasingly apparent. In addition to various goods and services like food, feed, timber or biofuels being provided to humans, forest ecosystems are a large store of terrestrial carbon and account for a major part of the carbon exchange between the atmosphere and the land surface. Depending on the stage of the ecosystems and/or management regimes, forests can be either sinks, or sources of carbon. At the global scale, rapid economic development and a growing world population have raised much concern over the use of natural resources, especially forest resources. The challenging question is how can the global demands for forest commodities be satisfied in an increasingly globalised economy, and where could they potentially be produced? For this purpose, wood demand estimates need to be integrated in a framework, which is able to adequately handle the competition for land between major land-use options such as residential land or agricultural land. This thesis is organised in accordance with the requirements to integrate the simulation of forest changes based on wood extraction in an existing framework for global land-use modelling called LandSHIFT. Accordingly, the following neuralgic points for research have been identified: (1) a review of existing global-scale economic forest sector models (2) simulation of global wood production under selected scenarios (3) simulation of global vegetation carbon yields and (4) the implementation of a land-use allocation procedure to simulate the impact of wood extraction on forest land-cover. Modelling the spatial dynamics of forests on the global scale requires two important inputs: (1) simulated long-term wood demand data to determine future roundwood harvests in each country and (2) the changes in the spatial distribution of woody biomass stocks to determine how much of the resource is available to satisfy the simulated wood demands. First, three global timber market models are reviewed and compared in order to select a suitable economic model to generate wood demand scenario data for the forest sector in LandSHIFT. The comparison indicates that the ‘Global Forest Products Model’ (GFPM) is most suitable for obtaining projections on future roundwood harvests for further study with the LandSHIFT forest sector. Accordingly, the GFPM is adapted and applied to simulate wood demands for the global forestry sector conditional on selected scenarios from the Millennium Ecosystem Assessment and the Global Environmental Outlook until 2050. Secondly, the Lund-Potsdam-Jena (LPJ) dynamic global vegetation model is utilized to simulate the change in potential vegetation carbon stocks for the forested locations in LandSHIFT. The LPJ data is used in collaboration with spatially explicit forest inventory data on aboveground biomass to allocate the demands for raw forest products and identify locations of deforestation. Using the previous results as an input, a methodology to simulate the spatial dynamics of forests based on wood extraction is developed within the LandSHIFT framework. The land-use allocation procedure specified in the module translates the country level demands for forest products into woody biomass requirements for forest areas, and allocates these on a five arc minute grid. In a first version, the model assumes only actual conditions through the entire study period and does not explicitly address forest age structure. Although the module is in a very preliminary stage of development, it already captures the effects of important drivers of land-use change like cropland and urban expansion. As a first plausibility test, the module performance is tested under three forest management scenarios. The module succeeds in responding to changing inputs in an expected and consistent manner. The entire methodology is applied in an exemplary scenario analysis for India. A couple of future research priorities need to be addressed, particularly the incorporation of plantation establishments; issue of age structure dynamics; as well as the implementation of a new technology change factor in the GFPM which can allow the specification of substituting raw wood products (especially fuelwood) by other non-wood products.
Resumo:
We study cooperating distributed systems (CD-systems) of restarting automata that are very restricted: they are deterministic, they cannot rewrite, but only delete symbols, they restart immediately after performing a delete operation, they are stateless, and they have a read/write window of size 1 only, that is, these are stateless deterministic R(1)-automata. We study the expressive power of these systems by relating the class of languages that they accept by mode =1 computations to other well-studied language classes, showing in particular that this class only contains semi-linear languages, and that it includes all rational trace languages. In addition, we investigate the closure and non-closure properties of this class of languages and some of its algorithmic properties.