934 resultados para Computer Science(all)
Resumo:
The development of correct programs is a core problem in computer science. Although formal verification methods for establishing correctness with mathematical rigor are available, programmers often find these difficult to put into practice. One hurdle is deriving the loop invariants and proving that the code maintains them. So called correct-by-construction methods aim to alleviate this issue by integrating verification into the programming workflow. Invariant-based programming is a practical correct-by-construction method in which the programmer first establishes the invariant structure, and then incrementally extends the program in steps of adding code and proving after each addition that the code is consistent with the invariants. In this way, the program is kept internally consistent throughout its development, and the construction of the correctness arguments (proofs) becomes an integral part of the programming workflow. A characteristic of the approach is that programs are described as invariant diagrams, a graphical notation similar to the state charts familiar to programmers. Invariant-based programming is a new method that has not been evaluated in large scale studies yet. The most important prerequisite for feasibility on a larger scale is a high degree of automation. The goal of the Socos project has been to build tools to assist the construction and verification of programs using the method. This thesis describes the implementation and evaluation of a prototype tool in the context of the Socos project. The tool supports the drawing of the diagrams, automatic derivation and discharging of verification conditions, and interactive proofs. It is used to develop programs that are correct by construction. The tool consists of a diagrammatic environment connected to a verification condition generator and an existing state-of-the-art theorem prover. Its core is a semantics for translating diagrams into verification conditions, which are sent to the underlying theorem prover. We describe a concrete method for 1) deriving sufficient conditions for total correctness of an invariant diagram; 2) sending the conditions to the theorem prover for simplification; and 3) reporting the results of the simplification to the programmer in a way that is consistent with the invariantbased programming workflow and that allows errors in the program specification to be efficiently detected. The tool uses an efficient automatic proof strategy to prove as many conditions as possible automatically and lets the remaining conditions be proved interactively. The tool is based on the verification system PVS and i uses the SMT (Satisfiability Modulo Theories) solver Yices as a catch-all decision procedure. Conditions that were not discharged automatically may be proved interactively using the PVS proof assistant. The programming workflow is very similar to the process by which a mathematical theory is developed inside a computer supported theorem prover environment such as PVS. The programmer reduces a large verification problem with the aid of the tool into a set of smaller problems (lemmas), and he can substantially improve the degree of proof automation by developing specialized background theories and proof strategies to support the specification and verification of a specific class of programs. We demonstrate this workflow by describing in detail the construction of a verified sorting algorithm. Tool-supported verification often has little to no presence in computer science (CS) curricula. Furthermore, program verification is frequently introduced as an advanced and purely theoretical topic that is not connected to the workflow taught in the early and practically oriented programming courses. Our hypothesis is that verification could be introduced early in the CS education, and that verification tools could be used in the classroom to support the teaching of formal methods. A prototype of Socos has been used in a course at Åbo Akademi University targeted at first and second year undergraduate students. We evaluate the use of Socos in the course as part of a case study carried out in 2007.
Resumo:
Tämä tutkielma kuuluu merkkijonoalgoritmiikan piiriin. Merkkijono S on merkkijonojen X[1..m] ja Y[1..n] yhteinen alijono, mikäli se voidaan muodostaa poistamalla X:stä 0..m ja Y:stä 0..n kappaletta merkkejä mielivaltaisista paikoista. Jos yksikään X:n ja Y:n yhteinen alijono ei ole S:ää pidempi, sanotaan, että S on X:n ja Y:n pisin yhteinen alijono (lyh. PYA). Tässä työssä keskitytään kahden merkkijonon PYAn ratkaisemiseen, mutta ongelma on yleistettävissä myös useammalle jonolle. PYA-ongelmalle on sovelluskohteita – paitsi tietojenkäsittelytieteen niin myös bioinformatiikan osa-alueilla. Tunnetuimpia niistä ovat tekstin ja kuvien tiivistäminen, tiedostojen versionhallinta, hahmontunnistus sekä DNA- ja proteiiniketjujen rakennetta vertaileva tutkimus. Ongelman ratkaisemisen tekee hankalaksi ratkaisualgoritmien riippuvuus syötejonojen useista eri parametreista. Näitä ovat syötejonojen pituuden lisäksi mm. syöttöaakkoston koko, syötteiden merkkijakauma, PYAn suhteellinen osuus lyhyemmän syötejonon pituudesta ja täsmäävien merkkiparien lukumäärä. Täten on vaikeaa kehittää algoritmia, joka toimisi tehokkaasti kaikille ongelman esiintymille. Tutkielman on määrä toimia yhtäältä käsikirjana, jossa esitellään ongelman peruskäsitteiden kuvauksen jälkeen jo aikaisemmin kehitettyjä tarkkoja PYAalgoritmeja. Niiden tarkastelu on ryhmitelty algoritmin toimintamallin mukaan joko rivi, korkeuskäyrä tai diagonaali kerrallaan sekä monisuuntaisesti prosessoiviin. Tarkkojen menetelmien lisäksi esitellään PYAn pituuden ylä- tai alarajan laskevia heuristisia menetelmiä, joiden laskemia tuloksia voidaan hyödyntää joko sellaisinaan tai ohjaamaan tarkan algoritmin suoritusta. Tämä osuus perustuu tutkimusryhmämme julkaisemiin artikkeleihin. Niissä käsitellään ensimmäistä kertaa heuristiikoilla tehostettuja tarkkoja menetelmiä. Toisaalta työ sisältää laajahkon empiirisen tutkimusosuuden, jonka tavoitteena on ollut tehostaa olemassa olevien tarkkojen algoritmien ajoaikaa ja muistinkäyttöä. Kyseiseen tavoitteeseen on pyritty ohjelmointiteknisesti esittelemällä algoritmien toimintamallia hyvin tukevia tietorakenteita ja rajoittamalla algoritmien suorittamaa tuloksetonta laskentaa parantamalla niiden kykyä havainnoida suorituksen aikana saavutettuja välituloksia ja hyödyntää niitä. Tutkielman johtopäätöksinä voidaan yleisesti todeta tarkkojen PYA-algoritmien heuristisen esiprosessoinnin lähes systemaattisesti pienentävän niiden suoritusaikaa ja erityisesti muistintarvetta. Lisäksi algoritmin käyttämällä tietorakenteella on ratkaiseva vaikutus laskennan tehokkuuteen: mitä paikallisempia haku- ja päivitysoperaatiot ovat, sitä tehokkaampaa algoritmin suorittama laskenta on.
Resumo:
The ongoing global financial crisis has demonstrated the importance of a systemwide, or macroprudential, approach to safeguarding financial stability. An essential part of macroprudential oversight concerns the tasks of early identification and assessment of risks and vulnerabilities that eventually may lead to a systemic financial crisis. Thriving tools are crucial as they allow early policy actions to decrease or prevent further build-up of risks or to otherwise enhance the shock absorption capacity of the financial system. In the literature, three types of systemic risk can be identified: i ) build-up of widespread imbalances, ii ) exogenous aggregate shocks, and iii ) contagion. Accordingly, the systemic risks are matched by three categories of analytical methods for decision support: i ) early-warning, ii ) macro stress-testing, and iii ) contagion models. Stimulated by the prolonged global financial crisis, today's toolbox of analytical methods includes a wide range of innovative solutions to the two tasks of risk identification and risk assessment. Yet, the literature lacks a focus on the task of risk communication. This thesis discusses macroprudential oversight from the viewpoint of all three tasks: Within analytical tools for risk identification and risk assessment, the focus concerns a tight integration of means for risk communication. Data and dimension reduction methods, and their combinations, hold promise for representing multivariate data structures in easily understandable formats. The overall task of this thesis is to represent high-dimensional data concerning financial entities on lowdimensional displays. The low-dimensional representations have two subtasks: i ) to function as a display for individual data concerning entities and their time series, and ii ) to use the display as a basis to which additional information can be linked. The final nuance of the task is, however, set by the needs of the domain, data and methods. The following ve questions comprise subsequent steps addressed in the process of this thesis: 1. What are the needs for macroprudential oversight? 2. What form do macroprudential data take? 3. Which data and dimension reduction methods hold most promise for the task? 4. How should the methods be extended and enhanced for the task? 5. How should the methods and their extensions be applied to the task? Based upon the Self-Organizing Map (SOM), this thesis not only creates the Self-Organizing Financial Stability Map (SOFSM), but also lays out a general framework for mapping the state of financial stability. This thesis also introduces three extensions to the standard SOM for enhancing the visualization and extraction of information: i ) fuzzifications, ii ) transition probabilities, and iii ) network analysis. Thus, the SOFSM functions as a display for risk identification, on top of which risk assessments can be illustrated. In addition, this thesis puts forward the Self-Organizing Time Map (SOTM) to provide means for visual dynamic clustering, which in the context of macroprudential oversight concerns the identification of cross-sectional changes in risks and vulnerabilities over time. Rather than automated analysis, the aim of visual means for identifying and assessing risks is to support disciplined and structured judgmental analysis based upon policymakers' experience and domain intelligence, as well as external risk communication.
Resumo:
Video transcoding refers to the process of converting a digital video from one format into another format. It is a compute-intensive operation. Therefore, transcoding of a large number of simultaneous video streams requires a large amount of computing resources. Moreover, to handle di erent load conditions in a cost-e cient manner, the video transcoding service should be dynamically scalable. Infrastructure as a Service Clouds currently offer computing resources, such as virtual machines, under the pay-per-use business model. Thus the IaaS Clouds can be leveraged to provide a coste cient, dynamically scalable video transcoding service. To use computing resources e ciently in a cloud computing environment, cost-e cient virtual machine provisioning is required to avoid overutilization and under-utilization of virtual machines. This thesis presents proactive virtual machine resource allocation and de-allocation algorithms for video transcoding in cloud computing. Since users' requests for videos may change at di erent times, a check is required to see if the current computing resources are adequate for the video requests. Therefore, the work on admission control is also provided. In addition to admission control, temporal resolution reduction is used to avoid jitters in a video. Furthermore, in a cloud computing environment such as Amazon EC2, the computing resources are more expensive as compared with the storage resources. Therefore, to avoid repetition of transcoding operations, a transcoded video needs to be stored for a certain time. To store all videos for the same amount of time is also not cost-e cient because popular transcoded videos have high access rate while unpopular transcoded videos are rarely accessed. This thesis provides a cost-e cient computation and storage trade-o strategy, which stores videos in the video repository as long as it is cost-e cient to store them. This thesis also proposes video segmentation strategies for bit rate reduction and spatial resolution reduction video transcoding. The evaluation of proposed strategies is performed using a message passing interface based video transcoder, which uses a coarse-grain parallel processing approach where video is segmented at group of pictures level.
Resumo:
End-user development is a very common but often largely overlooked phenomenon in information systems research and practice. End-user development means that regular people, the end-users of software, and not professional developers are doing software development. A large number of people are directly or indirectly impacted by the results of these non-professional development activities. The numbers of users performing end-user development activities are difficult to ascertain precisely. But it is very large, and still growing. Computer adoption is growing towards 100% and many new types of computational devices are continually introduced. In addition, other devices not previously programmable are becoming so. This means that, at this very moment, hundreds of millions of people are likely struggling with development problems. Furthermore, software itself is continually being adapted for more flexibility, enabling users to change the behaviour of their software themselves. New software and services are helping to transform users from consumers to producers. Much of this is now found on-line. The problem for the end-user developer is that little of this development is supported by anyone. Often organisations do not notice end-user development and consequently neither provide support for it, nor are equipped to be able to do so. Many end-user developers do not belong to any organisation at all. Also, the end-user development process may be aggravating the problem. End-users are usually not really committed to the development process, which tends to be more iterative and ad hoc. This means support becomes a distant third behind getting the job done and figuring out the development issues to get the job done. Sometimes the software itself may exacerbate the issue by simplifying the development process, deemphasising the difficulty of the task being undertaken. On-line support could be the lifeline the end-user developer needs. Going online one can find all the knowledge one could ever need. However, that does still not help the end-user apply this information or knowledge in practice. A virtual community, through its ability to adopt the end-user’s specific context, could surmount this final obstacle. This thesis explores the concept of end-user development and how it could be supported through on-line sources, in particular virtual communities, which it is argued here, seem to fit the end-user developer’s needs very well. The experiences of real end-user developers and prior literature were used in this process. Emphasis has been on those end-user developers, e.g. small business owners, who may have literally nowhere to turn to for support. Adopting the viewpoint of the end-user developer, the thesis examines the question of how an end-user could use a virtual community effectively, improving the results of the support process. Assuming the common situation where the demand for support outstrips the supply.
Resumo:
A web service is a software system that provides a machine-processable interface to the other machines over the network using different Internet protocols. They are being increasingly used in the industry in order to automate different tasks and offer services to a wider audience. The REST architectural style aims at producing scalable and extensible web services using technologies that play well with the existing tools and infrastructure of the web. It provides a uniform set of operation that can be used to invoke a CRUD interface (create, retrieve, update and delete) of a web service. The stateless behavior of the service interface requires that every request to a resource is independent of the previous ones facilitating scalability. Automated systems, e.g., hotel reservation systems, provide advanced scenarios for stateful services that require a certain sequence of requests that must be followed in order to fulfill the service goals. Designing and developing such services for advanced scenarios with REST constraints require rigorous approaches that are capable of creating web services that can be trusted for their behavior. Systems that can be trusted for their behavior can be termed as dependable systems. This thesis presents an integrated design, analysis and validation approach that facilitates the service developer to create dependable and stateful REST web services. The main contribution of this thesis is that we provide a novel model-driven methodology to design behavioral REST web service interfaces and their compositions. The behavioral interfaces provide information on what methods can be invoked on a service and the pre- and post-conditions of these methods. The methodology uses Unified Modeling Language (UML), as the modeling language, which has a wide user base and has mature tools that are continuously evolving. We have used UML class diagram and UML state machine diagram with additional design constraints to provide resource and behavioral models, respectively, for designing REST web service interfaces. These service design models serve as a specification document and the information presented in them have manifold applications. The service design models also contain information about the time and domain requirements of the service that can help in requirement traceability which is an important part of our approach. Requirement traceability helps in capturing faults in the design models and other elements of software development environment by tracing back and forth the unfulfilled requirements of the service. The information about service actors is also included in the design models which is required for authenticating the service requests by authorized actors since not all types of users have access to all the resources. In addition, following our design approach, the service developer can ensure that the designed web service interfaces will be REST compliant. The second contribution of this thesis is consistency analysis of the behavioral REST interfaces. To overcome the inconsistency problem and design errors in our service models, we have used semantic technologies. The REST interfaces are represented in web ontology language, OWL2, that can be part of the semantic web. These interfaces are used with OWL 2 reasoners to check unsatisfiable concepts which result in implementations that fail. This work is fully automated thanks to the implemented translation tool and the existing OWL 2 reasoners. The third contribution of this thesis is the verification and validation of REST web services. We have used model checking techniques with UPPAAL model checker for this purpose. The timed automata of UML based service design models are generated with our transformation tool that are verified for their basic characteristics like deadlock freedom, liveness, reachability and safety. The implementation of a web service is tested using a black-box testing approach. Test cases are generated from the UPPAAL timed automata and using the online testing tool, UPPAAL TRON, the service implementation is validated at runtime against its specifications. Requirement traceability is also addressed in our validation approach with which we can see what service goals are met and trace back the unfulfilled service goals to detect the faults in the design models. A final contribution of the thesis is an implementation of behavioral REST interfaces and service monitors from the service design models. The partial code generation tool creates code skeletons of REST web services with method pre and post-conditions. The preconditions of methods constrain the user to invoke the stateful REST service under the right conditions and the post condition constraint the service developer to implement the right functionality. The details of the methods can be manually inserted by the developer as required. We do not target complete automation because we focus only on the interface aspects of the web service. The applicability of the approach is demonstrated with a pedagogical example of a hotel room booking service and a relatively complex worked example of holiday booking service taken from the industrial context. The former example presents a simple explanation of the approach and the later worked example shows how stateful and timed web services offering complex scenarios and involving other web services can be constructed using our approach.
Resumo:
The advancement of science and technology makes it clear that no single perspective is any longer sufficient to describe the true nature of any phenomenon. That is why the interdisciplinary research is gaining more attention overtime. An excellent example of this type of research is natural computing which stands on the borderline between biology and computer science. The contribution of research done in natural computing is twofold: on one hand, it sheds light into how nature works and how it processes information and, on the other hand, it provides some guidelines on how to design bio-inspired technologies. The first direction in this thesis focuses on a nature-inspired process called gene assembly in ciliates. The second one studies reaction systems, as a modeling framework with its rationale built upon the biochemical interactions happening within a cell. The process of gene assembly in ciliates has attracted a lot of attention as a research topic in the past 15 years. Two main modelling frameworks have been initially proposed in the end of 1990s to capture ciliates’ gene assembly process, namely the intermolecular model and the intramolecular model. They were followed by other model proposals such as templatebased assembly and DNA rearrangement pathways recombination models. In this thesis we are interested in a variation of the intramolecular model called simple gene assembly model, which focuses on the simplest possible folds in the assembly process. We propose a new framework called directed overlap-inclusion (DOI) graphs to overcome the limitations that previously introduced models faced in capturing all the combinatorial details of the simple gene assembly process. We investigate a number of combinatorial properties of these graphs, including a necessary property in terms of forbidden induced subgraphs. We also introduce DOI graph-based rewriting rules that capture all the operations of the simple gene assembly model and prove that they are equivalent to the string-based formalization of the model. Reaction systems (RS) is another nature-inspired modeling framework that is studied in this thesis. Reaction systems’ rationale is based upon two main regulation mechanisms, facilitation and inhibition, which control the interactions between biochemical reactions. Reaction systems is a complementary modeling framework to traditional quantitative frameworks, focusing on explicit cause-effect relationships between reactions. The explicit formulation of facilitation and inhibition mechanisms behind reactions, as well as the focus on interactions between reactions (rather than dynamics of concentrations) makes their applicability potentially wide and useful beyond biological case studies. In this thesis, we construct a reaction system model corresponding to the heat shock response mechanism based on a novel concept of dominance graph that captures the competition on resources in the ODE model. We also introduce for RS various concepts inspired by biology, e.g., mass conservation, steady state, periodicity, etc., to do model checking of the reaction systems based models. We prove that the complexity of the decision problems related to these properties varies from P to NP- and coNP-complete to PSPACE-complete. We further focus on the mass conservation relation in an RS and introduce the conservation dependency graph to capture the relation between the species and also propose an algorithm to list the conserved sets of a given reaction system.
Resumo:
Subshifts are sets of configurations over an infinite grid defined by a set of forbidden patterns. In this thesis, we study two-dimensional subshifts offinite type (2D SFTs), where the underlying grid is Z2 and the set of for-bidden patterns is finite. We are mainly interested in the interplay between the computational power of 2D SFTs and their geometry, examined through the concept of expansive subdynamics. 2D SFTs with expansive directions form an interesting and natural class of subshifts that lie between dimensions 1 and 2. An SFT that has only one non-expansive direction is called extremely expansive. We prove that in many aspects, extremely expansive 2D SFTs display the totality of behaviours of general 2D SFTs. For example, we construct an aperiodic extremely expansive 2D SFT and we prove that the emptiness problem is undecidable even when restricted to the class of extremely expansive 2D SFTs. We also prove that every Medvedev class contains an extremely expansive 2D SFT and we provide a characterization of the sets of directions that can be the set of non-expansive directions of a 2D SFT. Finally, we prove that for every computable sequence of 2D SFTs with an expansive direction, there exists a universal object that simulates all of the elements of the sequence. We use the so called hierarchical, self-simulating or fixed-point method for constructing 2D SFTs which has been previously used by Ga´cs, Durand, Romashchenko and Shen.
Resumo:
This thesis will introduce a new strongly typed programming language utilizing Self types, named Win--*Foy, along with a suitable user interface designed specifically to highlight language features. The need for such a programming language is based on deficiencies found in programming languages that support both Self types and subtyping. Subtyping is a concept that is taken for granted by most software engineers programming in object-oriented languages. Subtyping supports subsumption but it does not support the inheritance of binary methods. Binary methods contain an argument of type Self, the same type as the object itself, in a contravariant position, i.e. as a parameter. There are several arguments in favour of introducing Self types into a programming language (11. This rationale led to the development of a relation that has become known as matching [4, 5). The matching relation does not support subsumption, however, it does support the inheritance of binary methods. Two forms of matching have been proposed (lJ. Specifically, these relations are known as higher-order matching and I-bound matching. Previous research on these relations indicates that the higher-order matching relation is both reflexive and transitive whereas the f-bound matching is reflexive but not transitive (7]. The higher-order matching relation provides significant flexibility regarding inheritance of methods that utilize or return values of the same type. This flexibility, in certain situations, can restrict the programmer from defining specific classes and methods which are based on constant values [21J. For this reason, the type This is used as a second reference to the type of the object that cannot, contrary to Self, be specialized in subclasses. F-bound matching allows a programmer to define a function that will work for all types of A', a subtype of an upper bound function of type A, with the result type being dependent on A'. The use of parametric polymorphism in f-bound matching provides a connection to subtyping in object-oriented languages. This thesis will contain two main sections. Firstly, significant details concerning deficiencies of the subtype relation and the need to introduce higher-order and f-bound matching relations into programming languages will be explored. Secondly, a new programming language named Win--*Foy Functional Object-Oriented Programming Language has been created, along with a suitable user interface, in order to facilitate experimentation by programmers regarding the matching relation. The construction of the programming language and the user interface will be explained in detail.
Resumo:
Bioinformatics applies computers to problems in molecular biology. Previous research has not addressed edit metric decoders. Decoders for quaternary edit metric codes are finding use in bioinformatics problems with applications to DNA. By using side effect machines we hope to be able to provide efficient decoding algorithms for this open problem. Two ideas for decoding algorithms are presented and examined. Both decoders use Side Effect Machines(SEMs) which are generalizations of finite state automata. Single Classifier Machines(SCMs) use a single side effect machine to classify all words within a code. Locking Side Effect Machines(LSEMs) use multiple side effect machines to create a tree structure of subclassification. The goal is to examine these techniques and provide new decoders for existing codes. Presented are ideas for best practices for the creation of these two types of new edit metric decoders.
Resumo:
The hyper-star interconnection network was proposed in 2002 to overcome the drawbacks of the hypercube and its variations concerning the network cost, which is defined by the product of the degree and the diameter. Some properties of the graph such as connectivity, symmetry properties, embedding properties have been studied by other researchers, routing and broadcasting algorithms have also been designed. This thesis studies the hyper-star graph from both the topological and algorithmic point of view. For the topological properties, we try to establish relationships between hyper-star graphs with other known graphs. We also give a formal equation for the surface area of the graph. Another topological property we are interested in is the Hamiltonicity problem of this graph. For the algorithms, we design an all-port broadcasting algorithm and a single-port neighbourhood broadcasting algorithm for the regular form of the hyper-star graphs. These algorithms are both optimal time-wise. Furthermore, we prove that the folded hyper-star, a variation of the hyper-star, to be maixmally fault-tolerant.
Resumo:
Relation algebras and categories of relations in particular have proven to be extremely useful as a fundamental tool in mathematics and computer science. Since relation algebras are Boolean algebras with some well-behaved operations, every such algebra provides an atom structure, i.e., a relational structure on its set of atoms. In the case of complete and atomic structure (e.g. finite algebras), the original algebra can be recovered from its atom structure by using the complex algebra construction. This gives a representation of relation algebras as the complex algebra of a certain relational structure. This property is of particular interest because storing the atom structure requires less space than the entire algebra. In this thesis I want to introduce and implement three structures representing atom structures of integral heterogeneous relation algebras, i.e., categorical versions of relation algebras. The first structure will simply embed a homogeneous atom structure of a relation algebra into the heterogeneous context. The second structure is obtained by splitting all symmetric idempotent relations. This new algebra is in almost all cases an heterogeneous structure having more objects than the original one. Finally, I will define two different union operations to combine two algebras into a single one.
Resumo:
This research focuses on generating aesthetically pleasing images in virtual environments using the particle swarm optimization (PSO) algorithm. The PSO is a stochastic population based search algorithm that is inspired by the flocking behavior of birds. In this research, we implement swarms of cameras flying through a virtual world in search of an image that is aesthetically pleasing. Virtual world exploration using particle swarm optimization is considered to be a new research area and is of interest to both the scientific and artistic communities. Aesthetic rules such as rule of thirds, subject matter, colour similarity and horizon line are all analyzed together as a multi-objective problem to analyze and solve with rendered images. A new multi-objective PSO algorithm, the sum of ranks PSO, is introduced. It is empirically compared to other single-objective and multi-objective swarm algorithms. An advantage of the sum of ranks PSO is that it is useful for solving high-dimensional problems within the context of this research. Throughout many experiments, we show that our approach is capable of automatically producing images satisfying a variety of supplied aesthetic criteria.
Resumo:
Complex networks can arise naturally and spontaneously from all things that act as a part of a larger system. From the patterns of socialization between people to the way biological systems organize themselves, complex networks are ubiquitous, but are currently poorly understood. A number of algorithms, designed by humans, have been proposed to describe the organizational behaviour of real-world networks. Consequently, breakthroughs in genetics, medicine, epidemiology, neuroscience, telecommunications and the social sciences have recently resulted. The algorithms, called graph models, represent significant human effort. Deriving accurate graph models is non-trivial, time-intensive, challenging and may only yield useful results for very specific phenomena. An automated approach can greatly reduce the human effort required and if effective, provide a valuable tool for understanding the large decentralized systems of interrelated things around us. To the best of the author's knowledge this thesis proposes the first method for the automatic inference of graph models for complex networks with varied properties, with and without community structure. Furthermore, to the best of the author's knowledge it is the first application of genetic programming for the automatic inference of graph models. The system and methodology was tested against benchmark data, and was shown to be capable of reproducing close approximations to well-known algorithms designed by humans. Furthermore, when used to infer a model for real biological data the resulting model was more representative than models currently used in the literature.
Resumo:
Complex networks have recently attracted a significant amount of research attention due to their ability to model real world phenomena. One important problem often encountered is to limit diffusive processes spread over the network, for example mitigating pandemic disease or computer virus spread. A number of problem formulations have been proposed that aim to solve such problems based on desired network characteristics, such as maintaining the largest network component after node removal. The recently formulated critical node detection problem aims to remove a small subset of vertices from the network such that the residual network has minimum pairwise connectivity. Unfortunately, the problem is NP-hard and also the number of constraints is cubic in number of vertices, making very large scale problems impossible to solve with traditional mathematical programming techniques. Even many approximation algorithm strategies such as dynamic programming, evolutionary algorithms, etc. all are unusable for networks that contain thousands to millions of vertices. A computationally efficient and simple approach is required in such circumstances, but none currently exist. In this thesis, such an algorithm is proposed. The methodology is based on a depth-first search traversal of the network, and a specially designed ranking function that considers information local to each vertex. Due to the variety of network structures, a number of characteristics must be taken into consideration and combined into a single rank that measures the utility of removing each vertex. Since removing a vertex in sequential fashion impacts the network structure, an efficient post-processing algorithm is also proposed to quickly re-rank vertices. Experiments on a range of common complex network models with varying number of vertices are considered, in addition to real world networks. The proposed algorithm, DFSH, is shown to be highly competitive and often outperforms existing strategies such as Google PageRank for minimizing pairwise connectivity.