921 resultados para Processing wikipedia data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

About ten years ago, triadic contexts were presented by Lehmann and Wille as an extension of Formal Concept Analysis. However, they have rarely been used up to now, which may be due to the rather complex structure of the resulting diagrams. In this paper, we go one step back and discuss how traditional line diagrams of standard (dyadic) concept lattices can be used for exploring and navigating triadic data. Our approach is inspired by the slice & dice paradigm of On-Line-Analytical Processing (OLAP). We recall the basic ideas of OLAP, and show how they may be transferred to triadic contexts. For modeling the navigation patterns a user might follow, we use the formalisms of finite state machines. In order to present the benefits of our model, we show how it can be used for navigating the IT Baseline Protection Manual of the German Federal Office for Information Security.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Die zunehmende Vernetzung der Informations- und Kommunikationssysteme führt zu einer weiteren Erhöhung der Komplexität und damit auch zu einer weiteren Zunahme von Sicherheitslücken. Klassische Schutzmechanismen wie Firewall-Systeme und Anti-Malware-Lösungen bieten schon lange keinen Schutz mehr vor Eindringversuchen in IT-Infrastrukturen. Als ein sehr wirkungsvolles Instrument zum Schutz gegenüber Cyber-Attacken haben sich hierbei die Intrusion Detection Systeme (IDS) etabliert. Solche Systeme sammeln und analysieren Informationen von Netzwerkkomponenten und Rechnern, um ungewöhnliches Verhalten und Sicherheitsverletzungen automatisiert festzustellen. Während signatur-basierte Ansätze nur bereits bekannte Angriffsmuster detektieren können, sind anomalie-basierte IDS auch in der Lage, neue bisher unbekannte Angriffe (Zero-Day-Attacks) frühzeitig zu erkennen. Das Kernproblem von Intrusion Detection Systeme besteht jedoch in der optimalen Verarbeitung der gewaltigen Netzdaten und der Entwicklung eines in Echtzeit arbeitenden adaptiven Erkennungsmodells. Um diese Herausforderungen lösen zu können, stellt diese Dissertation ein Framework bereit, das aus zwei Hauptteilen besteht. Der erste Teil, OptiFilter genannt, verwendet ein dynamisches "Queuing Concept", um die zahlreich anfallenden Netzdaten weiter zu verarbeiten, baut fortlaufend Netzverbindungen auf, und exportiert strukturierte Input-Daten für das IDS. Den zweiten Teil stellt ein adaptiver Klassifikator dar, der ein Klassifikator-Modell basierend auf "Enhanced Growing Hierarchical Self Organizing Map" (EGHSOM), ein Modell für Netzwerk Normalzustand (NNB) und ein "Update Model" umfasst. In dem OptiFilter werden Tcpdump und SNMP traps benutzt, um die Netzwerkpakete und Hostereignisse fortlaufend zu aggregieren. Diese aggregierten Netzwerkpackete und Hostereignisse werden weiter analysiert und in Verbindungsvektoren umgewandelt. Zur Verbesserung der Erkennungsrate des adaptiven Klassifikators wird das künstliche neuronale Netz GHSOM intensiv untersucht und wesentlich weiterentwickelt. In dieser Dissertation werden unterschiedliche Ansätze vorgeschlagen und diskutiert. So wird eine classification-confidence margin threshold definiert, um die unbekannten bösartigen Verbindungen aufzudecken, die Stabilität der Wachstumstopologie durch neuartige Ansätze für die Initialisierung der Gewichtvektoren und durch die Stärkung der Winner Neuronen erhöht, und ein selbst-adaptives Verfahren eingeführt, um das Modell ständig aktualisieren zu können. Darüber hinaus besteht die Hauptaufgabe des NNB-Modells in der weiteren Untersuchung der erkannten unbekannten Verbindungen von der EGHSOM und der Überprüfung, ob sie normal sind. Jedoch, ändern sich die Netzverkehrsdaten wegen des Concept drif Phänomens ständig, was in Echtzeit zur Erzeugung nicht stationärer Netzdaten führt. Dieses Phänomen wird von dem Update-Modell besser kontrolliert. Das EGHSOM-Modell kann die neuen Anomalien effektiv erkennen und das NNB-Model passt die Änderungen in Netzdaten optimal an. Bei den experimentellen Untersuchungen hat das Framework erfolgversprechende Ergebnisse gezeigt. Im ersten Experiment wurde das Framework in Offline-Betriebsmodus evaluiert. Der OptiFilter wurde mit offline-, synthetischen- und realistischen Daten ausgewertet. Der adaptive Klassifikator wurde mit dem 10-Fold Cross Validation Verfahren evaluiert, um dessen Genauigkeit abzuschätzen. Im zweiten Experiment wurde das Framework auf einer 1 bis 10 GB Netzwerkstrecke installiert und im Online-Betriebsmodus in Echtzeit ausgewertet. Der OptiFilter hat erfolgreich die gewaltige Menge von Netzdaten in die strukturierten Verbindungsvektoren umgewandelt und der adaptive Klassifikator hat sie präzise klassifiziert. Die Vergleichsstudie zwischen dem entwickelten Framework und anderen bekannten IDS-Ansätzen zeigt, dass der vorgeschlagene IDSFramework alle anderen Ansätze übertrifft. Dies lässt sich auf folgende Kernpunkte zurückführen: Bearbeitung der gesammelten Netzdaten, Erreichung der besten Performanz (wie die Gesamtgenauigkeit), Detektieren unbekannter Verbindungen und Entwicklung des in Echtzeit arbeitenden Erkennungsmodells von Eindringversuchen.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Information sheet for mature students participating in the Emotional processing study. Please read before you attend the data collection session you have scheduled. Many thanks.

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Slides describing streaming data, data stream processing systems and stream reasoning Also we have some description of CSPARQL

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Title: Data-Driven Text Generation using Neural Networks Speaker: Pavlos Vougiouklis, University of Southampton Abstract: Recent work on neural networks shows their great potential at tackling a wide variety of Natural Language Processing (NLP) tasks. This talk will focus on the Natural Language Generation (NLG) problem and, more specifically, on the extend to which neural network language models could be employed for context-sensitive and data-driven text generation. In addition, a neural network architecture for response generation in social media along with the training methods that enable it to capture contextual information and effectively participate in public conversations will be discussed. Speaker Bio: Pavlos Vougiouklis obtained his 5-year Diploma in Electrical and Computer Engineering from the Aristotle University of Thessaloniki in 2013. He was awarded an MSc degree in Software Engineering from the University of Southampton in 2014. In 2015, he joined the Web and Internet Science (WAIS) research group of the University of Southampton and he is currently working towards the acquisition of his PhD degree in the field of Neural Network Approaches for Natural Language Processing. Title: Provenance is Complicated and Boring — Is there a solution? Speaker: Darren Richardson, University of Southampton Abstract: Paper trails, auditing, and accountability — arguably not the sexiest terms in computer science. But then you discover that you've possibly been eating horse-meat, and the importance of provenance becomes almost palpable. Having accepted that we should be creating provenance-enabled systems, the challenge of then communicating that provenance to casual users is not trivial: users should not have to have a detailed working knowledge of your system, and they certainly shouldn't be expected to understand the data model. So how, then, do you give users an insight into the provenance, without having to build a bespoke system for each and every different provenance installation? Speaker Bio: Darren is a final year Computer Science PhD student. He completed his undergraduate degree in Electronic Engineering at Southampton in 2012.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An emerging consensus in cognitive science views the biological brain as a hierarchically-organized predictive processing system. This is a system in which higher-order regions are continuously attempting to predict the activity of lower-order regions at a variety of (increasingly abstract) spatial and temporal scales. The brain is thus revealed as a hierarchical prediction machine that is constantly engaged in the effort to predict the flow of information originating from the sensory surfaces. Such a view seems to afford a great deal of explanatory leverage when it comes to a broad swathe of seemingly disparate psychological phenomena (e.g., learning, memory, perception, action, emotion, planning, reason, imagination, and conscious experience). In the most positive case, the predictive processing story seems to provide our first glimpse at what a unified (computationally-tractable and neurobiological plausible) account of human psychology might look like. This obviously marks out one reason why such models should be the focus of current empirical and theoretical attention. Another reason, however, is rooted in the potential of such models to advance the current state-of-the-art in machine intelligence and machine learning. Interestingly, the vision of the brain as a hierarchical prediction machine is one that establishes contact with work that goes under the heading of 'deep learning'. Deep learning systems thus often attempt to make use of predictive processing schemes and (increasingly abstract) generative models as a means of supporting the analysis of large data sets. But are such computational systems sufficient (by themselves) to provide a route to general human-level analytic capabilities? I will argue that they are not and that closer attention to a broader range of forces and factors (many of which are not confined to the neural realm) may be required to understand what it is that gives human cognition its distinctive (and largely unique) flavour. The vision that emerges is one of 'homomimetic deep learning systems', systems that situate a hierarchically-organized predictive processing core within a larger nexus of developmental, behavioural, symbolic, technological and social influences. Relative to that vision, I suggest that we should see the Web as a form of 'cognitive ecology', one that is as much involved with the transformation of machine intelligence as it is with the progressive reshaping of our own cognitive capabilities.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent interest in the validation of general circulation models (GCMs) has been devoted to objective methods. A small number of authors have used the direct synoptic identification of phenomena together with a statistical analysis to perform the objective comparison between various datasets. This paper describes a general method for performing the synoptic identification of phenomena that can be used for an objective analysis of atmospheric, or oceanographic, datasets obtained from numerical models and remote sensing. Methods usually associated with image processing have been used to segment the scene and to identify suitable feature points to represent the phenomena of interest. This is performed for each time level. A technique from dynamic scene analysis is then used to link the feature points to form trajectories. The method is fully automatic and should be applicable to a wide range of geophysical fields. An example will be shown of results obtained from this method using data obtained from a run of the Universities Global Atmospheric Modelling Project GCM.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Flood modelling of urban areas is still at an early stage, partly because until recently topographic data of sufficiently high resolution and accuracy have been lacking in urban areas. However, Digital Surface Models (DSMs) generated from airborne scanning laser altimetry (LiDAR) having sub-metre spatial resolution have now become available, and these are able to represent the complexities of urban topography. The paper describes the development of a LiDAR post-processor for urban flood modelling based on the fusion of LiDAR and digital map data. The map data are used in conjunction with LiDAR data to identify different object types in urban areas, though pattern recognition techniques are also employed. Post-processing produces a Digital Terrain Model (DTM) for use as model bathymetry, and also a friction parameter map for use in estimating spatially-distributed friction coefficients. In vegetated areas, friction is estimated from LiDAR-derived vegetation height, and (unlike most vegetation removal software) the method copes with short vegetation less than ~1m high, which may occupy a substantial fraction of even an urban floodplain. The DTM and friction parameter map may also be used to help to generate an unstructured mesh of a vegetated urban floodplain for use by a 2D finite element model. The mesh is decomposed to reflect floodplain features having different frictional properties to their surroundings, including urban features such as buildings and roads as well as taller vegetation features such as trees and hedges. This allows a more accurate estimation of local friction. The method produces a substantial node density due to the small dimensions of many urban features.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The long-term stability, high accuracy, all-weather capability, high vertical resolution, and global coverage of Global Navigation Satellite System (GNSS) radio occultation (RO) suggests it as a promising tool for global monitoring of atmospheric temperature change. With the aim to investigate and quantify how well a GNSS RO observing system is able to detect climate trends, we are currently performing an (climate) observing system simulation experiment over the 25-year period 2001 to 2025, which involves quasi-realistic modeling of the neutral atmosphere and the ionosphere. We carried out two climate simulations with the general circulation model MAECHAM5 (Middle Atmosphere European Centre/Hamburg Model Version 5) of the MPI-M Hamburg, covering the period 2001–2025: One control run with natural variability only and one run also including anthropogenic forcings due to greenhouse gases, sulfate aerosols, and tropospheric ozone. On the basis of this, we perform quasi-realistic simulations of RO observables for a small GNSS receiver constellation (six satellites), state-of-the-art data processing for atmospheric profiles retrieval, and a statistical analysis of temperature trends in both the “observed” climatology and the “true” climatology. Here we describe the setup of the experiment and results from a test bed study conducted to obtain a basic set of realistic estimates of observational errors (instrument- and retrieval processing-related errors) and sampling errors (due to spatial-temporal undersampling). The test bed results, obtained for a typical summer season and compared to the climatic 2001–2025 trends from the MAECHAM5 simulation including anthropogenic forcing, were found encouraging for performing the full 25-year experiment. They indicated that observational and sampling errors (both contributing about 0.2 K) are consistent with recent estimates of these errors from real RO data and that they should be sufficiently small for monitoring expected temperature trends in the global atmosphere over the next 10 to 20 years in most regions of the upper troposphere and lower stratosphere (UTLS). Inspection of the MAECHAM5 trends in different RO-accessible atmospheric parameters (microwave refractivity and pressure/geopotential height in addition to temperature) indicates complementary climate change sensitivity in different regions of the UTLS so that optimized climate monitoring shall combine information from all climatic key variables retrievable from GNSS RO data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We construct a mapping from complex recursive linguistic data structures to spherical wave functions using Smolensky's filler/role bindings and tensor product representations. Syntactic language processing is then described by the transient evolution of these spherical patterns whose amplitudes are governed by nonlinear order parameter equations. Implications of the model in terms of brain wave dynamics are indicated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

1. There is concern over the possibility of unwanted environmental change following transgene movement from genetically modified (GM) rapeseed Brassica napus to its wild and weedy relatives. 2. The aim of this research was to develop a remote sensing-assisted methodology to help quantify gene flow from crops to their wild relatives over wide areas. Emphasis was placed on locating sites of sympatry, where the frequency of gene flow is likely to be highest, and on measuring the size of rapeseed fields to allow spatially explicit modelling of wind-mediated pollen-dispersal patterns. 3. Remote sensing was used as a tool to locate rapeseed fields, and a variety of image-processing techniques was adopted to facilitate the compilation of a spatially explicit profile of sympatry between the crop and Brassica rapa. 4. Classified satellite images containing rapeseed fields were first used to infer the spatial relationship between donor rapeseed fields and recipient riverside B. rapa populations. Such images also have utility for improving the efficiency of ground surveys by identifying probable sites of sympatry. The same data were then also used for the calculation of mean field size. 5. This paper forms a companion paper to Wilkinson et al. (2003), in which these elements were combined to produce a spatially explicit profile of hybrid formation over the UK. The current paper demonstrates the value of remote sensing and image processing for large-scale studies of gene flow, and describes a generic method that could be applied to a variety of crops in many countries. 6. Synthesis and applications. The decision to approve or prevent the release of a GM cultivar is made at a national rather than regional level. It is highly desirable that data relating to the decision-making process are collected at the same scale, rather than relying on extrapolation from smaller experiments designed at the plot, field or even regional scale. It would be extremely difficult and labour intensive to attempt to carry out such large-scale investigations without the use of remote-sensing technology. This study used rapeseed in the UK as a model to demonstrate the value of remote sensing in assembling empirical information at a national level.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An information processor for rendering input data compatible with standard video recording and/or display equipment, comprizing means for digitizing the input data over periods which are synchronous with the fields of a standard video signal, a store adapted to store the digitized data and release stored digitized data in correspondence wiht the line scan of a standard video monitor, the store having two halves which correspond to the interlaced fields of a standard video signal and being so arranged that one half is filed while the other is emptied, and means for converting the released stored digitized data into video luminance signals. The input signals may be in digital or analogue form. A second stage which reconstitutes the recorded data is also described.