984 resultados para heterogeneous sources
Resumo:
The research presented in this thesis addresses inherent problems in signaturebased intrusion detection systems (IDSs) operating in heterogeneous environments. The research proposes a solution to address the difficulties associated with multistep attack scenario specification and detection for such environments. The research has focused on two distinct problems: the representation of events derived from heterogeneous sources and multi-step attack specification and detection. The first part of the research investigates the application of an event abstraction model to event logs collected from a heterogeneous environment. The event abstraction model comprises a hierarchy of events derived from different log sources such as system audit data, application logs, captured network traffic, and intrusion detection system alerts. Unlike existing event abstraction models where low-level information may be discarded during the abstraction process, the event abstraction model presented in this work preserves all low-level information as well as providing high-level information in the form of abstract events. The event abstraction model presented in this work was designed independently of any particular IDS and thus may be used by any IDS, intrusion forensic tools, or monitoring tools. The second part of the research investigates the use of unification for multi-step attack scenario specification and detection. Multi-step attack scenarios are hard to specify and detect as they often involve the correlation of events from multiple sources which may be affected by time uncertainty. The unification algorithm provides a simple and straightforward scenario matching mechanism by using variable instantiation where variables represent events as defined in the event abstraction model. The third part of the research looks into the solution to address time uncertainty. Clock synchronisation is crucial for detecting multi-step attack scenarios which involve logs from multiple hosts. Issues involving time uncertainty have been largely neglected by intrusion detection research. The system presented in this research introduces two techniques for addressing time uncertainty issues: clock skew compensation and clock drift modelling using linear regression. An off-line IDS prototype for detecting multi-step attacks has been implemented. The prototype comprises two modules: implementation of the abstract event system architecture (AESA) and of the scenario detection module. The scenario detection module implements our signature language developed based on the Python programming language syntax and the unification-based scenario detection engine. The prototype has been evaluated using a publicly available dataset of real attack traffic and event logs and a synthetic dataset. The distinct features of the public dataset are the fact that it contains multi-step attacks which involve multiple hosts with clock skew and clock drift. These features allow us to demonstrate the application and the advantages of the contributions of this research. All instances of multi-step attacks in the dataset have been correctly identified even though there exists a significant clock skew and drift in the dataset. Future work identified by this research would be to develop a refined unification algorithm suitable for processing streams of events to enable an on-line detection. In terms of time uncertainty, identified future work would be to develop mechanisms which allows automatic clock skew and clock drift identification and correction. The immediate application of the research presented in this thesis is the framework of an off-line IDS which processes events from heterogeneous sources using abstraction and which can detect multi-step attack scenarios which may involve time uncertainty.
Resumo:
Many online services access a large number of autonomous data sources and at the same time need to meet different user requirements. It is essential for these services to achieve semantic interoperability among these information exchange entities. In the presence of an increasing number of proprietary business processes, heterogeneous data standards, and diverse user requirements, it is critical that the services are implemented using adaptable, extensible, and scalable technology. The COntext INterchange (COIN) approach, inspired by similar goals of the Semantic Web, provides a robust solution. In this paper, we describe how COIN can be used to implement dynamic online services where semantic differences are reconciled on the fly. We show that COIN is flexible and scalable by comparing it with several conventional approaches. With a given ontology, the number of conversions in COIN is quadratic to the semantic aspect that has the largest number of distinctions. These semantic aspects are modeled as modifiers in a conceptual ontology; in most cases the number of conversions is linear with the number of modifiers, which is significantly smaller than traditional hard-wiring middleware approach where the number of conversion programs is quadratic to the number of sources and data receivers. In the example scenario in the paper, the COIN approach needs only 5 conversions to be defined while traditional approaches require 20,000 to 100 million. COIN achieves this scalability by automatically composing all the comprehensive conversions from a small number of declaratively defined sub-conversions.
Resumo:
The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^
Resumo:
Modern enterprise knowledge management systems typically require distributed approaches and the integration of numerous heterogeneous sources of information. A powerful foundation for these tasks can be Topic Maps, which not only provide a semantic net-like knowledge representation means and the possibility to use ontologies for modelling knowledge structures, but also offer concepts to link these knowledge structures with unstructured data stored in files, external documents etc. In this paper, we present the architecture and prototypical implementation of a Topic Map application infrastructure, the ‘Topic Grid’, which enables transparent, node-spanning access to different Topic Maps distributed in a network.
Resumo:
Effective management of chronic diseases is a global health priority. A healthcare information system offers opportunities to address challenges of chronic disease management. However, the requirements of health information systems are often not well understood. The accuracy of requirements has a direct impact on the successful design and implementation of a health information system. Our research describes methods used to understand the requirements of health information systems for advanced prostate cancer management. The research conducted a survey to identify heterogeneous sources of clinical records. Our research showed that the General Practitioner was the common source of patient's clinical records (41%) followed by the Urologist (14%) and other clinicians (14%). Our research describes a method to identify diverse data sources and proposes a novel patient journey browser prototype that integrates disparate data sources.
Resumo:
BACKGROUND: A hierarchical taxonomy of organisms is a prerequisite for semantic integration of biodiversity data. Ideally, there would be a single, expansive, authoritative taxonomy that includes extinct and extant taxa, information on synonyms and common names, and monophyletic supraspecific taxa that reflect our current understanding of phylogenetic relationships. DESCRIPTION: As a step towards development of such a resource, and to enable large-scale integration of phenotypic data across vertebrates, we created the Vertebrate Taxonomy Ontology (VTO), a semantically defined taxonomic resource derived from the integration of existing taxonomic compilations, and freely distributed under a Creative Commons Zero (CC0) public domain waiver. The VTO includes both extant and extinct vertebrates and currently contains 106,947 taxonomic terms, 22 taxonomic ranks, 104,736 synonyms, and 162,400 cross-references to other taxonomic resources. Key challenges in constructing the VTO included (1) extracting and merging names, synonyms, and identifiers from heterogeneous sources; (2) structuring hierarchies of terms based on evolutionary relationships and the principle of monophyly; and (3) automating this process as much as possible to accommodate updates in source taxonomies. CONCLUSIONS: The VTO is the primary source of taxonomic information used by the Phenoscape Knowledgebase (http://phenoscape.org/), which integrates genetic and evolutionary phenotype data across both model and non-model vertebrates. The VTO is useful for inferring phenotypic changes on the vertebrate tree of life, which enables queries for candidate genes for various episodes in vertebrate evolution.
Resumo:
Correctly modelling and reasoning with uncertain information from heterogeneous sources in large-scale systems is critical when the reliability is unknown and we still want to derive adequate conclusions. To this end, context-dependent merging strategies have been proposed in the literature. In this paper we investigate how one such context-dependent merging strategy (originally defined for possibility theory), called largely partially maximal consistent subsets (LPMCS), can be adapted to Dempster-Shafer (DS) theory. We identify those measures for the degree of uncertainty and internal conflict that are available in DS theory and show how they can be used for guiding LPMCS merging. A simplified real-world power distribution scenario illustrates our framework. We also briefly discuss how our approach can be incorporated into a multi-agent programming language, thus leading to better plan selection and decision making.
Resumo:
To provide in-time reactions to a large volume of surveil- lance data, uncertainty-enabled event reasoning frameworks for CCTV and sensor based intelligent surveillance system have been integrated to model and infer events of interest. However, most of the existing works do not consider decision making under uncertainty which is important for surveillance operators. In this paper, we extend an event reasoning framework for decision support, which enables our framework to predict, rank and alarm threats from multiple heterogeneous sources.
Resumo:
There has been much interest in the belief–desire–intention (BDI) agent-based model for developing scalable intelligent systems, e.g. using the AgentSpeak framework. However, reasoning from sensor information in these large-scale systems remains a significant challenge. For example, agents may be faced with information from heterogeneous sources which is uncertain and incomplete, while the sources themselves may be unreliable or conflicting. In order to derive meaningful conclusions, it is important that such information be correctly modelled and combined. In this paper, we choose to model uncertain sensor information in Dempster–Shafer (DS) theory. Unfortunately, as in other uncertainty theories, simple combination strategies in DS theory are often too restrictive (losing valuable information) or too permissive (resulting in ignorance). For this reason, we investigate how a context-dependent strategy originally defined for possibility theory can be adapted to DS theory. In particular, we use the notion of largely partially maximal consistent subsets (LPMCSes) to characterise the context for when to use Dempster’s original rule of combination and for when to resort to an alternative. To guide this process, we identify existing measures of similarity and conflict for finding LPMCSes along with quality of information heuristics to ensure that LPMCSes are formed around high-quality information. We then propose an intelligent sensor model for integrating this information into the AgentSpeak framework which is responsible for applying evidence propagation to construct compatible information, for performing context-dependent combination and for deriving beliefs for revising an agent’s belief base. Finally, we present a power grid scenario inspired by a real-world case study to demonstrate our work.
Resumo:
Tese de doutoramento, Informática (Bioinformática), Universidade de Lisboa, Faculdade de Ciências, 2014
Resumo:
Dissertação de natureza científica realizada para obtenção do grau de Mestre em Engenharia Informática e de Computadores
Resumo:
Many projects, e.g. VIKEF [13] and KIM [7], present grounded approaches for the use of entities as a means of indexing and retrieval of multimedia resources from heterogeneous sources. In this paper, we discuss the state-of-the-art of entity-centric approaches for multimedia indexing and retrieval. A summary of projects employing entity-centric repositories are portrayed. This paper also looks at the current state-of-the-art authoring environment, Macromedia Authorware, and the possibility of potential extension of this environment for entity-based multimedia authoring.
Resumo:
Pervasive applications use context provision middleware support as infrastructures to provide context information. Typically, those applications use communication publish/subscribe to eliminate the direct coupling between components and to allow the selective information dissemination based in the interests of the communicating elements. The use of composite events mechanisms together with such middlewares to aggregate individual low level events, originating from of heterogeneous sources, in high level context information relevant for the application. CES (Composite Event System) is a composite events mechanism that works simultaneously in cooperation with several context provision middlewares. With that integration, applications use CES to subscribe to composite events and CES, in turn, subscribes to the primitive events in the appropriate underlying middlewares and notifies the applications when the composed events happen. Furthermore, CES offers a language with a group of operators for the definition of composite events that also allows context information sharing
Resumo:
Microstratigraphic, sedimentological, and taphonomic features of the Ferraz Shell Bed, from the Upper Permian (Kazanian-Tatarian?) Corumbatai Formation of Rio Claro Region (the Parana Basin, Brazil), indicate that the bed consists of four distinct microstratigraphic units. They include, from bottom to top, a lag concentration (Unit 1), a partly reworked storm deposit (Unit 2), a rapidly deposited sandstone unit with three thin horizons recording episodes of reworking (Unit 3), and a shell-rich horizon generated by reworking/winnowing that was subsequently buried by storm-induced obrution deposit (Unit 4). The bioclasts of the Ferraz Shell Bed represent exclusively bivalve mollusks. Pinzonella illusa and Terraia aequilateralis are the dominant species. Taphonomic analysis indicates that mollusks are heavily time-averaged (except for some parts of Unit 3). Moreover, different species are time-averaged to a different degree (disharmonious time-averaging). The units differ statistically from one another in their taxonomic and ecological composition, in their taphonomic pattern, and in the size-frequency distributions of the two most common species. Other Permian shell beds of the Parana Basin are similar to the Ferraz Shell Bed in their faunal composition (they typically contain similar sets of 5 to 10 bivalve species) and in their taphonomic, sedimentologic, and microstratigraphic characteristics. However, rare shell beds that include 2-3 species only and are dominated by articulated shells preserved in life position also occur. Diversity levels in the Permian benthic associations of the Parana Basin were very low, with the point diversity of 2-3 species and with the within-habitat and basin-wide (alpha and gamma) diversities of 10 species, at most. The Parana Basin benthic communities may have thus been analogous to low-diversity bivalve-dominated associations of the present-day Baltic Sea. The 'Ferraz-type' shell beds of the Parana Basin represent genetically complex and highly heterogeneous sources of paleontological data. They are cumulative records of spectra of benthic ecosystems time-averaged over long periods of time (10(2)-10(4) years judging from actualistic research). Detailed biostratinomic reconstructions of shell beds can not only offer useful insights into their depositional histories, but may also allow paleoecologists to optimize their sampling designs, and consequently, refine paleoecological and paleoenvironmental interpretations.
Resumo:
This thesis focusses on the tectonic evolution and geochronology of part of the Kaoko orogen, which is part of a network of Pan-African orogenic belts in NW Namibia. By combining geochemical, isotopic and structural analysis, the aim was to gain more information about how and when the Kaoko Belt formed. The first chapter gives a general overview of the studied area and the second one describes the basis of the Electron Probe Microanalysis dating method. The reworking of Palaeo- to Mesoproterozoic basement during the Pan-African orogeny as part of the assembly of West Gondwana is discussed in Chapter 3. In the study area, high-grade rocks occupy a large area, and the belt is marked by several large-scale structural discontinuities. The two major discontinuities, the Sesfontein Thrust (ST) and the Puros Shear Zone (PSZ), subdivide the orogen into three tectonic units: the Eastern Kaoko Zone (EKZ), the Central Kaoko Zone (CKZ) and the Western Kaoko Zone (WKZ). An important lineament, the Village Mylonite Zone (VMZ), has been identified in the WKZ. Since plutonic rocks play an important role in understanding the evolution of a mountain belt, zircons from granitoid gneisses were dated by conventional U-Pb, SHRIMP and Pb-Pb techniques to identify different age provinces. Four different age provinces were recognized within the Central and Western part of the belt, which occur in different structural positions. The VMZ seems to mark the limit between Pan-African granitic rocks east of the lineament and Palaeo- to Mesoproterozoic basement to the west. In Chapter 4 the tectonic processes are discussed that led to the Neoproterozoic architecture of the orogen. The data suggest that the Kaoko Belt experienced three main phases of deformation, D1-D3, during the Pan-African orogeny. Early structures in the central part of the study area indicate that the initial stage of collision was governed by underthrusting of the medium-grade Central Kaoko zone below the high-grade Western Kaoko zone, resulting in the development of an inverted metamorphic gradient. The early structures were overprinted by a second phase D2, which was associated with the development of the PSZ and extensive partial melting and intrusion of ~550 Ma granitic bodies in the high-grade WKZ. Transcurrent deformation continued during cooling of the entire belt, giving rise to the localized low-temperature VMZ that separates a segment of elevated Mesoproterozoic basement from the rest of the Western zone in which only Pan-African ages have so far been observed. The data suggest that the boundary between the Western and Central Kaoko zones represents a modified thrust zone, controlling the tectonic evolution of the Kaoko belt. The geodynamic evolution and the processes that generated this belt system are discussed in Chapter 5. Nd mean crustal residence ages of granitoid rocks permit subdivision of the belt into four provinces. Province I is characterised by mean crustal residence ages <1.7 Ga and is restricted to the Neoproterozoic granitoids. A wide range of initial Sr isotopic values (87Sr/86Sri = 0.7075 to 0.7225) suggests heterogeneous sources for these granitoids. The second province consists of Mesoproterozoic (1516-1448 Ma) and late Palaeo-proterozoic (1776-1701 Ma) rocks and is probably related to the Eburnian cycle with Nd model ages of 1.8-2.2 Ga. The eNd i values of these granitoids are around zero and suggest a predominantly juvenile source. Late Archaean and middle Palaeoproterozoic rocks with model ages of 2.5 to 2.8 Ga make up Province III in the central part of the belt and are distinct from two early Proterozoic samples taken near the PSZ which show even older TDM ages of ~3.3 Ga (Province IV). There is no clear geological evidence for the involvement of oceanic lithosphere in the formation of the Kaoko-Dom Feliciano orogen. Chapter 6 presents the results of isotopic analyses of garnet porphyroblasts from high-grade meta-igneous and metasedimentary rocks of the sillimanite-K-feldspar zone. Minimum P-T conditions for peak metamorphism were calculated at 731±10 °C at 6.7±1.2 kbar, substantially lower than those previously reported. A Sm-Nd garnet-whole rock errorchron obtained on a single meta-igneous rock yielded an unexpectedly old age of 692±13 Ma, which is interpreted as an inherited metamorphic age reflecting an early Pan-African granulite-facies event. The dated garnets survived a younger high-grade metamorphism that occurred between ca. 570 and 520 Ma and apparently maintained their old Sm-Nd isotopic systematics, implying that the closure temperature for garnet in this sample was higher than 730 °C. The metamorphic peak of the younger event was dated by electronmicroprobe on monazite at 567±5 Ma. From a regional viewpoint, it is possible that these granulites of igneous origin may be unrelated to the early Pan-African metamorphic evolution of the Kaoko Belt and may represent a previously unrecognised exotic terrane.