711 resultados para Localization real-world challenges
Resumo:
Kernel-level malware is one of the most dangerous threats to the security of users on the Internet, so there is an urgent need for its detection. The most popular detection approach is misuse-based detection. However, it cannot catch up with today's advanced malware that increasingly apply polymorphism and obfuscation. In this thesis, we present our integrity-based detection for kernel-level malware, which does not rely on the specific features of malware. ^ We have developed an integrity analysis system that can derive and monitor integrity properties for commodity operating systems kernels. In our system, we focus on two classes of integrity properties: data invariants and integrity of Kernel Queue (KQ) requests. ^ We adopt static analysis for data invariant detection and overcome several technical challenges: field-sensitivity, array-sensitivity, and pointer analysis. We identify data invariants that are critical to system runtime integrity from Linux kernel 2.4.32 and Windows Research Kernel (WRK) with very low false positive rate and very low false negative rate. We then develop an Invariant Monitor to guard these data invariants against real-world malware. In our experiment, we are able to use Invariant Monitor to detect ten real-world Linux rootkits and nine real-world Windows malware and one synthetic Windows malware. ^ We leverage static and dynamic analysis of kernel and device drivers to learn the legitimate KQ requests. Based on the learned KQ requests, we build KQguard to protect KQs. At runtime, KQguard rejects all the unknown KQ requests that cannot be validated. We apply KQguard on WRK and Linux kernel, and extensive experimental evaluation shows that KQguard is efficient (up to 5.6% overhead) and effective (capable of achieving zero false positives against representative benign workloads after appropriate training and very low false negatives against 125 real-world malware and nine synthetic attacks). ^ In our system, Invariant Monitor and KQguard cooperate together to protect data invariants and KQs in the target kernel. By monitoring these integrity properties, we can detect malware by its violation of these integrity properties during execution.^
Resumo:
Data integration systems offer uniform access to a set of autonomous and heterogeneous data sources. One of the main challenges in data integration is reconciling semantic differences among data sources. Approaches that been used to solve this problem can be categorized as schema-based and attribute-based. Schema-based approaches use schema information to identify the semantic similarity in data; furthermore, they focus on reconciling types before reconciling attributes. In contrast, attribute-based approaches use statistical and structural information of attributes to identify the semantic similarity of data in different sources. This research examines an approach to semantic reconciliation based on integrating properties expressed at different levels of abstraction or granularity using the concept of property precedence. Property precedence reconciles the meaning of attributes by identifying similarities between attributes based on what these attributes represent in the real world. In order to use property precedence for semantic integration, we need to identify the precedence of attributes within and across data sources. The goal of this research is to develop and evaluate a method and algorithms that will identify precedence relations among attributes and build property precedence graph (PPG) that can be used to support integration.
Resumo:
This thesis stems from the project with real-time environmental monitoring company EMSAT Corporation. They were looking for methods to automatically ag spikes and other anomalies in their environmental sensor data streams. The problem presents several challenges: near real-time anomaly detection, absence of labeled data and time-changing data streams. Here, we address this problem using both a statistical parametric approach as well as a non-parametric approach like Kernel Density Estimation (KDE). The main contribution of this thesis is extending the KDE to work more effectively for evolving data streams, particularly in presence of concept drift. To address that, we have developed a framework for integrating Adaptive Windowing (ADWIN) change detection algorithm with KDE. We have tested this approach on several real world data sets and received positive feedback from our industry collaborator. Some results appearing in this thesis have been presented at ECML PKDD 2015 Doctoral Consortium.
Resumo:
The successful performance of a hydrological model is usually challenged by the quality of the sensitivity analysis, calibration and uncertainty analysis carried out in the modeling exercise and subsequent simulation results. This is especially important under changing climatic conditions where there are more uncertainties associated with climate models and downscaling processes that increase the complexities of the hydrological modeling system. In response to these challenges and to improve the performance of the hydrological models under changing climatic conditions, this research proposed five new methods for supporting hydrological modeling. First, a design of experiment aided sensitivity analysis and parameterization (DOE-SAP) method was proposed to investigate the significant parameters and provide more reliable sensitivity analysis for improving parameterization during hydrological modeling. The better calibration results along with the advanced sensitivity analysis for significant parameters and their interactions were achieved in the case study. Second, a comprehensive uncertainty evaluation scheme was developed to evaluate three uncertainty analysis methods, the sequential uncertainty fitting version 2 (SUFI-2), generalized likelihood uncertainty estimation (GLUE) and Parameter solution (ParaSol) methods. The results showed that the SUFI-2 performed better than the other two methods based on calibration and uncertainty analysis results. The proposed evaluation scheme demonstrated that it is capable of selecting the most suitable uncertainty method for case studies. Third, a novel sequential multi-criteria based calibration and uncertainty analysis (SMC-CUA) method was proposed to improve the efficiency of calibration and uncertainty analysis and control the phenomenon of equifinality. The results showed that the SMC-CUA method was able to provide better uncertainty analysis results with high computational efficiency compared to the SUFI-2 and GLUE methods and control parameter uncertainty and the equifinality effect without sacrificing simulation performance. Fourth, an innovative response based statistical evaluation method (RESEM) was proposed for estimating the uncertainty propagated effects and providing long-term prediction for hydrological responses under changing climatic conditions. By using RESEM, the uncertainty propagated from statistical downscaling to hydrological modeling can be evaluated. Fifth, an integrated simulation-based evaluation system for uncertainty propagation analysis (ISES-UPA) was proposed for investigating the effects and contributions of different uncertainty components to the total propagated uncertainty from statistical downscaling. Using ISES-UPA, the uncertainty from statistical downscaling, uncertainty from hydrological modeling, and the total uncertainty from two uncertainty sources can be compared and quantified. The feasibility of all the methods has been tested using hypothetical and real-world case studies. The proposed methods can also be integrated as a hydrological modeling system to better support hydrological studies under changing climatic conditions. The results from the proposed integrated hydrological modeling system can be used as scientific references for decision makers to reduce the potential risk of damages caused by extreme events for long-term water resource management and planning.
Resumo:
Aligned single-walled carbon nanotubes (SWNTs) synthesized by the chemical vapor deposition (CVD) method have exceptional potential for next-generation nanoelectronics. However, there are considerable challenges in the preparation of semiconducting (s-) SWNTs with controlled properties (e.g., density, selectivity, and diameter) for their application in solving real-world problems. This dissertation describes research that aims to overcome the limitations by novel synthesis strategies and post-growth treatment. The application of as-prepared SWNTs as functional devices is also demonstrated. The dissertation includes the following parts: 1) decoupling the conflict between density and selectivity of s-SWNTs in CVD growth; 2) investigating the importance of diameter control for the selective synthesis of s-SWNTs; 3) synthesizing highly conductive SWNT thin film by thiophene-assisted CVD method; 4) eliminating metallic pathways in SWNT crossbars by gate-free electrical breakdown method; 5) enhancing the density of SWNT arrays by strain-release method; 6) studying the sensing mechanism of SWNT crossbar chemical sensors.
Resumo:
Constant technology advances have caused data explosion in recent years. Accord- ingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.
Previous statistical methods for big data often aim to find low dimensional struc- tures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture pro- portions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.
Resumo:
Kernel-level malware is one of the most dangerous threats to the security of users on the Internet, so there is an urgent need for its detection. The most popular detection approach is misuse-based detection. However, it cannot catch up with today's advanced malware that increasingly apply polymorphism and obfuscation. In this thesis, we present our integrity-based detection for kernel-level malware, which does not rely on the specific features of malware. We have developed an integrity analysis system that can derive and monitor integrity properties for commodity operating systems kernels. In our system, we focus on two classes of integrity properties: data invariants and integrity of Kernel Queue (KQ) requests. We adopt static analysis for data invariant detection and overcome several technical challenges: field-sensitivity, array-sensitivity, and pointer analysis. We identify data invariants that are critical to system runtime integrity from Linux kernel 2.4.32 and Windows Research Kernel (WRK) with very low false positive rate and very low false negative rate. We then develop an Invariant Monitor to guard these data invariants against real-world malware. In our experiment, we are able to use Invariant Monitor to detect ten real-world Linux rootkits and nine real-world Windows malware and one synthetic Windows malware. We leverage static and dynamic analysis of kernel and device drivers to learn the legitimate KQ requests. Based on the learned KQ requests, we build KQguard to protect KQs. At runtime, KQguard rejects all the unknown KQ requests that cannot be validated. We apply KQguard on WRK and Linux kernel, and extensive experimental evaluation shows that KQguard is efficient (up to 5.6% overhead) and effective (capable of achieving zero false positives against representative benign workloads after appropriate training and very low false negatives against 125 real-world malware and nine synthetic attacks). In our system, Invariant Monitor and KQguard cooperate together to protect data invariants and KQs in the target kernel. By monitoring these integrity properties, we can detect malware by its violation of these integrity properties during execution.
Resumo:
Full paper presented at EC-TEL 2016
Resumo:
Salman, M. et al. (2016). Integrating Scientific Publication into an Applied Gaming Ecosystem. GSTF Journal on Computing (JoC), Volume 5 (Issue 1), pp. 45-51.
Resumo:
Cybercriminals ramp up their efforts with sophisticated techniques while defenders gradually update their typical security measures. Attackers often have a long-term interest in their targets. Due to a number of factors such as scale, architecture and nonproductive traffic however it makes difficult to detect them using typical intrusion detection techniques. Cyber early warning systems (CEWS) aim at alerting such attempts in their nascent stages using preliminary indicators. Design and implementation of such systems involves numerous research challenges such as generic set of indicators, intelligence gathering, uncertainty reasoning and information fusion. This paper discusses such challenges and presents the reader with compelling motivation. A carefully deployed empirical analysis using a real world attack scenario and a real network traffic capture is also presented.
Resumo:
Future pervasive environments will take into consideration not only individual user’s interest, but also social relationships. In this way, pervasive communities can lead the user to participate beyond traditional pervasive spaces, enabling the cooperation among groups and taking into account not only individual interests, but also the collective and social context. Social applications in CSCW (Computer Supported Cooperative Work) field represent new challenges and possibilities in terms of use of social context information for adaptability in pervasive environments. In particular, the research describes the approach in the design and development of a context.aware framework for collaborative applications (CAFCA), utilizing user’s context social information for proactive adaptations in pervasive environments. In order to validate the proposed framework an evaluation was conducted with a group of users based on enterprise scenario. The analysis enabled to verify the impact of the framework in terms of functionality and efficiency in real-world conditions. The main contribution of this thesis was to provide a context-aware framework to support collaborative applications in pervasive environments. The research focused on providing an innovative socio-technical approach to exploit collaboration in pervasive communities. Finally, the main results reside in social matching capabilities for session formation, communication and coordinations of groupware for collaborative activities.
Resumo:
In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.
Resumo:
In the past decade, systems that extract information from millions of Internet documents have become commonplace. Knowledge graphs -- structured knowledge bases that describe entities, their attributes and the relationships between them -- are a powerful tool for understanding and organizing this vast amount of information. However, a significant obstacle to knowledge graph construction is the unreliability of the extracted information, due to noise and ambiguity in the underlying data or errors made by the extraction system and the complexity of reasoning about the dependencies between these noisy extractions. My dissertation addresses these challenges by exploiting the interdependencies between facts to improve the quality of the knowledge graph in a scalable framework. I introduce a new approach called knowledge graph identification (KGI), which resolves the entities, attributes and relationships in the knowledge graph by incorporating uncertain extractions from multiple sources, entity co-references, and ontological constraints. I define a probability distribution over possible knowledge graphs and infer the most probable knowledge graph using a combination of probabilistic and logical reasoning. Such probabilistic models are frequently dismissed due to scalability concerns, but my implementation of KGI maintains tractable performance on large problems through the use of hinge-loss Markov random fields, which have a convex inference objective. This allows the inference of large knowledge graphs using 4M facts and 20M ground constraints in 2 hours. To further scale the solution, I develop a distributed approach to the KGI problem which runs in parallel across multiple machines, reducing inference time by 90%. Finally, I extend my model to the streaming setting, where a knowledge graph is continuously updated by incorporating newly extracted facts. I devise a general approach for approximately updating inference in convex probabilistic models, and quantify the approximation error by defining and bounding inference regret for online models. Together, my work retains the attractive features of probabilistic models while providing the scalability necessary for large-scale knowledge graph construction. These models have been applied on a number of real-world knowledge graph projects, including the NELL project at Carnegie Mellon and the Google Knowledge Graph.
Resumo:
Cache-coherent non uniform memory access (ccNUMA) architecture is a standard design pattern for contemporary multicore processors, and future generations of architectures are likely to be NUMA. NUMA architectures create new challenges for managed runtime systems. Memory-intensive applications use the system’s distributed memory banks to allocate data, and the automatic memory manager collects garbage left in these memory banks. The garbage collector may need to access remote memory banks, which entails access latency overhead and potential bandwidth saturation for the interconnection between memory banks. This dissertation makes five significant contributions to garbage collection on NUMA systems, with a case study implementation using the Hotspot Java Virtual Machine. It empirically studies data locality for a Stop-The-World garbage collector when tracing connected objects in NUMA heaps. First, it identifies a locality richness which exists naturally in connected objects that contain a root object and its reachable set— ‘rooted sub-graphs’. Second, this dissertation leverages the locality characteristic of rooted sub-graphs to develop a new NUMA-aware garbage collection mechanism. A garbage collector thread processes a local root and its reachable set, which is likely to have a large number of objects in the same NUMA node. Third, a garbage collector thread steals references from sibling threads that run on the same NUMA node to improve data locality. This research evaluates the new NUMA-aware garbage collector using seven benchmarks of an established real-world DaCapo benchmark suite. In addition, evaluation involves a widely used SPECjbb benchmark and Neo4J graph database Java benchmark, as well as an artificial benchmark. The results of the NUMA-aware garbage collector on a multi-hop NUMA architecture show an average of 15% performance improvement. Furthermore, this performance gain is shown to be as a result of an improved NUMA memory access in a ccNUMA system. Fourth, the existing Hotspot JVM adaptive policy for configuring the number of garbage collection threads is shown to be suboptimal for current NUMA machines. The policy uses outdated assumptions and it generates a constant thread count. In fact, the Hotspot JVM still uses this policy in the production version. This research shows that the optimal number of garbage collection threads is application-specific and configuring the optimal number of garbage collection threads yields better collection throughput than the default policy. Fifth, this dissertation designs and implements a runtime technique, which involves heuristics from dynamic collection behavior to calculate an optimal number of garbage collector threads for each collection cycle. The results show an average of 21% improvements to the garbage collection performance for DaCapo benchmarks.
Resumo:
Automatic analysis of human behaviour in large collections of videos is gaining interest, even more so with the advent of file sharing sites such as YouTube. However, challenges still exist owing to several factors such as inter- and intra-class variations, cluttered backgrounds, occlusion, camera motion, scale, view and illumination changes. This research focuses on modelling human behaviour for action recognition in videos. The developed techniques are validated on large scale benchmark datasets and applied on real-world scenarios such as soccer videos. Three major contributions are made. The first contribution is in the area of proper choice of a feature representation for videos. This involved a study of state-of-the-art techniques for action recognition, feature extraction processing and dimensional reduction techniques so as to yield the best performance with optimal computational requirements. Secondly, temporal modelling of human behaviour is performed. This involved frequency analysis and temporal integration of local information in the video frames to yield a temporal feature vector. Current practices mostly average the frame information over an entire video and neglect the temporal order. Lastly, the proposed framework is applied and further adapted to real-world scenario such as soccer videos. A dataset consisting of video sequences depicting events of players falling is created from actual match data to this end and used to experimentally evaluate the proposed framework.