24 resultados para Data detection


Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the rapid growth of the Internet, computer attacks are increasing at a fast pace and can easily cause millions of dollar in damage to an organization. Detecting these attacks is an important issue of computer security. There are many types of attacks and they fall into four main categories, Denial of Service (DoS) attacks, Probe, User to Root (U2R) attacks, and Remote to Local (R2L) attacks. Within these categories, DoS and Probe attacks continuously show up with greater frequency in a short period of time when they attack systems. They are different from the normal traffic data and can be easily separated from normal activities. On the contrary, U2R and R2L attacks are embedded in the data portions of the packets and normally involve only a single connection. It becomes difficult to achieve satisfactory detection accuracy for detecting these two attacks. Therefore, we focus on studying the ambiguity problem between normal activities and U2R/R2L attacks. The goal is to build a detection system that can accurately and quickly detect these two attacks. In this dissertation, we design a two-phase intrusion detection approach. In the first phase, a correlation-based feature selection algorithm is proposed to advance the speed of detection. Features with poor prediction ability for the signatures of attacks and features inter-correlated with one or more other features are considered redundant. Such features are removed and only indispensable information about the original feature space remains. In the second phase, we develop an ensemble intrusion detection system to achieve accurate detection performance. The proposed method includes multiple feature selecting intrusion detectors and a data mining intrusion detector. The former ones consist of a set of detectors, and each of them uses a fuzzy clustering technique and belief theory to solve the ambiguity problem. The latter one applies data mining technique to automatically extract computer users’ normal behavior from training network traffic data. The final decision is a combination of the outputs of feature selecting and data mining detectors. The experimental results indicate that our ensemble approach not only significantly reduces the detection time but also effectively detect U2R and R2L attacks that contain degrees of ambiguous information.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traffic incidents are non-recurring events that can cause a temporary reduction in roadway capacity. They have been recognized as a major contributor to traffic congestion on our national highway systems. To alleviate their impacts on capacity, automatic incident detection (AID) has been applied as an incident management strategy to reduce the total incident duration. AID relies on an algorithm to identify the occurrence of incidents by analyzing real-time traffic data collected from surveillance detectors. Significant research has been performed to develop AID algorithms for incident detection on freeways; however, similar research on major arterial streets remains largely at the initial stage of development and testing. This dissertation research aims to identify design strategies for the deployment of an Artificial Neural Network (ANN) based AID algorithm for major arterial streets. A section of the US-1 corridor in Miami-Dade County, Florida was coded in the CORSIM microscopic simulation model to generate data for both model calibration and validation. To better capture the relationship between the traffic data and the corresponding incident status, Discrete Wavelet Transform (DWT) and data normalization were applied to the simulated data. Multiple ANN models were then developed for different detector configurations, historical data usage, and the selection of traffic flow parameters. To assess the performance of different design alternatives, the model outputs were compared based on both detection rate (DR) and false alarm rate (FAR). The results show that the best models were able to achieve a high DR of between 90% and 95%, a mean time to detect (MTTD) of 55-85 seconds, and a FAR below 4%. The results also show that a detector configuration including only the mid-block and upstream detectors performs almost as well as one that also includes a downstream detector. In addition, DWT was found to be able to improve model performance, and the use of historical data from previous time cycles improved the detection rate. Speed was found to have the most significant impact on the detection rate, while volume was found to contribute the least. The results from this research provide useful insights on the design of AID for arterial street applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. ^ Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. ^ In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Kernel-level malware is one of the most dangerous threats to the security of users on the Internet, so there is an urgent need for its detection. The most popular detection approach is misuse-based detection. However, it cannot catch up with today's advanced malware that increasingly apply polymorphism and obfuscation. In this thesis, we present our integrity-based detection for kernel-level malware, which does not rely on the specific features of malware. ^ We have developed an integrity analysis system that can derive and monitor integrity properties for commodity operating systems kernels. In our system, we focus on two classes of integrity properties: data invariants and integrity of Kernel Queue (KQ) requests. ^ We adopt static analysis for data invariant detection and overcome several technical challenges: field-sensitivity, array-sensitivity, and pointer analysis. We identify data invariants that are critical to system runtime integrity from Linux kernel 2.4.32 and Windows Research Kernel (WRK) with very low false positive rate and very low false negative rate. We then develop an Invariant Monitor to guard these data invariants against real-world malware. In our experiment, we are able to use Invariant Monitor to detect ten real-world Linux rootkits and nine real-world Windows malware and one synthetic Windows malware. ^ We leverage static and dynamic analysis of kernel and device drivers to learn the legitimate KQ requests. Based on the learned KQ requests, we build KQguard to protect KQs. At runtime, KQguard rejects all the unknown KQ requests that cannot be validated. We apply KQguard on WRK and Linux kernel, and extensive experimental evaluation shows that KQguard is efficient (up to 5.6% overhead) and effective (capable of achieving zero false positives against representative benign workloads after appropriate training and very low false negatives against 125 real-world malware and nine synthetic attacks). ^ In our system, Invariant Monitor and KQguard cooperate together to protect data invariants and KQs in the target kernel. By monitoring these integrity properties, we can detect malware by its violation of these integrity properties during execution.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Kernel-level malware is one of the most dangerous threats to the security of users on the Internet, so there is an urgent need for its detection. The most popular detection approach is misuse-based detection. However, it cannot catch up with today's advanced malware that increasingly apply polymorphism and obfuscation. In this thesis, we present our integrity-based detection for kernel-level malware, which does not rely on the specific features of malware. We have developed an integrity analysis system that can derive and monitor integrity properties for commodity operating systems kernels. In our system, we focus on two classes of integrity properties: data invariants and integrity of Kernel Queue (KQ) requests. We adopt static analysis for data invariant detection and overcome several technical challenges: field-sensitivity, array-sensitivity, and pointer analysis. We identify data invariants that are critical to system runtime integrity from Linux kernel 2.4.32 and Windows Research Kernel (WRK) with very low false positive rate and very low false negative rate. We then develop an Invariant Monitor to guard these data invariants against real-world malware. In our experiment, we are able to use Invariant Monitor to detect ten real-world Linux rootkits and nine real-world Windows malware and one synthetic Windows malware. We leverage static and dynamic analysis of kernel and device drivers to learn the legitimate KQ requests. Based on the learned KQ requests, we build KQguard to protect KQs. At runtime, KQguard rejects all the unknown KQ requests that cannot be validated. We apply KQguard on WRK and Linux kernel, and extensive experimental evaluation shows that KQguard is efficient (up to 5.6% overhead) and effective (capable of achieving zero false positives against representative benign workloads after appropriate training and very low false negatives against 125 real-world malware and nine synthetic attacks). In our system, Invariant Monitor and KQguard cooperate together to protect data invariants and KQs in the target kernel. By monitoring these integrity properties, we can detect malware by its violation of these integrity properties during execution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The presence of inhibitory substances in biological forensic samples has, and continues to affect the quality of the data generated following DNA typing processes. Although the chemistries used during the procedures have been enhanced to mitigate the effects of these deleterious compounds, some challenges remain. Inhibitors can be components of the samples, the substrate where samples were deposited or chemical(s) associated to the DNA purification step. Therefore, a thorough understanding of the extraction processes and their ability to handle the various types of inhibitory substances can help define the best analytical processing for any given sample. A series of experiments were conducted to establish the inhibition tolerance of quantification and amplification kits using common inhibitory substances in order to determine if current laboratory practices are optimal for identifying potential problems associated with inhibition. DART mass spectrometry was used to determine the amount of inhibitor carryover after sample purification, its correlation to the initial inhibitor input in the sample and the overall effect in the results. Finally, a novel alternative at gathering investigative leads from samples that would otherwise be ineffective for DNA typing due to the large amounts of inhibitory substances and/or environmental degradation was tested. This included generating data associated with microbial peak signatures to identify locations of clandestine human graves. Results demonstrate that the current methods for assessing inhibition are not necessarily accurate, as samples that appear inhibited in the quantification process can yield full DNA profiles, while those that do not indicate inhibition may suffer from lowered amplification efficiency or PCR artifacts. The extraction methods tested were able to remove >90% of the inhibitors from all samples with the exception of phenol, which was present in variable amounts whenever the organic extraction approach was utilized. Although the results attained suggested that most inhibitors produce minimal effect on downstream applications, analysts should practice caution when selecting the best extraction method for particular samples, as casework DNA samples are often present in small quantities and can contain an overwhelming amount of inhibitory substances.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The development cost of any civil infrastructure is very high; during its life span, the civil structure undergoes a lot of physical loads and environmental effects which damage the structure. Failing to identify this damage at an early stage may result in severe property loss and may become a potential threat to people and the environment. Thus, there is a need to develop effective damage detection techniques to ensure the safety and integrity of the structure. One of the Structural Health Monitoring methods to evaluate a structure is by using statistical analysis. In this study, a civil structure measuring 8 feet in length, 3 feet in diameter, embedded with thermocouple sensors at 4 different levels is analyzed under controlled and variable conditions. With the help of statistical analysis, possible damage to the structure was analyzed. The analysis could detect the structural defects at various levels of the structure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.