871 resultados para Pattern mining, Information filtering, User profile, Threshold


Relevância:

40.00% 40.00%

Publicador:

Resumo:

We develop a new autoregressive conditional process to capture both the changes and the persistency of the intraday seasonal (U-shape) pattern of volatility in essay 1. Unlike other procedures, this approach allows for the intraday volatility pattern to change over time without the filtering process injecting a spurious pattern of noise into the filtered series. We show that prior deterministic filtering procedures are special cases of the autoregressive conditional filtering process presented here. Lagrange multiplier tests prove that the stochastic seasonal variance component is statistically significant. Specification tests using the correlogram and cross-spectral analyses prove the reliability of the autoregressive conditional filtering process. In essay 2 we develop a new methodology to decompose return variance in order to examine the informativeness embedded in the return series. The variance is decomposed into the information arrival component and the noise factor component. This decomposition methodology differs from previous studies in that both the informational variance and the noise variance are time-varying. Furthermore, the covariance of the informational component and the noisy component is no longer restricted to be zero. The resultant measure of price informativeness is defined as the informational variance divided by the total variance of the returns. The noisy rational expectations model predicts that uninformed traders react to price changes more than informed traders, since uninformed traders cannot distinguish between price changes caused by information arrivals and price changes caused by noise. This hypothesis is tested in essay 3 using intraday data with the intraday seasonal volatility component removed, as based on the procedure in the first essay. The resultant seasonally adjusted variance series is decomposed into components caused by unexpected information arrivals and by noise in order to examine informativeness.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be effective, it is important to include the visualization techniques in the mining process and to generate the discovered patterns for a more comprehensive visual view. In this dissertation, four related problems: dimensionality reduction for visualizing high dimensional datasets, visualization-based clustering evaluation, interactive document mining, and multiple clusterings exploration are studied to explore the integration of data mining and data visualization. In particular, we 1) propose an efficient feature selection method (reliefF + mRMR) for preprocessing high dimensional datasets; 2) present DClusterE to integrate cluster validation with user interaction and provide rich visualization tools for users to examine document clustering results from multiple perspectives; 3) design two interactive document summarization systems to involve users efforts and generate customized summaries from 2D sentence layouts; and 4) propose a new framework which organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Physiological signals, which are controlled by the autonomic nervous system (ANS), could be used to detect the affective state of computer users and therefore find applications in medicine and engineering. The Pupil Diameter (PD) seems to provide a strong indication of the affective state, as found by previous research, but it has not been investigated fully yet. ^ In this study, new approaches based on monitoring and processing the PD signal for off-line and on-line affective assessment ("relaxation" vs. "stress") are proposed. Wavelet denoising and Kalman filtering methods are first used to remove abrupt changes in the raw Pupil Diameter (PD) signal. Then three features (PDmean, PDmax and PDWalsh) are extracted from the preprocessed PD signal for the affective state classification. In order to select more relevant and reliable physiological data for further analysis, two types of data selection methods are applied, which are based on the paired t-test and subject self-evaluation, respectively. In addition, five different kinds of the classifiers are implemented on the selected data, which achieve average accuracies up to 86.43% and 87.20%, respectively. Finally, the receiver operating characteristic (ROC) curve is utilized to investigate the discriminating potential of each individual feature by evaluation of the area under the ROC curve, which reaches values above 0.90. ^ For the on-line affective assessment, a hard threshold is implemented first in order to remove the eye blinks from the PD signal and then a moving average window is utilized to obtain the representative value PDr for every one-second time interval of PD. There are three main steps for the on-line affective assessment algorithm, which are preparation, feature-based decision voting and affective determination. The final results show that the accuracies are 72.30% and 73.55% for the data subsets, which were respectively chosen using two types of data selection methods (paired t-test and subject self-evaluation). ^ In order to further analyze the efficiency of affective recognition through the PD signal, the Galvanic Skin Response (GSR) was also monitored and processed. The highest affective assessment classification rate obtained from GSR processing is only 63.57% (based on the off-line processing algorithm). The overall results confirm that the PD signal should be considered as one of the most powerful physiological signals to involve in future automated real-time affective recognition systems, especially for detecting the "relaxation" vs. "stress" states.^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Protecting confidential information from improper disclosure is a fundamental security goal. While encryption and access control are important tools for ensuring confidentiality, they cannot prevent an authorized system from leaking confidential information to its publicly observable outputs, whether inadvertently or maliciously. Hence, secure information flow aims to provide end-to-end control of information flow. Unfortunately, the traditionally-adopted policy of noninterference, which forbids all improper leakage, is often too restrictive. Theories of quantitative information flow address this issue by quantifying the amount of confidential information leaked by a system, with the goal of showing that it is intuitively "small" enough to be tolerated. Given such a theory, it is crucial to develop automated techniques for calculating the leakage in a system. ^ This dissertation is concerned with program analysis for calculating the maximum leakage, or capacity, of confidential information in the context of deterministic systems and under three proposed entropy measures of information leakage: Shannon entropy leakage, min-entropy leakage, and g-leakage. In this context, it turns out that calculating the maximum leakage of a program reduces to counting the number of possible outputs that it can produce. ^ The new approach introduced in this dissertation is to determine two-bit patterns, the relationships among pairs of bits in the output; for instance we might determine that two bits must be unequal. By counting the number of solutions to the two-bit patterns, we obtain an upper bound on the number of possible outputs. Hence, the maximum leakage can be bounded. We first describe a straightforward computation of the two-bit patterns using an automated prover. We then show a more efficient implementation that uses an implication graph to represent the two- bit patterns. It efficiently constructs the graph through the use of an automated prover, random executions, STP counterexamples, and deductive closure. The effectiveness of our techniques, both in terms of efficiency and accuracy, is shown through a number of case studies found in recent literature. ^

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Many systems and applications are continuously producing events. These events are used to record the status of the system and trace the behaviors of the systems. By examining these events, system administrators can check the potential problems of these systems. If the temporal dynamics of the systems are further investigated, the underlying patterns can be discovered. The uncovered knowledge can be leveraged to predict the future system behaviors or to mitigate the potential risks of the systems. Moreover, the system administrators can utilize the temporal patterns to set up event management rules to make the system more intelligent. With the popularity of data mining techniques in recent years, these events grad- ually become more and more useful. Despite the recent advances of the data mining techniques, the application to system event mining is still in a rudimentary stage. Most of works are still focusing on episodes mining or frequent pattern discovering. These methods are unable to provide a brief yet comprehensible summary to reveal the valuable information from the high level perspective. Moreover, these methods provide little actionable knowledge to help the system administrators to better man- age the systems. To better make use of the recorded events, more practical techniques are required. From the perspective of data mining, three correlated directions are considered to be helpful for system management: (1) Provide concise yet comprehensive summaries about the running status of the systems; (2) Make the systems more intelligence and autonomous; (3) Effectively detect the abnormal behaviors of the systems. Due to the richness of the event logs, all these directions can be solved in the data-driven manner. And in this way, the robustness of the systems can be enhanced and the goal of autonomous management can be approached. This dissertation mainly focuses on the foregoing directions that leverage tem- poral mining techniques to facilitate system management. More specifically, three concrete topics will be discussed, including event, resource demand prediction, and streaming anomaly detection. Besides the theoretic contributions, the experimental evaluation will also be presented to demonstrate the effectiveness and efficacy of the corresponding solutions.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We develop a new autoregressive conditional process to capture both the changes and the persistency of the intraday seasonal (U-shape) pattern of volatility in essay 1. Unlike other procedures, this approach allows for the intraday volatility pattern to change over time without the filtering process injecting a spurious pattern of noise into the filtered series. We show that prior deterministic filtering procedures are special cases of the autoregressive conditional filtering process presented here. Lagrange multiplier tests prove that the stochastic seasonal variance component is statistically significant. Specification tests using the correlogram and cross-spectral analyses prove the reliability of the autoregressive conditional filtering process. In essay 2 we develop a new methodology to decompose return variance in order to examine the informativeness embedded in the return series. The variance is decomposed into the information arrival component and the noise factor component. This decomposition methodology differs from previous studies in that both the informational variance and the noise variance are time-varying. Furthermore, the covariance of the informational component and the noisy component is no longer restricted to be zero. The resultant measure of price informativeness is defined as the informational variance divided by the total variance of the returns. The noisy rational expectations model predicts that uninformed traders react to price changes more than informed traders, since uninformed traders cannot distinguish between price changes caused by information arrivals and price changes caused by noise. This hypothesis is tested in essay 3 using intraday data with the intraday seasonal volatility component removed, as based on the procedure in the first essay. The resultant seasonally adjusted variance series is decomposed into components caused by unexpected information arrivals and by noise in order to examine informativeness.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Essai doctoral d'intégration présenté à la Faculté des Études Supérieures et Postdoctorales en vue de l'obtention du grade de Docteur en psychologie (D.Psy.), en psychologie clinique

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Essai doctoral d'intégration présenté à la Faculté des Études Supérieures et Postdoctorales en vue de l'obtention du grade de Docteur en psychologie (D.Psy.), en psychologie clinique

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, we consider a multiuser downlink wiretap network consisting of one base station (BS) equipped with AA antennas, NB single-antenna legitimate users, and NE single-antenna eavesdroppers over Nakagami-m fading channels. In particular, we introduce a joint secure transmission scheme that adopts transmit antenna selection (TAS) at the BS and explores threshold-based selection diversity (tSD) scheduling over legitimate users to achieve a good secrecy performance while maintaining low implementation complexity. More specifically, in an effort to quantify the secrecy performance of the considered system, two practical scenarios are investigated, i.e., Scenario I: the eavesdropper’s channel state information (CSI) is unavailable at the BS, and Scenario II: the eavesdropper’s CSI is available at the BS. For Scenario I, novel exact closed-form expressions of the secrecy outage probability are derived, which are valid for general networks with an arbitrary number of legitimate users, antenna configurations, number of eavesdroppers, and the switched threshold. For Scenario II, we take into account the ergodic secrecy rate as the principle performance metric, and derive novel closed-form expressions of the exact ergodic secrecy rate. Additionally, we also provide simple and asymptotic expressions for secrecy outage probability and ergodic secrecy rate under two distinct cases, i.e., Case I: the legitimate user is located close to the BS, and Case II: both the legitimate user and eavesdropper are located close to the BS. Our important findings reveal that the secrecy diversity order is AAmA and the slope of secrecy rate is one under Case I, while the secrecy diversity order and the slope of secrecy rate collapse to zero under Case II, where the secrecy performance floor occurs. Finally, when the switched threshold is carefully selected, the considered scheduling scheme outperforms other well known existing schemes in terms of the secrecy performance and complexity tradeoff

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Recommendation systems aim to help users make decisions more efficiently. The most widely used method in recommendation systems is collaborative filtering, of which, a critical step is to analyze a user's preferences and make recommendations of products or services based on similarity analysis with other users' ratings. However, collaborative filtering is less usable for recommendation facing the "cold start" problem, i.e. few comments being given to products or services. To tackle this problem, we propose an improved method that combines collaborative filtering and data classification. We use hotel recommendation data to test the proposed method. The accuracy of the recommendation is determined by the rankings. Evaluations regarding the accuracies of Top-3 and Top-10 recommendation lists using the 10-fold cross-validation method and ROC curves are conducted. The results show that the Top-3 hotel recommendation list proposed by the combined method has the superiority of the recommendation performance than the Top-10 list under the cold start condition in most of the times.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of topic patterns over different contexts. For instance, analysis of search logs in the context of the user can reveal how we can improve the quality of a search engine by optimizing the search results according to particular users; analysis of customer reviews in the context of positive and negative sentiments can help the user summarize public opinions about a product; analysis of blogs or scientific publications in the context of a social network can facilitate discovery of more meaningful topical communities. Since context information significantly affects the choices of topics and language made by authors, in general, it is very important to incorporate it into analyzing and mining text data. In general, modeling the context in text, discovering contextual patterns of language units and topics from text, a general task which we refer to as Contextual Text Mining, has widespread applications in text mining. In this thesis, we provide a novel and systematic study of contextual text mining, which is a new paradigm of text mining treating context information as the ``first-class citizen.'' We formally define the problem of contextual text mining and its basic tasks, and propose a general framework for contextual text mining based on generative modeling of text. This conceptual framework provides general guidance on text mining problems with context information and can be instantiated into many real tasks, including the general problem of contextual topic analysis. We formally present a functional framework for contextual topic analysis, with a general contextual topic model and its various versions, which can effectively solve the text mining problems in a lot of real world applications. We further introduce general components of contextual topic analysis, by adding priors to contextual topic models to incorporate prior knowledge, regularizing contextual topic models with dependency structure of context, and postprocessing contextual patterns to extract refined patterns. The refinements on the general contextual topic model naturally lead to a variety of probabilistic models which incorporate different types of context and various assumptions and constraints. These special versions of the contextual topic model are proved effective in a variety of real applications involving topics and explicit contexts, implicit contexts, and complex contexts. We then introduce a postprocessing procedure for contextual patterns, by generating meaningful labels for multinomial context models. This method provides a general way to interpret text mining results for real users. By applying contextual text mining in the ``context'' of other text information management tasks, including ad hoc text retrieval and web search, we further prove the effectiveness of contextual text mining techniques in a quantitative way with large scale datasets. The framework of contextual text mining not only unifies many explorations of text analysis with context information, but also opens up many new possibilities for future research directions in text mining.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The search for patterns or motifs in data represents an area of key interest to many researchers. In this paper we present the Motif Tracking Algorithm, a novel immune inspired pattern identification tool that is able to identify unknown motifs which repeat within time series data. The power of the algorithm is derived from its use of a small number of parameters with minimal assumptions. The algorithm searches from a completely neutral perspective that is independent of the data being analysed and the underlying motifs. In this paper the motif tracking algorithm is applied to the search for patterns within sequences of low level system calls between the Linux kernel and the operating system’s user space. The MTA is able to compress data found in large system call data sets to a limited number of motifs which summarise that data. The motifs provide a resource from which a profile of executed processes can be built. The potential for these profiles and new implications for security research are highlighted. A higher level system call language for measuring similarity between patterns of such calls is also suggested.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The Library of the Institute of Alajuela made an induction experience and training of users and ventured into the information literacy and engaged in the work of the teaching-learning as an integral part of the curriculum. The actions of the library in developing search strategies, location, selection and use of information brought inthe health service, changes to the role of the library, the librarian, the book and the information in the educational environment.By sharing this experience is intended to provide information that can motivate staff of educational institutions that wish toenter the field of information literacy as a strategy to support the development oflifelong independent learning skills and meaningful learning. Currently, the library should be a proactive part in the education of students but also teachers, administrative and family.This will result in a benefit to Costa Rica: the development of youth and their proper integration into the workplace.