13 resultados para Data Mining and its Application

em Digital Commons at Florida International University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Metagenomics is the culture-independent study of genetic material obtained directly from environmental samples. It has become a realistic approach to understanding microbial communities thanks to advances in high-throughput DNA sequencing technologies over the past decade. Current research has shown that different sites of the human body house varied bacterial communities. There is a strong correlation between an individual’s microbial community profile at a given site and disease. Metagenomics is being applied more often as a means of comparing microbial profiles in biomedical studies. The analysis of the data collected using metagenomics can be quite challenging and there exist a plethora of tools for interpreting the results. An automatic analytical workflow for metagenomic analyses has been implemented and tested using synthetic datasets of varying quality. It is able to accurately classify bacteria by taxa and correctly estimate the richness and diversity of each set. The workflow was then applied to the study of the airways microbiome in Chronic Obstructive Pulmonary Disease (COPD). COPD is a progressive lung disease resulting in narrowing of the airways and restricted airflow. Despite being the third leading cause of death in the United States, little is known about the differences in the lung microbial community profiles of healthy individuals and COPD patients. Bronchoalveolar lavage (BAL) samples were collected from COPD patients, active or ex-smokers, and never smokers and sequenced by 454 pyrosequencing. A total of 56 individuals were recruited for the study. Substantial colonization of the lungs was found in all subjects and differentially abundant genera in each group were identified. These discoveries are promising and may further our understanding of how the structure of the lung microbiome is modified as COPD progresses. It is also anticipated that the results will eventually lead to improved treatments for COPD.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We develop a new autoregressive conditional process to capture both the changes and the persistency of the intraday seasonal (U-shape) pattern of volatility in essay 1. Unlike other procedures, this approach allows for the intraday volatility pattern to change over time without the filtering process injecting a spurious pattern of noise into the filtered series. We show that prior deterministic filtering procedures are special cases of the autoregressive conditional filtering process presented here. Lagrange multiplier tests prove that the stochastic seasonal variance component is statistically significant. Specification tests using the correlogram and cross-spectral analyses prove the reliability of the autoregressive conditional filtering process. In essay 2 we develop a new methodology to decompose return variance in order to examine the informativeness embedded in the return series. The variance is decomposed into the information arrival component and the noise factor component. This decomposition methodology differs from previous studies in that both the informational variance and the noise variance are time-varying. Furthermore, the covariance of the informational component and the noisy component is no longer restricted to be zero. The resultant measure of price informativeness is defined as the informational variance divided by the total variance of the returns. The noisy rational expectations model predicts that uninformed traders react to price changes more than informed traders, since uninformed traders cannot distinguish between price changes caused by information arrivals and price changes caused by noise. This hypothesis is tested in essay 3 using intraday data with the intraday seasonal volatility component removed, as based on the procedure in the first essay. The resultant seasonally adjusted variance series is decomposed into components caused by unexpected information arrivals and by noise in order to examine informativeness.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We develop a new autoregressive conditional process to capture both the changes and the persistency of the intraday seasonal (U-shape) pattern of volatility in essay 1. Unlike other procedures, this approach allows for the intraday volatility pattern to change over time without the filtering process injecting a spurious pattern of noise into the filtered series. We show that prior deterministic filtering procedures are special cases of the autoregressive conditional filtering process presented here. Lagrange multiplier tests prove that the stochastic seasonal variance component is statistically significant. Specification tests using the correlogram and cross-spectral analyses prove the reliability of the autoregressive conditional filtering process. In essay 2 we develop a new methodology to decompose return variance in order to examine the informativeness embedded in the return series. The variance is decomposed into the information arrival component and the noise factor component. This decomposition methodology differs from previous studies in that both the informational variance and the noise variance are time-varying. Furthermore, the covariance of the informational component and the noisy component is no longer restricted to be zero. The resultant measure of price informativeness is defined as the informational variance divided by the total variance of the returns. The noisy rational expectations model predicts that uninformed traders react to price changes more than informed traders, since uninformed traders cannot distinguish between price changes caused by information arrivals and price changes caused by noise. This hypothesis is tested in essay 3 using intraday data with the intraday seasonal volatility component removed, as based on the procedure in the first essay. The resultant seasonally adjusted variance series is decomposed into components caused by unexpected information arrivals and by noise in order to examine informativeness.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research is part of continued efforts to correlate the hydrology of East Fork Poplar Creek (EFPC) and Bear Creek (BC) with the long term distribution of mercury within the overland, subsurface, and river sub-domains. The main objective of this study was to add a sedimentation module (ECO Lab) capable of simulating the reactive transport mercury exchange mechanisms within sediments and porewater throughout the watershed. The enhanced model was then applied to a Total Maximum Daily Load (TMDL) mercury analysis for EFPC. That application used historical precipitation, groundwater levels, river discharges, and mercury concentrations data that were retrieved from government databases and input to the model. The model was executed to reduce computational time, predict flow discharges, total mercury concentration, flow duration and mercury mass rate curves at key monitoring stations under various hydrological and environmental conditions and scenarios. The computational results provided insight on the relationship between discharges and mercury mass rate curves at various stations throughout EFPC, which is important to best understand and support the management mercury contamination and remediation efforts within EFPC.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be effective, it is important to include the visualization techniques in the mining process and to generate the discovered patterns for a more comprehensive visual view. In this dissertation, four related problems: dimensionality reduction for visualizing high dimensional datasets, visualization-based clustering evaluation, interactive document mining, and multiple clusterings exploration are studied to explore the integration of data mining and data visualization. In particular, we 1) propose an efficient feature selection method (reliefF + mRMR) for preprocessing high dimensional datasets; 2) present DClusterE to integrate cluster validation with user interaction and provide rich visualization tools for users to examine document clustering results from multiple perspectives; 3) design two interactive document summarization systems to involve users efforts and generate customized summaries from 2D sentence layouts; and 4) propose a new framework which organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The organizational authority of the Papacy in the Roman Catholic Church and the permanent membership of the UN Security Council are unique from institutions that are commonly compared with the UN, like the Concert of Europe and the League of Nations, in that these institutional organs possessed strong authoritative and veto powers. Both organs also owe their strong authority during their founding to a need for stability: The Papacy after the crippling of Western Roman Empire and the P-5 to deal with the insecurities of the post-WWII world. While the P-5 still possesses similar authoritative powers within the Council as it did after WWII, the historical authoritative powers of the Papacy within the Church was debilitated to such a degree that by the time of the Reformation in Europe, condemnations of practices within the Church itself were not effective. This paper will analyze major challenges to the authoritative powers of the Papacy, from the crowning of Charlemagne to the beginning of the Reformation, and compare the analysis to challenges affecting the authoritative powers of the P-5 since its creation. From research conducted thus far, I hypothesize that common themes affecting the authoritative powers of the P-5 and the Papacy would include: major changes in the institutions organization (i.e. the Avignon Papacy and Japan’s bid to become a permanent member); the decline in power of actors supporting the institutional organ (i.e. the Holy Roman Empire and the P-5 members); and ideological clashes affecting the institution’s normative power (i.e. the Great Western Schism and Cold War politics).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The organizational authority of the Papacy in the Roman Catholic Church and the permanent membership of the UN Security Council are unique from institutions that are commonly compared with the UN, like the Concert of Europe and the League of Nations, in that these institutional organs possessed strong authoritative and veto powers. Both organs also owe their strong authority during their founding to a need for stability: The Papacy after the crippling of Western Roman Empire and the P-5 to deal with the insecurities of the post-WWII world. While the P-5 still possesses similar authoritative powers within the Council as it did after WWII, the historical authoritative powers of the Papacy within the Church was debilitated to such a degree that by the time of the Reformation in Europe, condemnations of practices within the Church itself were not effective. This paper will analyze major challenges to the authoritative powers of the Papacy, from the crowning of Charlemagne to the beginning of the Reformation, and compare the analysis to challenges affecting the authoritative powers of the P-5 since its creation. From research conducted thus far, I hypothesize that common themes affecting the authoritative powers of the P-5 and the Papacy would include: major changes in the institutions organization (i.e. the Avignon Papacy and Japan’s bid to become a permanent member); the decline in power of actors supporting the institutional organ (i.e. the Holy Roman Empire and the P-5 members); and ideological clashes affecting the institution’s normative power (i.e. the Great Western Schism and Cold War politics).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Electronic database handling of buisness information has gradually gained its popularity in the hospitality industry. This article provides an overview on the fundamental concepts of a hotel database and investigates the feasibility of incorporating computer-assisted data mining techniques into hospitality database applications. The author also exposes some potential myths associated with data mining in hospitaltiy database applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The nation's freeway systems are becoming increasingly congested. A major contribution to traffic congestion on freeways is due to traffic incidents. Traffic incidents are non-recurring events such as accidents or stranded vehicles that cause a temporary roadway capacity reduction, and they can account for as much as 60 percent of all traffic congestion on freeways. One major freeway incident management strategy involves diverting traffic to avoid incident locations by relaying timely information through Intelligent Transportation Systems (ITS) devices such as dynamic message signs or real-time traveler information systems. The decision to divert traffic depends foremost on the expected duration of an incident, which is difficult to predict. In addition, the duration of an incident is affected by many contributing factors. Determining and understanding these factors can help the process of identifying and developing better strategies to reduce incident durations and alleviate traffic congestion. A number of research studies have attempted to develop models to predict incident durations, yet with limited success. ^ This dissertation research attempts to improve on this previous effort by applying data mining techniques to a comprehensive incident database maintained by the District 4 ITS Office of the Florida Department of Transportation (FDOT). Two categories of incident duration prediction models were developed: "offline" models designed for use in the performance evaluation of incident management programs, and "online" models for real-time prediction of incident duration to aid in the decision making of traffic diversion in the event of an ongoing incident. Multiple data mining analysis techniques were applied and evaluated in the research. The multiple linear regression analysis and decision tree based method were applied to develop the offline models, and the rule-based method and a tree algorithm called M5P were used to develop the online models. ^ The results show that the models in general can achieve high prediction accuracy within acceptable time intervals of the actual durations. The research also identifies some new contributing factors that have not been examined in past studies. As part of the research effort, software code was developed to implement the models in the existing software system of District 4 FDOT for actual applications. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ecotourism, a new term for low-impact nature travel, is receiving increasing attention. The author has researched the development of the U.S. ecotourism market from 1980-1989 in order to obtain data on the growth of this market segment. Factors involved in the growth of the U.S. ecotourism market are then examined in order to project the growth of this maeket during the 1990's.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.