4 resultados para multiple data

em Digital Commons at Florida International University


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The nation's freeway systems are becoming increasingly congested. A major contribution to traffic congestion on freeways is due to traffic incidents. Traffic incidents are non-recurring events such as accidents or stranded vehicles that cause a temporary roadway capacity reduction, and they can account for as much as 60 percent of all traffic congestion on freeways. One major freeway incident management strategy involves diverting traffic to avoid incident locations by relaying timely information through Intelligent Transportation Systems (ITS) devices such as dynamic message signs or real-time traveler information systems. The decision to divert traffic depends foremost on the expected duration of an incident, which is difficult to predict. In addition, the duration of an incident is affected by many contributing factors. Determining and understanding these factors can help the process of identifying and developing better strategies to reduce incident durations and alleviate traffic congestion. A number of research studies have attempted to develop models to predict incident durations, yet with limited success. ^ This dissertation research attempts to improve on this previous effort by applying data mining techniques to a comprehensive incident database maintained by the District 4 ITS Office of the Florida Department of Transportation (FDOT). Two categories of incident duration prediction models were developed: "offline" models designed for use in the performance evaluation of incident management programs, and "online" models for real-time prediction of incident duration to aid in the decision making of traffic diversion in the event of an ongoing incident. Multiple data mining analysis techniques were applied and evaluated in the research. The multiple linear regression analysis and decision tree based method were applied to develop the offline models, and the rule-based method and a tree algorithm called M5P were used to develop the online models. ^ The results show that the models in general can achieve high prediction accuracy within acceptable time intervals of the actual durations. The research also identifies some new contributing factors that have not been examined in past studies. As part of the research effort, software code was developed to implement the models in the existing software system of District 4 FDOT for actual applications. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The purpose of this study was to develop, explicate, and validate a comprehensive model in order to more effectively assess community injury prevention needs, plan and target efforts, identify potential interventions, and provide a framework for an outcome-based evaluation of the effectiveness of interventions. A systems model approach was developed to conceptualize the major components of inputs, efforts, outcomes and feedback within a community setting. Profiling of multiple data sources demonstrated a community feedback mechanism that increased awareness of priority issues and elicited support from traditional as well as non-traditional injury prevention partners. Injury countermeasures including education, enforcement, engineering, and economic incentives were presented for their potential synergistic effect impacting on knowledge, attitudes, or behaviors of a targeted population. Levels of outcome data were classified into ultimate, intermediate and immediate indicators to assist with determining the effectiveness of intervention efforts. A collaboration between business and health care was successful in achieving data access and use of an emergency department level of injury data for monitoring of the impact of community interventions. Evaluation of injury events and preventive efforts within the context of a dynamic community systems environment was applied to a study community with examples detailing actual profiling and trending of injuries. The resulting model of community injury prevention was validated using a community focus group, community injury prevention coordinators, and injury prevention national experts. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Since the 1980s, governments and organizations have promoted cash transfers in education as a tool for motivating elementary aged children to attend school. Oftentimes, the monthly payments supplemented the income a child would be making in the labor market. In Brazil, where these Bolsa or grant programs were pioneered, there has been much success in removing children from harsh labor conditions and increasing enrollment rates among the poorest families. However, the capacity of Bolsa Escola programs to meet other objectives, such as impacting educational outcomes and reducing incidences of poverty, continues to be examined. As these programs continue to be adopted globally, funding millions of children and families, evidence that demonstrates such success becomes ever more imperative. This study, therefore, examined evidence to determine whether Bolsa Escola programs have a significant impact on the academic performance of beneficiaries in Brazil. ^ Through the course of three data collection phases, multiple data sources were used to demonstrate the academic performance of fourth and eighth grade Brazilian students who were eligible to participate in either an NGO or the federal cash transfer program. MANOVAs were conducted separately for fourth and eighth grade data to determine if significant differences existed between measures of academic performance of Bolsa and non-Bolsa students. In every case and for both grade levels, significant effects were found for participation. ^ The limited qualitative data collected did not support drawing conclusions. Thematic analysis of the limited interview data pointed to possible dependency on Bolsa monthly stipends, and reallocation of responsibilities in the home in cases where children shifted from being breadwinners to students. ^