12 resultados para data integration
em Digital Commons at Florida International University
Resumo:
An Automatic Vehicle Location (AVL) system is a computer-based vehicle tracking system that is capable of determining a vehicle's location in real time. As a major technology of the Advanced Public Transportation System (APTS), AVL systems have been widely deployed by transit agencies for purposes such as real-time operation monitoring, computer-aided dispatching, and arrival time prediction. AVL systems make a large amount of transit performance data available that are valuable for transit performance management and planning purposes. However, the difficulties of extracting useful information from the huge spatial-temporal database have hindered off-line applications of the AVL data. ^ In this study, a data mining process, including data integration, cluster analysis, and multiple regression, is proposed. The AVL-generated data are first integrated into a Geographic Information System (GIS) platform. The model-based cluster method is employed to investigate the spatial and temporal patterns of transit travel speeds, which may be easily translated into travel time. The transit speed variations along the route segments are identified. Transit service periods such as morning peak, mid-day, afternoon peak, and evening periods are determined based on analyses of transit travel speed variations for different times of day. The seasonal patterns of transit performance are investigated by using the analysis of variance (ANOVA). Travel speed models based on the clustered time-of-day intervals are developed using important factors identified as having significant effects on speed for different time-of-day periods. ^ It has been found that transit performance varied from different seasons and different time-of-day periods. The geographic location of a transit route segment also plays a role in the variation of the transit performance. The results of this research indicate that advanced data mining techniques have good potential in providing automated techniques of assisting transit agencies in service planning, scheduling, and operations control. ^
Resumo:
The mediator software architecture design has been developed to provide data integration and retrieval in distributed, heterogeneous environments. Since the initial conceptualization of this architecture, many new technologies have emerged that can facilitate the implementation of this design. The purpose of this thesis was to show that a mediator framework supporting users of mobile devices could be implemented using common software technologies available today. In addition, the prototype was developed with a view to providing a better understanding of what a mediator is and to expose issues that will have to be addressed in full, more robust designs. The prototype developed for this thesis was implemented using various technologies including: Java, XML, and Simple Object Access Protocol (SOAP) among others. SOAP was used to accomplish inter-process communication. In the end, it is expected that more data intensive software applications will be possible in a world with ever-increasing demands for information.
Resumo:
This study analyzed the health and overall landcover of citrus crops in Florida. The analysis was completed using Landsat satellite imagery available free of charge from the University of Maryland Global Landcover Change Facility. The project hypothesized that combining citrus production (economic) data with citrus area per county derived from spectral signatures would yield correlations between observable spectral reflectance throughout the year, and the fiscal impact of citrus on local economies. A positive correlation between these two data types would allow us to predict the economic impact of citrus using spectral data analysis to determine final crop harvests.
Resumo:
Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be effective, it is important to include the visualization techniques in the mining process and to generate the discovered patterns for a more comprehensive visual view. In this dissertation, four related problems: dimensionality reduction for visualizing high dimensional datasets, visualization-based clustering evaluation, interactive document mining, and multiple clusterings exploration are studied to explore the integration of data mining and data visualization. In particular, we 1) propose an efficient feature selection method (reliefF + mRMR) for preprocessing high dimensional datasets; 2) present DClusterE to integrate cluster validation with user interaction and provide rich visualization tools for users to examine document clustering results from multiple perspectives; 3) design two interactive document summarization systems to involve users efforts and generate customized summaries from 2D sentence layouts; and 4) propose a new framework which organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions.
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. ^ This thesis describes a heterogeneous database system being developed at High-performance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii) a framework for intelligent computing and communication on the Internet applying the concepts of our work. ^
Resumo:
The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^
Resumo:
Natural and man-made disasters have gained attention at all levels of policy-making in recent years. Emergency management tasks are inherently complex and unpredictable, and often require coordination among multiple organizations across different levels and locations. Effectively managing various knowledge areas and the organizations involved has become a critical emergency management success factor. However, there is a general lack of understanding about how to describe and assess the complex nature of emergency management tasks and how knowledge integration can help managers improve emergency management task performance. ^ The purpose of this exploratory research was first, to understand how emergency management operations are impacted by tasks that are complex and inter-organizational and second, to investigate how knowledge integration as a particular knowledge management strategy can improve the efficiency and effectiveness of the emergency tasks. Three types of specific knowledge were considered: context-specific, technology-specific, and context-and-technology-specific. ^ The research setting was the Miami-Dade Emergency Operations Center (EOC) and the study was based on the survey responses from the participants in past EOC activations related to their emergency tasks and knowledge areas. The data included task attributes related to complexity, knowledge area, knowledge integration, specificity of knowledge, and task performance. The data was analyzed using multiple linear regressions and path analyses, to (1) examine the relationships between task complexity, knowledge integration, and performance, (2) the moderating effects of each type of specific knowledge on the relationship between task complexity and performance, and (3) the mediating role of knowledge integration. ^ As per theory-based propositions, the results indicated that overall component complexity and interactive complexity tend to have a negative effect on task performance. But surprisingly, procedural rigidity tended to have a positive effect on performance in emergency management tasks. Also as per our expectation, knowledge integration had a positive relationship with task performance. Interestingly, the moderating effects of each type of specific knowledge on the relationship between task complexity and performance were varied and the extent of mediation of knowledge integration depended on the dimension of task complexity. ^
Resumo:
The No Child Left Behind Act of 2001 (NCLB) brought many significant changes to American schools including accessibility to technology. Through an extensive literature review of the relationship between technology leadership and student achievement, five major themes emerged from data that support the need for more effective computer-based education in schools.
Resumo:
This dissertation examines the relationship between the degree of openness to international trade and the level of growth experienced by a country. More precisely, it explores how trade liberalization and economic integration can lead to specialization in production, affect national levels of welfare, productivity, and competition and at the same time reinforce deflationary efforts. A large part of this investigation is carried out using industry-level data from Spain. ^
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. This thesis describes a heterogeneous database system being developed at Highperformance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i.) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii.) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii.) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv.) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v.) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi.) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii.) a framework for intelligent computing and communication on the Internet applying the concepts of our work.
Resumo:
Natural and man-made disasters have gained attention at all levels of policy-making in recent years. Emergency management tasks are inherently complex and unpredictable, and often require coordination among multiple organizations across different levels and locations. Effectively managing various knowledge areas and the organizations involved has become a critical emergency management success factor. However, there is a general lack of understanding about how to describe and assess the complex nature of emergency management tasks and how knowledge integration can help managers improve emergency management task performance. The purpose of this exploratory research was first, to understand how emergency management operations are impacted by tasks that are complex and inter-organizational and second, to investigate how knowledge integration as a particular knowledge management strategy can improve the efficiency and effectiveness of the emergency tasks. Three types of specific knowledge were considered: context-specific, technology-specific, and context-and-technology-specific. The research setting was the Miami-Dade Emergency Operations Center (EOC) and the study was based on the survey responses from the participants in past EOC activations related to their emergency tasks and knowledge areas. The data included task attributes related to complexity, knowledge area, knowledge integration, specificity of knowledge, and task performance. The data was analyzed using multiple linear regressions and path analyses, to (1) examine the relationships between task complexity, knowledge integration, and performance, (2) the moderating effects of each type of specific knowledge on the relationship between task complexity and performance, and (3) the mediating role of knowledge integration. As per theory-based propositions, the results indicated that overall component complexity and interactive complexity tend to have a negative effect on task performance. But surprisingly, procedural rigidity tended to have a positive effect on performance in emergency management tasks. Also as per our expectation, knowledge integration had a positive relationship with task performance. Interestingly, the moderating effects of each type of specific knowledge on the relationship between task complexity and performance were varied and the extent of mediation of knowledge integration depended on the dimension of task complexity.
Resumo:
This research is part of continued efforts to correlate the hydrology of East Fork Poplar Creek (EFPC) and Bear Creek (BC) with the long term distribution of mercury within the overland, subsurface, and river sub-domains. The main objective of this study was to add a sedimentation module (ECO Lab) capable of simulating the reactive transport mercury exchange mechanisms within sediments and porewater throughout the watershed. The enhanced model was then applied to a Total Maximum Daily Load (TMDL) mercury analysis for EFPC. That application used historical precipitation, groundwater levels, river discharges, and mercury concentrations data that were retrieved from government databases and input to the model. The model was executed to reduce computational time, predict flow discharges, total mercury concentration, flow duration and mercury mass rate curves at key monitoring stations under various hydrological and environmental conditions and scenarios. The computational results provided insight on the relationship between discharges and mercury mass rate curves at various stations throughout EFPC, which is important to best understand and support the management mercury contamination and remediation efforts within EFPC.