17 resultados para Databases and Information Systems
em Digital Commons at Florida International University
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. This thesis describes a heterogeneous database system being developed at Highperformance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i.) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii.) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii.) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv.) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v.) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi.) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii.) a framework for intelligent computing and communication on the Internet applying the concepts of our work.
Resumo:
Many systems and applications are continuously producing events. These events are used to record the status of the system and trace the behaviors of the systems. By examining these events, system administrators can check the potential problems of these systems. If the temporal dynamics of the systems are further investigated, the underlying patterns can be discovered. The uncovered knowledge can be leveraged to predict the future system behaviors or to mitigate the potential risks of the systems. Moreover, the system administrators can utilize the temporal patterns to set up event management rules to make the system more intelligent. With the popularity of data mining techniques in recent years, these events grad- ually become more and more useful. Despite the recent advances of the data mining techniques, the application to system event mining is still in a rudimentary stage. Most of works are still focusing on episodes mining or frequent pattern discovering. These methods are unable to provide a brief yet comprehensible summary to reveal the valuable information from the high level perspective. Moreover, these methods provide little actionable knowledge to help the system administrators to better man- age the systems. To better make use of the recorded events, more practical techniques are required. From the perspective of data mining, three correlated directions are considered to be helpful for system management: (1) Provide concise yet comprehensive summaries about the running status of the systems; (2) Make the systems more intelligence and autonomous; (3) Effectively detect the abnormal behaviors of the systems. Due to the richness of the event logs, all these directions can be solved in the data-driven manner. And in this way, the robustness of the systems can be enhanced and the goal of autonomous management can be approached. This dissertation mainly focuses on the foregoing directions that leverage tem- poral mining techniques to facilitate system management. More specifically, three concrete topics will be discussed, including event, resource demand prediction, and streaming anomaly detection. Besides the theoretic contributions, the experimental evaluation will also be presented to demonstrate the effectiveness and efficacy of the corresponding solutions.
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.
Resumo:
With hundreds of millions of users reporting locations and embracing mobile technologies, Location Based Services (LBSs) are raising new challenges. In this dissertation, we address three emerging problems in location services, where geolocation data plays a central role. First, to handle the unprecedented growth of generated geolocation data, existing location services rely on geospatial database systems. However, their inability to leverage combined geographical and textual information in analytical queries (e.g. spatial similarity joins) remains an open problem. To address this, we introduce SpsJoin, a framework for computing spatial set-similarity joins. SpsJoin handles combined similarity queries that involve textual and spatial constraints simultaneously. LBSs use this system to tackle different types of problems, such as deduplication, geolocation enhancement and record linkage. We define the spatial set-similarity join problem in a general case and propose an algorithm for its efficient computation. Our solution utilizes parallel computing with MapReduce to handle scalability issues in large geospatial databases. Second, applications that use geolocation data are seldom concerned with ensuring the privacy of participating users. To motivate participation and address privacy concerns, we propose iSafe, a privacy preserving algorithm for computing safety snapshots of co-located mobile devices as well as geosocial network users. iSafe combines geolocation data extracted from crime datasets and geosocial networks such as Yelp. In order to enhance iSafe's ability to compute safety recommendations, even when crime information is incomplete or sparse, we need to identify relationships between Yelp venues and crime indices at their locations. To achieve this, we use SpsJoin on two datasets (Yelp venues and geolocated businesses) to find venues that have not been reviewed and to further compute the crime indices of their locations. Our results show a statistically significant dependence between location crime indices and Yelp features. Third, review centered LBSs (e.g., Yelp) are increasingly becoming targets of malicious campaigns that aim to bias the public image of represented businesses. Although Yelp actively attempts to detect and filter fraudulent reviews, our experiments showed that Yelp is still vulnerable. Fraudulent LBS information also impacts the ability of iSafe to provide correct safety values. We take steps toward addressing this problem by proposing SpiDeR, an algorithm that takes advantage of the richness of information available in Yelp to detect abnormal review patterns. We propose a fake venue detection solution that applies SpsJoin on Yelp and U.S. housing datasets. We validate the proposed solutions using ground truth data extracted by our experiments and reviews filtered by Yelp.
Resumo:
The Chihuahua desert is one of the most biologically diverse ecosystems in the world, but suffers serious degradation because of changes in fire regimes resulting in large catastrophic fires. My study was conducted in the Sierra La Mojonera (SLM) natural protected area in Mexico. The purpose of this study was to implement the use of FARSITE fire modeling as a fire management tool to develop an integrated fire management plan at SLM. Firebreaks proved to detain 100% of wildfire outbreaks. The rosetophilous scrub experienced the fastest rate of fire spread and lowland creosote bush scrub experienced the slowest rate of fire spread. March experienced the fastest rate of fire spread, while September experienced the slowest rate of fire spread. The results of my study provide a tool for wildfire management through the use geospatial technologies and, in particular, FARSITE fire modeling in SLM and Mexico.
Resumo:
Geographic Information Systems (GIS) is an emerging information technology (IT) which promises to have large scale influences in how spatially distributed resources are managed. It has had applications in the management of issues as diverse as recovering from the disaster of Hurricane Andrew to aiding military operations in Desert Storm. Implementation of GIS systems is an important issue because there are high cost and time involvement in setting them up. An important component of the implementation problem is the "meaning" different groups of people who are influencing the implementation give to the technology. The research was based on the theory of (theoretical stance to the problem was based on the) "Social Construction of Knowledge" systems which assumes knowledge systems are subject to sociological analysis both in usage and in content. An interpretive research approach was adopted to inductively derive a model which explains how the "meanings" of a GIS are socially constructed. The research design entailed a comparative case analysis over two county sites which were using the same GIS for a variety of purposes. A total of 75 in-depth interviews were conducted to elicit interpretations of GIS. Results indicate that differences in how geographers and data-processors view the technology lead to different implementation patterns in the two sites.
Resumo:
Automated information system design and implementation is one of the fastest changing aspects of the hospitality industry. During the past several years nothing has increased the professionalism or improved the productivity within the industry more than the application of computer technology. Intuitive software applications, deemed the first step toward making computers more people-literate, object-oriented programming, intended to more accurately model reality, and wireless communications are expected to play a significant role in future technological advancement.
Resumo:
This research analyzed the spatial relationship between a mega-scale fracture network and the occurrence of vegetation in an arid region. High-resolution aerial photographs of Arches National Park, Utah were used for digital image processing. Four sets of large-scale joints were digitized from the rectified color photograph in order to characterize the geospatial properties of the fracture network with the aid of a Geographic Information System. An unsupervised landcover classification was carried out to identify the spatial distribution of vegetation on the fractured outcrop. Results of this study confirm that the WNW-ESE alignment of vegetation is dominantly controlled by the spatial distribution of the systematic joint set, which in turn parallels the regional fold axis. This research provides insight into the spatial heterogeneity inherent to fracture networks, as well as the effects of jointing on the distribution of surface vegetation in desert environments.
Resumo:
This dissertation examines the consequences of Electronic Data Interchange (EDI) use on interorganizational relations (IR) in the retail industry. EDI is a type of interorganizational information system that facilitates the exchange of business documents in structured, machine processable form. The research model links EDI use and three IR dimensions--structural, behavioral, and outcome. Based on relevant literature from organizational theory and marketing channels, fourteen hypotheses were proposed for the relationships among EDI use and the three IR dimensions.^ Data were collected through self-administered questionnaires from key informants in 97 retail companies (19% response rate). The hypotheses were tested using multiple regression analysis. The analysis supports the following hypothesis: (a) EDI use is positively related to information intensity and formalization, (b) formalization is positively related to cooperation, (c) information intensity is positively related to cooperation, (d) conflict is negatively related to performance and satisfaction, (e) cooperation is positively related to performance, and (f) performance is positively related to satisfaction. The results support the general premise of the model that the relationship between EDI use and satisfaction among channel members has to be viewed within an interorganizational context.^ Research on EDI is still in a nascent stage. By identifying and testing relevant interorganizational variables, this study offers insights for practitioners managing boundary-spanning activities in organizations using or planning to use EDI. Further, the thesis provides avenues for future research aimed at understanding the consequences of this interorganizational information technology. ^
Resumo:
The ultimate intent of this dissertation was to broaden and strengthen our understanding of IT implementation by emphasizing research efforts on the dynamic nature of the implementation process. More specifically, efforts were directed toward opening the "black box" and providing the story that explains how and why contextual conditions and implementation tactics interact to produce project outcomes. In pursuit of this objective, the dissertation was aimed at theory building and adopted a case study methodology combining qualitative and quantitative evidence. Precisely, it examined the implementation process, use and consequences of three clinical information systems at Jackson Memorial Hospital, a large tertiary care teaching hospital.^ As a preliminary step toward the development of a more realistic model of system implementation, the study proposes a new set of research propositions reflecting the dynamic nature of the implementation process.^ Findings clearly reveal that successful implementation projects are likely to be those where key actors envision end goals, anticipate challenges ahead, and recognize the presence of and seize opportunities. It was also found that IT implementation is characterized by the systems theory of equifinality, that is, there are likely several equally effective ways to achieve a given end goal. The selection of a particular implementation strategy appears to be a rational process where actions and decisions are largely influenced by the degree to which key actors recognize the mediating role of each tactic and are motivated to action. The nature of the implementation process is also characterized by the concept of "duality of structure," that is, context and actions mutually influence each other. Another key finding suggests that there is no underlying program that regulates the process of change and moves it form one given point toward a subsequent and already prefigured end. For this reason, the implementation process cannot be thought of as a series of activities performed in a sequential manner such as conceived in stage models. Finally, it was found that IT implementation is punctuated by a certain indeterminacy. Results suggest that only when substantial efforts are focused on what to look for and think about, it is less likely that unfavorable and undesirable consequences will occur. ^
Resumo:
Today, many organizations are turning to new approaches to building and maintaining information systems (I/S) to cope with a highly competitive business environment. Current anecdotal evidence indicates that the approaches being used improve the effectiveness of software development by encouraging active user participation throughout the development process. Unfortunately, very little is known about how the use of such approaches enhances the ability of team members to develop I/S that are responsive to changing business conditions.^ Drawing from predominant theories of organizational conflict, this study develops and tests a model of conflict among members of a development team. The model proposes that development approaches provide the relevant context conditioning the management and resolution of conflict in software development which, in turn, are crucial for the success of the development process.^ Empirical testing of the model was conducted using data collected through a combination of interviews with I/S executives and surveys of team members and business users at nine organizations. Results of path analysis provide support for the model's main prediction that integrative conflict management and distributive conflict management can contribute to I/S success by influencing differently the manifestation and resolution of conflict in software development. Further, analyses of variance indicate that object-oriented development, when compared to rapid and structured development, appears to produce the lowest levels of conflict management, conflict resolution, and I/S success.^ The proposed model and findings suggest academic implications for understanding the effects of different conflict management behaviors on software development outcomes, and practical implications for better managing the software development process, especially in user-oriented development environments. ^
Resumo:
The purpose of this study was to test Lotka’s law of scientific publication productivity using the methodology outlined by Pao (1985), in the field of Library and Information Studies (LIS). Lotka’s law has been sporadically tested in the field over the past 30+ years, but the results of these studies are inconclusive due to the varying methods employed by the researchers. ^ A data set of 1,856 citations that were found using the ISI Web of Knowledge databases were studied. The values of n and c were calculated to be 2.1 and 0.6418 (64.18%) respectively. The Kolmogorov-Smirnov (K-S) one sample goodness-of-fit test was conducted at the 0.10 level of significance. The Dmax value is 0.022758 and the calculated critical value is 0.026562. It was determined that the null hypothesis stating that there is no difference in the observed distribution of publications and the distribution obtained using Lotka’s and Pao’s procedure could not be rejected. ^ This study finds that literature in the field of Library and Information Studies does conform to Lotka’s law with reliable results. As result, Lotka’s law can be used in LIS as a standardized means of measuring author publication productivity which will lead to findings that are comparable on many levels (e.g., department, institution, national). Lotka’s law can be employed as an empirically proven analytical tool to establish publication productivity benchmarks for faculty and faculty librarians. Recommendations for further study include (a) exploring the characteristics of the high and low producers; (b) finding a way to successfully account for collaborative contributions in the formula; and, (c) a detailed study of institutional policies concerning publication productivity and its impact on the appointment, tenure and promotion process of academic librarians. ^
Resumo:
Construction organizations typically deal with large volumes of project data containing valuable information. It is found that these organizations do not use these data effectively for planning and decision-making. There are two reasons. First, the information systems in construction organizations are designed to support day-to-day construction operations. The data stored in these systems are often non-validated, non-integrated and are available in a format that makes it difficult for decision makers to use in order to make timely decisions. Second, the organizational structure and the IT infrastructure are often not compatible with the information systems thereby resulting in higher operational costs and lower productivity. These two issues have been investigated in this research with the objective of developing systems that are structured for effective decision-making. ^ A framework was developed to guide storage and retrieval of validated and integrated data for timely decision-making and to enable construction organizations to redesign their organizational structure and IT infrastructure matched with information system capabilities. The research was focused on construction owner organizations that were continuously involved in multiple construction projects. Action research and Data warehousing techniques were used to develop the framework. ^ One hundred and sixty-three construction owner organizations were surveyed in order to assess their data needs, data management practices and extent of use of information systems in planning and decision-making. For in-depth analysis, Miami-Dade Transit (MDT) was selected which is in-charge of all transportation-related construction projects in the Miami-Dade county. A functional model and a prototype system were developed to test the framework. The results revealed significant improvements in data management and decision-support operations that were examined through various qualitative (ease in data access, data quality, response time, productivity improvement, etc.) and quantitative (time savings and operational cost savings) measures. The research results were first validated by MDT and then by a representative group of twenty construction owner organizations involved in various types of construction projects. ^
Resumo:
The performance of building envelopes and roofing systems significantly depends on accurate knowledge of wind loads and the response of envelope components under realistic wind conditions. Wind tunnel testing is a well-established practice to determine wind loads on structures. For small structures much larger model scales are needed than for large structures, to maintain modeling accuracy and minimize Reynolds number effects. In these circumstances the ability to obtain a large enough turbulence integral scale is usually compromised by the limited dimensions of the wind tunnel meaning that it is not possible to simulate the low frequency end of the turbulence spectrum. Such flows are called flows with Partial Turbulence Simulation. In this dissertation, the test procedure and scaling requirements for tests in partial turbulence simulation are discussed. A theoretical method is proposed for including the effects of low-frequency turbulences in the post-test analysis. In this theory the turbulence spectrum is divided into two distinct statistical processes, one at high frequencies which can be simulated in the wind tunnel, and one at low frequencies which can be treated in a quasi-steady manner. The joint probability of load resulting from the two processes is derived from which full-scale equivalent peak pressure coefficients can be obtained. The efficacy of the method is proved by comparing predicted data derived from tests on large-scale models of the Silsoe Cube and Texas-Tech University buildings in Wall of Wind facility at Florida International University with the available full-scale data. For multi-layer building envelopes such as rain-screen walls, roof pavers, and vented energy efficient walls not only peak wind loads but also their spatial gradients are important. Wind permeable roof claddings like roof pavers are not well dealt with in many existing building codes and standards. Large-scale experiments were carried out to investigate the wind loading on concrete pavers including wind blow-off tests and pressure measurements. Simplified guidelines were developed for design of loose-laid roof pavers against wind uplift. The guidelines are formatted so that use can be made of the existing information in codes and standards such as ASCE 7-10 on pressure coefficients on components and cladding.
Resumo:
Construction organizations typically deal with large volumes of project data containing valuable information. It is found that these organizations do not use these data effectively for planning and decision-making. There are two reasons. First, the information systems in construction organizations are designed to support day-to-day construction operations. The data stored in these systems are often non-validated, nonintegrated and are available in a format that makes it difficult for decision makers to use in order to make timely decisions. Second, the organizational structure and the IT infrastructure are often not compatible with the information systems thereby resulting in higher operational costs and lower productivity. These two issues have been investigated in this research with the objective of developing systems that are structured for effective decision-making. A framework was developed to guide storage and retrieval of validated and integrated data for timely decision-making and to enable construction organizations to redesign their organizational structure and IT infrastructure matched with information system capabilities. The research was focused on construction owner organizations that were continuously involved in multiple construction projects. Action research and Data warehousing techniques were used to develop the framework. One hundred and sixty-three construction owner organizations were surveyed in order to assess their data needs, data management practices and extent of use of information systems in planning and decision-making. For in-depth analysis, Miami-Dade Transit (MDT) was selected which is in-charge of all transportation-related construction projects in the Miami-Dade county. A functional model and a prototype system were developed to test the framework. The results revealed significant improvements in data management and decision-support operations that were examined through various qualitative (ease in data access, data quality, response time, productivity improvement, etc.) and quantitative (time savings and operational cost savings) measures. The research results were first validated by MDT and then by a representative group of twenty construction owner organizations involved in various types of construction projects.