22 resultados para Data-Mining Techniques
Resumo:
Construction organizations typically deal with large volumes of project data containing valuable information. It is found that these organizations do not use these data effectively for planning and decision-making. There are two reasons. First, the information systems in construction organizations are designed to support day-to-day construction operations. The data stored in these systems are often non-validated, non-integrated and are available in a format that makes it difficult for decision makers to use in order to make timely decisions. Second, the organizational structure and the IT infrastructure are often not compatible with the information systems thereby resulting in higher operational costs and lower productivity. These two issues have been investigated in this research with the objective of developing systems that are structured for effective decision-making. ^ A framework was developed to guide storage and retrieval of validated and integrated data for timely decision-making and to enable construction organizations to redesign their organizational structure and IT infrastructure matched with information system capabilities. The research was focused on construction owner organizations that were continuously involved in multiple construction projects. Action research and Data warehousing techniques were used to develop the framework. ^ One hundred and sixty-three construction owner organizations were surveyed in order to assess their data needs, data management practices and extent of use of information systems in planning and decision-making. For in-depth analysis, Miami-Dade Transit (MDT) was selected which is in-charge of all transportation-related construction projects in the Miami-Dade county. A functional model and a prototype system were developed to test the framework. The results revealed significant improvements in data management and decision-support operations that were examined through various qualitative (ease in data access, data quality, response time, productivity improvement, etc.) and quantitative (time savings and operational cost savings) measures. The research results were first validated by MDT and then by a representative group of twenty construction owner organizations involved in various types of construction projects. ^
Resumo:
Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as ƒ-test is performed during each node's split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.
Resumo:
Construction organizations typically deal with large volumes of project data containing valuable information. It is found that these organizations do not use these data effectively for planning and decision-making. There are two reasons. First, the information systems in construction organizations are designed to support day-to-day construction operations. The data stored in these systems are often non-validated, nonintegrated and are available in a format that makes it difficult for decision makers to use in order to make timely decisions. Second, the organizational structure and the IT infrastructure are often not compatible with the information systems thereby resulting in higher operational costs and lower productivity. These two issues have been investigated in this research with the objective of developing systems that are structured for effective decision-making. A framework was developed to guide storage and retrieval of validated and integrated data for timely decision-making and to enable construction organizations to redesign their organizational structure and IT infrastructure matched with information system capabilities. The research was focused on construction owner organizations that were continuously involved in multiple construction projects. Action research and Data warehousing techniques were used to develop the framework. One hundred and sixty-three construction owner organizations were surveyed in order to assess their data needs, data management practices and extent of use of information systems in planning and decision-making. For in-depth analysis, Miami-Dade Transit (MDT) was selected which is in-charge of all transportation-related construction projects in the Miami-Dade county. A functional model and a prototype system were developed to test the framework. The results revealed significant improvements in data management and decision-support operations that were examined through various qualitative (ease in data access, data quality, response time, productivity improvement, etc.) and quantitative (time savings and operational cost savings) measures. The research results were first validated by MDT and then by a representative group of twenty construction owner organizations involved in various types of construction projects.
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. This thesis describes a heterogeneous database system being developed at Highperformance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i.) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii.) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii.) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv.) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v.) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi.) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii.) a framework for intelligent computing and communication on the Internet applying the concepts of our work.
Resumo:
As the Web evolves unexpectedly fast, information grows explosively. Useful resources become more and more difficult to find because of their dynamic and unstructured characteristics. A vertical search engine is designed and implemented towards a specific domain. Instead of processing the giant volume of miscellaneous information distributed in the Web, a vertical search engine targets at identifying relevant information in specific domains or topics and eventually provides users with up-to-date information, highly focused insights and actionable knowledge representation. As the mobile device gets more popular, the nature of the search is changing. So, acquiring information on a mobile device poses unique requirements on traditional search engines, which will potentially change every feature they used to have. To summarize, users are strongly expecting search engines that can satisfy their individual information needs, adapt their current situation, and present highly personalized search results. In my research, the next generation vertical search engine means to utilize and enrich existing domain information to close the loop of vertical search engine's system that mutually facilitate knowledge discovering, actionable information extraction, and user interests modeling and recommendation. I investigate three problems in which domain taxonomy plays an important role, including taxonomy generation using a vertical search engine, actionable information extraction based on domain taxonomy, and the use of ensemble taxonomy to catch user's interests. As the fundamental theory, ultra-metric, dendrogram, and hierarchical clustering are intensively discussed. Methods on taxonomy generation using my research on hierarchical clustering are developed. The related vertical search engine techniques are practically used in Disaster Management Domain. Especially, three disaster information management systems are developed and represented as real use cases of my research work.
Resumo:
Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.
Resumo:
Ensemble Stream Modeling and Data-cleaning are sensor information processing systems have different training and testing methods by which their goals are cross-validated. This research examines a mechanism, which seeks to extract novel patterns by generating ensembles from data. The main goal of label-less stream processing is to process the sensed events to eliminate the noises that are uncorrelated, and choose the most likely model without over fitting thus obtaining higher model confidence. Higher quality streams can be realized by combining many short streams into an ensemble which has the desired quality. The framework for the investigation is an existing data mining tool. First, to accommodate feature extraction such as a bush or natural forest-fire event we make an assumption of the burnt area (BA*), sensed ground truth as our target variable obtained from logs. Even though this is an obvious model choice the results are disappointing. The reasons for this are two: One, the histogram of fire activity is highly skewed. Two, the measured sensor parameters are highly correlated. Since using non descriptive features does not yield good results, we resort to temporal features. By doing so we carefully eliminate the averaging effects; the resulting histogram is more satisfactory and conceptual knowledge is learned from sensor streams. Second is the process of feature induction by cross-validating attributes with single or multi-target variables to minimize training error. We use F-measure score, which combines precision and accuracy to determine the false alarm rate of fire events. The multi-target data-cleaning trees use information purity of the target leaf-nodes to learn higher order features. A sensitive variance measure such as f-test is performed during each node’s split to select the best attribute. Ensemble stream model approach proved to improve when using complicated features with a simpler tree classifier. The ensemble framework for data-cleaning and the enhancements to quantify quality of fitness (30% spatial, 10% temporal, and 90% mobility reduction) of sensor led to the formation of streams for sensor-enabled applications. Which further motivates the novelty of stream quality labeling and its importance in solving vast amounts of real-time mobile streams generated today.