10 resultados para Knowledge discovery in databases

em Digital Commons at Florida International University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There is growing popularity in the use of composite indices and rankings for cross-organizational benchmarking. However, little attention has been paid to alternative methods and procedures for the computation of these indices and how the use of such methods may impact the resulting indices and rankings. This dissertation developed an approach for assessing composite indices and rankings based on the integration of a number of methods for aggregation, data transformation and attribute weighting involved in their computation. The integrated model developed is based on the simulation of composite indices using methods and procedures proposed in the area of multi-criteria decision making (MCDM) and knowledge discovery in databases (KDD). The approach developed in this dissertation was automated through an IT artifact that was designed, developed and evaluated based on the framework and guidelines of the design science paradigm of information systems research. This artifact dynamically generates multiple versions of indices and rankings by considering different methodological scenarios according to user specified parameters. The computerized implementation was done in Visual Basic for Excel 2007. Using different performance measures, the artifact produces a number of excel outputs for the comparison and assessment of the indices and rankings. In order to evaluate the efficacy of the artifact and its underlying approach, a full empirical analysis was conducted using the World Bank's Doing Business database for the year 2010, which includes ten sub-indices (each corresponding to different areas of the business environment and regulation) for 183 countries. The output results, which were obtained using 115 methodological scenarios for the assessment of this index and its ten sub-indices, indicated that the variability of the component indicators considered in each case influenced the sensitivity of the rankings to the methodological choices. Overall, the results of our multi-method assessment were consistent with the World Bank rankings except in cases where the indices involved cost indicators measured in per capita income which yielded more sensitive results. Low income level countries exhibited more sensitivity in their rankings and less agreement between the benchmark rankings and our multi-method based rankings than higher income country groups.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Natural and man-made disasters have gained attention at all levels of policy-making in recent years. Emergency management tasks are inherently complex and unpredictable, and often require coordination among multiple organizations across different levels and locations. Effectively managing various knowledge areas and the organizations involved has become a critical emergency management success factor. However, there is a general lack of understanding about how to describe and assess the complex nature of emergency management tasks and how knowledge integration can help managers improve emergency management task performance. ^ The purpose of this exploratory research was first, to understand how emergency management operations are impacted by tasks that are complex and inter-organizational and second, to investigate how knowledge integration as a particular knowledge management strategy can improve the efficiency and effectiveness of the emergency tasks. Three types of specific knowledge were considered: context-specific, technology-specific, and context-and-technology-specific. ^ The research setting was the Miami-Dade Emergency Operations Center (EOC) and the study was based on the survey responses from the participants in past EOC activations related to their emergency tasks and knowledge areas. The data included task attributes related to complexity, knowledge area, knowledge integration, specificity of knowledge, and task performance. The data was analyzed using multiple linear regressions and path analyses, to (1) examine the relationships between task complexity, knowledge integration, and performance, (2) the moderating effects of each type of specific knowledge on the relationship between task complexity and performance, and (3) the mediating role of knowledge integration. ^ As per theory-based propositions, the results indicated that overall component complexity and interactive complexity tend to have a negative effect on task performance. But surprisingly, procedural rigidity tended to have a positive effect on performance in emergency management tasks. Also as per our expectation, knowledge integration had a positive relationship with task performance. Interestingly, the moderating effects of each type of specific knowledge on the relationship between task complexity and performance were varied and the extent of mediation of knowledge integration depended on the dimension of task complexity. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Researchers have extensively discussed using knowledge management to achieve sustainable competitive advantages; however, the successful implementation of knowledge management programs in organizations remains challenging. Problems with knowledge management arise primarily from issues related to inter-subjective creation of meaning by diverse individuals in a dynamic learning environment. ^ The first part of this dissertation examined the concepts of shared interpretive resources referring to background assumptions, shared language, and symbolic resources upon which individuals draw in their interactions in the community. The discussion adopted an interpretive research approach to underscore how community members develop shared interpretive resources over time. The second part examined how learners' behaviors influence knowledge acquisition in the community, emphasizing the associations between learners' learning approaches and learning contexts. An empirical survey of learners provided significant evidence to demonstrate the influences of learners' learning approaches. The third part examined an instructor's strategy—namely, advance organizer—to enhance learners' knowledge assimilation process. Advance organizer is an instructor strategy that refers to a set of inclusive concepts that introduce and sum up new material, and refers to a method of bridging and linking old information with something new. In this part, I underscore the concepts of advance organizer, and the implementations of advance organizer in one learning environment. A study was conducted in one higher educational environment to show the implementation of advance organizer. Additionally, an advance organizer instrument was developed and tested, and results from learners' feedback were analyzed. The significant empirical evidence showed the association between learners' learning outcomes and the implementation of advance organizer strategy. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Natural and man-made disasters have gained attention at all levels of policy-making in recent years. Emergency management tasks are inherently complex and unpredictable, and often require coordination among multiple organizations across different levels and locations. Effectively managing various knowledge areas and the organizations involved has become a critical emergency management success factor. However, there is a general lack of understanding about how to describe and assess the complex nature of emergency management tasks and how knowledge integration can help managers improve emergency management task performance. The purpose of this exploratory research was first, to understand how emergency management operations are impacted by tasks that are complex and inter-organizational and second, to investigate how knowledge integration as a particular knowledge management strategy can improve the efficiency and effectiveness of the emergency tasks. Three types of specific knowledge were considered: context-specific, technology-specific, and context-and-technology-specific. The research setting was the Miami-Dade Emergency Operations Center (EOC) and the study was based on the survey responses from the participants in past EOC activations related to their emergency tasks and knowledge areas. The data included task attributes related to complexity, knowledge area, knowledge integration, specificity of knowledge, and task performance. The data was analyzed using multiple linear regressions and path analyses, to (1) examine the relationships between task complexity, knowledge integration, and performance, (2) the moderating effects of each type of specific knowledge on the relationship between task complexity and performance, and (3) the mediating role of knowledge integration. As per theory-based propositions, the results indicated that overall component complexity and interactive complexity tend to have a negative effect on task performance. But surprisingly, procedural rigidity tended to have a positive effect on performance in emergency management tasks. Also as per our expectation, knowledge integration had a positive relationship with task performance. Interestingly, the moderating effects of each type of specific knowledge on the relationship between task complexity and performance were varied and the extent of mediation of knowledge integration depended on the dimension of task complexity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the proliferation of multimedia data and ever-growing requests for multimedia applications, there is an increasing need for efficient and effective indexing, storage and retrieval of multimedia data, such as graphics, images, animation, video, audio and text. Due to the special characteristics of the multimedia data, the Multimedia Database management Systems (MMDBMSs) have emerged and attracted great research attention in recent years. Though much research effort has been devoted to this area, it is still far from maturity and there exist many open issues. In this dissertation, with the focus of addressing three of the essential challenges in developing the MMDBMS, namely, semantic gap, perception subjectivity and data organization, a systematic and integrated framework is proposed with video database and image database serving as the testbed. In particular, the framework addresses these challenges separately yet coherently from three main aspects of a MMDBMS: multimedia data representation, indexing and retrieval. In terms of multimedia data representation, the key to address the semantic gap issue is to intelligently and automatically model the mid-level representation and/or semi-semantic descriptors besides the extraction of the low-level media features. The data organization challenge is mainly addressed by the aspect of media indexing where various levels of indexing are required to support the diverse query requirements. In particular, the focus of this study is to facilitate the high-level video indexing by proposing a multimodal event mining framework associated with temporal knowledge discovery approaches. With respect to the perception subjectivity issue, advanced techniques are proposed to support users' interaction and to effectively model users' perception from the feedback at both the image-level and object-level.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. ^ This thesis describes a heterogeneous database system being developed at High-performance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii) a framework for intelligent computing and communication on the Internet applying the concepts of our work. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Modern geographical databases, which are at the core of geographic information systems (GIS), store a rich set of aspatial attributes in addition to geographic data. Typically, aspatial information comes in textual and numeric format. Retrieving information constrained on spatial and aspatial data from geodatabases provides GIS users the ability to perform more interesting spatial analyses, and for applications to support composite location-aware searches; for example, in a real estate database: “Find the nearest homes for sale to my current location that have backyard and whose prices are between $50,000 and $80,000”. Efficient processing of such queries require combined indexing strategies of multiple types of data. Existing spatial query engines commonly apply a two-filter approach (spatial filter followed by nonspatial filter, or viceversa), which can incur large performance overheads. On the other hand, more recently, the amount of geolocation data has grown rapidly in databases due in part to advances in geolocation technologies (e.g., GPS-enabled smartphones) that allow users to associate location data to objects or events. The latter poses potential data ingestion challenges of large data volumes for practical GIS databases. In this dissertation, we first show how indexing spatial data with R-trees (a typical data pre-processing task) can be scaled in MapReduce—a widely-adopted parallel programming model for data intensive problems. The evaluation of our algorithms in a Hadoop cluster showed close to linear scalability in building R-tree indexes. Subsequently, we develop efficient algorithms for processing spatial queries with aspatial conditions. Novel techniques for simultaneously indexing spatial with textual and numeric data are developed to that end. Experimental evaluations with real-world, large spatial datasets measured query response times within the sub-second range for most cases, and up to a few seconds for a small number of cases, which is reasonable for interactive applications. Overall, the previous results show that the MapReduce parallel model is suitable for indexing tasks in spatial databases, and the adequate combination of spatial and aspatial attribute indexes can attain acceptable response times for interactive spatial queries with constraints on aspatial data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study assessed the civic knowledge, skills, and attitudes of Hispanic eighth grade students in Miami-Dade County Public Schools (M-DCPS), Florida. Three hundred sixty one Hispanic students of Cuban (253), Colombian (57), and Nicaraguan (51) ancestry from 10 middle schools participated in the study. Two hundred twenty eight students were from low socio-economic status (SES) background, and 133 were of middle SES background. There were 136 boys and 225 girls. The International Association for the Evaluation of Educational Achievement Civic Education Student Questionnaire was used to collect data. The instrument assessed the students’ civic knowledge, skills, and attitudes. Multivariate analysis of variance was used to test for differences in the civic knowledge, skills, and attitudes of participants based on ancestry, SES, and gender. ^ The findings indicated that there was no significant difference in the civic knowledge, skills, and attitudes of Hispanic eighth grade students that were of Cuban, Colombian, and Nicaraguan ancestry. There was no significant difference in the civic skills and in five of the civic attitude scales for students from low SES families compared to those from middle SES families. However, there was a significant difference in the civic knowledge and in the civic attitude concerning classroom discussions and participation based on SES. The civic knowledge of middle SES students was higher than that of low SES students. Furthermore, middle SES Hispanic students displayed a higher mean score for the civic attitude of classroom discussions and participation than low SES students. There was no significant difference in the civic knowledge and in five of the civic attitude scales between boys and girls. However, there was a significant difference in the civic skills and the civic attitude of support for women’s rights between boys and girls. Hispanic girls displayed a higher mean score in civic skills than Hispanic boys. Furthermore, the mean score of civic attitude of support for women’s rights for Hispanic girls was higher than that of Hispanic boys. ^ It was concluded that Cuban, Colombian, or Nicaraguan participants did not demonstrate differences in civic attitudes and levels of civic knowledge and skills that eighth grade students possessed. In addition, when compared to boys, girls demonstrated a higher level of civic skills and a greater support for women’s rights and participation in politics and their roles in politics. Moreover, SES was demonstrated to be a key factor in the acquisition of civic knowledge, regardless of ancestry.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. This thesis describes a heterogeneous database system being developed at Highperformance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i.) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii.) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii.) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv.) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v.) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi.) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii.) a framework for intelligent computing and communication on the Internet applying the concepts of our work.