39 resultados para multiple data sources

em Digital Commons at Florida International University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this study was to develop, explicate, and validate a comprehensive model in order to more effectively assess community injury prevention needs, plan and target efforts, identify potential interventions, and provide a framework for an outcome-based evaluation of the effectiveness of interventions. A systems model approach was developed to conceptualize the major components of inputs, efforts, outcomes and feedback within a community setting. Profiling of multiple data sources demonstrated a community feedback mechanism that increased awareness of priority issues and elicited support from traditional as well as non-traditional injury prevention partners. Injury countermeasures including education, enforcement, engineering, and economic incentives were presented for their potential synergistic effect impacting on knowledge, attitudes, or behaviors of a targeted population. Levels of outcome data were classified into ultimate, intermediate and immediate indicators to assist with determining the effectiveness of intervention efforts. A collaboration between business and health care was successful in achieving data access and use of an emergency department level of injury data for monitoring of the impact of community interventions. Evaluation of injury events and preventive efforts within the context of a dynamic community systems environment was applied to a study community with examples detailing actual profiling and trending of injuries. The resulting model of community injury prevention was validated using a community focus group, community injury prevention coordinators, and injury prevention national experts. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Since the 1980s, governments and organizations have promoted cash transfers in education as a tool for motivating elementary aged children to attend school. Oftentimes, the monthly payments supplemented the income a child would be making in the labor market. In Brazil, where these Bolsa or grant programs were pioneered, there has been much success in removing children from harsh labor conditions and increasing enrollment rates among the poorest families. However, the capacity of Bolsa Escola programs to meet other objectives, such as impacting educational outcomes and reducing incidences of poverty, continues to be examined. As these programs continue to be adopted globally, funding millions of children and families, evidence that demonstrates such success becomes ever more imperative. This study, therefore, examined evidence to determine whether Bolsa Escola programs have a significant impact on the academic performance of beneficiaries in Brazil. ^ Through the course of three data collection phases, multiple data sources were used to demonstrate the academic performance of fourth and eighth grade Brazilian students who were eligible to participate in either an NGO or the federal cash transfer program. MANOVAs were conducted separately for fourth and eighth grade data to determine if significant differences existed between measures of academic performance of Bolsa and non-Bolsa students. In every case and for both grade levels, significant effects were found for participation. ^ The limited qualitative data collected did not support drawing conclusions. Thematic analysis of the limited interview data pointed to possible dependency on Bolsa monthly stipends, and reallocation of responsibilities in the home in cases where children shifted from being breadwinners to students. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An implementation of Sem-ODB—a database management system based on the Semantic Binary Model is presented. A metaschema of Sem-ODB database as well as the top-level architecture of the database engine is defined. A new benchmarking technique is proposed which allows databases built on different database models to compete fairly. This technique is applied to show that Sem-ODB has excellent efficiency comparing to a relational database on a certain class of database applications. A new semantic benchmark is designed which allows evaluation of the performance of the features characteristic of semantic database applications. An application used in the benchmark represents a class of problems requiring databases with sparse data, complex inheritances and many-to-many relations. Such databases can be naturally accommodated by semantic model. A fixed predefined implementation is not enforced allowing the database designer to choose the most efficient structures available in the DBMS tested. The results of the benchmark are analyzed. ^ A new high-level querying model for semantic databases is defined. It is proven adequate to serve as an efficient native semantic database interface, and has several advantages over the existing interfaces. It is optimizable and parallelizable, supports the definition of semantic userviews and the interoperability of semantic databases with other data sources such as World Wide Web, relational, and object-oriented databases. The query is structured as a semantic database schema graph with interlinking conditionals. The query result is a mini-database, accessible in the same way as the original database. The paradigm supports and utilizes the rich semantics and inherent ergonomics of semantic databases. ^ The analysis and high-level design of a system that exploits the superiority of the Semantic Database Model to other data models in expressive power and ease of use to allow uniform access to heterogeneous data sources such as semantic databases, relational databases, web sites, ASCII files, and others via a common query interface is presented. The Sem-ODB engine is used to control all the data sources combined under a unified semantic schema. A particular application of the system to provide an ODBC interface to the WWW as a data source is discussed. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. ^ Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a twofold “custom wrapper” approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. ^ Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. ^ This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The research presented in this dissertation is comprised of several parts which jointly attain the goal of Semantic Distributed Database Management with Applications to Internet Dissemination of Environmental Data. ^ Part of the research into more effective and efficient data management has been pursued through enhancements to the Semantic Binary Object-Oriented database (Sem-ODB) such as more effective load balancing techniques for the database engine, and the use of Sem-ODB as a tool for integrating structured and unstructured heterogeneous data sources. Another part of the research in data management has pursued methods for optimizing queries in distributed databases through the intelligent use of network bandwidth; this has applications in networks that provide varying levels of Quality of Service or throughput. ^ The application of the Semantic Binary database model as a tool for relational database modeling has also been pursued. This has resulted in database applications that are used by researchers at the Everglades National Park to store environmental data and to remotely-sensed imagery. ^ The areas of research described above have contributed to the creation TerraFly, which provides for the dissemination of geospatial data via the Internet. TerraFly research presented herein ranges from the development of TerraFly's back-end database and interfaces, through the features that are presented to the public (such as the ability to provide autopilot scripts and on-demand data about a point), to applications of TerraFly in the areas of hazard mitigation, recreation, and aviation. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The purpose of this study was to determine fifth grade students' perceptions of the Fitnessgram physical fitness testing program. This study examined if the Fitnessgram physical fitness testing experience promotes an understanding of the health-related fitness components and examined the relationship between individual fitness test scores and time spent participating in out-of-school physical activity. Lastly, students' thoughts and feelings concerning the Fitnessgram experience were examined. ^ The primary participant population for the study was 110 fifth grade students at Redland Elementary School, a Miami-Dade County Public School (M-DCPS). Data were collected over the course of 5 months. Multiple sources of data allowed for triangulation. Data sources included Fitnessgram test scores, questionnaires, document analysis, and in-depth interviews. ^ Interview data were analyzed qualitatively for common broad themes, which were identified and defined. Document analysis included analyzing student fitness test scores and student questionnaire data. This information was analyzed to determine if the Fitnessgram test scores have an impact on student views about the school fitness-testing program. Data were statistically analyzed using analysis of frequency, crosstabulations (Bryman & Duncan, 1997), and Somers'd Correlation (Bryman & Duncan, 1997). The results of the analysis of data on student knowledge of the physical fitness components tested by each Fitnessgram test revealed students do not understand the health-related fitness components. ^ The results of determining a relationship between individuals' fitness test scores and time spent in out-of-school physical activity revealed a significant positive relationship for 2 of the 6 Fitnessgram tests. ^ The results of examining students' thoughts and feelings about each Fitnessgram test focused around 2 broad themes: (a) these children do not mind the physical fitness testing and (b) how they felt about the experience was directly related to how they thought they had performed. ^ If the goal of physical fitness was only to get children fit, this test may be appropriate. However, the ultimate goal of physical fitness is to encourage students to live active and healthy lifestyles. Findings suggest the Fitnessgram as implemented by M-DCPS may not be the most suitable measurement instrument when assessing attitudinal changes that affect a healthy lifelong lifestyle. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Mediation techniques provide interoperability and support integrated query processing among heterogeneous databases. While such techniques help data sharing among different sources, they increase the risk for data security, such as violating access control rules. Successful protection of information by an effective access control mechanism is a basic requirement for interoperation among heterogeneous data sources. ^ This dissertation first identified the challenges in the mediation system in order to achieve both interoperability and security in the interconnected and collaborative computing environment, which includes: (1) context-awareness, (2) semantic heterogeneity, and (3) multiple security policy specification. Currently few existing approaches address all three security challenges in mediation system. This dissertation provides a modeling and architectural solution to the problem of mediation security that addresses the aforementioned security challenges. A context-aware flexible authorization framework was developed in the dissertation to deal with security challenges faced by mediation system. The authorization framework consists of two major tasks, specifying security policies and enforcing security policies. Firstly, the security policy specification provides a generic and extensible method to model the security policies with respect to the challenges posed by the mediation system. The security policies in this study are specified by 5-tuples followed by a series of authorization constraints, which are identified based on the relationship of the different security components in the mediation system. Two essential features of mediation systems, i. e., relationship among authorization components and interoperability among heterogeneous data sources, are the focus of this investigation. Secondly, this dissertation supports effective access control on mediation systems while providing uniform access for heterogeneous data sources. The dynamic security constraints are handled in the authorization phase instead of the authentication phase, thus the maintenance cost of security specification can be reduced compared with related solutions. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The nation's freeway systems are becoming increasingly congested. A major contribution to traffic congestion on freeways is due to traffic incidents. Traffic incidents are non-recurring events such as accidents or stranded vehicles that cause a temporary roadway capacity reduction, and they can account for as much as 60 percent of all traffic congestion on freeways. One major freeway incident management strategy involves diverting traffic to avoid incident locations by relaying timely information through Intelligent Transportation Systems (ITS) devices such as dynamic message signs or real-time traveler information systems. The decision to divert traffic depends foremost on the expected duration of an incident, which is difficult to predict. In addition, the duration of an incident is affected by many contributing factors. Determining and understanding these factors can help the process of identifying and developing better strategies to reduce incident durations and alleviate traffic congestion. A number of research studies have attempted to develop models to predict incident durations, yet with limited success. ^ This dissertation research attempts to improve on this previous effort by applying data mining techniques to a comprehensive incident database maintained by the District 4 ITS Office of the Florida Department of Transportation (FDOT). Two categories of incident duration prediction models were developed: "offline" models designed for use in the performance evaluation of incident management programs, and "online" models for real-time prediction of incident duration to aid in the decision making of traffic diversion in the event of an ongoing incident. Multiple data mining analysis techniques were applied and evaluated in the research. The multiple linear regression analysis and decision tree based method were applied to develop the offline models, and the rule-based method and a tree algorithm called M5P were used to develop the online models. ^ The results show that the models in general can achieve high prediction accuracy within acceptable time intervals of the actual durations. The research also identifies some new contributing factors that have not been examined in past studies. As part of the research effort, software code was developed to implement the models in the existing software system of District 4 FDOT for actual applications. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Recent advances in airborne Light Detection and Ranging (LIDAR) technology allow rapid and inexpensive measurements of topography over large areas. Airborne LIDAR systems usually return a 3-dimensional cloud of point measurements from reflective objects scanned by the laser beneath the flight path. This technology is becoming a primary method for extracting information of different kinds of geometrical objects, such as high-resolution digital terrain models (DTMs), buildings and trees, etc. In the past decade, LIDAR gets more and more interest from researchers in the field of remote sensing and GIS. Compared to the traditional data sources, such as aerial photography and satellite images, LIDAR measurements are not influenced by sun shadow and relief displacement. However, voluminous data pose a new challenge for automated extraction the geometrical information from LIDAR measurements because many raster image processing techniques cannot be directly applied to irregularly spaced LIDAR points. ^ In this dissertation, a framework is proposed to filter out information about different kinds of geometrical objects, such as terrain and buildings from LIDAR automatically. They are essential to numerous applications such as flood modeling, landslide prediction and hurricane animation. The framework consists of several intuitive algorithms. Firstly, a progressive morphological filter was developed to detect non-ground LIDAR measurements. By gradually increasing the window size and elevation difference threshold of the filter, the measurements of vehicles, vegetation, and buildings are removed, while ground data are preserved. Then, building measurements are identified from no-ground measurements using a region growing algorithm based on the plane-fitting technique. Raw footprints for segmented building measurements are derived by connecting boundary points and are further simplified and adjusted by several proposed operations to remove noise, which is caused by irregularly spaced LIDAR measurements. To reconstruct 3D building models, the raw 2D topology of each building is first extracted and then further adjusted. Since the adjusting operations for simple building models do not work well on 2D topology, 2D snake algorithm is proposed to adjust 2D topology. The 2D snake algorithm consists of newly defined energy functions for topology adjusting and a linear algorithm to find the minimal energy value of 2D snake problems. Data sets from urbanized areas including large institutional, commercial, and small residential buildings were employed to test the proposed framework. The results demonstrated that the proposed framework achieves a very good performance. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This study describes the case of private higher education in Ohio between 1980 and 2006 using Zumeta's (1996) model of state policy and private higher education. More specifically, this study used case study methodology and multiple sources to demonstrate the usefulness of Zumeta's model and illustrate its limitations. Ohio served as the subject state and data for 67 private, 4-year, degree-granting, Higher Learning Commission-accredited institutions were collected. Data sources for this study included the National Center for Education Statistics Integrated Postsecondary Data System as well as database information and documents from various state agencies in Ohio, including the Ohio Board of Regents. ^ The findings of this study indicated that the general state context for higher education in Ohio during the study time period was shaped by deteriorating economic factors, stagnating population growth coupled with a rapidly aging society, fluctuating state income and increasing expenditures in areas such as corrections, transportation and social services. However, private higher education experienced consistent enrollment growth, an increase in the number of institutions, widening involvement in state-wide planning for higher education, and greater fiscal support from the state in a variety of forms such as the Ohio Choice Grant. This study also demonstrated that private higher education in Ohio benefited because of its inclusion in state-wide planning and the state's decision to grant state aid directly to students. ^ Taken together, this study supported Zumeta's (1996) classification of Ohio as having a hybrid market-competitive/central-planning policy posture toward private higher education. Furthermore, this study demonstrated that Zumeta's model is a useful tool for both policy makers and researchers for understanding a state's relationship to its private higher education sector. However, this study also demonstrated that Zumeta's model is less useful when applied over an extended time period. Additionally, this study identifies a further limitation of Zumeta's model resulting from his failure to define "state mandate" and the "level of state mandates" that allows for inconsistent analysis of this component. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With the advent of peer to peer networks, and more importantly sensor networks, the desire to extract useful information from continuous and unbounded streams of data has become more prominent. For example, in tele-health applications, sensor based data streaming systems are used to continuously and accurately monitor Alzheimer's patients and their surrounding environment. Typically, the requirements of such applications necessitate the cleaning and filtering of continuous, corrupted and incomplete data streams gathered wirelessly in dynamically varying conditions. Yet, existing data stream cleaning and filtering schemes are incapable of capturing the dynamics of the environment while simultaneously suppressing the losses and corruption introduced by uncertain environmental, hardware, and network conditions. Consequently, existing data cleaning and filtering paradigms are being challenged. This dissertation develops novel schemes for cleaning data streams received from a wireless sensor network operating under non-linear and dynamically varying conditions. The study establishes a paradigm for validating spatio-temporal associations among data sources to enhance data cleaning. To simplify the complexity of the validation process, the developed solution maps the requirements of the application on a geometrical space and identifies the potential sensor nodes of interest. Additionally, this dissertation models a wireless sensor network data reduction system by ascertaining that segregating data adaptation and prediction processes will augment the data reduction rates. The schemes presented in this study are evaluated using simulation and information theory concepts. The results demonstrate that dynamic conditions of the environment are better managed when validation is used for data cleaning. They also show that when a fast convergent adaptation process is deployed, data reduction rates are significantly improved. Targeted applications of the developed methodology include machine health monitoring, tele-health, environment and habitat monitoring, intermodal transportation and homeland security.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

It has long been known that vocabulary is essential in the development of reading. Because vocabulary leading to increased comprehension is important, it necessary to determine strategies for ensuring that the best methods of teaching vocabulary are used to help students make gains in vocabulary leading to reading comprehension. According to the National Reading Panel, multiple strategies that involve active engagement on the part of the student are more effective than the use of just one strategy. The purpose of this study was to determine if students' use of visualization, student-generated pictures of onset-and-rime-patterned vocabulary, and story read-alouds with discussion, would enable diverse first-grade students to increase their vocabulary and comprehension. In addition, this study examined the effect of the multimodal framework of strategies on English learners (ELs). This quasi-experimental study (N=69) was conducted in four first-grade classrooms in a low socio-economic school. Two treatment classes used a multimodal framework of strategies to learn weekly vocabulary words and comprehension. Two comparison classrooms used the traditional method of teaching weekly vocabulary and comprehension. Data sources included Florida Assessments for Instruction in Reading (FAIR), comprehension and vocabulary scores, and weekly MacMillan/McGraw Hill Treasures basal comprehension questions and onset-and-rime vocabulary questions. This research determined that the treatment had an effect in adjusted FAIR comprehension means by group, with the treatment group (adj M = 5.14) significantly higher than the comparison group ( adj M = -8.26) on post scores. However, the treatment means did not increase from pre to post, but the comparison means significantly decreased from pre to post as the materials became more challenging. For the FAIR vocabulary, there was a significant difference by group with the comparison adjusted post mean higher than the treatment's, although both groups significantly increased from pre to post. However, the FAIR vocabulary posttest was not part of the Treasures vocabulary, which was taught using the multimodal framework of strategies. The Treasures vocabulary scores were not significantly different by group on the assessment across the weeks, although the treatment means were higher than those of the comparison group. Continued research is needed in the area of vocabulary and comprehension instructional methods in order to determine strategies to increase diverse, urban students' performance.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The coastal zone of the Florida Keys features the only living coral reef in the continental United States and as such represents a unique regional environmental resource. Anthropogenic pressures combined with climate disturbances such as hurricanes can affect the biogeochemistry of the region and threaten the health of this unique ecosystem. As such, water quality monitoring has historically been implemented in the Florida Keys, and six spatially distinct zones have been identified. In these studies however, dissolved organic matter (DOM) has only been studied as a quantitative parameter, and DOM composition can be a valuable biogeochemical parameter in assessing environmental change in coastal regions. Here we report the first data of its kind on the application of optical properties of DOM, in particular excitation emission matrix fluorescence with parallel factor analysis (EEM-PARAFAC), throughout these six Florida Keys regions in an attempt to assess spatial differences in DOM sources. Our data suggests that while DOM in the Florida Keys can be influenced by distant terrestrial environments such as the Everglades, spatial differences in DOM distribution were also controlled in part by local surface runoff/fringe mangroves, contributions from seasgrass communities, as well as the reefs and waters from the Florida Current. Application of principal component analysis (PCA) of the relative abundance of EEM-PARAFAC components allowed for a clear distinction between the sources of DOM (allochthonous vs. autochthonous), between different autochthonous sources and/or the diagenetic status of DOM, and further clarified contribution of terrestrial DOM in zones where levels of DOM were low in abundance. The combination between EEM-PARAFAC and PCA proved to be ideally suited to discern DOM composition and source differences in coastal zones with complex hydrology and multiple DOM sources.