6 resultados para Prediction of Heterogeneous Variables System

em Digital Commons at Florida International University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. ^ This thesis describes a heterogeneous database system being developed at High-performance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii) a framework for intelligent computing and communication on the Internet applying the concepts of our work. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As congestion management strategies begin to put more emphasis on person trips than vehicle trips, the need for vehicle occupancy data has become more critical. The traditional methods of collecting these data include the roadside windshield method and the carousel method. These methods are labor-intensive and expensive. An alternative to these traditional methods is to make use of the vehicle occupancy information in traffic accident records. This method is cost effective and may provide better spatial and temporal coverage than the traditional methods. However, this method is subject to potential biases resulting from under- and over-involvement of certain population sectors and certain types of accidents in traffic accident records. In this dissertation, three such potential biases, i.e., accident severity, driver’s age, and driver’s gender, were investigated and the corresponding bias factors were developed as needed. The results show that although multi-occupant vehicles are involved in higher percentages of severe accidents than are single-occupant vehicles, multi-occupant vehicles in the whole accident vehicle population were not overrepresented in the accident database. On the other hand, a significant difference was found between the distributions of the ages and genders of drivers involved in accidents and those of the general driving population. An information system that incorporates adjustments for the potential biases was developed to estimate the average vehicle occupancies (AVOs) for different types of roadways on the Florida state roadway system. A reasonableness check of the results from the system shows AVO estimates that are highly consistent with expectations. In addition, comparisons of AVOs from accident data with the field estimates show that the two data sources produce relatively consistent results. While accident records can be used to obtain the historical AVO trends and field data can be used to estimate the current AVOs, no known methods have been developed to project future AVOs. Four regression models for the purpose of predicting weekday AVOs on different levels of geographic areas and roadway types were developed as part of this dissertation. The models show that such socioeconomic factors as income, vehicle ownership, and employment have a significant impact on AVOs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bio-systems are inherently complex information processing systems. Furthermore, physiological complexities of biological systems limit the formation of a hypothesis in terms of behavior and the ability to test hypothesis. More importantly the identification and classification of mutation in patients are centric topics in today's cancer research. Next generation sequencing (NGS) technologies can provide genome-wide coverage at a single nucleotide resolution and at reasonable speed and cost. The unprecedented molecular characterization provided by NGS offers the potential for an individualized approach to treatment. These advances in cancer genomics have enabled scientists to interrogate cancer-specific genomic variants and compare them with the normal variants in the same patient. Analysis of this data provides a catalog of somatic variants, present in tumor genome but not in the normal tissue DNA. In this dissertation, we present a new computational framework to the problem of predicting the number of mutations on a chromosome for a certain patient, which is a fundamental problem in clinical and research fields. We begin this dissertation with the development of a framework system that is capable of utilizing published data from a longitudinal study of patients with acute myeloid leukemia (AML), who's DNA from both normal as well as malignant tissues was subjected to NGS analysis at various points in time. By processing the sequencing data at the time of cancer diagnosis using the components of our framework, we tested it by predicting the genomic regions to be mutated at the time of relapse and, later, by comparing our results with the actual regions that showed mutations (discovered at relapse time). We demonstrate that this coupling of the algorithm pipeline can drastically improve the predictive abilities of searching a reliable molecular signature. Arguably, the most important result of our research is its superior performance to other methods like Radial Basis Function Network, Sequential Minimal Optimization, and Gaussian Process. In the final part of this dissertation, we present a detailed significance, stability and statistical analysis of our model. A performance comparison of the results are presented. This work clearly lays a good foundation for future research for other types of cancer.^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As congestion management strategies begin to put more emphasis on person trips than vehicle trips, the need for vehicle occupancy data has become more critical. The traditional methods of collecting these data include the roadside windshield method and the carousel method. These methods are labor-intensive and expensive. An alternative to these traditional methods is to make use of the vehicle occupancy information in traffic accident records. This method is cost effective and may provide better spatial and temporal coverage than the traditional methods. However, this method is subject to potential biases resulting from under- and over-involvement of certain population sectors and certain types of accidents in traffic accident records. In this dissertation, three such potential biases, i.e., accident severity, driver¡¯s age, and driver¡¯s gender, were investigated and the corresponding bias factors were developed as needed. The results show that although multi-occupant vehicles are involved in higher percentages of severe accidents than are single-occupant vehicles, multi-occupant vehicles in the whole accident vehicle population were not overrepresented in the accident database. On the other hand, a significant difference was found between the distributions of the ages and genders of drivers involved in accidents and those of the general driving population. An information system that incorporates adjustments for the potential biases was developed to estimate the average vehicle occupancies (AVOs) for different types of roadways on the Florida state roadway system. A reasonableness check of the results from the system shows AVO estimates that are highly consistent with expectations. In addition, comparisons of AVOs from accident data with the field estimates show that the two data sources produce relatively consistent results. While accident records can be used to obtain the historical AVO trends and field data can be used to estimate the current AVOs, no known methods have been developed to project future AVOs. Four regression models for the purpose of predicting weekday AVOs on different levels of geographic areas and roadway types were developed as part of this dissertation. The models show that such socioeconomic factors as income, vehicle ownership, and employment have a significant impact on AVOs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. This thesis describes a heterogeneous database system being developed at Highperformance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i.) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii.) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii.) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv.) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v.) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi.) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii.) a framework for intelligent computing and communication on the Internet applying the concepts of our work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Concrete substructures are often subjected to environmental deterioration, such as sulfate and acid attack, which leads to severe damage and causes structure degradation or even failure. In order to improve the durability of concrete, the High Performance Concrete (HPC) has become widely used by partially replacing cement with pozzolanic materials. However, HPC degradation mechanisms in sulfate and acidic environments are not completely understood. It is therefore important to evaluate the performance of the HPC in such conditions and predict concrete service life by establishing degradation models. This study began with a review of available environmental data in the State of Florida. A total of seven bridges have been inspected. Concrete cores were taken from these bridge piles and were subjected for microstructural analysis using Scanning Electron Microscope (SEM). Ettringite is found to be the products of sulfate attack in sulfate and acidic condition. In order to quantitatively analyze concrete deterioration level, an image processing program is designed using Matlab to obtain quantitative data. Crack percentage (Acrack/Asurface) is used to evaluate concrete deterioration. Thereafter, correlation analysis was performed to find the correlation between five related variables and concrete deterioration. Environmental sulfate concentration and bridge age were found to be positively correlated, while environmental pH level was found to be negatively correlated. Besides environmental conditions, concrete property factor was also included in the equation. It was derived from laboratory testing data. Experimental tests were carried out implementing accelerated expansion test under controlled environment. Specimens of eight different mix designs were prepared. The effect of pozzolanic replacement rate was taken into consideration in the empirical equation. And the empirical equation was validated with existing bridges. Results show that the proposed equations compared well with field test results with a maximum deviation of ± 20%. Two examples showing how to use the proposed equations are provided to guide the practical implementation. In conclusion, the proposed approach of relating microcracks to deterioration is a better method than existing diffusion and sorption models since sulfate attack cause cracking in concrete. Imaging technique provided in this study can also be used to quantitatively analyze concrete samples.