20 resultados para Analysis Tools

em Digital Commons at Florida International University


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The primary aim of this dissertation is to develop data mining tools for knowledge discovery in biomedical data when multiple (homogeneous or heterogeneous) sources of data are available. The central hypothesis is that, when information from multiple sources of data are used appropriately and effectively, knowledge discovery can be better achieved than what is possible from only a single source. ^ Recent advances in high-throughput technology have enabled biomedical researchers to generate large volumes of diverse types of data on a genome-wide scale. These data include DNA sequences, gene expression measurements, and much more; they provide the motivation for building analysis tools to elucidate the modular organization of the cell. The challenges include efficiently and accurately extracting information from the multiple data sources; representing the information effectively, developing analytical tools, and interpreting the results in the context of the domain. ^ The first part considers the application of feature-level integration to design classifiers that discriminate between soil types. The machine learning tools, SVM and KNN, were used to successfully distinguish between several soil samples. ^ The second part considers clustering using multiple heterogeneous data sources. The resulting Multi-Source Clustering (MSC) algorithm was shown to have a better performance than clustering methods that use only a single data source or a simple feature-level integration of heterogeneous data sources. ^ The third part proposes a new approach to effectively incorporate incomplete data into clustering analysis. Adapted from K-means algorithm, the Generalized Constrained Clustering (GCC) algorithm makes use of incomplete data in the form of constraints to perform exploratory analysis. Novel approaches for extracting constraints were proposed. For sufficiently large constraint sets, the GCC algorithm outperformed the MSC algorithm. ^ The last part considers the problem of providing a theme-specific environment for mining multi-source biomedical data. The database called PlasmoTFBM, focusing on gene regulation of Plasmodium falciparum, contains diverse information and has a simple interface to allow biologists to explore the data. It provided a framework for comparing different analytical tools for predicting regulatory elements and for designing useful data mining tools. ^ The conclusion is that the experiments reported in this dissertation strongly support the central hypothesis.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Internet has become an integral part of our nation’s critical socio-economic infrastructure. With its heightened use and growing complexity however, organizations are at greater risk of cyber crimes. To aid in the investigation of crimes committed on or via the Internet, a network forensics analysis tool pulls together needed digital evidence. It provides a platform for performing deep network analysis by capturing, recording and analyzing network events to find out the source of a security attack or other information security incidents. Existing network forensics work has been mostly focused on the Internet and fixed networks. But the exponential growth and use of wireless technologies, coupled with their unprecedented characteristics, necessitates the development of new network forensic analysis tools. This dissertation fostered the emergence of a new research field in cellular and ad-hoc network forensics. It was one of the first works to identify this problem and offer fundamental techniques and tools that laid the groundwork for future research. In particular, it introduced novel methods to record network incidents and report logged incidents. For recording incidents, location is considered essential to documenting network incidents. However, in network topology spaces, location cannot be measured due to absence of a ‘distance metric’. Therefore, a novel solution was proposed to label locations of nodes within network topology spaces, and then to authenticate the identity of nodes in ad hoc environments. For reporting logged incidents, a novel technique based on Distributed Hash Tables (DHT) was adopted. Although the direct use of DHTs for reporting logged incidents would result in an uncontrollably recursive traffic, a new mechanism was introduced that overcome this recursive process. These logging and reporting techniques aided forensics over cellular and ad-hoc networks, which in turn increased their ability to track and trace attacks to their source. These techniques were a starting point for further research and development that would result in equipping future ad hoc networks with forensic components to complement existing security mechanisms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The need for efficient, sustainable, and planned utilization of resources is ever more critical. In the U.S. alone, buildings consume 34.8 Quadrillion (1015) BTU of energy annually at a cost of $1.4 Trillion. Of this energy 58% is utilized for heating and air conditioning. ^ Several building energy analysis tools have been developed to assess energy demands and lifecycle energy costs in buildings. Such analyses are also essential for an efficient HVAC design that overcomes the pitfalls of an under/over-designed system. DOE-2 is among the most widely known full building energy analysis models. It also constitutes the simulation engine of other prominent software such as eQUEST, EnergyPro, PowerDOE. Therefore, it is essential that DOE-2 energy simulations be characterized by high accuracy. ^ Infiltration is an uncontrolled process through which outside air leaks into a building. Studies have estimated infiltration to account for up to 50% of a building's energy demand. This, considered alongside the annual cost of buildings energy consumption, reveals the costs of air infiltration. It also stresses the need that prominent building energy simulation engines accurately account for its impact. ^ In this research the relative accuracy of current air infiltration calculation methods is evaluated against an intricate Multiphysics Hygrothermal CFD building envelope analysis. The full-scale CFD analysis is based on a meticulous representation of cracking in building envelopes and on real-life conditions. The research found that even the most advanced current infiltration methods, including in DOE-2, are at up to 96.13% relative error versus CFD analysis. ^ An Enhanced Model for Combined Heat and Air Infiltration Simulation was developed. The model resulted in 91.6% improvement in relative accuracy over current models. It reduces error versus CFD analysis to less than 4.5% while requiring less than 1% of the time required for such a complex hygrothermal analysis. The algorithm used in our model was demonstrated to be easy to integrate into DOE-2 and other engines as a standalone method for evaluating infiltration heat loads. This will vastly increase the accuracy of such simulation engines while maintaining their speed and ease of use characteristics that make them very widely used in building design.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Internet has become an integral part of our nation's critical socio-economic infrastructure. With its heightened use and growing complexity however, organizations are at greater risk of cyber crimes. To aid in the investigation of crimes committed on or via the Internet, a network forensics analysis tool pulls together needed digital evidence. It provides a platform for performing deep network analysis by capturing, recording and analyzing network events to find out the source of a security attack or other information security incidents. Existing network forensics work has been mostly focused on the Internet and fixed networks. But the exponential growth and use of wireless technologies, coupled with their unprecedented characteristics, necessitates the development of new network forensic analysis tools. This dissertation fostered the emergence of a new research field in cellular and ad-hoc network forensics. It was one of the first works to identify this problem and offer fundamental techniques and tools that laid the groundwork for future research. In particular, it introduced novel methods to record network incidents and report logged incidents. For recording incidents, location is considered essential to documenting network incidents. However, in network topology spaces, location cannot be measured due to absence of a 'distance metric'. Therefore, a novel solution was proposed to label locations of nodes within network topology spaces, and then to authenticate the identity of nodes in ad hoc environments. For reporting logged incidents, a novel technique based on Distributed Hash Tables (DHT) was adopted. Although the direct use of DHTs for reporting logged incidents would result in an uncontrollably recursive traffic, a new mechanism was introduced that overcome this recursive process. These logging and reporting techniques aided forensics over cellular and ad-hoc networks, which in turn increased their ability to track and trace attacks to their source. These techniques were a starting point for further research and development that would result in equipping future ad hoc networks with forensic components to complement existing security mechanisms.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The need for efficient, sustainable, and planned utilization of resources is ever more critical. In the U.S. alone, buildings consume 34.8 Quadrillion (1015) BTU of energy annually at a cost of $1.4 Trillion. Of this energy 58% is utilized for heating and air conditioning. Several building energy analysis tools have been developed to assess energy demands and lifecycle energy costs in buildings. Such analyses are also essential for an efficient HVAC design that overcomes the pitfalls of an under/over-designed system. DOE-2 is among the most widely known full building energy analysis models. It also constitutes the simulation engine of other prominent software such as eQUEST, EnergyPro, PowerDOE. Therefore, it is essential that DOE-2 energy simulations be characterized by high accuracy. Infiltration is an uncontrolled process through which outside air leaks into a building. Studies have estimated infiltration to account for up to 50% of a building’s energy demand. This, considered alongside the annual cost of buildings energy consumption, reveals the costs of air infiltration. It also stresses the need that prominent building energy simulation engines accurately account for its impact. In this research the relative accuracy of current air infiltration calculation methods is evaluated against an intricate Multiphysics Hygrothermal CFD building envelope analysis. The full-scale CFD analysis is based on a meticulous representation of cracking in building envelopes and on real-life conditions. The research found that even the most advanced current infiltration methods, including in DOE-2, are at up to 96.13% relative error versus CFD analysis. An Enhanced Model for Combined Heat and Air Infiltration Simulation was developed. The model resulted in 91.6% improvement in relative accuracy over current models. It reduces error versus CFD analysis to less than 4.5% while requiring less than 1% of the time required for such a complex hygrothermal analysis. The algorithm used in our model was demonstrated to be easy to integrate into DOE-2 and other engines as a standalone method for evaluating infiltration heat loads. This will vastly increase the accuracy of such simulation engines while maintaining their speed and ease of use characteristics that make them very widely used in building design.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Software architecture is the abstract design of a software system. It plays a key role as a bridge between requirements and implementation, and is a blueprint for development. The architecture represents a set of early design decisions that are crucial to a system. Mistakes in those decisions are very costly if they remain undetected until the system is implemented and deployed. This is where formal specification and analysis fits in. Formal specification makes sure that an architecture design is represented in a rigorous and unambiguous way. Furthermore, a formally specified model allows the use of different analysis techniques for verifying the correctness of those crucial design decisions. ^ This dissertation presented a framework, called SAM, for formal specification and analysis of software architectures. In terms of specification, formalisms and mechanisms were identified and chosen to specify software architecture based on different analysis needs. Formalisms for specifying properties were also explored, especially in the case of non-functional properties. In terms of analysis, the dissertation explored both the verification of functional properties and the evaluation of non-functional properties of software architecture. For the verification of functional property, methodologies were presented on how to apply existing model checking techniques on a SAM model. For the evaluation of non-functional properties, the dissertation first showed how to incorporate stochastic information into a SAM model, and then explained how to translate the model to existing tools and conducts the analysis using those tools. ^ To alleviate the analysis work, we also provided a tool to automatically translate a SAM model for model checking. All the techniques and methods described in the dissertation were illustrated by examples or case studies, which also served a purpose of advocating the use of formal methods in practice. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most research on tax evasion has focused on the income tax. Sales tax evasion has been largely ignored and dismissed as immaterial. This paper explored the differences between income tax and sales tax evasion and demonstrated that sales tax enforcement is deserving of and requires the use of different tools to achieve compliance. Specifically, the major enforcement problem with sales tax is not evasion: it is theft perpetrated by companies that act as collection agents for the state. Companies engage in a principal-agent relationship with the state and many retain funds collected as an agent of the state for private use. As such, the act of sales tax theft bears more resemblance to embezzlement than to income tax evasion. It has long been assumed that the sales tax is nearly evasion free, and state revenue departments report voluntary compliance in a manner that perpetuates this myth. Current sales tax compliance enforcement methodologies are similar in form to income tax compliance enforcement methodologies and are based largely on trust. The primary focus is on delinquent filers with a very small percentage of businesses subject to audit. As a result, there is a very large group of noncompliant businesses who file on time and fly below the radar while stealing millions of taxpayer dollars. ^ The author utilized a variety of statistical methods with actual field data derived from operations of the Southern Region Criminal Investigations Unit of the Florida Department of Revenue to evaluate current and proposed sales tax compliance enforcement methodologies in a quasi-experimental, time series research design and to set forth a typology of sales tax evaders. This study showed that current estimates of voluntary compliance in sales tax systems are seriously and significantly overstated and that current enforcement methodologies are inadequate to identify the majority of violators and enforce compliance. Sales tax evasion is modeled using the theory of planned behavior and Cressey’s fraud triangle and it is demonstrated that proactive enforcement activities, characterized by substantial contact with non-delinquent taxpayers, results in superior ability to identify noncompliance and provides a structure through which noncompliant businesses can be rehabilitated.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Since the 1990s, scholars have paid special attention to public management’s role in theory and research under the assumption that effective management is one of the primary means for achieving superior performance. To some extent, this was influenced by popular business writings of the 1980s as well as the reinventing literature of the 1990s. A number of case studies but limited quantitative research papers have been published showing that management matters in the performance of public organizations. ^ My study examined whether or not management capacity increased organizational performance using quantitative techniques. The specific research problem analyzed was whether significant differences existed between high and average performing public housing agencies on select criteria identified in the Government Performance Project (GPP) management capacity model, and whether this model could predict outcome performance measures in a statistically significant manner, while controlling for exogenous influences. My model included two of four GPP management subsystems (human resources and information technology), integration and alignment of subsystems, and an overall managing for results framework. It also included environmental and client control variables that were hypothesized to affect performance independent of management action. ^ Descriptive results of survey responses showed high performing agencies with better scores on most high performance dimensions of individual criteria, suggesting support for the model; however, quantitative analysis found limited statistically significant differences between high and average performers and limited predictive power of the model. My analysis led to the following major conclusions: past performance was the strongest predictor of present performance; high unionization hurt performance; and budget related criterion mattered more for high performance than other model factors. As to the specific research question, management capacity may be necessary but it is not sufficient to increase performance. ^ The research suggested managers may benefit by implementing best practices identified through the GPP model. The usefulness of the model could be improved by adding direct service delivery to the model, which may also improve its predictive power. Finally, there are abundant tested concepts and tools designed to improve system performance that are available for practitioners designed to improve management subsystem support of direct service delivery.^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Stable isotope analysis has emerged as one of the primary means for examining the structure and dynamics of food webs, and numerous analytical approaches are now commonly used in the field. Techniques range from simple, qualitative inferences based on the isotopic niche, to Bayesian mixing models that can be used to characterize food-web structure at multiple hierarchical levels. We provide a comprehensive review of these techniques, and thus a single reference source to help identify the most useful approaches to apply to a given data set. We structure the review around four general questions: (1) what is the trophic position of an organism in a food web?; (2) which resource pools support consumers?; (3) what additional information does relative position of consumers in isotopic space reveal about food-web structure?; and (4) what is the degree of trophic variability at the intrapopulation level? For each general question, we detail different approaches that have been applied, discussing the strengths and weaknesses of each. We conclude with a set of suggestions that transcend individual analytical approaches, and provide guidance for future applications in the field.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the article - Menu Analysis: Review and Evaluation - by Lendal H. Kotschevar, Distinguished Professor School of Hospitality Management, Florida International University, Kotschevar’s initial statement reads: “Various methods are used to evaluate menus. Some have quite different approaches and give different information. Even those using quite similar methods vary in the information they give. The author attempts to describe the most frequently used methods and to indicate their value. A correlation calculation is made to see how well certain of these methods agree in the information they give.” There is more than one way to look at the word menu. The culinary selections decided upon by the head chef or owner of a restaurant, which ultimately define the type of restaurant is one way. The physical outline of the food, which a patron actually holds in his or her hand, is another. These descriptions are most common to the word, menu. The author primarily concentrates on the latter description, and uses the act of counting the number of items sold on a menu to measure the popularity of any particular item. This, along with a formula, allows Kotschevar to arrive at a specific value per item. Menu analysis would appear a difficult subject to broach. How does a person approach a menu analysis, how do you qualify and quantify a menu; it seems such a subjective exercise. The author offers methods and outlines on approaching menu analysis from empirical perspectives. “Menus are often examined visually through the evaluation of various factors. It is a subjective method but has the advantage of allowing scrutiny of a wide range of factors which other methods do not,” says Distinguished Professor, Kotschevar. “The method is also highly flexible. Factors can be given a score value and scores summed to give a total for a menu. This allows comparison between menus. If the one making the evaluations knows menu values, it is a good method of judgment,” he further offers. The author wants you to know that assigning values is fundamental to a pragmatic menu analysis; it is how the reviewer keeps score, so to speak. Value merit provides reliable criteria from which to gauge a particular menu item. In the final analysis, menu evaluation provides the mechanism for either keeping or rejecting selected items on a menu. Kotschevar provides at least three different matrix evaluation methods; they are defined as the Miller method, the Smith and Kasavana method, and the Pavesic method. He offers illustrated examples of each via a table format. These are helpful tools since trying to explain the theories behind the tables would be difficult at best. Kotschevar also references examples of analysis methods which aren’t matrix based. The Hayes and Huffman - Goal Value Analysis - is one such method. The author sees no one method better than another, and suggests that combining two or more of the methods to be a benefit.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this study we have identified key genes that are critical in development of astrocytic tumors. Meta-analysis of microarray studies which compared normal tissue to astrocytoma revealed a set of 646 differentially expressed genes in the majority of astrocytoma. Reverse engineering of these 646 genes using Bayesian network analysis produced a gene network for each grade of astrocytoma (Grade I–IV), and ‘key genes’ within each grade were identified. Genes found to be most influential to development of the highest grade of astrocytoma, Glioblastoma multiforme were: COL4A1, EGFR, BTF3, MPP2, RAB31, CDK4, CD99, ANXA2, TOP2A, and SERBP1. All of these genes were up-regulated, except MPP2 (down regulated). These 10 genes were able to predict tumor status with 96–100% confidence when using logistic regression, cross validation, and the support vector machine analysis. Markov genes interact with NFkβ, ERK, MAPK, VEGF, growth hormone and collagen to produce a network whose top biological functions are cancer, neurological disease, and cellular movement. Three of the 10 genes - EGFR, COL4A1, and CDK4, in particular, seemed to be potential ‘hubs of activity’. Modified expression of these 10 Markov Blanket genes increases lifetime risk of developing glioblastoma compared to the normal population. The glioblastoma risk estimates were dramatically increased with joint effects of 4 or more than 4 Markov Blanket genes. Joint interaction effects of 4, 5, 6, 7, 8, 9 or 10 Markov Blanket genes produced 9, 13, 20.9, 26.7, 52.8, 53.2, 78.1 or 85.9%, respectively, increase in lifetime risk of developing glioblastoma compared to normal population. In summary, it appears that modified expression of several ‘key genes’ may be required for the development of glioblastoma. Further studies are needed to validate these ‘key genes’ as useful tools for early detection and novel therapeutic options for these tumors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To promote regional or mutual improvement, numerous interjurisdictional efforts to share tax bases have been attempted. Most of these efforts fail to be consummated. Motivations to share revenues include: narrowing fiscal disparities, enhancing regional cooperation and economic development, rationalizing land-use, and minimizing revenue losses caused by competition to attract and keep businesses. Various researchers have developed theories to aid understanding of why interjurisdictional cooperation efforts succeed or fail. Walter Rosenbaum and Gladys Kammerer studied two contemporaneous Florida local-government consolidation attempts. Boyd Messinger subsequently tested their Theory of Successful Consolidation on nine consolidation attempts. Paul Peterson's dual theories on Modern Federalism posit that all governmental levels attempt to further economic development and that politicians act in ways that either further their futures or cement job security. Actions related to the latter theory often interfere with the former. Samuel Nunn and Mark Rosentraub sought to learn how interjurisdictional cooperation evolves. Through multiple case studies they developed a model framing interjurisdictional cooperation in four dimensions. ^ This dissertation investigates the ability of the above theories to help predict success or failure of regional tax-base revenue sharing attempts. A research plan was formed that used five sequenced steps to gather data, analyze it, and conclude if hypotheses concerning the application of these theories were valid. The primary analytical tools were: multiple case studies, cross-case analysis, and pattern matching. Data was gathered from historical records, questionnaires, and interviews. ^ The results of this research indicate that Rosenbaum-Kammerer theory can be a predictor of success or failure in implementing tax-base revenue sharing if it is amended as suggested by Messinger and further modified by a recommendation in this dissertation. Peterson's Functional and Legislative theories considered together were able to predict revenue sharing proposal outcomes. Many of the indicators of interjurisdictional cooperation forwarded in the Nunn-Rosentraub model appeared in the cases studied, but the model was not a reliable forecasting instrument. ^

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The primary goal of this dissertation is the study of patterns of viral evolution inferred from serially-sampled sequence data, i.e., sequence data obtained from strains isolated at consecutive time points from a single patient or host. RNA viral populations have an extremely high genetic variability, largely due to their astronomical population sizes within host systems, high replication rate, and short generation time. It is this aspect of their evolution that demands special attention and a different approach when studying the evolutionary relationships of serially-sampled sequence data. New methods that analyze serially-sampled data were developed shortly after a groundbreaking HIV-1 study of several patients from which viruses were isolated at recurring intervals over a period of 10 or more years. These methods assume a tree-like evolutionary model, while many RNA viruses have the capacity to exchange genetic material with one another using a process called recombination. ^ A genealogy involving recombination is best described by a network structure. A more general approach was implemented in a new computational tool, Sliding MinPD, one that is mindful of the sampling times of the input sequences and that reconstructs the viral evolutionary relationships in the form of a network structure with implicit representations of recombination events. The underlying network organization reveals unique patterns of viral evolution and could help explain the emergence of disease-associated mutants and drug-resistant strains, with implications for patient prognosis and treatment strategies. In order to comprehensively test the developed methods and to carry out comparison studies with other methods, synthetic data sets are critical. Therefore, appropriate sequence generators were also developed to simulate the evolution of serially-sampled recombinant viruses, new and more through evaluation criteria for recombination detection methods were established, and three major comparison studies were performed. The newly developed tools were also applied to "real" HIV-1 sequence data and it was shown that the results represented within an evolutionary network structure can be interpreted in biologically meaningful ways. ^