17 resultados para ree software environment for statistical computing and graphics R
em Digital Commons at Florida International University
Resumo:
The phenomenonal growth of the Internet has connected us to a vast amount of computation and information resources around the world. However, making use of these resources is difficult due to the unparalleled massiveness, high communication latency, share-nothing architecture and unreliable connection of the Internet. In this dissertation, we present a distributed software agent approach, which brings a new distributed problem-solving paradigm to the Internet computing researches with enhanced client-server scheme, inherent scalability and heterogeneity. Our study discusses the role of a distributed software agent in Internet computing and classifies it into three major categories by the objects it interacts with: computation agent, information agent and interface agent. The discussion of the problem domain and the deployment of the computation agent and the information agent are presented with the analysis, design and implementation of the experimental systems in high performance Internet computing and in scalable Web searching. ^ In the computation agent study, high performance Internet computing can be achieved with our proposed Java massive computation agent (JAM) model. We analyzed the JAM computing scheme and built a brutal force cipher text decryption prototype. In the information agent study, we discuss the scalability problem of the existing Web search engines and designed the approach of Web searching with distributed collaborative index agent. This approach can be used for constructing a more accurate, reusable and scalable solution to deal with the growth of the Web and of the information on the Web. ^ Our research reveals that with the deployment of the distributed software agent in Internet computing, we can have a more cost effective approach to make better use of the gigantic scale network of computation and information resources on the Internet. The case studies in our research show that we are now able to solve many practically hard or previously unsolvable problems caused by the inherent difficulties of Internet computing. ^
Resumo:
With advances in science and technology, computing and business intelligence (BI) systems are steadily becoming more complex with an increasing variety of heterogeneous software and hardware components. They are thus becoming progressively more difficult to monitor, manage and maintain. Traditional approaches to system management have largely relied on domain experts through a knowledge acquisition process that translates domain knowledge into operating rules and policies. It is widely acknowledged as a cumbersome, labor intensive, and error prone process, besides being difficult to keep up with the rapidly changing environments. In addition, many traditional business systems deliver primarily pre-defined historic metrics for a long-term strategic or mid-term tactical analysis, and lack the necessary flexibility to support evolving metrics or data collection for real-time operational analysis. There is thus a pressing need for automatic and efficient approaches to monitor and manage complex computing and BI systems. To realize the goal of autonomic management and enable self-management capabilities, we propose to mine system historical log data generated by computing and BI systems, and automatically extract actionable patterns from this data. This dissertation focuses on the development of different data mining techniques to extract actionable patterns from various types of log data in computing and BI systems. Four key problems—Log data categorization and event summarization, Leading indicator identification , Pattern prioritization by exploring the link structures , and Tensor model for three-way log data are studied. Case studies and comprehensive experiments on real application scenarios and datasets are conducted to show the effectiveness of our proposed approaches.
Resumo:
This dissertation develops a new figure of merit to measure the similarity (or dissimilarity) of Gaussian distributions through a novel concept that relates the Fisher distance to the percentage of data overlap. The derivations are expanded to provide a generalized mathematical platform for determining an optimal separating boundary of Gaussian distributions in multiple dimensions. Real-world data used for implementation and in carrying out feasibility studies were provided by Beckman-Coulter. It is noted that although the data used is flow cytometric in nature, the mathematics are general in their derivation to include other types of data as long as their statistical behavior approximate Gaussian distributions. ^ Because this new figure of merit is heavily based on the statistical nature of the data, a new filtering technique is introduced to accommodate for the accumulation process involved with histogram data. When data is accumulated into a frequency histogram, the data is inherently smoothed in a linear fashion, since an averaging effect is taking place as the histogram is generated. This new filtering scheme addresses data that is accumulated in the uneven resolution of the channels of the frequency histogram. ^ The qualitative interpretation of flow cytometric data is currently a time consuming and imprecise method for evaluating histogram data. This method offers a broader spectrum of capabilities in the analysis of histograms, since the figure of merit derived in this dissertation integrates within its mathematics both a measure of similarity and the percentage of overlap between the distributions under analysis. ^
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
Rivals may voluntarily share Research and Development (R&D) results even in the absence of any binding agreements or collusion. In a model where rival firms engage in non-cooperative independent R&D process, we used optimization and game theory analysis to study the equilibrium strategy of the firms. Our work showed that, while minimal spillover is always equilibrium, there may be another equilibrium where firms may reciprocally choose high, sometimes perfect, spillover rates. The incentive for sharing R&D output is based on firms' expectations of learning from their rivals' R&D progress in the future. This leads to strategic complementarities between the firms' choices of spillover rates and thus policy implication follows. ^ Public research agencies can contribute more to social welfare by providing research as public goods. In a non-cooperative public-private research relationship where parallel R&D is conducted, by making its R&D results accessible, the public research agency can stimulate private spillovers, even if there exists rivalry among the private firms who can benefit from such spillovers. ^
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. ^ This thesis describes a heterogeneous database system being developed at High-performance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii) a framework for intelligent computing and communication on the Internet applying the concepts of our work. ^
Resumo:
Modern power networks incorporate communications and information technology infrastructure into the electrical power system to create a smart grid in terms of control and operation. The smart grid enables real-time communication and control between consumers and utility companies allowing suppliers to optimize energy usage based on price preference and system technical issues. The smart grid design aims to provide overall power system monitoring, create protection and control strategies to maintain system performance, stability and security. This dissertation contributed to the development of a unique and novel smart grid test-bed laboratory with integrated monitoring, protection and control systems. This test-bed was used as a platform to test the smart grid operational ideas developed here. The implementation of this system in the real-time software creates an environment for studying, implementing and verifying novel control and protection schemes developed in this dissertation. Phasor measurement techniques were developed using the available Data Acquisition (DAQ) devices in order to monitor all points in the power system in real time. This provides a practical view of system parameter changes, system abnormal conditions and its stability and security information system. These developments provide valuable measurements for technical power system operators in the energy control centers. Phasor Measurement technology is an excellent solution for improving system planning, operation and energy trading in addition to enabling advanced applications in Wide Area Monitoring, Protection and Control (WAMPAC). Moreover, a virtual protection system was developed and implemented in the smart grid laboratory with integrated functionality for wide area applications. Experiments and procedures were developed in the system in order to detect the system abnormal conditions and apply proper remedies to heal the system. A design for DC microgrid was developed to integrate it to the AC system with appropriate control capability. This system represents realistic hybrid AC/DC microgrids connectivity to the AC side to study the use of such architecture in system operation to help remedy system abnormal conditions. In addition, this dissertation explored the challenges and feasibility of the implementation of real-time system analysis features in order to monitor the system security and stability measures. These indices are measured experimentally during the operation of the developed hybrid AC/DC microgrids. Furthermore, a real-time optimal power flow system was implemented to optimally manage the power sharing between AC generators and DC side resources. A study relating to real-time energy management algorithm in hybrid microgrids was performed to evaluate the effects of using energy storage resources and their use in mitigating heavy load impacts on system stability and operational security.
Resumo:
As researchers and practitioners move towards a vision of software systems that configure, optimize, protect, and heal themselves, they must also consider the implications of such self-management activities on software reliability. Autonomic computing (AC) describes a new generation of software systems that are characterized by dynamically adaptive self-management features. During dynamic adaptation, autonomic systems modify their own structure and/or behavior in response to environmental changes. Adaptation can result in new system configurations and capabilities, which need to be validated at runtime to prevent costly system failures. However, although the pioneers of AC recognize that validating autonomic systems is critical to the success of the paradigm, the architectural blueprint for AC does not provide a workflow or supporting design models for runtime testing. ^ This dissertation presents a novel approach for seamlessly integrating runtime testing into autonomic software. The approach introduces an implicit self-test feature into autonomic software by tailoring the existing self-management infrastructure to runtime testing. Autonomic self-testing facilitates activities such as test execution, code coverage analysis, timed test performance, and post-test evaluation. In addition, the approach is supported by automated testing tools, and a detailed design methodology. A case study that incorporates self-testing into three autonomic applications is also presented. The findings of the study reveal that autonomic self-testing provides a flexible approach for building safe, reliable autonomic software, while limiting the development and performance overhead through software reuse. ^
Resumo:
In outsourcing relationships with China, the Electronic Manufacturing (EM) and Information Technology Services (ITS) industry in Taiwan may possess such advantages as the continuing growth of its production value, complete manufacturing supply chain, low production cost and a large-scale Chinese market, and language and culture similarity compared to outsourcing to other countries. Nevertheless, the Council for Economic Planning and Development of Executive Yuan (CEPD) found that Taiwan's IT services outsourcing to China is subject to certain constraints and might not be as successful as the EM outsourcing (Aggarwal, 2003; CEPD, 2004a; CIER, 2003; Einhorn and Kriplani, 2003; Kumar and Zhu, 2006; Li and Gao, 2003; MIC, 2006). Some studies examined this issue, but failed to (1) provide statistical evidence about lower prevalence rates of IT services outsourcing, and (2) clearly explain the lower prevalence rates of IT services outsourcing by identifying similarities and differences between both types of outsourcing contexts. This research seeks to fill that gap and possibly provide potential strategic guidelines to ITS firms in Taiwan. This study adopts Transaction Cost Economics (TCE) as the theoretical basis. The basic premise is that different types of outsourcing activities may incur differing transaction costs and realize varying degrees of outsourcing success due to differential attributes of the transactions in the outsourcing process. Using primary data gathered from questionnaire surveys of ninety two firms, the results from exploratory analysis and binary logistic regression indicated that (1) when outsourcing to China, Taiwanese firms' ITS outsourcing tends to have higher level of asset specificity, uncertainty and technical skills relative to EM outsourcing, and these features indirectly reduce firms' outsourcing prevalence rates via their direct positive impacts on transaction costs; (2) Taiwanese firms' ITS outsourcing tends to have lower level of transaction structurability relative to EM outsourcing, and this feature indirectly increases firms' outsourcing prevalence rates via its direct negative impacts on transaction costs; (3) frequency does influence firms' transaction costs in ITS outsourcing positively, but does not bring impacts into their outsourcing prevalence rates, (4) relatedness does influence firms' transaction costs positively and prevalence rates negatively in ITS outsourcing, but its impacts on the prevalence rates are not caused by the mediation effects of transaction costs, and (5) firm size of outsourcing provider does not affect firms' transaction costs, but does affect their outsourcing prevalence rates in ITS outsourcing directly and positively. Using primary data gathered from face-to-face interviews of executives from seven firms, the results from inductive analysis indicated that (1) IT services outsourcing has lower prevalence rates than EM outsourcing, and (2) this result is mainly attributed to Taiwan's core competence in manufacturing and management and higher overall transaction costs of IT services outsourcing. Specifically, there is not much difference between both types of outsourcing context in the transaction characteristics of reputation and most aspects of overall comparison. Although there are some differences in the feature of firm size of the outsourcing provider, the difference doesn't cause apparent impacts on firms' overall transaction costs. The medium or above medium difference in the transaction characteristics of asset specificity, uncertainty, frequency, technical skills, transaction structurability, and relatedness has caused higher overall transaction costs for IT services outsourcing. This higher cost might cause lower prevalence rates for ITS outsourcing relative to EM outsourcing. Overall, the interview results are consistent with the statistical analyses and provide support to my expectation that in outsourcing to China, Taiwan's electronic manufacturing firms do have lower prevalence rates of IT services outsourcing relative to EM outsourcing due to higher transaction costs caused by certain attributes. To solve this problem, firms' management should aim at identifying alternative strategies and strive to reduce their overall transaction costs of IT services outsourcing by initiating appropriate strategies which fit their environment and needs.
Resumo:
Background The HIV virus is known for its ability to exploit numerous genetic and evolutionary mechanisms to ensure its proliferation, among them, high replication, mutation and recombination rates. Sliding MinPD, a recently introduced computational method [1], was used to investigate the patterns of evolution of serially-sampled HIV-1 sequence data from eight patients with a special focus on the emergence of X4 strains. Unlike other phylogenetic methods, Sliding MinPD combines distance-based inference with a nonparametric bootstrap procedure and automated recombination detection to reconstruct the evolutionary history of longitudinal sequence data. We present serial evolutionary networks as a longitudinal representation of the mutational pathways of a viral population in a within-host environment. The longitudinal representation of the evolutionary networks was complemented with charts of clinical markers to facilitate correlation analysis between pertinent clinical information and the evolutionary relationships. Results Analysis based on the predicted networks suggests the following:: significantly stronger recombination signals (p = 0.003) for the inferred ancestors of the X4 strains, recombination events between different lineages and recombination events between putative reservoir virus and those from a later population, an early star-like topology observed for four of the patients who died of AIDS. A significantly higher number of recombinants were predicted at sampling points that corresponded to peaks in the viral load levels (p = 0.0042). Conclusion Our results indicate that serial evolutionary networks of HIV sequences enable systematic statistical analysis of the implicit relations embedded in the topology of the structure and can greatly facilitate identification of patterns of evolution that can lead to specific hypotheses and new insights. The conclusions of applying our method to empirical HIV data support the conventional wisdom of the new generation HIV treatments, that in order to keep the virus in check, viral loads need to be suppressed to almost undetectable levels.
Resumo:
Background: Biologists often need to assess whether unfamiliar datasets warrant the time investment required for more detailed exploration. Basing such assessments on brief descriptions provided by data publishers is unwieldy for large datasets that contain insights dependent on specific scientific questions. Alternatively, using complex software systems for a preliminary analysis may be deemed as too time consuming in itself, especially for unfamiliar data types and formats. This may lead to wasted analysis time and discarding of potentially useful data. Results: We present an exploration of design opportunities that the Google Maps interface offers to biomedical data visualization. In particular, we focus on synergies between visualization techniques and Google Maps that facilitate the development of biological visualizations which have both low-overhead and sufficient expressivity to support the exploration of data at multiple scales. The methods we explore rely on displaying pre-rendered visualizations of biological data in browsers, with sparse yet powerful interactions, by using the Google Maps API. We structure our discussion around five visualizations: a gene co-regulation visualization, a heatmap viewer, a genome browser, a protein interaction network, and a planar visualization of white matter in the brain. Feedback from collaborative work with domain experts suggests that our Google Maps visualizations offer multiple, scale-dependent perspectives and can be particularly helpful for unfamiliar datasets due to their accessibility. We also find that users, particularly those less experienced with computer use, are attracted by the familiarity of the Google Maps API. Our five implementations introduce design elements that can benefit visualization developers. Conclusions: We describe a low-overhead approach that lets biologists access readily analyzed views of unfamiliar scientific datasets. We rely on pre-computed visualizations prepared by data experts, accompanied by sparse and intuitive interactions, and distributed via the familiar Google Maps framework. Our contributions are an evaluation demonstrating the validity and opportunities of this approach, a set of design guidelines benefiting those wanting to create such visualizations, and five concrete example visualizations.
Resumo:
Software Engineering is one of the most widely researched areas of Computer Science. The ability to reuse software, much like reuse of hardware components is one of the key issues in software development. The object-oriented programming methodology is revolutionary in that it promotes software reusability. This thesis describes the development of a tool that helps programmers to design and implement software from within the Smalltalk Environment (an Object- Oriented programming environment). The ASDN tool is part of the PEREAM (Programming Environment for the Reuse and Evolution of Abstract Models) system, which advocates incremental development of software. The Asdn tool along with the PEREAM system seeks to enhance the Smalltalk programming environment by providing facilities for structured development of abstractions (concepts). It produces a document that describes the abstractions that are developed using this tool. The features of the ASDN tool are illustrated by an example.
Resumo:
The purpose of this study was to determine the flooding potential of contaminated areas within the White Oak Creek watershed in the Oak Ridge Reservation in Tennessee. The watershed was analyzed with an integrated surface and subsurface numerical model based on MIKE SHE/MIKE 11 software. The model was calibrated and validated using five decades of historical data. A series of simulations were conducted to determine the watershed response to 25 year, 100 year and 500 year precipitation forecasts; flooding maps were generated for those events. Predicted flood events were compared to Log Pearson III flood flow frequency values for validation. This investigation also provides an improved understanding of the water fluxes between the surface and subsurface subdomains as they affect flood frequencies. In sum, this study presents crucial information to further assess the environmental risks of potential mobilization of contaminants of concern during extreme precipitation events.
Resumo:
The objective of this thesis was to investigate the effects of the built environment on the outcome of young patients. This investigation included recent innovations in children's hospitals that integrated both medical and architectural case studies as part of their design issues. In addition, the intervention responded to man-made conditions and natural elements of the site. The thesis project, a Children's Rehabilitation Hospital, is located at 1500 N.W. River Drive in Miami, Florida. The thesis intervention emerged from a site analysis that focused on the shifting of the urban grid, the variation in scale of the immediate context and the visual-physical connection to the river's edge. Furthermore, it addressed the issues of overnight accommodation for patient's families, as well as sound control through the use of specific materials in space enclosures and open courtyards. The key to the success of this intervention lies in the special attention given to the integration between nature and the built environment. Issues such as the incorporation of nature within a building through the use of vistas and the exploitation of natural light through windows and skylights, were pivotal in the creation of a pleasant environment for visitors, employees and young patients.
Resumo:
Today, databases have become an integral part of information systems. In the past two decades, we have seen different database systems being developed independently and used in different applications domains. Today's interconnected networks and advanced applications, such as data warehousing, data mining & knowledge discovery and intelligent data access to information on the Web, have created a need for integrated access to such heterogeneous, autonomous, distributed database systems. Heterogeneous/multidatabase research has focused on this issue resulting in many different approaches. However, a single, generally accepted methodology in academia or industry has not emerged providing ubiquitous intelligent data access from heterogeneous, autonomous, distributed information sources. This thesis describes a heterogeneous database system being developed at Highperformance Database Research Center (HPDRC). A major impediment to ubiquitous deployment of multidatabase technology is the difficulty in resolving semantic heterogeneity. That is, identifying related information sources for integration and querying purposes. Our approach considers the semantics of the meta-data constructs in resolving this issue. The major contributions of the thesis work include: (i.) providing a scalable, easy-to-implement architecture for developing a heterogeneous multidatabase system, utilizing Semantic Binary Object-oriented Data Model (Sem-ODM) and Semantic SQL query language to capture the semantics of the data sources being integrated and to provide an easy-to-use query facility; (ii.) a methodology for semantic heterogeneity resolution by investigating into the extents of the meta-data constructs of component schemas. This methodology is shown to be correct, complete and unambiguous; (iii.) a semi-automated technique for identifying semantic relations, which is the basis of semantic knowledge for integration and querying, using shared ontologies for context-mediation; (iv.) resolutions for schematic conflicts and a language for defining global views from a set of component Sem-ODM schemas; (v.) design of a knowledge base for storing and manipulating meta-data and knowledge acquired during the integration process. This knowledge base acts as the interface between integration and query processing modules; (vi.) techniques for Semantic SQL query processing and optimization based on semantic knowledge in a heterogeneous database environment; and (vii.) a framework for intelligent computing and communication on the Internet applying the concepts of our work.