987 resultados para Computer Sciences Corporation
Resumo:
Spreadsheets are widely used but often contain faults. Thus, in prior work we presented a data-flow testing methodology for use with spreadsheets, which studies have shown can be used cost-effectively by end-user programmers. To date, however, the methodology has been investigated across a limited set of spreadsheet language features. Commercial spreadsheet environments are multiparadigm languages, utilizing features not accommodated by our prior approaches. In addition, most spreadsheets contain large numbers of replicated formulas that severely limit the efficiency of data-flow testing approaches. We show how to handle these two issues with a new data-flow adequacy criterion and automated detection of areas of replicated formulas, and report results of a controlled experiment investigating the feasibility of our approach.
Resumo:
End-user programmers are increasingly relying on web authoring environments to create web applications. Although often consisting primarily of web pages, such applications are increasingly going further, harnessing the content available on the web through “programs” that query other web applications for information to drive other tasks. Unfortunately, errors can be pervasive in web applications, impacting their dependability. This paper reports the results of an exploratory study of end-user web application developers, performed with the aim of exposing prevalent classes of errors. The results suggest that end-users struggle the most with the identification and manipulation of variables when structuring requests to obtain data from other web sites. To address this problem, we present a family of techniques that help end user programmers perform this task, reducing possible sources of error. The techniques focus on simplification and characterization of the data that end-users must analyze while developing their web applications. We report the results of an empirical study in which these techniques are applied to several popular web sites. Our results reveal several potential benefits for end-users who wish to “engineer” dependable web applications.
Resumo:
The multiple-instance learning (MIL) model has been successful in areas such as drug discovery and content-based image-retrieval. Recently, this model was generalized and a corresponding kernel was introduced to learn generalized MIL concepts with a support vector machine. While this kernel enjoyed empirical success, it has limitations in its representation. We extend this kernel by enriching its representation and empirically evaluate our new kernel on data from content-based image retrieval, biological sequence analysis, and drug discovery. We found that our new kernel generalized noticeably better than the old one in content-based image retrieval and biological sequence analysis and was slightly better or even with the old kernel in the other applications, showing that an SVM using this kernel does not overfit despite its richer representation.
Resumo:
Active machine learning algorithms are used when large numbers of unlabeled examples are available and getting labels for them is costly (e.g. requiring consulting a human expert). Many conventional active learning algorithms focus on refining the decision boundary, at the expense of exploring new regions that the current hypothesis misclassifies. We propose a new active learning algorithm that balances such exploration with refining of the decision boundary by dynamically adjusting the probability to explore at each step. Our experimental results demonstrate improved performance on data sets that require extensive exploration while remaining competitive on data sets that do not. Our algorithm also shows significant tolerance of noise.
Resumo:
Due to the lack of optical random access memory, optical fiber delay line (FDL) is currently the only way to implement optical buffering. Feed-forward and feedback are two kinds of FDL structures in optical buffering. Both have advantages and disadvantages. In this paper, we propose a more effective hybrid FDL architecture that combines the merits of both schemes. The core of this switch is the arrayed waveguide grating (AWG) and the tunable wavelength converter (TWC). It requires smaller optical device sizes and fewer wavelengths and has less noise than feedback architecture. At the same time, it can facilitate preemptive priority routing which feed-forward architecture cannot support. Our numerical results show that the new switch architecture significantly reduces packet loss probability.
Resumo:
In this paper, a cross-layer solution for packet size optimization in wireless sensor networks (WSN) is introduced such that the effects of multi-hop routing, the broadcast nature of the physical wireless channel, and the effects of error control techniques are captured. A key result of this paper is that contrary to the conventional wireless networks, in wireless sensor networks, longer packets reduce the collision probability. Consequently, an optimization solution is formalized by using three different objective functions, i.e., packet throughput, energy consumption, and resource utilization. Furthermore, the effects of end-to-end latency and reliability constraints are investigated that may be required by a particular application. As a result, a generic, cross-layer optimization framework is developed to determine the optimal packet size in WSN. This framework is further extended to determine the optimal packet size in underwater and underground sensor networks. From this framework, the optimal packet sizes under various network parameters are determined.
Resumo:
The integration of CMOS cameras with embedded processors and wireless communication devices has enabled the development of distributed wireless vision systems. Wireless Vision Sensor Networks (WVSNs), which consist of wirelessly connected embedded systems with vision and sensing capabilities, provide wide variety of application areas that have not been possible to realize with the wall-powered vision systems with wired links or scalar-data based wireless sensor networks. In this paper, the design of a middleware for a wireless vision sensor node is presented for the realization of WVSNs. The implemented wireless vision sensor node is tested through a simple vision application to study and analyze its capabilities, and determine the challenges in distributed vision applications through a wireless network of low-power embedded devices. The results of this paper highlight the practical concerns for the development of efficient image processing and communication solutions for WVSNs and emphasize the need for cross-layer solutions that unify these two so-far-independent research areas.
Resumo:
The emerging Cyber-Physical Systems (CPSs) are envisioned to integrate computation, communication and control with the physical world. Therefore, CPS requires close interactions between the cyber and physical worlds both in time and space. These interactions are usually governed by events, which occur in the physical world and should autonomously be reflected in the cyber-world, and actions, which are taken by the CPS as a result of detection of events and certain decision mechanisms. Both event detection and action decision operations should be performed accurately and timely to guarantee temporal and spatial correctness. This calls for a flexible architecture and task representation framework to analyze CP operations. In this paper, we explore the temporal and spatial properties of events, define a novel CPS architecture, and develop a layered spatiotemporal event model for CPS. The event is represented as a function of attribute-based, temporal, and spatial event conditions. Moreover, logical operators are used to combine different types of event conditions to capture composite events. To the best of our knowledge, this is the first event model that captures the heterogeneous characteristics of CPS for formal temporal and spatial analysis.
Resumo:
One problem with using component-based software development approach is that once software modules are reused over generations of products, they form legacy structures that can be challenging to understand, making validating these systems difficult. Therefore, tools and methodologies that enable engineers to see interactions of these software modules will enhance their ability to make these software systems more dependable. To address this need, we propose SimSight, a framework to capture dynamic call graphs in Simics, a widely adopted commercial full-system simulator. Simics is a software system that simulates complete computer systems. Thus, it performs nearly identical tasks to a real system but at a much lower speed while providing greater execution observability. We have implemented SimSight to generate dynamic call graphs of statically and dynamically linked functions in x86/Linux environment. A case study illustrates how we can use SimSight to identify sources of software errors. We then evaluate its performance using 12 integer programs from SPEC CPU2006 benchmark suite.
Resumo:
Mashups are becoming increasingly popular as end users are able to easily access, manipulate, and compose data from several web sources. To support end users, communities are forming around mashup development environments that facilitate sharing code and knowledge. We have observed, however, that end user mashups tend to suffer from several deficiencies, such as inoperable components or references to invalid data sources, and that those deficiencies are often propagated through the rampant reuse in these end user communities. In this work, we identify and specify ten code smells indicative of deficiencies we observed in a sample of 8,051 pipe-like web mashups developed by thousands of end users in the popular Yahoo! Pipes environment. We show through an empirical study that end users generally prefer pipes that lack those smells, and then present eleven specialized refactorings that we designed to target and remove the smells. Our refactorings reduce the complexity of pipes, increase their abstraction, update broken sources of data and dated components, and standardize pipes to fit the community development patterns. Our assessment on the sample of mashups shows that smells are present in 81% of the pipes, and that the proposed refactorings can reduce that number to 16%, illustrating the potential of refactoring to support thousands of end users developing pipe-like mashups.
Resumo:
Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for storage space. CMS data is stored using HDFS (Hadoop Distributed File System). HDFS logs give information regarding file access operations. Hadoop MapReduce was used to feed information in these logs to Support Vector Machines (SVMs), a machine learning algorithm applicable to classification and regression which is used in this Thesis to develop a classifier. Time elapsed in data set classification by this method is dependent on the size of the input HDFS log file since the algorithmic complexities of Hadoop MapReduce algorithms here are O(n). The SVM methodology produces a list of data sets for deletion along with their respective sizes. This methodology was also compared with a heuristic called Retention Cost which was calculated using size of the data set and the time since its last access to help decide how useful a data set is. Accuracies of both were compared by calculating the percentage of data sets predicted for deletion which were accessed at a later instance of time. Our methodology using SVMs proved to be more accurate than using the Retention Cost heuristic. This methodology could be used to solve similar problems involving other large data sets.
Resumo:
Software product line (SPL) engineering offers several advantages in the development of families of software products such as reduced costs, high quality and a short time to market. A software product line is a set of software intensive systems, each of which shares a common core set of functionalities, but also differs from the other products through customization tailored to fit the needs of individual groups of customers. The differences between products within the family are well-understood and organized into a feature model that represents the variability of the SPL. Products can then be built by generating and composing features described in the feature model. Testing of software product lines has become a bottleneck in the SPL development lifecycle, since many of the techniques used in their testing have been borrowed from traditional software testing and do not directly take advantage of the similarities between products. This limits the overall gains that can be achieved in SPL engineering. Recent work proposed by both industry and the research community for improving SPL testing has begun to consider this problem, but there is still a need for better testing techniques that are tailored to SPL development. In this thesis, I make two primary contributions to software product line testing. First I propose a new definition for testability of SPLs that is based on the ability to re-use test cases between products without a loss of fault detection effectiveness. I build on this idea to identify elements of the feature model that contribute positively and/or negatively towards SPL testability. Second, I provide a graph based testing approach called the FIG Basis Path method that selects products and features for testing based on a feature dependency graph. This method should increase our ability to re-use results of test cases across successive products in the family and reduce testing effort. I report the results of a case study involving several non-trivial SPLs and show that for these objects, the FIG Basis Path method is as effective as testing all products, but requires us to test no more than 24% of the products in the SPL.
Resumo:
Observability measures the support of computer systems to accurately capture, analyze, and present (collectively observe) the internal information about the systems. Observability frameworks play important roles for program understanding, troubleshooting, performance diagnosis, and optimizations. However, traditional solutions are either expensive or coarse-grained, consequently compromising their utility in accommodating today’s increasingly complex software systems. New solutions are emerging for VM-based languages due to the full control language VMs have over program executions. Existing such solutions, nonetheless, still lack flexibility, have high overhead, or provide limited context information for developing powerful dynamic analyses. In this thesis, we present a VM-based infrastructure, called marker tracing framework (MTF), to address the deficiencies in the existing solutions for providing better observability for VM-based languages. MTF serves as a solid foundation for implementing fine-grained low-overhead program instrumentation. Specifically, MTF allows analysis clients to: 1) define custom events with rich semantics ; 2) specify precisely the program locations where the events should trigger; and 3) adaptively enable/disable the instrumentation at runtime. In addition, MTF-based analysis clients are more powerful by having access to all information available to the VM. To demonstrate the utility and effectiveness of MTF, we present two analysis clients: 1) dynamic typestate analysis with adaptive online program analysis (AOPA); and 2) selective probabilistic calling context analysis (SPCC). In addition, we evaluate the runtime performance of MTF and the typestate client with the DaCapo benchmarks. The results show that: 1) MTF has acceptable runtime overhead when tracing moderate numbers of marker events; and 2) AOPA is highly effective in reducing the event frequency for the dynamic typestate analysis; and 3) language VMs can be exploited to offer greater observability.
Resumo:
In this paper, we consider the problem of topology design for optical networks. We investigate the problem of selecting switching sites to minimize total cost of the optical network. The cost of an optical network can be expressed as a sum of three main factors: the site cost, the link cost, and the switch cost. To the best of our knowledge, this problem has not been studied in its general form as investigated in this paper. We present a mixed integer quadratic programming (MIQP) formulation of the problem to find the optimal value of the total network cost. We also present an efficient heuristic to approximate the solution in polynomial time. The experimental results show good performance of the heuristic. The value of the total network cost computed by the heuristic varies within 2% to 21% of its optimal value in the experiments with 10 nodes. The total network cost computed by the heuristic for 51% of the experiments with 10 node network topologies varies within 8% of its optimal value. We also discuss the insight gained from our experiments.
Resumo:
Die Forschungsarbeit siedelt sich im Dreieck der Erziehungswissenschaften, der Informatik und der Schulpraxis an und besitzt somit einen starken interdisziplinären Charakter. Aus Sicht der Erziehungswissenschaften handelt es sich um ein Forschungsprojekt aus den Bereichen E-Learning und Multimedia Learning und der Fragestellung nach geeigneten Informatiksystemen für die Herstellung und den Austausch von digitalen, multimedialen und interaktiven Lernbausteinen. Dazu wurden zunächst methodisch-didaktische Vorteile digitaler Lerninhalte gegenüber klassischen Medien wie Buch und Papier zusammengetragen und mögliche Potentiale im Zusammenhang mit neuen Web2.0-Technologien aufgezeigt. Darauf aufbauend wurde für existierende Autorenwerkzeuge zur Herstellung digitaler Lernbausteine und bestehende Austauschplattformen analysiert, inwieweit diese bereits Web 2.0-Technologien unterstützen und nutzen. Aus Sicht der Informatik ergab sich aus der Analyse bestehender Systeme ein Anforderungsprofil für ein neues Autorenwerkzeug und eine neue Austauschplattform für digitale Lernbausteine. Das neue System wurde nach dem Ansatz des Design Science Research in einem iterativen Entwicklungsprozess in Form der Webapplikation LearningApps.org realisiert und stetig mit Lehrpersonen aus der Schulpraxis evaluiert. Bei der Entwicklung kamen aktuelle Web-Technologien zur Anwendung. Das Ergebnis der Forschungsarbeit ist ein produktives Informatiksystem, welches bereits von tausenden Nutzern in verschiedenen Ländern sowohl in Schulen als auch in der Wirtschaft eingesetzt wird. In einer empirischen Studie konnte das mit der Systementwicklung angestrebte Ziel, die Herstellung und den Austausch von digitalen Lernbausteinen zu vereinfachen, bestätigt werden. Aus Sicht der Schulpraxis liefert LearningApps.org einen Beitrag zur Methodenvielfalt und zur Nutzung von ICT im Unterricht. Die Ausrichtung des Werkzeugs auf mobile Endgeräte und 1:1-Computing entspricht dem allgemeinen Trend im Bildungswesen. Durch die Verknüpfung des Werkzeugs mit aktuellen Software Entwicklungen zur Herstellung von digitalen Schulbüchern werden auch Lehrmittelverlage als Zielgruppe angesprochen.