988 resultados para Computer Sciences


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mashups are becoming increasingly popular as end users are able to easily access, manipulate, and compose data from several web sources. To support end users, communities are forming around mashup development environments that facilitate sharing code and knowledge. We have observed, however, that end user mashups tend to suffer from several deficiencies, such as inoperable components or references to invalid data sources, and that those deficiencies are often propagated through the rampant reuse in these end user communities. In this work, we identify and specify ten code smells indicative of deficiencies we observed in a sample of 8,051 pipe-like web mashups developed by thousands of end users in the popular Yahoo! Pipes environment. We show through an empirical study that end users generally prefer pipes that lack those smells, and then present eleven specialized refactorings that we designed to target and remove the smells. Our refactorings reduce the complexity of pipes, increase their abstraction, update broken sources of data and dated components, and standardize pipes to fit the community development patterns. Our assessment on the sample of mashups shows that smells are present in 81% of the pipes, and that the proposed refactorings can reduce that number to 16%, illustrating the potential of refactoring to support thousands of end users developing pipe-like mashups.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Hundreds of Terabytes of CMS (Compact Muon Solenoid) data are being accumulated for storage day by day at the University of Nebraska-Lincoln, which is one of the eight US CMS Tier-2 sites. Managing this data includes retaining useful CMS data sets and clearing storage space for newly arriving data by deleting less useful data sets. This is an important task that is currently being done manually and it requires a large amount of time. The overall objective of this study was to develop a methodology to help identify the data sets to be deleted when there is a requirement for storage space. CMS data is stored using HDFS (Hadoop Distributed File System). HDFS logs give information regarding file access operations. Hadoop MapReduce was used to feed information in these logs to Support Vector Machines (SVMs), a machine learning algorithm applicable to classification and regression which is used in this Thesis to develop a classifier. Time elapsed in data set classification by this method is dependent on the size of the input HDFS log file since the algorithmic complexities of Hadoop MapReduce algorithms here are O(n). The SVM methodology produces a list of data sets for deletion along with their respective sizes. This methodology was also compared with a heuristic called Retention Cost which was calculated using size of the data set and the time since its last access to help decide how useful a data set is. Accuracies of both were compared by calculating the percentage of data sets predicted for deletion which were accessed at a later instance of time. Our methodology using SVMs proved to be more accurate than using the Retention Cost heuristic. This methodology could be used to solve similar problems involving other large data sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Software product line (SPL) engineering offers several advantages in the development of families of software products such as reduced costs, high quality and a short time to market. A software product line is a set of software intensive systems, each of which shares a common core set of functionalities, but also differs from the other products through customization tailored to fit the needs of individual groups of customers. The differences between products within the family are well-understood and organized into a feature model that represents the variability of the SPL. Products can then be built by generating and composing features described in the feature model. Testing of software product lines has become a bottleneck in the SPL development lifecycle, since many of the techniques used in their testing have been borrowed from traditional software testing and do not directly take advantage of the similarities between products. This limits the overall gains that can be achieved in SPL engineering. Recent work proposed by both industry and the research community for improving SPL testing has begun to consider this problem, but there is still a need for better testing techniques that are tailored to SPL development. In this thesis, I make two primary contributions to software product line testing. First I propose a new definition for testability of SPLs that is based on the ability to re-use test cases between products without a loss of fault detection effectiveness. I build on this idea to identify elements of the feature model that contribute positively and/or negatively towards SPL testability. Second, I provide a graph based testing approach called the FIG Basis Path method that selects products and features for testing based on a feature dependency graph. This method should increase our ability to re-use results of test cases across successive products in the family and reduce testing effort. I report the results of a case study involving several non-trivial SPLs and show that for these objects, the FIG Basis Path method is as effective as testing all products, but requires us to test no more than 24% of the products in the SPL.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Observability measures the support of computer systems to accurately capture, analyze, and present (collectively observe) the internal information about the systems. Observability frameworks play important roles for program understanding, troubleshooting, performance diagnosis, and optimizations. However, traditional solutions are either expensive or coarse-grained, consequently compromising their utility in accommodating today’s increasingly complex software systems. New solutions are emerging for VM-based languages due to the full control language VMs have over program executions. Existing such solutions, nonetheless, still lack flexibility, have high overhead, or provide limited context information for developing powerful dynamic analyses. In this thesis, we present a VM-based infrastructure, called marker tracing framework (MTF), to address the deficiencies in the existing solutions for providing better observability for VM-based languages. MTF serves as a solid foundation for implementing fine-grained low-overhead program instrumentation. Specifically, MTF allows analysis clients to: 1) define custom events with rich semantics ; 2) specify precisely the program locations where the events should trigger; and 3) adaptively enable/disable the instrumentation at runtime. In addition, MTF-based analysis clients are more powerful by having access to all information available to the VM. To demonstrate the utility and effectiveness of MTF, we present two analysis clients: 1) dynamic typestate analysis with adaptive online program analysis (AOPA); and 2) selective probabilistic calling context analysis (SPCC). In addition, we evaluate the runtime performance of MTF and the typestate client with the DaCapo benchmarks. The results show that: 1) MTF has acceptable runtime overhead when tracing moderate numbers of marker events; and 2) AOPA is highly effective in reducing the event frequency for the dynamic typestate analysis; and 3) language VMs can be exploited to offer greater observability.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper, we consider the problem of topology design for optical networks. We investigate the problem of selecting switching sites to minimize total cost of the optical network. The cost of an optical network can be expressed as a sum of three main factors: the site cost, the link cost, and the switch cost. To the best of our knowledge, this problem has not been studied in its general form as investigated in this paper. We present a mixed integer quadratic programming (MIQP) formulation of the problem to find the optimal value of the total network cost. We also present an efficient heuristic to approximate the solution in polynomial time. The experimental results show good performance of the heuristic. The value of the total network cost computed by the heuristic varies within 2% to 21% of its optimal value in the experiments with 10 nodes. The total network cost computed by the heuristic for 51% of the experiments with 10 node network topologies varies within 8% of its optimal value. We also discuss the insight gained from our experiments.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Die Forschungsarbeit siedelt sich im Dreieck der Erziehungswissenschaften, der Informatik und der Schulpraxis an und besitzt somit einen starken interdisziplinären Charakter. Aus Sicht der Erziehungswissenschaften handelt es sich um ein Forschungsprojekt aus den Bereichen E-Learning und Multimedia Learning und der Fragestellung nach geeigneten Informatiksystemen für die Herstellung und den Austausch von digitalen, multimedialen und interaktiven Lernbausteinen. Dazu wurden zunächst methodisch-didaktische Vorteile digitaler Lerninhalte gegenüber klassischen Medien wie Buch und Papier zusammengetragen und mögliche Potentiale im Zusammenhang mit neuen Web2.0-Technologien aufgezeigt. Darauf aufbauend wurde für existierende Autorenwerkzeuge zur Herstellung digitaler Lernbausteine und bestehende Austauschplattformen analysiert, inwieweit diese bereits Web 2.0-Technologien unterstützen und nutzen. Aus Sicht der Informatik ergab sich aus der Analyse bestehender Systeme ein Anforderungsprofil für ein neues Autorenwerkzeug und eine neue Austauschplattform für digitale Lernbausteine. Das neue System wurde nach dem Ansatz des Design Science Research in einem iterativen Entwicklungsprozess in Form der Webapplikation LearningApps.org realisiert und stetig mit Lehrpersonen aus der Schulpraxis evaluiert. Bei der Entwicklung kamen aktuelle Web-Technologien zur Anwendung. Das Ergebnis der Forschungsarbeit ist ein produktives Informatiksystem, welches bereits von tausenden Nutzern in verschiedenen Ländern sowohl in Schulen als auch in der Wirtschaft eingesetzt wird. In einer empirischen Studie konnte das mit der Systementwicklung angestrebte Ziel, die Herstellung und den Austausch von digitalen Lernbausteinen zu vereinfachen, bestätigt werden. Aus Sicht der Schulpraxis liefert LearningApps.org einen Beitrag zur Methodenvielfalt und zur Nutzung von ICT im Unterricht. Die Ausrichtung des Werkzeugs auf mobile Endgeräte und 1:1-Computing entspricht dem allgemeinen Trend im Bildungswesen. Durch die Verknüpfung des Werkzeugs mit aktuellen Software Entwicklungen zur Herstellung von digitalen Schulbüchern werden auch Lehrmittelverlage als Zielgruppe angesprochen.

Relevância:

60.00% 60.00%

Publicador:

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Simulation is an important resource for researchers in diverse fields. However, many researchers have found flaws in the methodology of published simulation studies and have described the state of the simulation community as being in a crisis of credibility. This work describes the project of the Simulation Automation Framework for Experiments (SAFE), which addresses the issues that undermine credibility by automating the workflow in the execution of simulation studies. Automation reduces the number of opportunities for users to introduce error in the scientific process thereby improvingthe credibility of the final results. Automation also eases the job of simulation users and allows them to focus on the design of models and the analysis of results rather than on the complexities of the workflow.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The Simulation Automation Framework for Experiments (SAFE) is a project created to raise the level of abstraction in network simulation tools and thereby address issues that undermine credibility. SAFE incorporates best practices in network simulationto automate the experimental process and to guide users in the development of sound scientific studies using the popular ns-3 network simulator. My contributions to the SAFE project: the design of two XML-based languages called NEDL (ns-3 Experiment Description Language) and NSTL (ns-3 Script Templating Language), which facilitate the description of experiments and network simulationmodels, respectively. The languages provide a foundation for the construction of better interfaces between the user and the ns-3 simulator. They also provide input to a mechanism which automates the execution of network simulation experiments. Additionally,this thesis demonstrates that one can develop tools to generate ns-3 scripts in Python or C++ automatically from NSTL model descriptions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this paper we present a new population-based implant design methodology, which advances the state-of-the-art approaches by combining shape and bone quality information into the design strategy. The method enhances the mechanical stability of the fixation and reduces the intra-operative in-plane bending which might impede the functionality of the locking mechanism. The method is presented for the case of mandibular locking fixation plates, where the mandibular angle and the bone quality at screw locations are taken into account. Using computational anatomy techniques, the method automatically derives, from a set of computed tomography images, the mandibular angle and the bone thickness and intensity values at the path of every screw. An optimisation strategy is then used to optimise the two parameters of plate angle and screw position. Results for the new design are presented along with a comparison with a commercially available mandibular locking fixation plate. A statistically highly significant improvement was observed. Our experiments allowed us to conclude that an angle of 126° and a screw separation of 8mm is a more suitable design than the standard 120° and 9mm.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

With the advent of cheaper and faster DNA sequencing technologies, assembly methods have greatly changed. Instead of outputting reads that are thousands of base pairs long, new sequencers parallelize the task by producing read lengths between 35 and 400 base pairs. Reconstructing an organism’s genome from these millions of reads is a computationally expensive task. Our algorithm solves this problem by organizing and indexing the reads using n-grams, which are short, fixed-length DNA sequences of length n. These n-grams are used to efficiently locate putative read joins, thereby eliminating the need to perform an exhaustive search over all possible read pairs. Our goal was develop a novel n-gram method for the assembly of genomes from next-generation sequencers. Specifically, a probabilistic, iterative approach was utilized to determine the most likely reads to join through development of a new metric that models the probability of any two arbitrary reads being joined together. Tests were run using simulated short read data based on randomly created genomes ranging in lengths from 10,000 to 100,000 nucleotides with 16 to 20x coverage. We were able to successfully re-assemble entire genomes up to 100,000 nucleotides in length.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nitrogen and water are essential for plant growth and development. In this study, we designed experiments to produce gene expression data of poplar roots under nitrogen starvation and water deprivation conditions. We found low concentration of nitrogen led first to increased root elongation followed by lateral root proliferation and eventually increased root biomass. To identify genes regulating root growth and development under nitrogen starvation and water deprivation, we designed a series of data analysis procedures, through which, we have successfully identified biologically important genes. Differentially Expressed Genes (DEGs) analysis identified the genes that are differentially expressed under nitrogen starvation or drought. Protein domain enrichment analysis identified enriched themes (in same domains) that are highly interactive during the treatment. Gene Ontology (GO) enrichment analysis allowed us to identify biological process changed during nitrogen starvation. Based on the above analyses, we examined the local Gene Regulatory Network (GRN) and identified a number of transcription factors. After testing, one of them is a high hierarchically ranked transcription factor that affects root growth under nitrogen starvation. It is very tedious and time-consuming to analyze gene expression data. To avoid doing analysis manually, we attempt to automate a computational pipeline that now can be used for identification of DEGs and protein domain analysis in a single run. It is implemented in scripts of Perl and R.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Effective techniques for organizing and visualizing large image collections are in growing demand as visual search gets increasingly popular. iMap is a treemap representation for visualizing and navigating image search and clustering results based on the evaluation of image similarity using both visual and textual information. iMap not only makes effective use of available display area to arrange images but also maintains stable update when images are inserted or removed during the query. A key challenge of using iMap lies in the difficult to follow and track the changes when updating the image arrangement as the query image changes. For many information visualization applications, showing the transition when interacting with the data is critically important as it can help users better perceive the changes and understand the underlying data. This work investigates the effectiveness of animated transition in a tiled image layout where the spiral arrangement of the images is based on their similarity. Three aspects of animated transition are considered, including animation steps, animation actions, and flying paths. Exploring and weighting the advantages and disadvantages of different methods for each aspect and in conjunction with the characteristics of the spiral image layout, we present an integrated solution, called AniMap, for animating the transition from an old layout to a new layout when a different image is selected as the query image. To smooth the animation and reduce the overlap among images during the transition, we explore different factors that might have an impact on the animation and propose our solution accordingly. We show the effectiveness of our animated transition solution by demonstrating experimental results and conducting a comparative user study.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Three-dimensional flow visualization plays an essential role in many areas of science and engineering, such as aero- and hydro-dynamical systems which dominate various physical and natural phenomena. For popular methods such as the streamline visualization to be effective, they should capture the underlying flow features while facilitating user observation and understanding of the flow field in a clear manner. My research mainly focuses on the analysis and visualization of flow fields using various techniques, e.g. information-theoretic techniques and graph-based representations. Since the streamline visualization is a popular technique in flow field visualization, how to select good streamlines to capture flow patterns and how to pick good viewpoints to observe flow fields become critical. We treat streamline selection and viewpoint selection as symmetric problems and solve them simultaneously using the dual information channel [81]. To the best of my knowledge, this is the first attempt in flow visualization to combine these two selection problems in a unified approach. This work selects streamline in a view-independent manner and the selected streamlines will not change for all viewpoints. My another work [56] uses an information-theoretic approach to evaluate the importance of each streamline under various sample viewpoints and presents a solution for view-dependent streamline selection that guarantees coherent streamline update when the view changes gradually. When projecting 3D streamlines to 2D images for viewing, occlusion and clutter become inevitable. To address this challenge, we design FlowGraph [57, 58], a novel compound graph representation that organizes field line clusters and spatiotemporal regions hierarchically for occlusion-free and controllable visual exploration. We enable observation and exploration of the relationships among field line clusters, spatiotemporal regions and their interconnection in the transformed space. Most viewpoint selection methods only consider the external viewpoints outside of the flow field. This will not convey a clear observation when the flow field is clutter on the boundary side. Therefore, we propose a new way to explore flow fields by selecting several internal viewpoints around the flow features inside of the flow field and then generating a B-Spline curve path traversing these viewpoints to provide users with closeup views of the flow field for detailed observation of hidden or occluded internal flow features [54]. This work is also extended to deal with unsteady flow fields. Besides flow field visualization, some other topics relevant to visualization also attract my attention. In iGraph [31], we leverage a distributed system along with a tiled display wall to provide users with high-resolution visual analytics of big image and text collections in real time. Developing pedagogical visualization tools forms my other research focus. Since most cryptography algorithms use sophisticated mathematics, it is difficult for beginners to understand both what the algorithm does and how the algorithm does that. Therefore, we develop a set of visualization tools to provide users with an intuitive way to learn and understand these algorithms.