948 resultados para open source
Resumo:
Il focus di questo elaborato è sui sistemi di recommendations e le relative caratteristiche. L'utilizzo di questi meccanism è sempre più forte e presente nel mondo del web, con un parallelo sviluppo di soluzioni sempre più accurate ed efficienti. Tra tutti gli approcci esistenti, si è deciso di prendere in esame quello affrontato in Apache Mahout. Questa libreria open source implementa il collaborative-filtering, basando il processo di recommendation sulle preferenze espresse dagli utenti riguardo ifferenti oggetti. Grazie ad Apache Mahout e ai principi base delle varie tipologie di recommendationè stato possibile realizzare un applicativo web che permette di produrre delle recommendations nell'ambito delle pubblicazioni scientifiche, selezionando quegli articoli che hanno un maggiore similarità con quelli pubblicati dall'utente corrente. La realizzazione di questo progetto ha portato alla definizione di un sistema ibrido. Infatti l'approccio alla recommendation di Apache Mahout non è completamente adattabile a questa situazione, per questo motivo le sue componenti sono state estese e modellate per il caso di studio. Siè cercato quindi di combinare il collaborative filtering e il content-based in un unico approccio. Di Apache Mahout si è mantenuto l'algoritmo attraverso il quale esaminare i dati del data set, tralasciando completamente l'aspetto legato alle preferenze degli utenti, poichè essi non esprimono delle valutazioni sugli articoli. Del content-based si è utilizzata l'idea del confronto tra i titoli delle pubblicazioni. La valutazione di questo applicativo ha portato alla luce diversi limiti, ma anche possibili sviluppi futuri che potrebbero migliorare la qualità delle recommendations, ma soprattuto le prestazioni. Grazie per esempio ad Apache Hadoop sarebbe possibile una computazione distribuita che permetterebbe di elaborare migliaia di dati con dei risultati più che discreti.
Resumo:
La quantità di dati che vengono generati e immagazzinati sta aumentando sempre più grazie alle nuove tecnologie e al numero di utenti sempre maggiore. Questi dati, elaborati correttamente, permettono quindi di ottenere delle informazioni di valore strategico che aiutano nell’effettuare decisioni aziendali a qualsiasi livello, dalla produzione fino al marketing. Sono nati soprattutto negli ultimi anni numerosi framework proprietari e open source che permettono l'elaborazione di questi dati sfruttando un cluster. In particolare tra i più utilizzati e attivi in questo momento a livello open source troviamo Hadoop e Spark. Obiettivo di questa tesi è realizzare un modello di Spark per realizzare una funzione di costo che sia non solo implementabile all’interno dell’ottimizzatore di Spark SQL, ma anche per poter effettuare delle simulazioni di esecuzione di query su tale sistema. Si è quindi studiato nel dettaglio con ducumentazione e test il comportamento del sistema per realizzare un modello. I dati ottenuti sono infine stati confrontati con dati sperimentali ottenuti tramite l'utilizzo di un cluster. Con la presenza di tale modello non solo risulta possibile comprendere in maniera più approfondita il reale comportamento di Spark ma permette anche di programmare applicazioni più efficienti e progettare con maggiore precisione sistemi per la gestione dei dataset che sfruttino tali framework.
Resumo:
In these last years, systems engineering has became one of the major research domains. The complexity of systems has increased constantly and nowadays Cyber-Physical Systems (CPS) are a category of particular interest: these, are systems composed by a cyber part (computer-based algorithms) that monitor and control some physical processes. Their development and simulation are both complex due to the importance of the interaction between the cyber and the physical entities: there are a lot of models written in different languages that need to exchange information among each other. Normally people use an orchestrator that takes care of the simulation of the models and the exchange of informations. This orchestrator is developed manually and this is a tedious and long work. Our proposition is to achieve to generate the orchestrator automatically through the use of Co-Modeling, i.e. by modeling the coordination. Before achieving this ultimate goal, it is important to understand the mechanisms and de facto standards that could be used in a co-modeling framework. So, I studied the use of a technology employed for co-simulation in the industry: FMI. In order to better understand the FMI standard, I realized an automatic export, in the FMI format, of the models realized in an existing software for discrete modeling: TimeSquare. I also developed a simple physical model in the existing open source openmodelica tool. Later, I started to understand how works an orchestrator, developing a simple one: this will be useful in future to generate an orchestrator automatically.
Resumo:
BACKGROUND: Physiologic data display is essential to decision making in critical care. Current displays echo first-generation hemodynamic monitors dating to the 1970s and have not kept pace with new insights into physiology or the needs of clinicians who must make progressively more complex decisions about their patients. The effectiveness of any redesign must be tested before deployment. Tools that compare current displays with novel presentations of processed physiologic data are required. Regenerating conventional physiologic displays from archived physiologic data is an essential first step. OBJECTIVES: The purposes of the study were to (1) describe the SSSI (single sensor single indicator) paradigm that is currently used for physiologic signal displays, (2) identify and discuss possible extensions and enhancements of the SSSI paradigm, and (3) develop a general approach and a software prototype to construct such "extended SSSI displays" from raw data. RESULTS: We present Multi Wave Animator (MWA) framework-a set of open source MATLAB (MathWorks, Inc., Natick, MA, USA) scripts aimed to create dynamic visualizations (eg, video files in AVI format) of patient vital signs recorded from bedside (intensive care unit or operating room) monitors. Multi Wave Animator creates animations in which vital signs are displayed to mimic their appearance on current bedside monitors. The source code of MWA is freely available online together with a detailed tutorial and sample data sets.
Resumo:
Software evolution research has focused mostly on analyzing the evolution of single software systems. However, it is rarely the case that a project exists as standalone, independent of others. Rather, projects exist in parallel within larger contexts in companies, research groups or even the open-source communities. We call these contexts software ecosystems, and on this paper we present The Small Project Observatory, a prototype tool which aims to support the analysis of project ecosystems through interactive visualization and exploration. We present a case-study of exploring an ecosystem using our tool, we describe about the architecture of the tool, and we distill the lessons learned during the tool-building experience.
Resumo:
Software visualizations can provide a concise overview of a complex software system. Unfortunately, as software has no physical shape, there is no `natural' mapping of software to a two-dimensional space. As a consequence most visualizations tend to use a layout in which position and distance have no meaning, and consequently layout typically diverges from one visualization to another. We propose an approach to consistent layout for software visualization, called Software Cartography, in which the position of a software artifact reflects its vocabulary, and distance corresponds to similarity of vocabulary. We use Latent Semantic Indexing (LSI) to map software artifacts to a vector space, and then use Multidimensional Scaling (MDS) to map this vector space down to two dimensions. The resulting consistent layout allows us to develop a variety of thematic software maps that express very different aspects of software while making it easy to compare them. The approach is especially suitable for comparing views of evolving software, as the vocabulary of software artifacts tends to be stable over time. We present a prototype implementation of Software Cartography, and illustrate its use with practical examples from numerous open-source case studies.
Resumo:
Seaside is the open source framework of choice for developing sophisticated and dynamic web applications. Seaside uses the power of objects to master the web. With Seaside web applications is as simple as building desktop applications. Seaside lets you build highly dynamic and interactive web applications. Seaside supports agile development through interactive debugging and unit testing. Seaside is based on Smalltalk, a proven and robust language implemented by different vendors. Seaside is now available for all the major Smalltalk including Pharo, Squeak, GNU Smalltalk, Cincom Smalltalk, GemStone Smalltalk, and VA Smalltalk.
Resumo:
Background The estimation of demographic parameters from genetic data often requires the computation of likelihoods. However, the likelihood function is computationally intractable for many realistic evolutionary models, and the use of Bayesian inference has therefore been limited to very simple models. The situation changed recently with the advent of Approximate Bayesian Computation (ABC) algorithms allowing one to obtain parameter posterior distributions based on simulations not requiring likelihood computations. Results Here we present ABCtoolbox, a series of open source programs to perform Approximate Bayesian Computations (ABC). It implements various ABC algorithms including rejection sampling, MCMC without likelihood, a Particle-based sampler and ABC-GLM. ABCtoolbox is bundled with, but not limited to, a program that allows parameter inference in a population genetics context and the simultaneous use of different types of markers with different ploidy levels. In addition, ABCtoolbox can also interact with most simulation and summary statistics computation programs. The usability of the ABCtoolbox is demonstrated by inferring the evolutionary history of two evolutionary lineages of Microtus arvalis. Using nuclear microsatellites and mitochondrial sequence data in the same estimation procedure enabled us to infer sex-specific population sizes and migration rates and to find that males show smaller population sizes but much higher levels of migration than females. Conclusion ABCtoolbox allows a user to perform all the necessary steps of a full ABC analysis, from parameter sampling from prior distributions, data simulations, computation of summary statistics, estimation of posterior distributions, model choice, validation of the estimation procedure, and visualization of the results.
Resumo:
We recently reported on the Multi Wave Animator (MWA), a novel open-source tool with capability of recreating continuous physiologic signals from archived numerical data and presenting them as they appeared on the patient monitor. In this report, we demonstrate for the first time the power of this technology in a real clinical case, an intraoperative cardiopulmonary arrest following reperfusion of a liver transplant graft. Using the MWA, we animated hemodynamic and ventilator data acquired before, during, and after cardiac arrest and resuscitation. This report is accompanied by an online video that shows the most critical phases of the cardiac arrest and resuscitation and provides a basis for analysis and discussion. This video is extracted from a 33-min, uninterrupted video of cardiac arrest and resuscitation, which is available online. The unique strength of MWA, its capability to accurately present discrete and continuous data in a format familiar to clinicians, allowed us this rare glimpse into events leading to an intraoperative cardiac arrest. Because of the ability to recreate and replay clinical events, this tool should be of great interest to medical educators, researchers, and clinicians involved in quality assurance and patient safety.
Resumo:
BACKGROUND: Despite recent algorithmic and conceptual progress, the stoichiometric network analysis of large metabolic models remains a computationally challenging problem. RESULTS: SNA is a interactive, high performance toolbox for analysing the possible steady state behaviour of metabolic networks by computing the generating and elementary vectors of their flux and conversions cones. It also supports analysing the steady states by linear programming. The toolbox is implemented mainly in Mathematica and returns numerically exact results. It is available under an open source license from: http://bioinformatics.org/project/?group_id=546. CONCLUSION: Thanks to its performance and modular design, SNA is demonstrably useful in analysing genome scale metabolic networks. Further, the integration into Mathematica provides a very flexible environment for the subsequent analysis and interpretation of the results.
Resumo:
Recovering the architecture is the first step towards reengineering a software system. Many reverse engineering tools use top-down exploration as a way of providing a visual and interactive process for architecture recovery. During the exploration process, the user navigates through various views on the system by choosing from several exploration operations. Although some sequences of these operations lead to views which, from the architectural point of view, are mode relevant than others, current tools do not provide a way of predicting which exploration paths are worth taking and which are not. In this article we propose a set of package patterns which are used for augmenting the exploration process with in formation about the worthiness of the various exploration paths. The patterns are defined based on the internal package structure and on the relationships between the package and the other packages in the system. To validate our approach, we verify the relevance of the proposed patterns for real-world systems by analyzing their frequency of occurrence in six open-source software projects.
Resumo:
Genome-wide association studies (GWAS) are used to discover genes underlying complex, heritable disorders for which less powerful study designs have failed in the past. The number of GWAS has skyrocketed recently with findings reported in top journals and the mainstream media. Mircorarrays are the genotype calling technology of choice in GWAS as they permit exploration of more than a million single nucleotide polymorphisms (SNPs)simultaneously. The starting point for the statistical analyses used by GWAS, to determine association between loci and disease, are genotype calls (AA, AB, or BB). However, the raw data, microarray probe intensities, are heavily processed before arriving at these calls. Various sophisticated statistical procedures have been proposed for transforming raw data into genotype calls. We find that variability in microarray output quality across different SNPs, different arrays, and different sample batches has substantial inuence on the accuracy of genotype calls made by existing algorithms. Failure to account for these sources of variability, GWAS run the risk of adversely affecting the quality of reported findings. In this paper we present solutions based on a multi-level mixed model. Software implementation of the method described in this paper is available as free and open source code in the crlmm R/BioConductor.
Resumo:
Submicroscopic changes in chromosomal DNA copy number dosage are common and have been implicated in many heritable diseases and cancers. Recent high-throughput technologies have a resolution that permits the detection of segmental changes in DNA copy number that span thousands of basepairs across the genome. Genome-wide association studies (GWAS) may simultaneously screen for copy number-phenotype and SNP-phenotype associations as part of the analytic strategy. However, genome-wide array analyses are particularly susceptible to batch effects as the logistics of preparing DNA and processing thousands of arrays often involves multiple laboratories and technicians, or changes over calendar time to the reagents and laboratory equipment. Failure to adjust for batch effects can lead to incorrect inference and requires inefficient post-hoc quality control procedures that exclude regions that are associated with batch. Our work extends previous model-based approaches for copy number estimation by explicitly modeling batch effects and using shrinkage to improve locus-specific estimates of copy number uncertainty. Key features of this approach include the use of diallelic genotype calls from experimental data to estimate batch- and locus-specific parameters of background and signal without the requirement of training data. We illustrate these ideas using a study of bipolar disease and a study of chromosome 21 trisomy. The former has batch effects that dominate much of the observed variation in quantile-normalized intensities, while the latter illustrates the robustness of our approach to datasets where as many as 25% of the samples have altered copy number. Locus-specific estimates of copy number can be plotted on the copy-number scale to investigate mosaicism and guide the choice of appropriate downstream approaches for smoothing the copy number as a function of physical position. The software is open source and implemented in the R package CRLMM available at Bioconductor (http:www.bioconductor.org).
Resumo:
The main purpose of this project is to understand the process of engine simulation using the open source CFD code called KIVA. This report mainly discusses the simulation of the 4-valve Pentroof engine through KIVA 3VR2. KIVA is an open source FORTRAN code which is used to solve the fluid flow field in the engines with the transient 2D and 3D chemically reactive flow with spray. It also focuses on the complete procedure to simulate an engine cycle starting from pre- processing until the final results. This report will serve a handbook for the using the KIVA code.
Resumo:
The municipality of San Juan La Laguna, Guatemala is home to approximately 5,200 people and located on the western side of the Lake Atitlán caldera. Steep slopes surround all but the eastern side of San Juan. The Lake Atitlán watershed is susceptible to many natural hazards, but most predictable are the landslides that can occur annually with each rainy season, especially during high-intensity events. Hurricane Stan hit Guatemala in October 2005; the resulting flooding and landslides devastated the Atitlán region. Locations of landslide and non-landslide points were obtained from field observations and orthophotos taken following Hurricane Stan. This study used data from multiple attributes, at every landslide and non-landslide point, and applied different multivariate analyses to optimize a model for landslides prediction during high-intensity precipitation events like Hurricane Stan. The attributes considered in this study are: geology, geomorphology, distance to faults and streams, land use, slope, aspect, curvature, plan curvature, profile curvature and topographic wetness index. The attributes were pre-evaluated for their ability to predict landslides using four different attribute evaluators, all available in the open source data mining software Weka: filtered subset, information gain, gain ratio and chi-squared. Three multivariate algorithms (decision tree J48, logistic regression and BayesNet) were optimized for landslide prediction using different attributes. The following statistical parameters were used to evaluate model accuracy: precision, recall, F measure and area under the receiver operating characteristic (ROC) curve. The algorithm BayesNet yielded the most accurate model and was used to build a probability map of landslide initiation points. The probability map developed in this study was also compared to the results of a bivariate landslide susceptibility analysis conducted for the watershed, encompassing Lake Atitlán and San Juan. Landslides from Tropical Storm Agatha 2010 were used to independently validate this study’s multivariate model and the bivariate model. The ultimate aim of this study is to share the methodology and results with municipal contacts from the author's time as a U.S. Peace Corps volunteer, to facilitate more effective future landslide hazard planning and mitigation.