986 resultados para Source code


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Can crowdsourcing solutions serve many masters? Can they be beneficial for both, for the layman or native speakers of minority languages on the one hand and serious linguistic research on the other? How did an infrastructure that was designed to support linguistics turn out to be a solution for raising awareness of native languages? Since 2012 the National Library of Finland has been developing the Digitisation Project for Kindred Languages, in which the key objective is to support a culture of openness and interaction in linguistic research, but also to promote crowdsourcing as a tool for participation of the language community in research. In the course of the project, over 1,200 monographs and nearly 111,000 pages of newspapers in Finno-Ugric languages will be digitised and made available in the Fenno-Ugrica digital collection. This material was published in the Soviet Union in the 1920s and 1930s, and users have had only sporadic access to the material. The publication of open-access and searchable materials from this period is a goldmine for researchers. Historians, social scientists and laymen with an interest in specific local publications can now find text materials pertinent to their studies. The linguistically-oriented population can also find writings to delight them: (1) lexical items specific to a given publication, and (2) orthographically-documented specifics of phonetics. In addition to the open access collection, we developed an open source code OCR editor that enables the editing of machine-encoded text for the benefit of linguistic research. This tool was necessary since these rare and peripheral prints often include already archaic characters, which are neglected by modern OCR software developers but belong to the historical context of kindred languages, and are thus an essential part of the linguistic heritage. When modelling the OCR editor, it was essential to consider both the needs of researchers and the capabilities of lay citizens, and to have them participate in the planning and execution of the project from the very beginning. By implementing the feedback iteratively from both groups, it was possible to transform the requested changes as tools for research that not only supported the work of linguistics but also encouraged the citizen scientists to face the challenge and work with the crowdsourcing tools for the benefit of research. This presentation will not only deal with the technical aspects, developments and achievements of the infrastructure but will highlight the way in which user groups, researchers and lay citizens were engaged in a process as an active and communicative group of users and how their contributions were made to mutual benefit.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We have developed a software called pp-Blast that uses the publicly available Blast package and PVM (parallel virtual machine) to partition a multi-sequence query across a set of nodes with replicated or shared databases. Benchmark tests show that pp-Blast running in a cluster of 14 PCs outperformed conventional Blast running in large servers. In addition, using pp-Blast and the cluster we were able to map all human cDNAs onto the draft of the human genome in less than 6 days. We propose here that the cost/benefit ratio of pp-Blast makes it appropriate for large-scale sequence analysis. The source code and configuration files for pp-Blast are available at http://www.ludwig.org.br/biocomp/tools/pp-blast.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The pipeline for macro- and microarray analyses (PMmA) is a set of scripts with a web interface developed to analyze DNA array data generated by array image quantification software. PMmA is designed for use with single- or double-color array data and to work as a pipeline in five classes (data format, normalization, data analysis, clustering, and array maps). It can also be used as a plugin in the BioArray Software Environment, an open-source database for array analysis, or used in a local version of the web service. All scripts in PMmA were developed in the PERL programming language and statistical analysis functions were implemented in the R statistical language. Consequently, our package is a platform-independent software. Our algorithms can correctly select almost 90% of the differentially expressed genes, showing a superior performance compared to other methods of analysis. The pipeline software has been applied to 1536 expressed sequence tags macroarray public data of sugarcane exposed to cold for 3 to 48 h. PMmA identified thirty cold-responsive genes previously unidentified in this public dataset. Fourteen genes were up-regulated, two had a variable expression and the other fourteen were down-regulated in the treatments. These new findings certainly were a consequence of using a superior statistical analysis approach, since the original study did not take into account the dependence of data variability on the average signal intensity of each gene. The web interface, supplementary information, and the package source code are available, free, to non-commercial users at http://ipe.cbmeg.unicamp.br/pub/PMmA.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The emerging technologies have recently challenged the libraries to reconsider their role as a mere mediator between the collections, researchers, and wider audiences (Sula, 2013), and libraries, especially the nationwide institutions like national libraries, haven’t always managed to face the challenge (Nygren et al., 2014). In the Digitization Project of Kindred Languages, the National Library of Finland has become a node that connects the partners to interplay and work for shared goals and objectives. In this paper, I will be drawing a picture of the crowdsourcing methods that have been established during the project to support both linguistic research and lingual diversity. The National Library of Finland has been executing the Digitization Project of Kindred Languages since 2012. The project seeks to digitize and publish approximately 1,200 monograph titles and more than 100 newspapers titles in various, and in some cases endangered Uralic languages. Once the digitization has been completed in 2015, the Fenno-Ugrica online collection will consist of 110,000 monograph pages and around 90,000 newspaper pages to which all users will have open access regardless of their place of residence. The majority of the digitized literature was originally published in the 1920s and 1930s in the Soviet Union, and it was the genesis and consolidation period of literary languages. This was the era when many Uralic languages were converted into media of popular education, enlightenment, and dissemination of information pertinent to the developing political agenda of the Soviet state. The ‘deluge’ of popular literature in the 1920s to 1930s suddenly challenged the lexical orthographic norms of the limited ecclesiastical publications from the 1880s onward. Newspapers were now written in orthographies and in word forms that the locals would understand. Textbooks were written to address the separate needs of both adults and children. New concepts were introduced in the language. This was the beginning of a renaissance and period of enlightenment (Rueter, 2013). The linguistically oriented population can also find writings to their delight, especially lexical items specific to a given publication, and orthographically documented specifics of phonetics. The project is financially supported by the Kone Foundation in Helsinki and is part of the Foundation’s Language Programme. One of the key objectives of the Kone Foundation Language Programme is to support a culture of openness and interaction in linguistic research, but also to promote citizen science as a tool for the participation of the language community in research. In addition to sharing this aspiration, our objective within the Language Programme is to make sure that old and new corpora in Uralic languages are made available for the open and interactive use of the academic community as well as the language societies. Wordlists are available in 17 languages, but without tokenization, lemmatization, and so on. This approach was verified with the scholars, and we consider the wordlists as raw data for linguists. Our data is used for creating the morphological analyzers and online dictionaries at the Helsinki and Tromsø Universities, for instance. In order to reach the targets, we will produce not only the digitized materials but also their development tools for supporting linguistic research and citizen science. The Digitization Project of Kindred Languages is thus linked with the research of language technology. The mission is to improve the usage and usability of digitized content. During the project, we have advanced methods that will refine the raw data for further use, especially in the linguistic research. How does the library meet the objectives, which appears to be beyond its traditional playground? The written materials from this period are a gold mine, so how could we retrieve these hidden treasures of languages out of the stack that contains more than 200,000 pages of literature in various Uralic languages? The problem is that the machined-encoded text (OCR) contains often too many mistakes to be used as such in research. The mistakes in OCRed texts must be corrected. For enhancing the OCRed texts, the National Library of Finland developed an open-source code OCR editor that enabled the editing of machine-encoded text for the benefit of linguistic research. This tool was necessary to implement, since these rare and peripheral prints did often include already perished characters, which are sadly neglected by the modern OCR software developers, but belong to the historical context of kindred languages and thus are an essential part of the linguistic heritage (van Hemel, 2014). Our crowdsourcing tool application is essentially an editor of Alto XML format. It consists of a back-end for managing users, permissions, and files, communicating through a REST API with a front-end interface—that is, the actual editor for correcting the OCRed text. The enhanced XML files can be retrieved from the Fenno-Ugrica collection for further purposes. Could the crowd do this work to support the academic research? The challenge in crowdsourcing lies in its nature. The targets in the traditional crowdsourcing have often been split into several microtasks that do not require any special skills from the anonymous people, a faceless crowd. This way of crowdsourcing may produce quantitative results, but from the research’s point of view, there is a danger that the needs of linguists are not necessarily met. Also, the remarkable downside is the lack of shared goal or the social affinity. There is no reward in the traditional methods of crowdsourcing (de Boer et al., 2012). Also, there has been criticism that digital humanities makes the humanities too data-driven and oriented towards quantitative methods, losing the values of critical qualitative methods (Fish, 2012). And on top of that, the downsides of the traditional crowdsourcing become more imminent when you leave the Anglophone world. Our potential crowd is geographically scattered in Russia. This crowd is linguistically heterogeneous, speaking 17 different languages. In many cases languages are close to extinction or longing for language revitalization, and the native speakers do not always have Internet access, so an open call for crowdsourcing would not have produced appeasing results for linguists. Thus, one has to identify carefully the potential niches to complete the needed tasks. When using the help of a crowd in a project that is aiming to support both linguistic research and survival of endangered languages, the approach has to be a different one. In nichesourcing, the tasks are distributed amongst a small crowd of citizen scientists (communities). Although communities provide smaller pools to draw resources, their specific richness in skill is suited for complex tasks with high-quality product expectations found in nichesourcing. Communities have a purpose and identity, and their regular interaction engenders social trust and reputation. These communities can correspond to research more precisely (de Boer et al., 2012). Instead of repetitive and rather trivial tasks, we are trying to utilize the knowledge and skills of citizen scientists to provide qualitative results. In nichesourcing, we hand in such assignments that would precisely fill the gaps in linguistic research. A typical task would be editing and collecting the words in such fields of vocabularies where the researchers do require more information. For instance, there is lack of Hill Mari words and terminology in anatomy. We have digitized the books in medicine, and we could try to track the words related to human organs by assigning the citizen scientists to edit and collect words with the OCR editor. From the nichesourcing’s perspective, it is essential that altruism play a central role when the language communities are involved. In nichesourcing, our goal is to reach a certain level of interplay, where the language communities would benefit from the results. For instance, the corrected words in Ingrian will be added to an online dictionary, which is made freely available for the public, so the society can benefit, too. This objective of interplay can be understood as an aspiration to support the endangered languages and the maintenance of lingual diversity, but also as a servant of ‘two masters’: research and society.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cloud Computing paradigm is continually evolving, and with it, the size and the complexity of its infrastructure. Assessing the performance of a Cloud environment is an essential but strenuous task. Modeling and simulation tools have proved their usefulness and powerfulness to deal with this issue. This master thesis work contributes to the development of the widely used cloud simulator CloudSim and proposes CloudSimDisk, a module for modeling and simulation of energy-aware storage in CloudSim. As a starting point, a review of Cloud simulators has been conducted and hard disk drive technology has been studied in detail. Furthermore, CloudSim has been identified as the most popular and sophisticated discrete event Cloud simulator. Thus, CloudSimDisk module has been developed as an extension of CloudSim v3.0.3. The source code has been published for the research community. The simulation results proved to be in accordance with the analytic models, and the scalability of the module has been presented for further development.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A quadcopter is a helicopter with four rotors, which is mechanically simple device, but requires complex electrical control for each motor. Control system needs accurate information about quadcopter’s attitude in order to achieve stable flight. The goal of this bachelor’s thesis was to research how this information could be obtained. Literature review revealed that most of the quadcopters, whose source-code is available, use a complementary filter or some derivative of it to fuse data from a gyroscope, an accelerometer and often also a magnetometer. These sensors combined are called an Inertial Measurement Unit. This thesis focuses on calculating angles from each sensor’s data and fusing these with a complementary filter. On the basis of literature review and measurements using a quadcopter, the proposed filter provides sufficiently accurate attitude data for flight control system. However, a simple complementary filter has one significant drawback – it works reliably only when the quadcopter is hovering or moving at a constant speed. The reason is that an accelerometer can’t be used to measure angles accurately if linear acceleration is present. This problem can be fixed using some derivative of a complementary filter like an adaptive complementary filter or a Kalman filter, which are not covered in this thesis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The vast majority of our contemporary society owns a mobile phone, which has resulted in a dramatic rise in the amount of networked computers in recent years. Security issues in the computers have followed the same trend and nearly everyone is now affected by such issues. How could the situation be improved? For software engineers, an obvious answer is to build computer software with security in mind. A problem with building software with security is how to define secure software or how to measure security. This thesis divides the problem into three research questions. First, how can we measure the security of software? Second, what types of tools are available for measuring security? And finally, what do these tools reveal about the security of software? Measuring tools of these kind are commonly called metrics. This thesis is focused on the perspective of software engineers in the software design phase. Focus on the design phase means that code level semantics or programming language specifics are not discussed in this work. Organizational policy, management issues or software development process are also out of the scope. The first two research problems were studied using a literature review while the third was studied using a case study research. The target of the case study was a Java based email server called Apache James, which had details from its changelog and security issues available and the source code was accessible. The research revealed that there is a consensus in the terminology on software security. Security verification activities are commonly divided into evaluation and assurance. The focus of this work was in assurance, which means to verify one’s own work. There are 34 metrics available for security measurements, of which five are evaluation metrics and 29 are assurance metrics. We found, however, that the general quality of these metrics was not good. Only three metrics in the design category passed the inspection criteria and could be used in the case study. The metrics claim to give quantitative information on the security of the software, but in practice they were limited to evaluating different versions of the same software. Apart from being relative, the metrics were unable to detect security issues or point out problems in the design. Furthermore, interpreting the metrics’ results was difficult. In conclusion, the general state of the software security metrics leaves a lot to be desired. The metrics studied had both theoretical and practical issues, and are not suitable for daily engineering workflows. The metrics studied provided a basis for further research, since they pointed out areas where the security metrics were necessary to improve whether verification of security from the design was desired.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Vakaviin reaktorionnettomuuksiin liittyviä ilmiöitä on tutkittu jo 1980-luvulta lähtien ja tutkitaan edelleen. Ilmiöt liittyvät reaktorisydämen ja muiden paineastian sisäisten materi-aalien sulamiseen sekä reagointiin veden ja höyryn kanssa. Ilmiöt on myös tärkeää tuntea ja niiden esiintymistä mallintaa käytössä olevilla laitoksilla, jotta voidaan varmistua turval-lisuusjärjestelmien riittävyydestä. Olkiluoto 1 ja 2 laitosten käyttölupa uusitaan vuoteen 2018 mennessä. Lupaprosessiin liit-tyy analyysejä, joissa mallinnetaan laitosten toimintaa vakavassa reaktorionnettomuudessa. Näiden analyysien tekoon Teollisuuden Voima Oyj on käyttänyt ohjelmaa nimeltä MEL-COR jo vuodesta 1994 lähtien. Käytössä on ollut useita eri ohjelmaversioita ja viimeisin niistä on 1.8.6, joka riittää vielä tulevan käyttöluvan uusintaprojektiin liittyvien analyysien tekoon. MELCOR:n vanhaa 1.8.6 ohjelmaversioita ei kuitenkaan enää päivitetä, joten siirtyminen uudempaan 2.1 versioon on tulevaisuudessa välttämätöntä. Uusimman versiopäivityksen yhteydessä on kuitenkin muuttunut koko ohjelman lähdekoodi ja vanhojen laitosmallien käyttö uudessa ohjelmaversiossa vaatii tiedostojen konvertoinnin. Tässä työssä esitellään MELCOR-version 2.1 ominaisuuksia ja selvitetään, mitä 1.8.6 versioon luotujen laitosmal-lien käyttöönotto versiossa 2.1 vaatii. Vaatimusten määrittelemiseksi laitosmalleilla tehdään ajoja molemmilla ohjelmaversioilla ja erilaisilla onnettomuuden alkutapahtuman määrittelyillä. Tulosten perusteella arvioidaan ohjelmaversioiden eroja ja pohditaan mitä puutteita laitosmalleihin konversion jälkeen jää. Näiden perusteella arvioidaan mitä jatkotoimenpiteitä konversio vaatii.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The goal of most clustering algorithms is to find the optimal number of clusters (i.e. fewest number of clusters). However, analysis of molecular conformations of biological macromolecules obtained from computer simulations may benefit from a larger array of clusters. The Self-Organizing Map (SOM) clustering method has the advantage of generating large numbers of clusters, but often gives ambiguous results. In this work, SOMs have been shown to be reproducible when the same conformational dataset is independently clustered multiple times (~100), with the help of the Cramérs V-index (C_v). The ability of C_v to determine which SOMs are reproduced is generalizable across different SOM source codes. The conformational ensembles produced from MD (molecular dynamics) and REMD (replica exchange molecular dynamics) simulations of the penta peptide Met-enkephalin (MET) and the 34 amino acid protein human Parathyroid Hormone (hPTH) were used to evaluate SOM reproducibility. The training length for the SOM has a huge impact on the reproducibility. Analysis of MET conformational data definitively determined that toroidal SOMs cluster data better than bordered maps due to the fact that toroidal maps do not have an edge effect. For the source code from MATLAB, it was determined that the learning rate function should be LINEAR with an initial learning rate factor of 0.05 and the SOM should be trained by a sequential algorithm. The trained SOMs can be used as a supervised classification for another dataset. The toroidal 10×10 hexagonal SOMs produced from the MATLAB program for hPTH conformational data produced three sets of reproducible clusters (27%, 15%, and 13% of 100 independent runs) which find similar partitionings to those of smaller 6×6 SOMs. The χ^2 values produced as part of the C_v calculation were used to locate clusters with identical conformational memberships on independently trained SOMs, even those with different dimensions. The χ^2 values could relate the different SOM partitionings to each other.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

"Mémoire présenté à la Faculté des études supérieures en vue de l'obtention du grade de LL.M. en droit option droit des technologies de l'information"

Relevância:

60.00% 60.00%

Publicador:

Resumo:

L'utilisation des méthodes formelles est de plus en plus courante dans le développement logiciel, et les systèmes de types sont la méthode formelle qui a le plus de succès. L'avancement des méthodes formelles présente de nouveaux défis, ainsi que de nouvelles opportunités. L'un des défis est d'assurer qu'un compilateur préserve la sémantique des programmes, de sorte que les propriétés que l'on garantit à propos de son code source s'appliquent également au code exécutable. Cette thèse présente un compilateur qui traduit un langage fonctionnel d'ordre supérieur avec polymorphisme vers un langage assembleur typé, dont la propriété principale est que la préservation des types est vérifiée de manière automatisée, à l'aide d'annotations de types sur le code du compilateur. Notre compilateur implante les transformations de code essentielles pour un langage fonctionnel d'ordre supérieur, nommément une conversion CPS, une conversion des fermetures et une génération de code. Nous présentons les détails des représentation fortement typées des langages intermédiaires, et les contraintes qu'elles imposent sur l'implantation des transformations de code. Notre objectif est de garantir la préservation des types avec un minimum d'annotations, et sans compromettre les qualités générales de modularité et de lisibilité du code du compilateur. Cet objectif est atteint en grande partie dans le traitement des fonctionnalités de base du langage (les «types simples»), contrairement au traitement du polymorphisme qui demande encore un travail substantiel pour satisfaire la vérification de type.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

De nos jours, les logiciels doivent continuellement évoluer et intégrer toujours plus de fonctionnalités pour ne pas devenir obsolètes. C'est pourquoi, la maintenance représente plus de 60% du coût d'un logiciel. Pour réduire les coûts de programmation, les fonctionnalités sont programmées plus rapidement, ce qui induit inévitablement une baisse de qualité. Comprendre l’évolution du logiciel est donc devenu nécessaire pour garantir un bon niveau de qualité et retarder le dépérissement du code. En analysant à la fois les données sur l’évolution du code contenues dans un système de gestion de versions et les données quantitatives que nous pouvons déduire du code, nous sommes en mesure de mieux comprendre l'évolution du logiciel. Cependant, la quantité de données générées par une telle analyse est trop importante pour être étudiées manuellement et les méthodes d’analyses automatiques sont peu précises. Dans ce mémoire, nous proposons d'analyser ces données avec une méthode semi automatique : la visualisation. Eyes Of Darwin, notre système de visualisation en 3D, utilise une métaphore avec des quartiers et des bâtiments d'une ville pour visualiser toute l'évolution du logiciel sur une seule vue. De plus, il intègre un système de réduction de l'occlusion qui transforme l'écran de l'utilisateur en une fenêtre ouverte sur la scène en 3D qu'il affiche. Pour finir, ce mémoire présente une étude exploratoire qui valide notre approche.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dans le développement logiciel en industrie, les documents de spécification jouent un rôle important pour la communication entre les analystes et les développeurs. Cependant, avec le temps, les changements de personel et les échéances toujours plus courtes, ces documents sont souvent obsolètes ou incohérents avec l'état effectif du système, i.e., son code source. Pourtant, il est nécessaire que les composants du système logiciel soient conservés à jour et cohérents avec leurs documents de spécifications pour faciliter leur développement et maintenance et, ainsi, pour en réduire les coûts. Maintenir la cohérence entre spécification et code source nécessite de pouvoir représenter les changements sur les uns et les autres et de pouvoir appliquer ces changements de manière cohérente et automatique. Nous proposons une solution permettant de décrire une représentation d'un logiciel ainsi qu'un formalisme mathématique permettant de décrire et de manipuler l'évolution des composants de ces représentations. Le formalisme est basé sur les triplets de Hoare pour représenter les transformations et sur la théorie des groupes et des homomorphismes de groupes pour manipuler ces transformations et permettrent leur application sur les différentes représentations du système. Nous illustrons notre formalisme sur deux représentations d'un système logiciel : PADL, une représentation architecturale de haut niveau (semblable à UML), et JCT, un arbre de syntaxe abstrait basé sur Java. Nous définissons également des transformations représentant l'évolution de ces représentations et la transposition permettant de reporter les transformations d'une représentation sur l'autre. Enfin, nous avons développé et décrivons brièvement une implémentation de notre illustration, un plugiciel pour l'IDE Eclipse détectant les transformations effectuées sur le code par les développeurs et un générateur de code pour l'intégration de nouvelles représentations dans l'implémentation.