950 resultados para Online services using open-source NLP tools
imaxin|software: PLN aplicada a la mejora de la comunicación multilingüe de empresas e instituciones
Resumo:
imaxin|software es una empresa creada en 1997 por cuatro titulados en ingeniería informática cuyo objetivo ha sido el de desarrollar videojuegos multimedia educativos y procesamiento del lenguaje natural multilingüe. 17 años más tarde, hemos desarrollado recursos, herramientas y aplicaciones multilingües de referencia para diferentes lenguas: Portugués (Galicia, Portugal, Brasil, etc.), Español (España, Argentina, México, etc.), Inglés, Catalán y Francés. En este artículo haremos una descripción de aquellos principales hitos en relación a la incorporación de estas tecnologías PLN al sector industrial e institucional.
Resumo:
Open source software (OSS) popularity is growing steadily and many OSS systems could be used to preserve cultural heritage objects. Such solutions give the opportunity to organizations to afford the development of a digital collection. This paper focuses on reviewing two OSS tools, CollectionSpace and the Open Video Digital Library Toolkit and discuss on how these could be used for organizing digital replicas of cultural objects. The features of the software are presented and some examples are given.
Resumo:
No início da década de 90, as empresas começaram a sentir a necessidade de melhorar o acesso à informação das suas actividades para auxiliar na tomada de decisões. Desta forma, no mundo da informática, emergiu o sector Business Intelligence (BI) composto inicialmente por data warehousing e ferramentas de geração de relatórios. Ao longo dos anos o conceito de BI evoluiu de acordo com as necessidades empresariais, tornando a análise das actividades e do desempenho das organizações em aspectos críticos na gestão das mesmas. A área de BI abrange diversos sectores, sendo o de geração de relatórios e o de análise de dados aqueles que melhor preenchem os requisitos pretendidos no controlo de acesso à informação do negócio e respectivos processos. Actualmente o tempo e a informação são vantagens competitivas e por esse mesmo motivo as empresas estão cada vez mais preocupadas com o facto de o aumento do volume de informação estar a tornar-se insustentável na medida que o tempo necessário para processar a informação é cada vez maior. Por esta razão muitas empresas de software, tais como Microsoft, IBM e Oracle estão numa luta por um lugar neste mercado de BI em expansão. Para que as empresas possam ser competitivas, a sua capacidade de previsão e resposta às necessidades de mercado em tempo real é requisito principal, em detrimento da existência apenas de uma reacção a uma necessidade que peca por tardia. Os produtos de BI têm fama de trabalharem apenas com dados históricos armazenados, o que faz com que as empresas não se possam basear nessas soluções quando o requisito de alguns negócios é de tempo quase real. A latência introduzida por um data warehouse é demasiada para que o desempenho seja aceitável. Desta forma, surge a tecnologia Business Activity Monitoring (BAM) que fornece análise de dados e alertas em tempo quase real sobre os processos do negócio, utilizando fontes de dados como Web Services, filas de mensagens, etc. O conceito de BAM surgiu em Julho de 2001 pela organização Gartner, sendo uma extensão orientada a eventos da área de BI. O BAM define-se pelo acesso em tempo real aos indicadores de desempenho de negócios com o intuito de aumentar a velocidade e eficácia dos processos de negócio. As soluções BAM estão a tornar-se cada vez mais comuns e sofisticadas.
Resumo:
The purpose of this paper is to describe the design and development of a digital library at Cochin University of Science and Technology (CUSAT), India, using DSpace open source software. The study covers the structure, contents and usage of CUSAT digital library. Design/methodology/approach – This paper examines the possibilities of applying open source in libraries. An evaluative approach is carried out to explore the features of the CUSAT digital library. The Google Analytics service is employed to measure the amount of use of digital library by users across the world. Findings – CUSAT has successfully applied DSpace open source software for building a digital library. The digital library has had visits from 78 countries, with the major share from India. The distribution of documents in the digital library is uneven. Past exam question papers share the major part of the collection. The number of research papers, articles and rare documents is less. Originality/value – The study is the first of its type that tries to understand digital library design and development using DSpace open source software in a university environment with a focus on the analysis of distribution of items and measuring the value by usage statistics employing the Google Analytics service. The digital library model can be useful for designing similar systems
Resumo:
Newspapers cover a large amount of information everyday on topics of varied interests. To a university, newspapers are essential components of communication as they cover various happenings in a university. These items of information are neither stored properly nor put in retrieval systems for future use. The news and views appeared in newspapers can effectively be organized in a digital library making use of open source software. The CUSAT digital library (http://dspace.cusat.ac.in/dspace/) has organized some news items that appeared in local newspapers about the university under a special community named “CUSAT-News”. This article describes the methods of collecting, selecting, organizing, providing access and preserving news items required by a university using DSpace open source software.
Resumo:
Advanced neuroinformatics tools are required for methods of connectome mapping, analysis, and visualization. The inherent multi-modality of connectome datasets poses new challenges for data organization, integration, and sharing. We have designed and implemented the Connectome Viewer Toolkit - a set of free and extensible open source neuroimaging tools written in Python. The key components of the toolkit are as follows: (1) The Connectome File Format is an XML-based container format to standardize multi-modal data integration and structured metadata annotation. (2) The Connectome File Format Library enables management and sharing of connectome files. (3) The Connectome Viewer is an integrated research and development environment for visualization and analysis of multi-modal connectome data. The Connectome Viewer's plugin architecture supports extensions with network analysis packages and an interactive scripting shell, to enable easy development and community contributions. Integration with tools from the scientific Python community allows the leveraging of numerous existing libraries for powerful connectome data mining, exploration, and comparison. We demonstrate the applicability of the Connectome Viewer Toolkit using Diffusion MRI datasets processed by the Connectome Mapper. The Connectome Viewer Toolkit is available from http://www.cmtk.org/
Resumo:
La formació de traductors implica l´ús de procediments i eines que permetin els estudiants familiaritzar-se amb contextos professionals. El software lliure especialitzat inclou eines de qualitat professional i procediments accessibles per a les institucions acadèmiques i els estudiants a distància que treballen a casa seva. Els projectes reals que utilitzen software lliure i traducció col·laborativa (crowdsourcing) constitueixen recursos indispensables en la formació de traductors.
Resumo:
El present projecte tracta de la realització d'una botiga online amb característiques Web 2.0 utilitzant en tot moment solucions de programari lliure. La solució triada per a la implementació del nostre projecte descarta el desenvolupament total i complet del projecte, és a dir la realització d'una web programada totalment a mida, i passa per l'adaptació d'un CMS (programa per a l'administració i gestió dels continguts d'una web) als requisits de la nostra botiga.
Resumo:
Organizations across the globe are creating and distributing products that include open source software. To ensure compliance with the open source licenses, each company needs to evaluate exactly what open source licenses and copyrights are included - resulting in duplicated effort and redundancy. This talk will provide an overview of a new Software Package Data Exchange (SPDX) specification. This specification will provide a common format to share information about the open source licenses and copyrights that are included in any software package, with the goal of saving time and improving data accuracy. This talk will review the progress of the initiative; discuss the benefits to organizations using open source and share information on how you can contribute.
Resumo:
A Seminar about the advantages of using open source licenses as a complimentary strategy to the academic publish process.
Resumo:
Summary: The objective of this work was to evaluate the sperm motility of 13 Steindachneridion parahybae males using open-source software (ImageJ/CASA plugin). The sperm activation procedure and image capture were initiated after semen collection. Four experimental phases were defined from the videos captured of each male as follows: (i) standardization of a dialogue box generated by the CASA plugin within ImageJ; (ii) frame numbers used to perform the analysis; (iii) post-activation motility between 10 and 20 s with analysis at each 1 s; and (iv) post-activation motility between 10 and 50 s with analysis at each 10 s. The settings used in the CASA dialogue box were satisfactory, and the results were consistent. These analyses should be performed using 50 frames immediately after sperm activation because spermatozoa quickly lose their vigor. At 10 s post-activation, 89.1% motile sperm was observed with 107.2 μm s-1 curvilinear velocity, 83.6 μm s-1 average path velocity, 77.1 μm s-1 straight line velocity; 91.6% were of straightness and 77.1% of wobble. The CASA plugin within ImageJ can be applied in sperm analysis of the study species by using the established settings. © 2013 Blackwell Verlag GmbH.
Resumo:
Cada vez es mayor el número de aplicaciones desarrolladas en el ámbito científico, como en la Bioinformática o en las Geociencias, escritas bajo el modelo MapReduce, empleando herramientas de código abierto como Apache Hadoop. De la necesidad de integrar Hadoop en entornos HPC, para posibilitar la ejecutar aplicaciones desarrolladas bajo el paradigma MapReduce, nace el presente proyecto. Se analizan dos frameworks diseñados para facilitar dicha integración a los desarrolladores: HoD y myHadoop. En este proyecto se analiza, tanto las posibilidades en cuanto a entornos que ofrecen dichos frameworks para la ejecución de aplicaciones MapReduce, como el rendimiento de los clúster Hadoop generados con HoD o myHadoop respecto a un clúster Hadoop físico.
Resumo:
Natural language processing has achieved great success in a wide range of ap- plications, producing both commercial language services and open-source language tools. However, most methods take a static or batch approach, assuming that the model has all information it needs and makes a one-time prediction. In this disser- tation, we study dynamic problems where the input comes in a sequence instead of all at once, and the output must be produced while the input is arriving. In these problems, predictions are often made based only on partial information. We see this dynamic setting in many real-time, interactive applications. These problems usually involve a trade-off between the amount of input received (cost) and the quality of the output prediction (accuracy). Therefore, the evaluation considers both objectives (e.g., plotting a Pareto curve). Our goal is to develop a formal understanding of sequential prediction and decision-making problems in natural language processing and to propose efficient solutions. Toward this end, we present meta-algorithms that take an existent batch model and produce a dynamic model to handle sequential inputs and outputs. Webuild our framework upon theories of Markov Decision Process (MDP), which allows learning to trade off competing objectives in a principled way. The main machine learning techniques we use are from imitation learning and reinforcement learning, and we advance current techniques to tackle problems arising in our settings. We evaluate our algorithm on a variety of applications, including dependency parsing, machine translation, and question answering. We show that our approach achieves a better cost-accuracy trade-off than the batch approach and heuristic-based decision- making approaches. We first propose a general framework for cost-sensitive prediction, where dif- ferent parts of the input come at different costs. We formulate a decision-making process that selects pieces of the input sequentially, and the selection is adaptive to each instance. Our approach is evaluated on both standard classification tasks and a structured prediction task (dependency parsing). We show that it achieves similar prediction quality to methods that use all input, while inducing a much smaller cost. Next, we extend the framework to problems where the input is revealed incremen- tally in a fixed order. We study two applications: simultaneous machine translation and quiz bowl (incremental text classification). We discuss challenges in this set- ting and show that adding domain knowledge eases the decision-making problem. A central theme throughout the chapters is an MDP formulation of a challenging problem with sequential input/output and trade-off decisions, accompanied by a learning algorithm that solves the MDP.
Resumo:
[EN] This abstract describes the development of a wildfire forecasting plugin using Capaware. Capaware is designed as an easy to use open source framework to develop 3D graphics applications over large geographic areas offering high performance 3D visualization and powerful interaction tools for the Geographic Information Systems (GIS) community.