983 resultados para programming language processing
Resumo:
L’augmentation de la croissance des réseaux, des blogs et des utilisateurs des sites d’examen sociaux font d’Internet une énorme source de données, en particulier sur la façon dont les gens pensent, sentent et agissent envers différentes questions. Ces jours-ci, les opinions des gens jouent un rôle important dans la politique, l’industrie, l’éducation, etc. Alors, les gouvernements, les grandes et petites industries, les instituts universitaires, les entreprises et les individus cherchent à étudier des techniques automatiques fin d’extraire les informations dont ils ont besoin dans les larges volumes de données. L’analyse des sentiments est une véritable réponse à ce besoin. Elle est une application de traitement du langage naturel et linguistique informatique qui se compose de techniques de pointe telles que l’apprentissage machine et les modèles de langue pour capturer les évaluations positives, négatives ou neutre, avec ou sans leur force, dans des texte brut. Dans ce mémoire, nous étudions une approche basée sur les cas pour l’analyse des sentiments au niveau des documents. Notre approche basée sur les cas génère un classificateur binaire qui utilise un ensemble de documents classifies, et cinq lexiques de sentiments différents pour extraire la polarité sur les scores correspondants aux commentaires. Puisque l’analyse des sentiments est en soi une tâche dépendante du domaine qui rend le travail difficile et coûteux, nous appliquons une approche «cross domain» en basant notre classificateur sur les six différents domaines au lieu de le limiter à un seul domaine. Pour améliorer la précision de la classification, nous ajoutons la détection de la négation comme une partie de notre algorithme. En outre, pour améliorer la performance de notre approche, quelques modifications innovantes sont appliquées. Il est intéressant de mentionner que notre approche ouvre la voie à nouveaux développements en ajoutant plus de lexiques de sentiment et ensembles de données à l’avenir.
Resumo:
Lappeenrannan teknillinen yliopisto tutkii pientasajännitesähkön käyttöä. Yliopisto on rakennuttanut Järvi-Suomen Energia Oy:n ja Suur-Savon Sähkö Oy:n kanssa yhteistyössä kokeellisen pientasajännitesähköverkon, jolla pystytään tarjoamaan kenttäolosuhteet pienjännitetutkimukselle todellisilla asiakkailla ja todentaa LVDC-teknologiaa ja muita älykkään sähköverkon toimintoja kenttäolosuhteissa. Verkon tasajänniteyhteys on rakennettu 20 kV sähkönjakeluverkon ja neljän kuluttajan välille. 20 kV keskijännite suunnataan tasamuuntamolla ±750 V pientasajännitteeksi ja uudestaan 400/230 V vaihtojännitteeksi kuluttajien läheisyydessä. Tämän kandidaatintyön tarkoituksena on luoda yliopistolle tietokanta pientasajännitesähköverkosta kertyvälle tiedolle ja mittaustuloksille. Tietokanta nähtiin tarpeelliseksi luoda, jotta pienjänniteverkon mittaustuloksia pystytään myöhemmin tarkastelemaan yhdessä ja yhtenäisessä muodossa. Yhdeksi tutkimuskysymykseksi muodostui, kuinka järjestää ja visualisoida kaikki verkosta palvelimille kertyvä mittausdata. Työssä on huomioitu myös kolme tietokantaa mahdollisesti hyödyntävää käyttäjäryhmää: kotitalousasiakkaat, sähköverkkoyhtiöt ja tutkimuslaboratorio, sekä pohdittu tietokannan hyötyä ja merkitystä näille käyttäjille. Toiseksi tutkimuskysymykseksi muodostuikin, mikä kaikesta tietokantaan talletetusta datasta olisi oleellisen tärkeää ottaa talteen näiden asiakkaiden kannalta, ja kuinka nämä voisivat hakea tietoa tietokannasta. Työn tutkimusmenetelmät perustuvat jo valmiiksi olemassa olevaan mittausdataan. Työtä varten on käytetty sekä painettua että sähköisessä muodossa olevaa kirjallisuutta. Työn tuloksena on saatu luotua tietokanta MySQL Workbench -ohjelmistolla, sekä mittausdatan keräys- ja käsittelyohjelmat Python-ohjelmointikielellä. Lisäksi on luotu erillinen MATLAB-rajapinta tiedon visualisoimista varten, jolla havainnollistetaan kolmen asiakasryhmän mittausdataa. Tietokanta ja sen tiedon visualisointi antavat kuluttajalle mahdollisuuden ymmärtää paremmin omaa sähkönkäyttöään, sekä sähköverkkoyhtiöille ja tutkimuslaboratorioille muun muassa tietoa sähkön laadusta ja verkon kuormituksesta.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Objective: The study was designed to validate use of elec-tronic health records (EHRs) for diagnosing bipolar disorder and classifying control subjects. Method: EHR data were obtained from a health care system of more than 4.6 million patients spanning more than 20 years. Experienced clinicians reviewed charts to identify text features and coded data consistent or inconsistent with a diagnosis of bipolar disorder. Natural language processing was used to train a diagnostic algorithm with 95% specificity for classifying bipolar disorder. Filtered coded data were used to derive three additional classification rules for case subjects and one for control subjects. The positive predictive value (PPV) of EHR-based bipolar disorder and subphenotype di- agnoses was calculated against diagnoses from direct semi- structured interviews of 190 patients by trained clinicians blind to EHR diagnosis. Results: The PPV of bipolar disorder defined by natural language processing was 0.85. Coded classification based on strict filtering achieved a value of 0.79, but classifications based on less stringent criteria performed less well. No EHR- classified control subject received a diagnosis of bipolar dis- order on the basis of direct interview (PPV=1.0). For most subphenotypes, values exceeded 0.80. The EHR-based clas- sifications were used to accrue 4,500 bipolar disorder cases and 5,000 controls for genetic analyses. Conclusions: Semiautomated mining of EHRs can be used to ascertain bipolar disorder patients and control subjects with high specificity and predictive value compared with diagnostic interviews. EHRs provide a powerful resource for high-throughput phenotyping for genetic and clinical research.
Resumo:
The overwhelming amount and unprecedented speed of publication in the biomedical domain make it difficult for life science researchers to acquire and maintain a broad view of the field and gather all information that would be relevant for their research. As a response to this problem, the BioNLP (Biomedical Natural Language Processing) community of researches has emerged and strives to assist life science researchers by developing modern natural language processing (NLP), information extraction (IE) and information retrieval (IR) methods that can be applied at large-scale, to scan the whole publicly available biomedical literature and extract and aggregate the information found within, while automatically normalizing the variability of natural language statements. Among different tasks, biomedical event extraction has received much attention within BioNLP community recently. Biomedical event extraction constitutes the identification of biological processes and interactions described in biomedical literature, and their representation as a set of recursive event structures. The 2009–2013 series of BioNLP Shared Tasks on Event Extraction have given raise to a number of event extraction systems, several of which have been applied at a large scale (the full set of PubMed abstracts and PubMed Central Open Access full text articles), leading to creation of massive biomedical event databases, each of which containing millions of events. Sinece top-ranking event extraction systems are based on machine-learning approach and are trained on the narrow-domain, carefully selected Shared Task training data, their performance drops when being faced with the topically highly varied PubMed and PubMed Central documents. Specifically, false-positive predictions by these systems lead to generation of incorrect biomolecular events which are spotted by the end-users. This thesis proposes a novel post-processing approach, utilizing a combination of supervised and unsupervised learning techniques, that can automatically identify and filter out a considerable proportion of incorrect events from large-scale event databases, thus increasing the general credibility of those databases. The second part of this thesis is dedicated to a system we developed for hypothesis generation from large-scale event databases, which is able to discover novel biomolecular interactions among genes/gene-products. We cast the hypothesis generation problem as a supervised network topology prediction, i.e predicting new edges in the network, as well as types and directions for these edges, utilizing a set of features that can be extracted from large biomedical event networks. Routine machine learning evaluation results, as well as manual evaluation results suggest that the problem is indeed learnable. This work won the Best Paper Award in The 5th International Symposium on Languages in Biology and Medicine (LBM 2013).
Resumo:
Les unités linguistiques sous-lexicales (p.ex., la syllabe, le phonème ou le phone) jouent un rôle crucial dans le traitement langagier. En particulier, le traitement langagier est profondément influencé par la distribution de ces unités. Par exemple, les syllabes les plus fréquentes sont articulées plus rapidement. Il est donc important d’avoir accès à des outils permettant de créer du matériel expérimental ou clinique pour l’étude du langage normal ou pathologique qui soit représentatif de l’utilisation des syllabes et des phones dans la langue orale. L’accès à ce type d’outil permet également de comparer des stimuli langagiers en fonction de leurs statistiques distributionnelles, ou encore d’étudier l’impact de ces statistiques sur le traitement langagier dans différentes populations. Pourtant, jusqu’à ce jour, aucun outil n’était disponible sur l’utilisation des unités linguistiques sous-lexicales du français oral québécois. Afin de combler cette lacune, un vaste corpus du français québécois oral spontané a été élaboré à partir d’enregistrements de 184 locuteurs québécois. Une base de données de syllabes et une base de données de phones ont ensuite été construites à partir de ce corpus, offrant une foule d’informations sur la structure des unités et sur leurs statistiques distributionnelles. Le fruit de ce projet, intitulé SyllabO +, sera rendu disponible en ligne en accès libre via le site web http://speechneurolab.ca/fr/syllabo dès la publication de l’article le décrivant. Cet outil incomparable sera d’une grande utilité dans plusieurs domaines, tels que les neurosciences cognitives, la psycholinguistique, la psychologie expérimentale, la phonétique, la phonologie, l’orthophonie et l’étude de l’acquisition des langues.
Resumo:
This paper introduces the stochastic version of the Geometric Machine Model for the modelling of sequential, alternative, parallel (synchronous) and nondeterministic computations with stochastic numbers stored in a (possibly infinite) shared memory. The programming language L(D! 1), induced by the Coherence Space of Processes D! 1, can be applied to sequential and parallel products in order to provide recursive definitions for such processes, together with a domain-theoretic semantics of the Stochastic Arithmetic. We analyze both the spacial (ordinal) recursion, related to spacial modelling of the stochastic memory, and the temporal (structural) recursion, given by the inclusion relation modelling partial objects in the ordered structure of process construction.
Resumo:
A primary goal of context-aware systems is delivering the right information at the right place and right time to users in order to enable them to make effective decisions and improve their quality of life. There are three key requirements for achieving this goal: determining what information is relevant, personalizing it based on the users’ context (location, preferences, behavioral history etc.), and delivering it to them in a timely manner without an explicit request from them. These requirements create a paradigm that we term as “Proactive Context-aware Computing”. Most of the existing context-aware systems fulfill only a subset of these requirements. Many of these systems focus only on personalization of the requested information based on users’ current context. Moreover, they are often designed for specific domains. In addition, most of the existing systems are reactive - the users request for some information and the system delivers it to them. These systems are not proactive i.e. they cannot anticipate users’ intent and behavior and act proactively without an explicit request from them. In order to overcome these limitations, we need to conduct a deeper analysis and enhance our understanding of context-aware systems that are generic, universal, proactive and applicable to a wide variety of domains. To support this dissertation, we explore several directions. Clearly the most significant sources of information about users today are smartphones. A large amount of users’ context can be acquired through them and they can be used as an effective means to deliver information to users. In addition, social media such as Facebook, Flickr and Foursquare provide a rich and powerful platform to mine users’ interests, preferences and behavioral history. We employ the ubiquity of smartphones and the wealth of information available from social media to address the challenge of building proactive context-aware systems. We have implemented and evaluated a few approaches, including some as part of the Rover framework, to achieve the paradigm of Proactive Context-aware Computing. Rover is a context-aware research platform which has been evolving for the last 6 years. Since location is one of the most important context for users, we have developed ‘Locus’, an indoor localization, tracking and navigation system for multi-story buildings. Other important dimensions of users’ context include the activities that they are engaged in. To this end, we have developed ‘SenseMe’, a system that leverages the smartphone and its multiple sensors in order to perform multidimensional context and activity recognition for users. As part of the ‘SenseMe’ project, we also conducted an exploratory study of privacy, trust, risks and other concerns of users with smart phone based personal sensing systems and applications. To determine what information would be relevant to users’ situations, we have developed ‘TellMe’ - a system that employs a new, flexible and scalable approach based on Natural Language Processing techniques to perform bootstrapped discovery and ranking of relevant information in context-aware systems. In order to personalize the relevant information, we have also developed an algorithm and system for mining a broad range of users’ preferences from their social network profiles and activities. For recommending new information to the users based on their past behavior and context history (such as visited locations, activities and time), we have developed a recommender system and approach for performing multi-dimensional collaborative recommendations using tensor factorization. For timely delivery of personalized and relevant information, it is essential to anticipate and predict users’ behavior. To this end, we have developed a unified infrastructure, within the Rover framework, and implemented several novel approaches and algorithms that employ various contextual features and state of the art machine learning techniques for building diverse behavioral models of users. Examples of generated models include classifying users’ semantic places and mobility states, predicting their availability for accepting calls on smartphones and inferring their device charging behavior. Finally, to enable proactivity in context-aware systems, we have also developed a planning framework based on HTN planning. Together, these works provide a major push in the direction of proactive context-aware computing.
Resumo:
Este artículo presenta el proceso de implementación de una API (Application Programming Interface) que permite la interacción del guante P5 de Essential Reality1 con un entorno virtual desarrollado en el lenguaje de programación Java y su librería Java 3D.2 Por otra parte, se describe un ejemplo implementado, haciendo uso de la API en cuestión. Con base en este ejemplo se presentan los resultados de la ejecución de pruebas de requerimientos de recursos físicos como la CPU y memoria física. Finalmente, se especifican las conclusiones y resultados obtenidos.
Resumo:
This paper draws a parallel between document preparation and the traditional processes of compilation and link editing for computer programs. A block-based document model is described which allows for separate compilation of various portions of a document. These portions are brought together and merged by a linker program, called dlink, whose pilot implementation is based on ditroff and on its underlying intermediate code. In the light of experiences with dlink the requirements for a universal object-module language for documents are discussed. These requirements often resemble the characteristics of the intermediate codes used by programming-language compilers but with interesting extra constraints which arise from the way documents are executed .
Resumo:
In this thesis, tool support is addressed for the combined disciplines of Model-based testing and performance testing. Model-based testing (MBT) utilizes abstract behavioral models to automate test generation, thus decreasing time and cost of test creation. MBT is a functional testing technique, thereby focusing on output, behavior, and functionality. Performance testing, however, is non-functional and is concerned with responsiveness and stability under various load conditions. MBPeT (Model-Based Performance evaluation Tool) is one such tool which utilizes probabilistic models, representing dynamic real-world user behavior patterns, to generate synthetic workload against a System Under Test and in turn carry out performance analysis based on key performance indicators (KPI). Developed at Åbo Akademi University, the MBPeT tool is currently comprised of a downloadable command-line based tool as well as a graphical user interface. The goal of this thesis project is two-fold: 1) to extend the existing MBPeT tool by deploying it as a web-based application, thereby removing the requirement of local installation, and 2) to design a user interface for this web application which will add new user interaction paradigms to the existing feature set of the tool. All phases of the MBPeT process will be realized via this single web deployment location including probabilistic model creation, test configurations, test session execution against a SUT with real-time monitoring of user configurable metric, and final test report generation and display. This web application (MBPeT Dashboard) is implemented with the Java programming language on top of the Vaadin framework for rich internet application development. The Vaadin framework handles the complicated web communications processes and front-end technologies, freeing developers to implement the business logic as well as the user interface in pure Java. A number of experiments are run in a case study environment to validate the functionality of the newly developed Dashboard application as well as the scalability of the solution implemented in handling multiple concurrent users. The results support a successful solution with regards to the functional and performance criteria defined, while improvements and optimizations are suggested to increase both of these factors.
Resumo:
Depuis le milieu des années 2000, une nouvelle approche en apprentissage automatique, l'apprentissage de réseaux profonds (deep learning), gagne en popularité. En effet, cette approche a démontré son efficacité pour résoudre divers problèmes en améliorant les résultats obtenus par d'autres techniques qui étaient considérées alors comme étant l'état de l'art. C'est le cas pour le domaine de la reconnaissance d'objets ainsi que pour la reconnaissance de la parole. Sachant cela, l’utilisation des réseaux profonds dans le domaine du Traitement Automatique du Langage Naturel (TALN, Natural Language Processing) est donc une étape logique à suivre. Cette thèse explore différentes structures de réseaux de neurones dans le but de modéliser le texte écrit, se concentrant sur des modèles simples, puissants et rapides à entraîner.
Resumo:
Socioeconomic status (SES) influences language and cognitive development, with discrepancies particularly noticeable in vocabulary development. This study examines how SES-related differences impact the development of syntactic processing, cognitive inhibition, and word learning. 38 4-5-year-olds from higher- and lower-SES backgrounds completed a word-learning task, in which novel words were embedded in active and passive sentences. Critically, unlike the active sentences, all passive sentences required a syntactic revision. Measures of cognitive inhibition were obtained through a modified Stroop task. Results indicate that lower-SES participants had more difficulty using inhibitory functions to resolve conflict compared to their higher-SES counterparts. However, SES did not impact language processing, as the language outcomes were similar across SES background. Additionally, stronger inhibitory processes were related to better language outcomes in the passive sentence condition. These results suggest that cognitive inhibition impact language processing, but this function may vary across children from different SES backgrounds
Resumo:
A flexible and multipurpose bio-inspired hierarchical model for analyzing musical timbre is presented in this paper. Inspired by findings in the fields of neuroscience, computational neuroscience, and psychoacoustics, not only does the model extract spectral and temporal characteristics of a signal, but it also analyzes amplitude modulations on different timescales. It uses a cochlear filter bank to resolve the spectral components of a sound, lateral inhibition to enhance spectral resolution, and a modulation filter bank to extract the global temporal envelope and roughness of the sound from amplitude modulations. The model was evaluated in three applications. First, it was used to simulate subjective data from two roughness experiments. Second, it was used for musical instrument classification using the k-NN algorithm and a Bayesian network. Third, it was applied to find the features that characterize sounds whose timbres were labeled in an audiovisual experiment. The successful application of the proposed model in these diverse tasks revealed its potential in capturing timbral information.
Resumo:
Este documento descreve o trabalho realizado em conjunto com a empresa MedSUPPORT[1] no desenvolvimento de uma plataforma digital para análise da satisfação dos utentes de unidades de saúde. Atualmente a avaliação de satisfação junto dos seus clientes é um procedimento importante e que deve ser utilizado pelas empresas como mais uma ferramenta de avaliação dos seus produtos ou serviços. Para as unidades de saúde a avaliação da satisfação do utente é atualmente considerada como um objetivo fundamental dos serviços de saúde e tem vindo a ocupar um lugar progressivamente mais importante na avaliação da qualidade dos mesmos. Neste âmbito idealizou-se desenvolver uma plataforma digital para análise da satisfação dos utentes de unidades de saúde. O estudo inicial sobre o conceito da satisfação de consumidores e utentes permitiu consolidar os conceitos associados à temática em estudo. Conhecer as oito dimensões que, de acordo com os investigadores englobam a satisfação do utente é um dos pontos relevantes do estudo inicial. Para avaliar junto do utente a sua satisfação é necessário questiona-lo diretamente. Para efeito desenvolveu-se um inquérito de satisfação estudando cuidadosamente cada um dos elementos que deste fazem parte. No desenvolvimento do inquérito de satisfação foram seguidas as seguintes etapas: Planeamento do questionário, partindo das oito dimensões da satisfação do utente até às métricas que serão avaliadas junto do utente; Análise dos dados a recolher, definindo-se, para cada métrica, se os dados serão nominais, ordinais ou provenientes de escalas balanceadas; Por último a formulação das perguntas do inquérito de satisfação foi alvo de estudo cuidado para garantir que o utente percecione da melhor forma o objetivo da questão. A definição das especificações da plataforma e do questionário passou por diferentes estudos, entre eles uma análise de benchmarking[2], que permitiram definir que o inquérito iv estará localizado numa zona acessível da unidade de saúde, será respondido com recurso a um ecrã táctil (tablet) e que estará alojado na web. As aplicações web desenvolvidas atualmente apresentam um design apelativo e intuitivo. Foi fundamental levar a cabo um estudo do design da aplicação web, como garantia que as cores utilizadas, o tipo de letra, e o local onde a informação são os mais adequados. Para desenvolver a aplicação web foi utilizada a linguagem de programação Ruby, com recurso à framework Ruby on Rails. Para a implementação da aplicação foram estudadas as diferentes tecnologias disponíveis, com enfoque no estudo do sistema de gestão de base de dados a utilizar. O desenvolvimento da aplicação web teve também como objetivo melhorar a gestão da informação gerada pelas respostas ao inquérito de satisfação. O colaborador da MedSUPPORT é o responsável pela gestão da informação pelo que as suas necessidades foram atendidas. Um menu para a gestão da informação é disponibilizado ao administrador da aplicação, colaborador MedSUPPORT. O menu de gestão da informação permitirá uma análise simplificada do estado atual com recurso a um painel do tipo dashboard e, a fim de melhorar a análise interna dos dados terá uma função de exportação dos dados para folha de cálculo. Para validação do estudo efetuado foram realizados os testes de funcionamento à plataforma, tanto à sua funcionalidade como à sua utilização em contexto real pelos utentes inquiridos nas unidades de saúde. Os testes em contexto real objetivaram validar o conceito junto dos utentes inquiridos.